Question on Spark 1.3 SQL External Datasource

Yang Lei Tue, 17 Mar 2015 15:55:47 -0700

Hello,

I am migrating my Spark SQL external datasource integration from Spark
1.2.x to Spark 1.3.


I noticed, there are a couple of new filters now,  e.g.
org.apache.spark.sql.sources.And.
However, for a sql with condition "A AND B", I noticed
PrunedFilteredScan.buildScan
still gets an Array[Filter] with 2 filters of A and B, while I
have expected to get one "And" filter with left == A and right == B.

So my first question is: where I can find out the "rules" for converting a
SQL condition to the filters passed to the PrunedFilteredScan.buildScan.

I do like what I see on these And, Or, Not filters where we allow recursive
nested definition to connect filters together. If this is the direction we
are heading to, my second question is:  if we just need one Filter object
instead of Array[Filter] on the buildScan.

The third question is: what our plan is to allow a relation provider to
inform Spark which filters are handled already, so that there is
no redundant filtering.

Appreciate comments and links to any existing documentation or discussion.


Yang

Question on Spark 1.3 SQL External Datasource

Reply via email to