[ https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147610#comment-15147610 ]
Victoria Markman commented on DRILL-4201: ----------------------------------------- I verified that partial filter is getting pushed down, however it is not going to happen always. It depends on the costing and heuristic there is a bit tricky. In the case below, filter is not going to be pushed pass project, because file vicky.json contains only 2 rows: {code} 0: jdbc:drill:schema=dfs> explain plan for select . . . . . . . . . . . . > * . . . . . . . . . . . . > from . . . . . . . . . . . . > hive.lineitem_text_hive l . . . . . . . . . . . . > inner join . . . . . . . . . . . . > ( select . . . . . . . . . . . . > flatten(test) as test, . . . . . . . . . . . . > o_orderkey as orderkey . . . . . . . . . . . . > from . . . . . . . . . . . . > dfs.`/drill/testdata/Tpch0.01/json/orders/vicky.json`) as o . . . . . . . . . . . . > on ( l.l_orderkey = o.orderkey ) . . . . . . . . . . . . > where test = 1 and o.orderkey = 22; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17]) 00-02 Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17]) 00-03 HashJoin(condition=[=($0, $17)], joinType=[inner]) 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:lineitem_text_hive), columns=[`*`], numPartitions=0, partitions= null, inputDirectories=[maprfs:/drill/testdata/partition_pruning/hive/text/lineitem]]]) 00-04 SelectionVectorRemover 00-06 Filter(condition=[AND(=($0, 1), =($1, 22))]) 00-07 Flatten(flattenField=[$0]) 00-08 Project(test=[$1], orderkey=[$0]) 00-09 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/Tpch0.01/json/orders/vicky.json, numFiles=1, columns=[`test`, `o_orderkey`], files=[maprfs:///drill/testdata/Tpch0.01/json/orders/vicky.json]]]) {code} It's not going to be pushed pass project even if I add 40 columns to be projected (json file with 2 rows): {code} 0: jdbc:drill:schema=dfs> explain plan for select . . . . . . . . . . . . > * . . . . . . . . . . . . > from . . . . . . . . . . . . > hive.lineitem_text_hive l . . . . . . . . . . . . > inner join . . . . . . . . . . . . > ( select . . . . . . . . . . . . > flatten(test) as test, . . . . . . . . . . . . > o_orderkey as orderkey, . . . . . . . . . . . . > o_orderkey + 1 as o1, . . . . . . . . . . . . > o_orderkey + 2 as o2, . . . . . . . . . . . . > o_orderkey + 3 as o3, . . . . . . . . . . . . > o_orderkey + 4 as o4, . . . . . . . . . . . . > o_orderkey + 5 as o5, . . . . . . . . . . . . > o_orderkey + 6 as o6, . . . . . . . . . . . . > o_orderkey + 7 as o7, . . . . . . . . . . . . > o_orderkey + 8 as o8, . . . . . . . . . . . . > o_orderkey + 9 as o9, . . . . . . . . . . . . > o_orderkey + 10 as o10, . . . . . . . . . . . . > o_orderkey + 11 as o11, . . . . . . . . . . . . > o_orderkey + 12 as o12, . . . . . . . . . . . . > o_orderkey + 13 as o13, . . . . . . . . . . . . > o_orderkey + 14 as o14, . . . . . . . . . . . . > o_orderkey + 15 as o15, . . . . . . . . . . . . > o_orderkey + 16 as o16, . . . . . . . . . . . . > o_orderkey + 17 as o17, . . . . . . . . . . . . > o_orderkey + 18 as o18, . . . . . . . . . . . . > o_orderkey + 19 as o19, . . . . . . . . . . . . > o_orderkey + 20 as o20, . . . . . . . . . . . . > o_orderkey + 21 as o21, . . . . . . . . . . . . > o_orderkey + 22 as o22, . . . . . . . . . . . . > o_orderkey + 23 as o23, . . . . . . . . . . . . > o_orderkey + 24 as o24, . . . . . . . . . . . . > o_orderkey + 25 as o25, . . . . . . . . . . . . > o_orderkey + 26 as o26, . . . . . . . . . . . . > o_orderkey + 27 as o27, . . . . . . . . . . . . > o_orderkey + 28 as o28, . . . . . . . . . . . . > o_orderkey + 29 as o29, . . . . . . . . . . . . > o_orderkey + 30 as o30, . . . . . . . . . . . . > o_orderkey + 31 as o31, . . . . . . . . . . . . > o_orderkey + 32 as o32, . . . . . . . . . . . . > o_orderkey + 33 as o33, . . . . . . . . . . . . > o_orderkey + 34 as o34, . . . . . . . . . . . . > o_orderkey + 35 as o35, . . . . . . . . . . . . > o_orderkey + 36 as o36, . . . . . . . . . . . . > o_orderkey + 37 as o37, . . . . . . . . . . . . > o_orderkey + 38 as o38, . . . . . . . . . . . . > o_orderkey + 39 as o39, . . . . . . . . . . . . > o_orderkey + 40 as o40 . . . . . . . . . . . . > from . . . . . . . . . . . . > dfs.`/drill/testdata/Tpch0.01/json/orders/vicky.json`) as o . . . . . . . . . . . . > on ( l.l_orderkey = o.orderkey) . . . . . . . . . . . . > where test = 1 and o.orderkey = 22; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17], o1=[$18], o2=[$19], o3=[$20], o4=[$21], o5=[$22], o6=[$23], o7=[$24], o8=[$25], o9=[$26], o10=[$27], o11=[$28], o12=[$29], o13=[$30], o14=[$31], o15=[$32], o16=[$33], o17=[$34], o18=[$35], o19=[$36], o20=[$37], o21=[$38], o22=[$39], o23=[$40], o24=[$41], o25=[$42], o26=[$43], o27=[$44], o28=[$45], o29=[$46], o30=[$47], o31=[$48], o32=[$49], o33=[$50], o34=[$51], o35=[$52], o36=[$53], o37=[$54], o38=[$55], o39=[$56], o40=[$57]) 00-02 Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17], o1=[$18], o2=[$19], o3=[$20], o4=[$21], o5=[$22], o6=[$23], o7=[$24], o8=[$25], o9=[$26], o10=[$27], o11=[$28], o12=[$29], o13=[$30], o14=[$31], o15=[$32], o16=[$33], o17=[$34], o18=[$35], o19=[$36], o20=[$37], o21=[$38], o22=[$39], o23=[$40], o24=[$41], o25=[$42], o26=[$43], o27=[$44], o28=[$45], o29=[$46], o30=[$47], o31=[$48], o32=[$49], o33=[$50], o34=[$51], o35=[$52], o36=[$53], o37=[$54], o38=[$55], o39=[$56], o40=[$57]) 00-03 HashJoin(condition=[=($0, $17)], joinType=[inner]) 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:lineitem_text_hive), columns=[`*`], numPartitions=0, partitions= null, inputDirectories=[maprfs:/drill/testdata/partition_pruning/hive/text/lineitem]]]) 00-04 SelectionVectorRemover 00-06 Filter(condition=[AND(=($0, 1), =($1, 22))]) 00-07 Flatten(flattenField=[$0]) 00-08 Project(test=[$1], orderkey=[$0], o1=[+($0, 1)], o2=[+($0, 2)], o3=[+($0, 3)], o4=[+($0, 4)], o5=[+($0, 5)], o6=[+($0, 6)], o7=[+($0, 7)], o8=[+($0, 8)], o9=[+($0, 9)], o10=[+($0, 10)], o11=[+($0, 11)], o12=[+($0, 12)], o13=[+($0, 13)], o14=[+($0, 14)], o15=[+($0, 15)], o16=[+($0, 16)], o17=[+($0, 17)], o18=[+($0, 18)], o19=[+($0, 19)], o20=[+($0, 20)], o21=[+($0, 21)], o22=[+($0, 22)], o23=[+($0, 23)], o24=[+($0, 24)], o25=[+($0, 25)], o26=[+($0, 26)], o27=[+($0, 27)], o28=[+($0, 28)], o29=[+($0, 29)], o30=[+($0, 30)], o31=[+($0, 31)], o32=[+($0, 32)], o33=[+($0, 33)], o34=[+($0, 34)], o35=[+($0, 35)], o36=[+($0, 36)], o37=[+($0, 37)], o38=[+($0, 38)], o39=[+($0, 39)], o40=[+($0, 40)]) 00-09 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/Tpch0.01/json/orders/vicky.json, numFiles=1, columns=[`test`, `o_orderkey`], files=[maprfs:///drill/testdata/Tpch0.01/json/orders/vicky.json]]]) {code} Filter pushdown is triggered when size of the table increased to 300+ rows: {code} 0: jdbc:drill:schema=dfs> explain plan for select . . . . . . . . . . . . > * . . . . . . . . . . . . > from . . . . . . . . . . . . > hive.lineitem_text_hive l . . . . . . . . . . . . > inner join . . . . . . . . . . . . > ( select . . . . . . . . . . . . > flatten(test) as test, . . . . . . . . . . . . > o_orderkey as orderkey . . . . . . . . . . . . > from . . . . . . . . . . . . > dfs.`/drill/testdata/Tpch0.01/json/orders/orders_1.json`) as o . . . . . . . . . . . . > on ( l.l_orderkey = o.orderkey ) . . . . . . . . . . . . > where test = 1 and o.orderkey = 22; +------+------+ | text | json | +------+------+ | 00-00 Screen 00-01 Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17]) 00-02 Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17]) 00-03 HashJoin(condition=[=($0, $17)], joinType=[inner]) 00-05 Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:lineitem_text_hive), columns=[`*`], numPartitions=0, partitions= null, inputDirectories=[maprfs:/drill/testdata/partition_pruning/hive/text/lineitem]]]) 00-04 SelectionVectorRemover 00-06 Filter(condition=[=($0, 1)]) 00-07 Flatten(flattenField=[$0]) 00-08 Project(test=[$0], orderkey=[$1]) 00-09 SelectionVectorRemover 00-10 Filter(condition=[=($1, 22)]) 00-11 Project(test=[$1], o_orderkey=[$0]) 00-12 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/Tpch0.01/json/orders/orders_1.json, numFiles=1, columns=[`test`, `o_orderkey`], files=[maprfs:///drill/testdata/Tpch0.01/json/orders/orders_1.json]]]) {code} I will add this test case to the test suite. It is different from Jinfeng's unit tests, he is testing with directory labels, which is a bit different code pass. In my opinion, we should revisit test coverage in this area after we switch to HEP planner (DRILL-3996). Will file a separate enhancement for that. Consider this fixed and verified in: {code} #Generated by Git-Commit-Id-Plugin #Sat Feb 13 00:36:18 UTC 2016 git.commit.id.abbrev=0a2518d git.commit.user.email=j...@apache.org git.commit.message.full=DRILL-4363\: Row count based pruning for parquet table used in Limit n query.\n\nModify two existint unit testcase\:\n1) TestPartitionFilter.testMainQueryFalseCondition()\: rowCount pruning applied after false condition is transformed into LIMIT 0\n2) TestLimitWithExchanges.testPushLimitPastUnionExchange()\: modify the testcase to use Json source, so that it does not mix with PushLimitIntoScanRule.\n git.commit.id=0a2518d7cf01a92a27a82e29edac5424bedf31d5 git.commit.message.short=DRILL-4363\: Row count based pruning for parquet table used in Limit n query. git.commit.user.name=Jinfeng Ni git.build.user.name=vmarkman git.commit.id.describe=0.9.0-602-g0a2518d git.build.user.email=vmark...@maprtech.com git.branch=0a2518d7cf01a92a27a82e29edac5424bedf31d5 git.commit.time=11.02.2016 @ 23\:01\:15 UTC git.build.time=13.02.2016 @ 00\:36\:18 UTC git.remote.origin.url=g...@github.com\:apache/drill.git {code} > DrillPushFilterPastProject should allow partial filter pushdown. > ----------------------------------------------------------------- > > Key: DRILL-4201 > URL: https://issues.apache.org/jira/browse/DRILL-4201 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Fix For: 1.5.0 > > > Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, > if the filter itself has ITEM or FLATTEN function, or its input reference is > referring to an ITEM or FLATTEN function. However, in case that the filter is > a conjunction of multiple sub-filters, some of them refer to ITEM or FLATTEN > but the other not, then we should allow partial filter to be pushed down. For > instance, > WHERE partition_col > 10 and flatten_output_col = 'ABC'. > The "flatten_output_col" comes from the output of FLATTEN operator, and > therefore flatten_output_col = 'ABC' should not pushed past the project. But > partiion_col > 10 should be pushed down, such that we could trigger the > pruning rule to apply partition pruning. > It would be improve Drill query performance, when the partially pushed filter > leads to partition pruning, or the partially pushed filter results in early > filtering in upstream operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)