[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

Victoria Markman (JIRA) Mon, 15 Feb 2016 09:17:58 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147610#comment-15147610
 ]


Victoria Markman commented on DRILL-4201:
-----------------------------------------

I verified that partial filter is getting pushed down, however it is not going 
to happen always. It depends on the costing and heuristic there is a bit tricky.

In the case below, filter is not going to be pushed pass project, because file 
vicky.json contains only 2 rows:
{code}
0: jdbc:drill:schema=dfs> explain plan for select
. . . . . . . . . . . . >         *
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         hive.lineitem_text_hive l
. . . . . . . . . . . . >         inner join
. . . . . . . . . . . . >         ( select
. . . . . . . . . . . . >                 flatten(test)   as test,
. . . . . . . . . . . . >                 o_orderkey      as orderkey
. . . . . . . . . . . . >         from
. . . . . . . . . . . . >                 
dfs.`/drill/testdata/Tpch0.01/json/orders/vicky.json`) as o
. . . . . . . . . . . . > on ( l.l_orderkey = o.orderkey )
. . . . . . . . . . . . > where test = 1 and o.orderkey = 22;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], 
l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], 
l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], 
l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], 
l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17])
00-02        Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], 
l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], 
l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], 
l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], 
l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17])
00-03          HashJoin(condition=[=($0, $17)], joinType=[inner])
00-05            Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:lineitem_text_hive), columns=[`*`], numPartitions=0, partitions= 
null, 
inputDirectories=[maprfs:/drill/testdata/partition_pruning/hive/text/lineitem]]])
00-04            SelectionVectorRemover
00-06              Filter(condition=[AND(=($0, 1), =($1, 22))])
00-07                Flatten(flattenField=[$0])
00-08                  Project(test=[$1], orderkey=[$0])
00-09                    Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/Tpch0.01/json/orders/vicky.json, 
numFiles=1, columns=[`test`, `o_orderkey`], 
files=[maprfs:///drill/testdata/Tpch0.01/json/orders/vicky.json]]])
{code}

It's not going to be pushed pass project even if I add 40 columns to be 
projected (json file with 2 rows):
{code}
0: jdbc:drill:schema=dfs> explain plan for select
. . . . . . . . . . . . >         *
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         hive.lineitem_text_hive l
. . . . . . . . . . . . >         inner join
. . . . . . . . . . . . >         ( select
. . . . . . . . . . . . >                 flatten(test)           as test,
. . . . . . . . . . . . >                 o_orderkey              as orderkey,
. . . . . . . . . . . . >                 o_orderkey + 1          as o1,
. . . . . . . . . . . . >                 o_orderkey + 2          as o2,
. . . . . . . . . . . . >                 o_orderkey + 3          as o3,
. . . . . . . . . . . . >                 o_orderkey + 4          as o4,
. . . . . . . . . . . . >                 o_orderkey + 5          as o5,
. . . . . . . . . . . . >                 o_orderkey + 6          as o6,
. . . . . . . . . . . . >                 o_orderkey + 7          as o7,
. . . . . . . . . . . . >                 o_orderkey + 8          as o8,
. . . . . . . . . . . . >                 o_orderkey + 9          as o9,
. . . . . . . . . . . . >                 o_orderkey + 10         as o10,
. . . . . . . . . . . . >                 o_orderkey + 11         as o11,
. . . . . . . . . . . . >                 o_orderkey + 12         as o12,
. . . . . . . . . . . . >                 o_orderkey + 13         as o13,
. . . . . . . . . . . . >                 o_orderkey + 14         as o14,
. . . . . . . . . . . . >                 o_orderkey + 15         as o15,
. . . . . . . . . . . . >                 o_orderkey + 16         as o16,
. . . . . . . . . . . . >                 o_orderkey + 17         as o17,
. . . . . . . . . . . . >                 o_orderkey + 18         as o18,
. . . . . . . . . . . . >                 o_orderkey + 19         as o19,
. . . . . . . . . . . . >                 o_orderkey + 20         as o20,
. . . . . . . . . . . . >                 o_orderkey + 21         as o21,
. . . . . . . . . . . . >                 o_orderkey + 22         as o22,
. . . . . . . . . . . . >                 o_orderkey + 23         as o23,
. . . . . . . . . . . . >                 o_orderkey + 24         as o24,
. . . . . . . . . . . . >                 o_orderkey + 25         as o25,
. . . . . . . . . . . . >                 o_orderkey + 26         as o26,
. . . . . . . . . . . . >                 o_orderkey + 27         as o27,
. . . . . . . . . . . . >                 o_orderkey + 28         as o28,
. . . . . . . . . . . . >                 o_orderkey + 29         as o29,
. . . . . . . . . . . . >                 o_orderkey + 30         as o30,
. . . . . . . . . . . . >                 o_orderkey + 31         as o31,
. . . . . . . . . . . . >                 o_orderkey + 32         as o32,
. . . . . . . . . . . . >                 o_orderkey + 33         as o33,
. . . . . . . . . . . . >                 o_orderkey + 34         as o34,
. . . . . . . . . . . . >                 o_orderkey + 35         as o35,
. . . . . . . . . . . . >                 o_orderkey + 36         as o36,
. . . . . . . . . . . . >                 o_orderkey + 37         as o37,
. . . . . . . . . . . . >                 o_orderkey + 38         as o38,
. . . . . . . . . . . . >                 o_orderkey + 39         as o39,
. . . . . . . . . . . . >                 o_orderkey + 40         as o40
. . . . . . . . . . . . >         from
. . . . . . . . . . . . >                 
dfs.`/drill/testdata/Tpch0.01/json/orders/vicky.json`) as o
. . . . . . . . . . . . > on ( l.l_orderkey = o.orderkey)
. . . . . . . . . . . . > where test = 1 and o.orderkey = 22;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], 
l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], 
l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], 
l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], 
l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17], o1=[$18], 
o2=[$19], o3=[$20], o4=[$21], o5=[$22], o6=[$23], o7=[$24], o8=[$25], o9=[$26], 
o10=[$27], o11=[$28], o12=[$29], o13=[$30], o14=[$31], o15=[$32], o16=[$33], 
o17=[$34], o18=[$35], o19=[$36], o20=[$37], o21=[$38], o22=[$39], o23=[$40], 
o24=[$41], o25=[$42], o26=[$43], o27=[$44], o28=[$45], o29=[$46], o30=[$47], 
o31=[$48], o32=[$49], o33=[$50], o34=[$51], o35=[$52], o36=[$53], o37=[$54], 
o38=[$55], o39=[$56], o40=[$57])
00-02        Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], 
l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], 
l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], 
l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], 
l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17], o1=[$18], 
o2=[$19], o3=[$20], o4=[$21], o5=[$22], o6=[$23], o7=[$24], o8=[$25], o9=[$26], 
o10=[$27], o11=[$28], o12=[$29], o13=[$30], o14=[$31], o15=[$32], o16=[$33], 
o17=[$34], o18=[$35], o19=[$36], o20=[$37], o21=[$38], o22=[$39], o23=[$40], 
o24=[$41], o25=[$42], o26=[$43], o27=[$44], o28=[$45], o29=[$46], o30=[$47], 
o31=[$48], o32=[$49], o33=[$50], o34=[$51], o35=[$52], o36=[$53], o37=[$54], 
o38=[$55], o39=[$56], o40=[$57])
00-03          HashJoin(condition=[=($0, $17)], joinType=[inner])
00-05            Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:lineitem_text_hive), columns=[`*`], numPartitions=0, partitions= 
null, 
inputDirectories=[maprfs:/drill/testdata/partition_pruning/hive/text/lineitem]]])
00-04            SelectionVectorRemover
00-06              Filter(condition=[AND(=($0, 1), =($1, 22))])
00-07                Flatten(flattenField=[$0])
00-08                  Project(test=[$1], orderkey=[$0], o1=[+($0, 1)], 
o2=[+($0, 2)], o3=[+($0, 3)], o4=[+($0, 4)], o5=[+($0, 5)], o6=[+($0, 6)], 
o7=[+($0, 7)], o8=[+($0, 8)], o9=[+($0, 9)], o10=[+($0, 10)], o11=[+($0, 11)], 
o12=[+($0, 12)], o13=[+($0, 13)], o14=[+($0, 14)], o15=[+($0, 15)], o16=[+($0, 
16)], o17=[+($0, 17)], o18=[+($0, 18)], o19=[+($0, 19)], o20=[+($0, 20)], 
o21=[+($0, 21)], o22=[+($0, 22)], o23=[+($0, 23)], o24=[+($0, 24)], o25=[+($0, 
25)], o26=[+($0, 26)], o27=[+($0, 27)], o28=[+($0, 28)], o29=[+($0, 29)], 
o30=[+($0, 30)], o31=[+($0, 31)], o32=[+($0, 32)], o33=[+($0, 33)], o34=[+($0, 
34)], o35=[+($0, 35)], o36=[+($0, 36)], o37=[+($0, 37)], o38=[+($0, 38)], 
o39=[+($0, 39)], o40=[+($0, 40)])
00-09                    Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/Tpch0.01/json/orders/vicky.json, 
numFiles=1, columns=[`test`, `o_orderkey`], 
files=[maprfs:///drill/testdata/Tpch0.01/json/orders/vicky.json]]])
{code}

Filter pushdown is triggered when size of the table increased to 300+ rows:
{code}
0: jdbc:drill:schema=dfs> explain plan for select
. . . . . . . . . . . . >         *
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         hive.lineitem_text_hive l
. . . . . . . . . . . . >         inner join
. . . . . . . . . . . . >         ( select
. . . . . . . . . . . . >                 flatten(test)           as test,
. . . . . . . . . . . . >                 o_orderkey              as orderkey
. . . . . . . . . . . . > from
. . . . . . . . . . . . >                 
dfs.`/drill/testdata/Tpch0.01/json/orders/orders_1.json`) as o
. . . . . . . . . . . . > on ( l.l_orderkey = o.orderkey )
. . . . . . . . . . . . > where test = 1 and o.orderkey = 22;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], 
l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], 
l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], 
l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], 
l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17])
00-02        Project(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], 
l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], 
l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], 
l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], 
l_shipmode=[$14], l_comment=[$15], test=[$16], orderkey=[$17])
00-03          HashJoin(condition=[=($0, $17)], joinType=[inner])
00-05            Scan(groupscan=[HiveScan [table=Table(dbName:default, 
tableName:lineitem_text_hive), columns=[`*`], numPartitions=0, partitions= 
null, 
inputDirectories=[maprfs:/drill/testdata/partition_pruning/hive/text/lineitem]]])
00-04            SelectionVectorRemover
00-06              Filter(condition=[=($0, 1)])
00-07                Flatten(flattenField=[$0])
00-08                  Project(test=[$0], orderkey=[$1])
00-09                    SelectionVectorRemover
00-10                      Filter(condition=[=($1, 22)])
00-11                        Project(test=[$1], o_orderkey=[$0])
00-12                          Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/Tpch0.01/json/orders/orders_1.json, 
numFiles=1, columns=[`test`, `o_orderkey`], 
files=[maprfs:///drill/testdata/Tpch0.01/json/orders/orders_1.json]]])
{code}

I will add this test case to the test suite. It is different from Jinfeng's 
unit tests, he is testing with directory labels, which is a bit different code 
pass.
In my opinion, we should revisit test coverage in this area after we switch to 
HEP planner (DRILL-3996).  Will file a separate enhancement for that.

Consider this fixed and verified in:
{code}
#Generated by Git-Commit-Id-Plugin
#Sat Feb 13 00:36:18 UTC 2016
git.commit.id.abbrev=0a2518d
git.commit.user.email=j...@apache.org
git.commit.message.full=DRILL-4363\: Row count based pruning for parquet table 
used in Limit n query.\n\nModify two existint unit testcase\:\n1) 
TestPartitionFilter.testMainQueryFalseCondition()\: rowCount pruning applied 
after false condition is transformed into LIMIT 0\n2) 
TestLimitWithExchanges.testPushLimitPastUnionExchange()\: modify the testcase 
to use Json source, so that it does not mix with PushLimitIntoScanRule.\n
git.commit.id=0a2518d7cf01a92a27a82e29edac5424bedf31d5
git.commit.message.short=DRILL-4363\: Row count based pruning for parquet table 
used in Limit n query.
git.commit.user.name=Jinfeng Ni
git.build.user.name=vmarkman
git.commit.id.describe=0.9.0-602-g0a2518d
git.build.user.email=vmark...@maprtech.com
git.branch=0a2518d7cf01a92a27a82e29edac5424bedf31d5
git.commit.time=11.02.2016 @ 23\:01\:15 UTC
git.build.time=13.02.2016 @ 00\:36\:18 UTC
git.remote.origin.url=g...@github.com\:apache/drill.git
{code}

> DrillPushFilterPastProject should allow partial filter pushdown. 
> -----------------------------------------------------------------
>
>                 Key: DRILL-4201
>                 URL: https://issues.apache.org/jira/browse/DRILL-4201
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.5.0
>
>
> Currently, DrillPushFilterPastProjectRule will stop pushing the filter down, 
> if the filter itself has ITEM or FLATTEN function, or its input reference is 
> referring to an ITEM or FLATTEN function. However, in case that the filter is 
> a conjunction of multiple sub-filters, some of them refer to ITEM  or FLATTEN 
> but the other not, then we should allow partial filter to be pushed down. For 
> instance,
> WHERE  partition_col > 10 and flatten_output_col = 'ABC'. 
> The "flatten_output_col" comes from the output of FLATTEN operator, and 
> therefore flatten_output_col = 'ABC' should not pushed past the project. But 
> partiion_col > 10 should be pushed down, such that we could trigger the 
> pruning rule to apply partition pruning.
> It would be improve Drill query performance, when the partially pushed filter 
> leads to partition pruning, or the partially pushed filter results in early 
> filtering in upstream operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4201) DrillPushFilterPastProject should allow partial filter pushdown.

Reply via email to