[ https://issues.apache.org/jira/browse/DRILL-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Barclay (Drill) updated DRILL-2735: ------------------------------------------ Summary: Broadcast plan gets "lost" when the same query is used in UNION ALL (was: Broadcast plan get's "lost" when the same query is used in UNION ALL) > Broadcast plan gets "lost" when the same query is used in UNION ALL > ------------------------------------------------------------------- > > Key: DRILL-2735 > URL: https://issues.apache.org/jira/browse/DRILL-2735 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.9.0 > Reporter: Victoria Markman > Assignee: Jinfeng Ni > Fix For: 1.2.0 > > Attachments: j1_j2_tables.tar > > > I get a broadcast plan for simple inner join query. > {code} > 0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 > where j1.c_integer = j2.c_integer; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 UnionExchange > 01-01 Project(c_integer=[$0]) > 01-02 HashJoin(condition=[=($0, $1)], joinType=[inner]) > 01-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/ctas/j1]], > selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]]) > 01-03 Project(c_integer0=[$0]) > 01-05 BroadcastExchange > 02-01 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], > selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]]) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ { > "name" : "planner.broadcast_factor", > "kind" : "DOUBLE", > "type" : "SESSION", > "float_val" : 0.0 > }, { > "name" : "planner.slice_target", > "kind" : "LONG", > "type" : "SESSION", > "num_val" : 1 > } ], > {code} > Create table succeeds and multiple fragments are executed: > {code} > 0: jdbc:drill:schema=dfs> create table test(a1) as select j1.c_integer from > j1, j2 where j1.c_integer = j2.c_integer; > +------------+---------------------------+ > | Fragment | Number of records written | > +------------+---------------------------+ > | 1_1 | 0 | > | 1_3 | 0 | > | 1_31 | 0 | > | 1_43 | 0 | > | 1_35 | 0 | > | 1_21 | 0 | > | 1_19 | 0 | > | 1_27 | 1 | > | 1_17 | 1 | > | 1_13 | 0 | > | 1_29 | 0 | > | 1_33 | 0 | > | 1_25 | 0 | > | 1_7 | 0 | > | 1_11 | 0 | > | 1_37 | 0 | > | 1_45 | 0 | > | 1_9 | 0 | > | 1_23 | 1 | > | 1_15 | 0 | > | 1_41 | 0 | > | 1_39 | 0 | > | 1_5 | 0 | > | 1_10 | 0 | > | 1_14 | 0 | > | 1_24 | 0 | > | 1_16 | 0 | > | 1_12 | 0 | > | 1_36 | 0 | > | 1_20 | 0 | > | 1_34 | 1 | > | 1_40 | 0 | > | 1_22 | 0 | > | 1_26 | 0 | > | 1_32 | 1 | > | 1_8 | 0 | > | 1_18 | 0 | > | 1_42 | 0 | > | 1_44 | 0 | > | 1_38 | 0 | > | 1_30 | 0 | > | 1_28 | 1 | > | 1_4 | 10 | > | 1_2 | 1 | > | 1_6 | 0 | > | 1_0 | 0 | > +------------+---------------------------+ > 46 rows selected (2.337 seconds) > {code} > 8 parquet files are written: > {code} > [Wed Apr 08 11:41:10 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/test > ] # ls -ltr > total 4 > -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_17_0.parquet > -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_27_0.parquet > -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_23_0.parquet > -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_34_0.parquet > -rwxr-xr-x 1 mapr mapr 185 Apr 8 11:40 1_4_0.parquet > -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_32_0.parquet > -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_28_0.parquet > -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_2_0.parquet > {code} > However: > 1. broadcast join is not planned when the same join query used in legs of the > union all oprartor: > {code} > 0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 > where j1.c_integer = j2.c_integer union all select j1.c_integer from j1, j2 > where j1.c_integer = j2.c_integer; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 UnionAll(all=[true]) > 00-03 Project(c_integer=[$0]) > 00-05 HashJoin(condition=[=($0, $1)], joinType=[inner]) > 00-09 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/ctas/j1]], > selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]]) > 00-08 Project(c_integer0=[$0]) > 00-11 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], > selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]]) > 00-02 Project(c_integer=[$0]) > 00-04 HashJoin(condition=[=($0, $1)], joinType=[inner]) > 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:/drill/testdata/ctas/j1]], > selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]]) > 00-06 Project(c_integer0=[$0]) > 00-10 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], > selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]]) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ { > "name" : "planner.broadcast_factor", > "kind" : "DOUBLE", > "type" : "SESSION", > "float_val" : 0.0 > }, { > "name" : "planner.slice_target", > "kind" : "LONG", > "type" : "SESSION", > "num_val" : 1 > } ], > {code} > 2. CTAS with this query produces 1 parquet file: > {code} > [Thu Apr 09 09:13:26 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/yyy > ] # ls > 0_0_0.parquet > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)