Victoria Markman created DRILL-2735: ---------------------------------------
Summary: Broadcast plan get's "lost" when the same query is used in UNION ALL Key: DRILL-2735 URL: https://issues.apache.org/jira/browse/DRILL-2735 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 0.9.0 Reporter: Victoria Markman Assignee: Jinfeng Ni I get a broadcast plan for simple inner join query. {code} 0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer; +------------+------------+ | text | json | +------------+------------+ | 00-00 Screen 00-01 UnionExchange 01-01 Project(c_integer=[$0]) 01-02 HashJoin(condition=[=($0, $1)], joinType=[inner]) 01-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]]) 01-03 Project(c_integer0=[$0]) 01-05 BroadcastExchange 02-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]]) | { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ { "name" : "planner.broadcast_factor", "kind" : "DOUBLE", "type" : "SESSION", "float_val" : 0.0 }, { "name" : "planner.slice_target", "kind" : "LONG", "type" : "SESSION", "num_val" : 1 } ], {code} Create table succeeds and multiple fragments are executed: {code} 0: jdbc:drill:schema=dfs> create table test(a1) as select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer; +------------+---------------------------+ | Fragment | Number of records written | +------------+---------------------------+ | 1_1 | 0 | | 1_3 | 0 | | 1_31 | 0 | | 1_43 | 0 | | 1_35 | 0 | | 1_21 | 0 | | 1_19 | 0 | | 1_27 | 1 | | 1_17 | 1 | | 1_13 | 0 | | 1_29 | 0 | | 1_33 | 0 | | 1_25 | 0 | | 1_7 | 0 | | 1_11 | 0 | | 1_37 | 0 | | 1_45 | 0 | | 1_9 | 0 | | 1_23 | 1 | | 1_15 | 0 | | 1_41 | 0 | | 1_39 | 0 | | 1_5 | 0 | | 1_10 | 0 | | 1_14 | 0 | | 1_24 | 0 | | 1_16 | 0 | | 1_12 | 0 | | 1_36 | 0 | | 1_20 | 0 | | 1_34 | 1 | | 1_40 | 0 | | 1_22 | 0 | | 1_26 | 0 | | 1_32 | 1 | | 1_8 | 0 | | 1_18 | 0 | | 1_42 | 0 | | 1_44 | 0 | | 1_38 | 0 | | 1_30 | 0 | | 1_28 | 1 | | 1_4 | 10 | | 1_2 | 1 | | 1_6 | 0 | | 1_0 | 0 | +------------+---------------------------+ 46 rows selected (2.337 seconds) {code} 8 parquet files are written: {code} [Wed Apr 08 11:41:10 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/test ] # ls -ltr total 4 -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_17_0.parquet -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_27_0.parquet -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_23_0.parquet -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_34_0.parquet -rwxr-xr-x 1 mapr mapr 185 Apr 8 11:40 1_4_0.parquet -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_32_0.parquet -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_28_0.parquet -rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_2_0.parquet {code} However: 1. broadcast join is not planned when the same join query used in legs of the union all oprartor: {code} 0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer union all select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer; +------------+------------+ | text | json | +------------+------------+ | 00-00 Screen 00-01 UnionAll(all=[true]) 00-03 Project(c_integer=[$0]) 00-05 HashJoin(condition=[=($0, $1)], joinType=[inner]) 00-09 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]]) 00-08 Project(c_integer0=[$0]) 00-11 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]]) 00-02 Project(c_integer=[$0]) 00-04 HashJoin(condition=[=($0, $1)], joinType=[inner]) 00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]]) 00-06 Project(c_integer0=[$0]) 00-10 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]]) | { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ { "name" : "planner.broadcast_factor", "kind" : "DOUBLE", "type" : "SESSION", "float_val" : 0.0 }, { "name" : "planner.slice_target", "kind" : "LONG", "type" : "SESSION", "num_val" : 1 } ], {code} 2. CTAS with this query produces 1 parquet file: {code} [Thu Apr 09 09:13:26 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/yyy ] # ls 0_0_0.parquet {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)