[jira] [Created] (DRILL-2735) Broadcast plan get's "lost" when the same query is used in UNION ALL

Victoria Markman (JIRA) Thu, 09 Apr 2015 09:22:43 -0700

Victoria Markman created DRILL-2735:
---------------------------------------


             Summary: Broadcast plan get's "lost" when the same query is used 
in UNION ALL
                 Key: DRILL-2735
                 URL: https://issues.apache.org/jira/browse/DRILL-2735
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 0.9.0
            Reporter: Victoria Markman
            Assignee: Jinfeng Ni


I get a broadcast plan for simple inner join query.

{code}
0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 
where j1.c_integer = j2.c_integer;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionExchange
01-01        Project(c_integer=[$0])
01-02          HashJoin(condition=[=($0, $1)], joinType=[inner])
01-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, 
numFiles=1, columns=[`c_integer`]]])
01-03            Project(c_integer0=[$0])
01-05              BroadcastExchange
02-01                Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], 
selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]])
 | {
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ {
      "name" : "planner.broadcast_factor",
      "kind" : "DOUBLE",
      "type" : "SESSION",
      "float_val" : 0.0
    }, {
      "name" : "planner.slice_target",
      "kind" : "LONG",
      "type" : "SESSION",
      "num_val" : 1
    } ],
{code}

Create table succeeds and multiple fragments are executed:
{code}
0: jdbc:drill:schema=dfs> create table test(a1) as  select j1.c_integer from 
j1, j2 where j1.c_integer = j2.c_integer;
+------------+---------------------------+
|  Fragment  | Number of records written |
+------------+---------------------------+
| 1_1        | 0                         |
| 1_3        | 0                         |
| 1_31       | 0                         |
| 1_43       | 0                         |
| 1_35       | 0                         |
| 1_21       | 0                         |
| 1_19       | 0                         |
| 1_27       | 1                         |
| 1_17       | 1                         |
| 1_13       | 0                         |
| 1_29       | 0                         |
| 1_33       | 0                         |
| 1_25       | 0                         |
| 1_7        | 0                         |
| 1_11       | 0                         |
| 1_37       | 0                         |
| 1_45       | 0                         |
| 1_9        | 0                         |
| 1_23       | 1                         |
| 1_15       | 0                         |
| 1_41       | 0                         |
| 1_39       | 0                         |
| 1_5        | 0                         |
| 1_10       | 0                         |
| 1_14       | 0                         |
| 1_24       | 0                         |
| 1_16       | 0                         |
| 1_12       | 0                         |
| 1_36       | 0                         |
| 1_20       | 0                         |
| 1_34       | 1                         |
| 1_40       | 0                         |
| 1_22       | 0                         |
| 1_26       | 0                         |
| 1_32       | 1                         |
| 1_8        | 0                         |
| 1_18       | 0                         |
| 1_42       | 0                         |
| 1_44       | 0                         |
| 1_38       | 0                         |
| 1_30       | 0                         |
| 1_28       | 1                         |
| 1_4        | 10                        |
| 1_2        | 1                         |
| 1_6        | 0                         |
| 1_0        | 0                         |
+------------+---------------------------+
46 rows selected (2.337 seconds)
{code}

8 parquet files are written:
{code}
[Wed Apr 08 11:41:10 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/test ] 
# ls -ltr
total 4
-rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_17_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_27_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_23_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_34_0.parquet
-rwxr-xr-x 1 mapr mapr 185 Apr  8 11:40 1_4_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_32_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_28_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_2_0.parquet
{code}

However:

1. broadcast join is not planned when the same join query used in legs of the 
union all oprartor:

{code}
0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 
where j1.c_integer = j2.c_integer union all select j1.c_integer from j1, j2 
where j1.c_integer = j2.c_integer;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionAll(all=[true])
00-03        Project(c_integer=[$0])
00-05          HashJoin(condition=[=($0, $1)], joinType=[inner])
00-09            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, 
numFiles=1, columns=[`c_integer`]]])
00-08            Project(c_integer0=[$0])
00-11              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, 
numFiles=1, columns=[`c_integer`]]])
00-02        Project(c_integer=[$0])
00-04          HashJoin(condition=[=($0, $1)], joinType=[inner])
00-07            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, 
numFiles=1, columns=[`c_integer`]]])
00-06            Project(c_integer0=[$0])
00-10              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, 
numFiles=1, columns=[`c_integer`]]])
 | {
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ {
      "name" : "planner.broadcast_factor",
      "kind" : "DOUBLE",
      "type" : "SESSION",
      "float_val" : 0.0
    }, {
      "name" : "planner.slice_target",
      "kind" : "LONG",
      "type" : "SESSION",
      "num_val" : 1
    } ],
{code}

2. CTAS with this query produces 1 parquet file:

{code}
[Thu Apr 09 09:13:26 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/yyy ] 
# ls
0_0_0.parquet
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-2735) Broadcast plan get's "lost" when the same query is used in UNION ALL

Reply via email to