slim bouguerra created HIVE-21934:
-------------------------------------

             Summary: Materialized view on top of Druid not pushing every thing
                 Key: HIVE-21934
                 URL: https://issues.apache.org/jira/browse/HIVE-21934
             Project: Hive
          Issue Type: Improvement
            Reporter: slim bouguerra
            Assignee: Jesus Camacho Rodriguez


The title is not very informative, but examples hopefully are.

this is the plan with the view

{code}


explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
SUM(1) AS `sum_number_of_records_ok`,
YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON (`lineorder_n0`.`lo_orderdate` 
= `dates_n1`.`d_datekey`)
JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON 
(`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON 
(`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON 
(`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
GROUP BY MONTH(`dates_n1`.`__time`),
CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
YEAR(`dates_n1`.`__time`)
INFO : Starting task [Stage-3:EXPLAIN] in serial mode
INFO : Completed executing 
command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977);
 Time taken: 0.005 seconds
INFO : OK
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Plan optimized by CBO. |
| |
| Vertex dependency in root stage |
| Reducer 2 <- Map 1 (SIMPLE_EDGE) |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Stage-1 |
| Reducer 2 vectorized, llap |
| File Output Operator [FS_13] |
| Select Operator [SEL_12] (rows=300018951 width=38) |
| Output:["_col0","_col1","_col2","_col3"] |
| Group By Operator [GBY_11] (rows=300018951 width=38) |
| 
Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
 KEY._col1, KEY._col2 |
| <-Map 1 [SIMPLE_EDGE] vectorized, llap |
| SHUFFLE [RS_10] |
| PartitionCols:_col0, _col1, _col2 |
| Group By Operator [GBY_9] (rows=600037902 width=38) |
| Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, 
_col1, _col2 |
| Select Operator [SEL_8] (rows=600037902 width=38) |
| Output:["_col0","_col1","_col2"] |
| TableScan [TS_0] (rows=600037902 width=38) |
| 
mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"}
 |
| |
+----------------------------------------------------+

 

{code}

if i use a simple druid table without MV 

{code}


explain SELECT MONTH(`__time`) AS `mn___time_ok`,
CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
SUM(1) AS `sum_number_of_records_ok`,
YEAR(`__time`) AS `yr___time_ok`
FROM `druid_ssb.ssb_druid_100`
GROUP BY MONTH(`__time`),
CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
YEAR(`__time`);

{code}

{code}

+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| Plan optimized by CBO. |
| |
| Stage-0 |
| Fetch Operator |
| limit:-1 |
| Select Operator [SEL_1] |
| Output:["_col0","_col1","_col2","_col3"] |
| TableScan [TS_0] |
| 
Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York')
 - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')), 
'LONG')\",\"outputType\":\"LONG\"}],\"limitSpec\":\{\"type\":\"default\"},\"aggregations\":[\{\"type\":\"longSum\",\"name\":\"$f3\",\"expression\":\"1\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
 |
| |
+----------------------------------------------------+

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to