[ https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875140#comment-16875140 ]
slim bouguerra commented on HIVE-21934: --------------------------------------- view definition {code} CREATE MATERIALIZED VIEW `ssb_mv_druid_100` STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' TBLPROPERTIES ( "druid.segment.granularity" = "MONTH", "druid.query.granularity" = "DAY") AS SELECT `__time` as `__time` , cast(c_city as string) c_city, cast(c_nation as string) c_nation, cast(c_region as string) c_region, c_mktsegment as c_mktsegment, cast(d_weeknuminyear as string) d_weeknuminyear, cast(d_year as string) d_year, cast(d_yearmonth as string) d_yearmonth, cast(d_yearmonthnum as string) d_yearmonthnum, cast(p_brand1 as string) p_brand1, cast(p_category as string) p_category, cast(p_mfgr as string) p_mfgr, p_type, s_name, cast(s_city as string) s_city, cast(s_nation as string) s_nation, cast(s_region as string) s_region, cast(`lo_ordpriority` as string) lo_ordpriority, cast(`lo_shippriority` as string) lo_shippriority, `d_sellingseason` `lo_shipmode`, lo_revenue, lo_supplycost , lo_discount , `lo_quantity`, `lo_extendedprice`, `lo_ordtotalprice`, lo_extendedprice * lo_discount discounted_price, lo_revenue - lo_supplycost net_revenue FROM customer_n1, dates_n1, lineorder_n1, ssb_part_n0, supplier_n0 where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and lo_custkey = c_custkey; {code} > Materialized view on top of Druid not pushing every thing > --------------------------------------------------------- > > Key: HIVE-21934 > URL: https://issues.apache.org/jira/browse/HIVE-21934 > Project: Hive > Issue Type: Improvement > Reporter: slim bouguerra > Assignee: Jesus Camacho Rodriguez > Priority: Major > > The title is not very informative, but examples hopefully are. > this is the plan with the view > {code} > explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`, > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`dates_n1`.`__time`) AS `yr___time_ok` > FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0` > JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON > (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`) > JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON > (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`) > JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON > (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`) > JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON > (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`) > GROUP BY MONTH(`dates_n1`.`__time`), > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`dates_n1`.`__time`) > INFO : Starting task [Stage-3:EXPLAIN] in serial mode > INFO : Completed executing > command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); > Time taken: 0.005 seconds > INFO : OK > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | Plan optimized by CBO. | > | | > | Vertex dependency in root stage | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Stage-1 | > | Reducer 2 vectorized, llap | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=300018951 width=38) | > | Output:["_col0","_col1","_col2","_col3"] | > | Group By Operator [GBY_11] (rows=300018951 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, > KEY._col1, KEY._col2 | > | <-Map 1 [SIMPLE_EDGE] vectorized, llap | > | SHUFFLE [RS_10] | > | PartitionCols:_col0, _col1, _col2 | > | Group By Operator [GBY_9] (rows=600037902 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, > _col1, _col2 | > | Select Operator [SEL_8] (rows=600037902 width=38) | > | Output:["_col0","_col1","_col2"] | > | TableScan [TS_0] (rows=600037902 width=38) | > | > mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} > | > | | > +----------------------------------------------------+ > > {code} > if i use a simple druid table without MV > {code} > explain SELECT MONTH(`__time`) AS `mn___time_ok`, > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`__time`) AS `yr___time_ok` > FROM `druid_ssb.ssb_druid_100` > GROUP BY MONTH(`__time`), > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`__time`); > {code} > {code} > +----------------------------------------------------+ > | Explain | > +----------------------------------------------------+ > | Plan optimized by CBO. | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Select Operator [SEL_1] | > | Output:["_col0","_col1","_col2","_col3"] | > | TableScan [TS_0] | > | > Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York') > - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')), > 'LONG')\",\"outputType\":\"LONG\"}],\"limitSpec\":\{\"type\":\"default\"},\"aggregations\":[\{\"type\":\"longSum\",\"name\":\"$f3\",\"expression\":\"1\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"} > | > | | > +----------------------------------------------------+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)