[jira] [Commented] (HIVE-21934) Materialized view on top of Druid not pushing everything
[ https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883477#comment-16883477 ] Hive QA commented on HIVE-21934: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12974278/HIVE-21934.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 16648 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/17991/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/17991/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-17991/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12974278 - PreCommit-HIVE-Build > Materialized view on top of Druid not pushing everything > > > Key: HIVE-21934 > URL: https://issues.apache.org/jira/browse/HIVE-21934 > Project: Hive > Issue Type: Improvement > Components: Druid integration, Materialized views >Reporter: slim bouguerra >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21934.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The title is not very informative, but examples hopefully are. > this is the plan with the view > {code} > explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`, > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`dates_n1`.`__time`) AS `yr___time_ok` > FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0` > JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON > (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`) > JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON > (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`) > JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON > (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`) > JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON > (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`) > GROUP BY MONTH(`dates_n1`.`__time`), > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`dates_n1`.`__time`) > INFO : Starting task [Stage-3:EXPLAIN] in serial mode > INFO : Completed executing > command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); > Time taken: 0.005 seconds > INFO : OK > ++ > | Explain | > ++ > | Plan optimized by CBO. | > | | > | Vertex dependency in root stage | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Stage-1 | > | Reducer 2 vectorized, llap | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=300018951 width=38) | > | Output:["_col0","_col1","_col2","_col3"] | > | Group By Operator [GBY_11] (rows=300018951 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, > KEY._col1, KEY._col2 | > | <-Map 1 [SIMPLE_EDGE] vectorized, llap | > | SHUFFLE [RS_10] | > | PartitionCols:_col0, _col1, _col2 | > | Group By Operator [GBY_9] (rows=600037902 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, > _col1, _col2 | > | Select Operator [SEL_8] (rows=600037902 width=38) | > | Output:["_col0","_col1","_col2"] | > | TableScan [TS_0] (rows=600037902 width=38) | > | > mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} > | > | | > ++ > > {code} > if i use a simple druid table without MV > {code} > explain SELECT MONTH(`__time`) AS `mn___time_ok`, > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`__time`) AS `yr___time_ok` > FROM `druid_ssb.ssb_druid_100` > GROUP BY MONTH(`__time`), > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`__
[jira] [Commented] (HIVE-21934) Materialized view on top of Druid not pushing everything
[ https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883457#comment-16883457 ] Hive QA commented on HIVE-21934: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 46s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 4m 3s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 95 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 23s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-17991/dev-support/hive-personality.sh | | git revision | master / 7cfe729 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-17991/yetus/whitespace-eol.txt | | modules | C: ql itests U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-17991/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Materialized view on top of Druid not pushing everything > > > Key: HIVE-21934 > URL: https://issues.apache.org/jira/browse/HIVE-21934 > Project: Hive > Issue Type: Improvement > Components: Druid integration, Materialized views >Reporter: slim bouguerra >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21934.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The title is not very informative, but examples hopefully are. > this is the plan with the view > {code} > explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`, > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`dates_n1`.`__time`) AS `yr___time_ok` > FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0` > JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON > (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`) > JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON > (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`) > JO
[jira] [Commented] (HIVE-21934) Materialized view on top of Druid not pushing everything
[ https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16883221#comment-16883221 ] slim bouguerra commented on HIVE-21934: --- +1 > Materialized view on top of Druid not pushing everything > > > Key: HIVE-21934 > URL: https://issues.apache.org/jira/browse/HIVE-21934 > Project: Hive > Issue Type: Improvement > Components: Druid integration, Materialized views >Reporter: slim bouguerra >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21934.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The title is not very informative, but examples hopefully are. > this is the plan with the view > {code} > explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`, > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`dates_n1`.`__time`) AS `yr___time_ok` > FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0` > JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON > (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`) > JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON > (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`) > JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON > (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`) > JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON > (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`) > GROUP BY MONTH(`dates_n1`.`__time`), > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`dates_n1`.`__time`) > INFO : Starting task [Stage-3:EXPLAIN] in serial mode > INFO : Completed executing > command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); > Time taken: 0.005 seconds > INFO : OK > ++ > | Explain | > ++ > | Plan optimized by CBO. | > | | > | Vertex dependency in root stage | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Stage-1 | > | Reducer 2 vectorized, llap | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=300018951 width=38) | > | Output:["_col0","_col1","_col2","_col3"] | > | Group By Operator [GBY_11] (rows=300018951 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, > KEY._col1, KEY._col2 | > | <-Map 1 [SIMPLE_EDGE] vectorized, llap | > | SHUFFLE [RS_10] | > | PartitionCols:_col0, _col1, _col2 | > | Group By Operator [GBY_9] (rows=600037902 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, > _col1, _col2 | > | Select Operator [SEL_8] (rows=600037902 width=38) | > | Output:["_col0","_col1","_col2"] | > | TableScan [TS_0] (rows=600037902 width=38) | > | > mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} > | > | | > ++ > > {code} > if i use a simple druid table without MV > {code} > explain SELECT MONTH(`__time`) AS `mn___time_ok`, > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`__time`) AS `yr___time_ok` > FROM `druid_ssb.ssb_druid_100` > GROUP BY MONTH(`__time`), > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`__time`); > {code} > {code} > ++ > | Explain | > ++ > | Plan optimized by CBO. | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Select Operator [SEL_1] | > | Output:["_col0","_col1","_col2","_col3"] | > | TableScan [TS_0] | > | > Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\"
[jira] [Commented] (HIVE-21934) Materialized view on top of Druid not pushing everything
[ https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882549#comment-16882549 ] Jesus Camacho Rodriguez commented on HIVE-21934: [~bslim], could you take a look? It is a one line fix. https://github.com/apache/hive/pull/717 Thanks > Materialized view on top of Druid not pushing everything > > > Key: HIVE-21934 > URL: https://issues.apache.org/jira/browse/HIVE-21934 > Project: Hive > Issue Type: Improvement > Components: Druid integration, Materialized views >Reporter: slim bouguerra >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21934.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The title is not very informative, but examples hopefully are. > this is the plan with the view > {code} > explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`, > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`dates_n1`.`__time`) AS `yr___time_ok` > FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0` > JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON > (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`) > JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON > (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`) > JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON > (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`) > JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON > (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`) > GROUP BY MONTH(`dates_n1`.`__time`), > CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`dates_n1`.`__time`) > INFO : Starting task [Stage-3:EXPLAIN] in serial mode > INFO : Completed executing > command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); > Time taken: 0.005 seconds > INFO : OK > ++ > | Explain | > ++ > | Plan optimized by CBO. | > | | > | Vertex dependency in root stage | > | Reducer 2 <- Map 1 (SIMPLE_EDGE) | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Stage-1 | > | Reducer 2 vectorized, llap | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=300018951 width=38) | > | Output:["_col0","_col1","_col2","_col3"] | > | Group By Operator [GBY_11] (rows=300018951 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, > KEY._col1, KEY._col2 | > | <-Map 1 [SIMPLE_EDGE] vectorized, llap | > | SHUFFLE [RS_10] | > | PartitionCols:_col0, _col1, _col2 | > | Group By Operator [GBY_9] (rows=600037902 width=38) | > | > Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, > _col1, _col2 | > | Select Operator [SEL_8] (rows=600037902 width=38) | > | Output:["_col0","_col1","_col2"] | > | TableScan [TS_0] (rows=600037902 width=38) | > | > mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} > | > | | > ++ > > {code} > if i use a simple druid table without MV > {code} > explain SELECT MONTH(`__time`) AS `mn___time_ok`, > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`, > SUM(1) AS `sum_number_of_records_ok`, > YEAR(`__time`) AS `yr___time_ok` > FROM `druid_ssb.ssb_druid_100` > GROUP BY MONTH(`__time`), > CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT), > YEAR(`__time`); > {code} > {code} > ++ > | Explain | > ++ > | Plan optimized by CBO. | > | | > | Stage-0 | > | Fetch Operator | > | limit:-1 | > | Select Operator [SEL_1] | > | Output:["_col0","_col1","_col2","_col3"] | > | TableScan [TS_0] | > | > Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",