[ https://issues.apache.org/jira/browse/HIVE-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111592#comment-17111592 ]
Hive QA commented on HIVE-23485: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/13003386/HIVE-23485.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 99 failed/errored test(s), 17270 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries] (batchId=225) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nonmr_fetch] (batchId=2) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_count] (batchId=5) org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver[udaf_example_group_concat] (batchId=229) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] (batchId=14) org.apache.hadoop.hive.cli.TestKuduCliDriver.testCliDriver[kudu_complex_queries] (batchId=224) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[except_distinct] (batchId=26) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] (batchId=27) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_all] (batchId=25) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_distinct] (batchId=28) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[intersect_merge] (batchId=23) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_stats] (batchId=20) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llapdecider] (batchId=20) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[mm_all] (batchId=27) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge10] (batchId=26) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge1] (batchId=21) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge_diff_fs] (batchId=20) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parallel_colstats] (batchId=23) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[reduce_deduplicate_distinct] (batchId=29) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=25) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[authorization_show_grant] (batchId=48) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multi_insert_distinct] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multi_insert_gby3] (batchId=108) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multi_insert_mixed] (batchId=55) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[notable_alias1] (batchId=88) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[notable_alias2] (batchId=44) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[nullgroup2] (batchId=82) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[nullgroup4] (batchId=55) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[nullgroup4_multi_distinct] (batchId=41) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[outer_reference_windowed] (batchId=72) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_vectorization_13] (batchId=84) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_vectorization_14] (batchId=71) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_vectorization_16] (batchId=115) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_vectorization_9] (batchId=61) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[parquet_vectorization_limit] (batchId=55) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ppd2] (batchId=57) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ppd_gby2] (batchId=121) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ppd_gby] (batchId=70) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ppd_join_filter] (batchId=90) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[ptfgroupbyjoin] (batchId=119) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[quotedid_partition] (batchId=50) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[reduce_deduplicate_extended2] (batchId=92) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan] (batchId=82) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[setop_subq] (batchId=36) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_empty_dyn_part] (batchId=63) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subq2] (batchId=37) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subq_where_serialization] (batchId=121) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists_having] (batchId=34) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multiinsert] (batchId=117) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notexists] (batchId=124) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notexists_having] (batchId=119) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin_having] (batchId=82) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_unqual_corr_expr] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_unqualcolumnrefs] (batchId=48) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] (batchId=68) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb_schq] (batchId=86) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union14] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union17] (batchId=105) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union19] (batchId=93) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union24] (batchId=93) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union28] (batchId=77) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union30] (batchId=111) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union31] (batchId=33) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union33] (batchId=55) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_pos_alias] (batchId=85) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_remove_6_subq] (batchId=70) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_remove_plan] (batchId=36) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_partition] (batchId=50) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_empty_where] (batchId=54) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_orc_nested_column_pruning] (batchId=55) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[view_cbo] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[acid_vectorization_original_tez] (batchId=19) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez-tag] (batchId=18) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[vector_join_part_col_char] (batchId=18) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query33] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query41] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query45] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query54] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query56] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query58] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query60] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query6] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query83] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query8] (batchId=231) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query23] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query33] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query41] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query45] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query54] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query56] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query58] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query60] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query6] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query83] (batchId=230) org.apache.hadoop.hive.cli.TestTezPerfConstraintsCliDriver.testCliDriver[query8] (batchId=230) org.apache.hadoop.hive.metastore.txn.TestTxnHandler.allocateNextWriteIdRetriesAfterDetectingConflictingConcurrentInsert (batchId=247) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/22465/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/22465/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-22465/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 99 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 13003386 - PreCommit-HIVE-Build > Bound GroupByOperator stats using largest NDV among columns > ----------------------------------------------------------- > > Key: HIVE-23485 > URL: https://issues.apache.org/jira/browse/HIVE-23485 > Project: Hive > Issue Type: Improvement > Reporter: Stamatis Zampetakis > Assignee: Stamatis Zampetakis > Priority: Major > Attachments: HIVE-23485.01.patch > > > Consider the following SQL query: > {code:sql} > select id, name from person group by id, name; > {code} > and assume that the person table contains the following tuples: > {code:sql} > insert into person values (0, 'A') ; > insert into person values (1, 'A') ; > insert into person values (2, 'B') ; > insert into person values (3, 'B') ; > insert into person values (4, 'B') ; > insert into person values (5, 'C') ; > {code} > If we know the number of distinct values (NDV) for all columns in the group > by clause then we can infer a lower bound for the total number of rows by > taking the maximun NDV of the involved columns. > Currently the query in the scenario above has the following plan: > {noformat} > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 vectorized > File Output Operator [FS_11] > Group By Operator [GBY_10] (rows=3 width=92) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] vectorized > SHUFFLE [RS_9] > PartitionCols:_col0, _col1 > Group By Operator [GBY_8] (rows=3 width=92) > Output:["_col0","_col1"],keys:id, name > Select Operator [SEL_7] (rows=6 width=92) > Output:["id","name"] > TableScan [TS_0] (rows=6 width=92) > > default@person,person,Tbl:COMPLETE,Col:COMPLETE,Output:["id","name"]{noformat} > Observe that the stats for group by report 3 rows but given that the ID > attribute is part of the aggregation the rows cannot be less than 6. -- This message was sent by Atlassian Jira (v8.3.4#803005)