[jira] [Commented] (HIVE-15528) Expose Spark job error in SparkTask
[ https://issues.apache.org/jira/browse/HIVE-15528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792519#comment-15792519 ] Hive QA commented on HIVE-15528: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12845276/HIVE-15528.000.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10879 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=139) [skewjoinopt15.q,vector_coalesce.q,orc_ppd_decimal.q,cbo_rp_lineage2.q,insert_into_with_schema.q,join_emit_interval.q,load_dyn_part3.q,auto_sortmerge_join_14.q,vector_null_projection.q,vector_cast_constant.q,mapjoin2.q,bucket_map_join_tez2.q,correlationoptimizer4.q,schema_evol_orc_acidvec_part_update.q,vectorization_12.q,vector_number_compare_projection.q,orc_merge_incompat3.q,vector_leftsemi_mapjoin.q,update_all_non_partitioned.q,multi_column_in_single.q,schema_evol_orc_nonvec_table.q,cbo_rp_semijoin.q,tez_insert_overwrite_local_directory_1.q,schema_evol_text_vecrow_table.q,vector_count.q,auto_sortmerge_join_15.q,vector_if_expr.q,delete_whole_partition.q,vector_decimal_6.q,sample1.q] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=92) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2758/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2758/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2758/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12845276 - PreCommit-HIVE-Build > Expose Spark job error in SparkTask > --- > > Key: HIVE-15528 > URL: https://issues.apache.org/jira/browse/HIVE-15528 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 2.2.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Attachments: HIVE-15528.000.patch > > > Expose Spark job error in SparkTask by propagating Spark job error to task > exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15529) LLAP: TaskSchedulerService can get stuck when scheduleTask returns DELAYED_RESOURCES
[ https://issues.apache.org/jira/browse/HIVE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793297#comment-15793297 ] Pengcheng Xiong commented on HIVE-15529: [~rajesh.balamohan], this sounds related to HIVE-15467? > LLAP: TaskSchedulerService can get stuck when scheduleTask returns > DELAYED_RESOURCES > > > Key: HIVE-15529 > URL: https://issues.apache.org/jira/browse/HIVE-15529 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Priority: Critical > > Easier way to simulate the issue: > 1. Start hive cli with "--hiveconf hive.execution.mode=llap" > 2. Run a sql script file (e.g sql script containing tpc-ds queries) > 3. In the middle of the run, press "ctrl+C" which would interrupt the current > job. This should not exit the hive cli yet. > 4. After sometime, launch the same SQL script in same cli. This would get > stuck indefinitely (waiting for computing the splits). > Even when cli is quit, AM runs forever until explicitly killed. > Issue seems to be around {{LlapTaskSchedulerService::schedulePendingTasks}} > dealing with the loop when it encounters {{DELAYED_RESOURCES}} on task > scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15507) Nested column pruning: fix issue when selecting struct field from array/map element
[ https://issues.apache.org/jira/browse/HIVE-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-15507: Resolution: Fixed Fix Version/s: 2.2.0 Target Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to the master branch. Thanks [~Ferd] for the review! > Nested column pruning: fix issue when selecting struct field from array/map > element > --- > > Key: HIVE-15507 > URL: https://issues.apache.org/jira/browse/HIVE-15507 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Physical Optimizer, > Serializers/Deserializers >Affects Versions: 2.2.0 >Reporter: Chao Sun >Assignee: Chao Sun > Fix For: 2.2.0 > > Attachments: 15507.1.patch > > > When running the following query: > {code} > SELECT count(col), arr[0].f > FROM tbl > GROUP BY arr[0].f > {code} > where {{arr}} is an array of struct with field {{f}}. Nested column pruning > will fail. This is because we currently process {{GenericUDFIndex}} in the > same way as any other UDF. In this case, it will generate path {{arr.f}}, > which will not match the struct type info when doing the pruning. > Same thing for map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15527) Memory usage is unbound in SortByShuffler for Spark
[ https://issues.apache.org/jira/browse/HIVE-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793570#comment-15793570 ] Chao Sun commented on HIVE-15527: - Patch looks good. Any idea why the qfile result is different? > Memory usage is unbound in SortByShuffler for Spark > --- > > Key: HIVE-15527 > URL: https://issues.apache.org/jira/browse/HIVE-15527 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 1.1.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-15527.1.patch, HIVE-15527.2.patch, > HIVE-15527.3.patch, HIVE-15527.patch > > > In SortByShuffler.java, an ArrayList is used to back the iterator for values > that have the same key in shuffled result produced by spark transformation > sortByKey. It's possible that memory can be exhausted because of a large key > group. > {code} > @Override > public Tuple2> next() { > // TODO: implement this by accumulating rows with the same key > into a list. > // Note that this list needs to improved to prevent excessive > memory usage, but this > // can be done in later phase. > while (it.hasNext()) { > Tuple2 pair = it.next(); > if (curKey != null && !curKey.equals(pair._1())) { > HiveKey key = curKey; > List values = curValues; > curKey = pair._1(); > curValues = new ArrayList(); > curValues.add(pair._2()); > return new Tuple2>(key, > values); > } > curKey = pair._1(); > curValues.add(pair._2()); > } > if (curKey == null) { > throw new NoSuchElementException(); > } > // if we get here, this should be the last element we have > HiveKey key = curKey; > curKey = null; > return new Tuple2>(key, > curValues); > } > {code} > Since the output from sortByKey is already sorted on key, it's possible to > backup the value iterable using the same input iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15525) Hooking ChangeManager to "drop table", "drop partition"
[ https://issues.apache.org/jira/browse/HIVE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793805#comment-15793805 ] Thejas M Nair commented on HIVE-15525: -- [~daijy] can you please include a reviewboard link or pull request ? > Hooking ChangeManager to "drop table", "drop partition" > --- > > Key: HIVE-15525 > URL: https://issues.apache.org/jira/browse/HIVE-15525 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-15525.1.patch > > > When Hive "drop table"/"drop partition", we will move data files into cmroot > in case the replication destination will need it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15481) Support multiple and nested subqueries
[ https://issues.apache.org/jira/browse/HIVE-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-15481: --- Status: Open (was: Patch Available) > Support multiple and nested subqueries > -- > > Key: HIVE-15481 > URL: https://issues.apache.org/jira/browse/HIVE-15481 > Project: Hive > Issue Type: Sub-task > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-15481.1.patch, HIVE-15481.2.patch, > HIVE-15481.3.patch > > > This is continuation of the work done in HIVE-15192. As listed at > [Restrictions | > https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf ] > currently it is not possible to execute queries which either have more than > one subquery or have nested subquery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15481) Support multiple and nested subqueries
[ https://issues.apache.org/jira/browse/HIVE-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-15481: --- Attachment: HIVE-15481.4.patch > Support multiple and nested subqueries > -- > > Key: HIVE-15481 > URL: https://issues.apache.org/jira/browse/HIVE-15481 > Project: Hive > Issue Type: Sub-task > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-15481.1.patch, HIVE-15481.2.patch, > HIVE-15481.3.patch, HIVE-15481.4.patch > > > This is continuation of the work done in HIVE-15192. As listed at > [Restrictions | > https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf ] > currently it is not possible to execute queries which either have more than > one subquery or have nested subquery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15481) Support multiple and nested subqueries
[ https://issues.apache.org/jira/browse/HIVE-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-15481: --- Status: Patch Available (was: Open) Added back restriction to disable multiple queries with OR since it produces wrong results (CALCITE-1546). > Support multiple and nested subqueries > -- > > Key: HIVE-15481 > URL: https://issues.apache.org/jira/browse/HIVE-15481 > Project: Hive > Issue Type: Sub-task > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-15481.1.patch, HIVE-15481.2.patch, > HIVE-15481.3.patch, HIVE-15481.4.patch > > > This is continuation of the work done in HIVE-15192. As listed at > [Restrictions | > https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf ] > currently it is not possible to execute queries which either have more than > one subquery or have nested subquery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Description: Currently when a table is altered, if any of below conditions is true, HMS would try to update column statistics for the table: # database name is changed # table name is changed # old columns and new columns are not the same As a result, when a column is added to a table, Hive also tries to update column statistics, which is not necessary. We can loose the last condition by checking whether all existing columns are changed or not. If not, we don't have to update stats info. was: Currently when a table is altered, if any of below conditions is false, HMS would try to update column statistics for the table: # database name is changed # table name is changed # old columns and new columns are not the same As a result, when a column is added to a table, Hive also tries to update column statistics, which is not necessary. We can loose the last condition by checking whether all existing columns are changed or not. If not, we don't have to update stats info. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15527) Memory usage is unbound in SortByShuffler for Spark
[ https://issues.apache.org/jira/browse/HIVE-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793903#comment-15793903 ] Rui Li commented on HIVE-15527: --- Since the HiveKVResultCache here only stores values for the same key, I think we can avoid Ser/De the HiveKey to improve performance? > Memory usage is unbound in SortByShuffler for Spark > --- > > Key: HIVE-15527 > URL: https://issues.apache.org/jira/browse/HIVE-15527 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 1.1.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-15527.1.patch, HIVE-15527.2.patch, > HIVE-15527.3.patch, HIVE-15527.patch > > > In SortByShuffler.java, an ArrayList is used to back the iterator for values > that have the same key in shuffled result produced by spark transformation > sortByKey. It's possible that memory can be exhausted because of a large key > group. > {code} > @Override > public Tuple2> next() { > // TODO: implement this by accumulating rows with the same key > into a list. > // Note that this list needs to improved to prevent excessive > memory usage, but this > // can be done in later phase. > while (it.hasNext()) { > Tuple2 pair = it.next(); > if (curKey != null && !curKey.equals(pair._1())) { > HiveKey key = curKey; > List values = curValues; > curKey = pair._1(); > curValues = new ArrayList(); > curValues.add(pair._2()); > return new Tuple2>(key, > values); > } > curKey = pair._1(); > curValues.add(pair._2()); > } > if (curKey == null) { > throw new NoSuchElementException(); > } > // if we get here, this should be the last element we have > HiveKey key = curKey; > curKey = null; > return new Tuple2>(key, > curValues); > } > {code} > Since the output from sortByKey is already sorted on key, it's possible to > backup the value iterable using the same input iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15481) Support multiple and nested subqueries
[ https://issues.apache.org/jira/browse/HIVE-15481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793920#comment-15793920 ] Hive QA commented on HIVE-15481: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12845313/HIVE-15481.4.patch {color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10920 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=93) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=92) org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropTable (batchId=208) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2759/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2759/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2759/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12845313 - PreCommit-HIVE-Build > Support multiple and nested subqueries > -- > > Key: HIVE-15481 > URL: https://issues.apache.org/jira/browse/HIVE-15481 > Project: Hive > Issue Type: Sub-task > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-15481.1.patch, HIVE-15481.2.patch, > HIVE-15481.3.patch, HIVE-15481.4.patch > > > This is continuation of the work done in HIVE-15192. As listed at > [Restrictions | > https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf ] > currently it is not possible to execute queries which either have more than > one subquery or have nested subquery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15527) Memory usage is unbound in SortByShuffler for Spark
[ https://issues.apache.org/jira/browse/HIVE-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793967#comment-15793967 ] liyunzhang_intel commented on HIVE-15527: - [~xuefuz] and [~lirui]: HiveKVResultCache will write key value pair if buffer is full and this will do some limits to the memory usage. But is there anything to show that the ArrayList use a lot of memory? test this by memory analysis tool? > Memory usage is unbound in SortByShuffler for Spark > --- > > Key: HIVE-15527 > URL: https://issues.apache.org/jira/browse/HIVE-15527 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 1.1.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-15527.1.patch, HIVE-15527.2.patch, > HIVE-15527.3.patch, HIVE-15527.patch > > > In SortByShuffler.java, an ArrayList is used to back the iterator for values > that have the same key in shuffled result produced by spark transformation > sortByKey. It's possible that memory can be exhausted because of a large key > group. > {code} > @Override > public Tuple2> next() { > // TODO: implement this by accumulating rows with the same key > into a list. > // Note that this list needs to improved to prevent excessive > memory usage, but this > // can be done in later phase. > while (it.hasNext()) { > Tuple2 pair = it.next(); > if (curKey != null && !curKey.equals(pair._1())) { > HiveKey key = curKey; > List values = curValues; > curKey = pair._1(); > curValues = new ArrayList(); > curValues.add(pair._2()); > return new Tuple2>(key, > values); > } > curKey = pair._1(); > curValues.add(pair._2()); > } > if (curKey == null) { > throw new NoSuchElementException(); > } > // if we get here, this should be the last element we have > HiveKey key = curKey; > curKey = null; > return new Tuple2>(key, > curValues); > } > {code} > Since the output from sortByKey is already sorted on key, it's possible to > backup the value iterable using the same input iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-15527) Memory usage is unbound in SortByShuffler for Spark
[ https://issues.apache.org/jira/browse/HIVE-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793967#comment-15793967 ] liyunzhang_intel edited comment on HIVE-15527 at 1/3/17 2:53 AM: - [~xuefuz] and [~lirui]: HiveKVResultCache will write key value pair to disk if buffer is full and this will do some limits to the memory usage. But is there anything to show that the ArrayList use a lot of memory? test this by memory analysis tool? was (Author: kellyzly): [~xuefuz] and [~lirui]: HiveKVResultCache will write key value pair if buffer is full and this will do some limits to the memory usage. But is there anything to show that the ArrayList use a lot of memory? test this by memory analysis tool? > Memory usage is unbound in SortByShuffler for Spark > --- > > Key: HIVE-15527 > URL: https://issues.apache.org/jira/browse/HIVE-15527 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 1.1.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-15527.1.patch, HIVE-15527.2.patch, > HIVE-15527.3.patch, HIVE-15527.patch > > > In SortByShuffler.java, an ArrayList is used to back the iterator for values > that have the same key in shuffled result produced by spark transformation > sortByKey. It's possible that memory can be exhausted because of a large key > group. > {code} > @Override > public Tuple2> next() { > // TODO: implement this by accumulating rows with the same key > into a list. > // Note that this list needs to improved to prevent excessive > memory usage, but this > // can be done in later phase. > while (it.hasNext()) { > Tuple2 pair = it.next(); > if (curKey != null && !curKey.equals(pair._1())) { > HiveKey key = curKey; > List values = curValues; > curKey = pair._1(); > curValues = new ArrayList(); > curValues.add(pair._2()); > return new Tuple2>(key, > values); > } > curKey = pair._1(); > curValues.add(pair._2()); > } > if (curKey == null) { > throw new NoSuchElementException(); > } > // if we get here, this should be the last element we have > HiveKey key = curKey; > curKey = null; > return new Tuple2>(key, > curValues); > } > {code} > Since the output from sortByKey is already sorted on key, it's possible to > backup the value iterable using the same input iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15529) LLAP: TaskSchedulerService can get stuck when scheduleTask returns DELAYED_RESOURCES
[ https://issues.apache.org/jira/browse/HIVE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794069#comment-15794069 ] Rajesh Balamohan commented on HIVE-15529: - [~pxiong] - Yes, on task failure the node gets into disabled state. Will debug more on this. > LLAP: TaskSchedulerService can get stuck when scheduleTask returns > DELAYED_RESOURCES > > > Key: HIVE-15529 > URL: https://issues.apache.org/jira/browse/HIVE-15529 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Rajesh Balamohan >Priority: Critical > > Easier way to simulate the issue: > 1. Start hive cli with "--hiveconf hive.execution.mode=llap" > 2. Run a sql script file (e.g sql script containing tpc-ds queries) > 3. In the middle of the run, press "ctrl+C" which would interrupt the current > job. This should not exit the hive cli yet. > 4. After sometime, launch the same SQL script in same cli. This would get > stuck indefinitely (waiting for computing the splits). > Even when cli is quit, AM runs forever until explicitly killed. > Issue seems to be around {{LlapTaskSchedulerService::schedulePendingTasks}} > dealing with the loop when it encounters {{DELAYED_RESOURCES}} on task > scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15313) Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark document
[ https://issues.apache.org/jira/browse/HIVE-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794156#comment-15794156 ] Ferdinand Xu commented on HIVE-15313: - [~lirui], any progress or plan about HIVE-15302? I think we can resolve this ticket firstly since HIVE-15302 doesn't block HIVE-15313? Any suggestions? [~xuefuz] [~lirui] > Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark > document > --- > > Key: HIVE-15313 > URL: https://issues.apache.org/jira/browse/HIVE-15313 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Priority: Minor > Attachments: performance.improvement.after.set.spark.yarn.archive.PNG > > > According to > [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started], > run queries in HOS16 and HOS20 in yarn mode. > Following table shows the difference in query time between HOS16 and HOS20. > ||Version||Total time||Time for Jobs||Time for preparing jobs|| > |Spark16|51|39|12| > |Spark20|54|40|14| > HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing > the source code of spark, found that following point causes this: > code:[Client#distribute|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L546], > In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in > spark configuration file, it will first copy all jars in $SPARK_HOME/jars to > a tmp directory and upload the tmp directory to distribute cache. Comparing > [spark16|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1145], > > In spark16, it searches spark-assembly*.jar and upload it to distribute cache. > In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a > tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars". > We can accelerate the startup of hive on spark 20 by settintg > "spark.yarn.archive" or "spark.yarn.jars": > set "spark.yarn.archive": > {code} > cd $SPARK_HOME/jars > zip spark-archive.zip ./*.jar # this is important, enter the jars folder then > zip > $ hadoop fs -copyFromLocal spark-archive.zip > $ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> > conf/spark-defaults.conf > {code} > set "spark.yarn.jars": > {code} > $ hadoop fs mkdir spark-2.0.0-bin-hadoop > $hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop > $ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> > conf/spark-defaults.conf > {code} > Suggest to add this part in wiki. > performance.improvement.after.set.spark.yarn.archive.PNG shows the detail > performance impovement after setting spark.yarn.archive in small queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15313) Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark document
[ https://issues.apache.org/jira/browse/HIVE-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-15313: Assignee: liyunzhang_intel > Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark > document > --- > > Key: HIVE-15313 > URL: https://issues.apache.org/jira/browse/HIVE-15313 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel >Priority: Minor > Attachments: performance.improvement.after.set.spark.yarn.archive.PNG > > > According to > [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started], > run queries in HOS16 and HOS20 in yarn mode. > Following table shows the difference in query time between HOS16 and HOS20. > ||Version||Total time||Time for Jobs||Time for preparing jobs|| > |Spark16|51|39|12| > |Spark20|54|40|14| > HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing > the source code of spark, found that following point causes this: > code:[Client#distribute|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L546], > In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in > spark configuration file, it will first copy all jars in $SPARK_HOME/jars to > a tmp directory and upload the tmp directory to distribute cache. Comparing > [spark16|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1145], > > In spark16, it searches spark-assembly*.jar and upload it to distribute cache. > In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a > tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars". > We can accelerate the startup of hive on spark 20 by settintg > "spark.yarn.archive" or "spark.yarn.jars": > set "spark.yarn.archive": > {code} > cd $SPARK_HOME/jars > zip spark-archive.zip ./*.jar # this is important, enter the jars folder then > zip > $ hadoop fs -copyFromLocal spark-archive.zip > $ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> > conf/spark-defaults.conf > {code} > set "spark.yarn.jars": > {code} > $ hadoop fs mkdir spark-2.0.0-bin-hadoop > $hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop > $ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> > conf/spark-defaults.conf > {code} > Suggest to add this part in wiki. > performance.improvement.after.set.spark.yarn.archive.PNG shows the detail > performance impovement after setting spark.yarn.archive in small queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Attachment: HIVE-15530.1.patch > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-15530: - Assignee: Yibing Shi > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-15530: -- Status: Patch Available (was: Open) > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794176#comment-15794176 ] Pengcheng Xiong commented on HIVE-15530: [~Yibing], could u add a test case for this? Thanks. > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8373) OOM for a simple query with spark.master=local [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu resolved HIVE-8373. Resolution: Fixed Fix Version/s: 2.2.0 WIKI has been updated. Thanks [~kellyzly] for the contributions. > OOM for a simple query with spark.master=local [Spark Branch] > - > > Key: HIVE-8373 > URL: https://issues.apache.org/jira/browse/HIVE-8373 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Xuefu Zhang >Assignee: liyunzhang_intel > Fix For: 2.2.0 > > > I have a straigh forward query to run in Spark local mode, but get an OOM > even though the data volumn is tiny: > {code} > Exception in thread "Spark Context Cleaner" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "Spark Context Cleaner" > Exception in thread "Executor task launch worker-1" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "Executor task launch worker-1" > Exception in thread "Keep-Alive-Timer" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "Keep-Alive-Timer" > Exception in thread "Driver Heartbeater" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "Driver Heartbeater" > {code} > The query is: > {code} > select product_name, avg(item_price) as avg_price from product join item on > item.product_pk=product.product_pk group by product_name order by avg_price; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15527) Memory usage is unbound in SortByShuffler for Spark
[ https://issues.apache.org/jira/browse/HIVE-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794194#comment-15794194 ] Xuefu Zhang commented on HIVE-15527: [~csun], [~lirui], and [~kellyzly], thanks for your feedback. The patch here is more like a POC, so improvement is needed for production. Here are a few thoughts: 1) I'm not sure what caused the result diff, though there might be a bug in HiveKVResultCache that's manifested. The diff seems invalid when comparing to MR result. Also, there seems some randomness generating the diff. 2) As to performance, Rui's idea concern is valid. What I tried to demo is that we need something similar to HiveKVResultCache but only for values. 3) Similar to 2), we need to have a good cache size to avoid FIO for regular group sizes. Currently HiveKVREsultCache has cache only for 1024 rows, which seems rather small. 4) Performance impact needs to be evaluated. 5) The idea here could be used to solve the same problem for Spark's groupByKey() in Hive. We could use Spark's reduceByKey() instead and in Hive we do in-group value caching like what we can do here. I'm not sure if I have bandwidth to move this forward at full speed. Please feel free to take this (and other issues) forward. Thanks. > Memory usage is unbound in SortByShuffler for Spark > --- > > Key: HIVE-15527 > URL: https://issues.apache.org/jira/browse/HIVE-15527 > Project: Hive > Issue Type: Improvement > Components: Spark >Affects Versions: 1.1.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-15527.1.patch, HIVE-15527.2.patch, > HIVE-15527.3.patch, HIVE-15527.patch > > > In SortByShuffler.java, an ArrayList is used to back the iterator for values > that have the same key in shuffled result produced by spark transformation > sortByKey. It's possible that memory can be exhausted because of a large key > group. > {code} > @Override > public Tuple2> next() { > // TODO: implement this by accumulating rows with the same key > into a list. > // Note that this list needs to improved to prevent excessive > memory usage, but this > // can be done in later phase. > while (it.hasNext()) { > Tuple2 pair = it.next(); > if (curKey != null && !curKey.equals(pair._1())) { > HiveKey key = curKey; > List values = curValues; > curKey = pair._1(); > curValues = new ArrayList(); > curValues.add(pair._2()); > return new Tuple2>(key, > values); > } > curKey = pair._1(); > curValues.add(pair._2()); > } > if (curKey == null) { > throw new NoSuchElementException(); > } > // if we get here, this should be the last element we have > HiveKey key = curKey; > curKey = null; > return new Tuple2>(key, > curValues); > } > {code} > Since the output from sortByKey is already sorted on key, it's possible to > backup the value iterable using the same input iterator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15526) Some tests need SORT_QUERY_RESULTS
[ https://issues.apache.org/jira/browse/HIVE-15526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15526: -- Attachment: HIVE-15526.1.patch > Some tests need SORT_QUERY_RESULTS > -- > > Key: HIVE-15526 > URL: https://issues.apache.org/jira/browse/HIVE-15526 > Project: Hive > Issue Type: Test >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-15526.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15526) Some tests need SORT_QUERY_RESULTS
[ https://issues.apache.org/jira/browse/HIVE-15526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15526: -- Status: Patch Available (was: Open) > Some tests need SORT_QUERY_RESULTS > -- > > Key: HIVE-15526 > URL: https://issues.apache.org/jira/browse/HIVE-15526 > Project: Hive > Issue Type: Test >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-15526.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15313) Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark document
[ https://issues.apache.org/jira/browse/HIVE-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794215#comment-15794215 ] Rui Li commented on HIVE-15313: --- [~Ferd], yeah I'm OK to update the wiki for performance first. I'll update again once the minimum set is determined. > Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark > document > --- > > Key: HIVE-15313 > URL: https://issues.apache.org/jira/browse/HIVE-15313 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel >Priority: Minor > Attachments: performance.improvement.after.set.spark.yarn.archive.PNG > > > According to > [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started], > run queries in HOS16 and HOS20 in yarn mode. > Following table shows the difference in query time between HOS16 and HOS20. > ||Version||Total time||Time for Jobs||Time for preparing jobs|| > |Spark16|51|39|12| > |Spark20|54|40|14| > HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing > the source code of spark, found that following point causes this: > code:[Client#distribute|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L546], > In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in > spark configuration file, it will first copy all jars in $SPARK_HOME/jars to > a tmp directory and upload the tmp directory to distribute cache. Comparing > [spark16|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1145], > > In spark16, it searches spark-assembly*.jar and upload it to distribute cache. > In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a > tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars". > We can accelerate the startup of hive on spark 20 by settintg > "spark.yarn.archive" or "spark.yarn.jars": > set "spark.yarn.archive": > {code} > cd $SPARK_HOME/jars > zip spark-archive.zip ./*.jar # this is important, enter the jars folder then > zip > $ hadoop fs -copyFromLocal spark-archive.zip > $ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> > conf/spark-defaults.conf > {code} > set "spark.yarn.jars": > {code} > $ hadoop fs mkdir spark-2.0.0-bin-hadoop > $hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop > $ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> > conf/spark-defaults.conf > {code} > Suggest to add this part in wiki. > performance.improvement.after.set.spark.yarn.archive.PNG shows the detail > performance impovement after setting spark.yarn.archive in small queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15525) Hooking ChangeManager to "drop table", "drop partition"
[ https://issues.apache.org/jira/browse/HIVE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-15525: -- Attachment: HIVE-15525.2.patch > Hooking ChangeManager to "drop table", "drop partition" > --- > > Key: HIVE-15525 > URL: https://issues.apache.org/jira/browse/HIVE-15525 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-15525.1.patch, HIVE-15525.2.patch > > > When Hive "drop table"/"drop partition", we will move data files into cmroot > in case the replication destination will need it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-15313) Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark document
[ https://issues.apache.org/jira/browse/HIVE-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu resolved HIVE-15313. - Resolution: Fixed Fix Version/s: 2.2.0 Updated the (Configuring Hive section https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-ConfiguringHive). Thanks [~kellyzly] [~xuefuz] and [~lirui] for the review and contribution. > Add export spark.yarn.archive or spark.yarn.jars variable in Hive on Spark > document > --- > > Key: HIVE-15313 > URL: https://issues.apache.org/jira/browse/HIVE-15313 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel >Priority: Minor > Fix For: 2.2.0 > > Attachments: performance.improvement.after.set.spark.yarn.archive.PNG > > > According to > [wiki|https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started], > run queries in HOS16 and HOS20 in yarn mode. > Following table shows the difference in query time between HOS16 and HOS20. > ||Version||Total time||Time for Jobs||Time for preparing jobs|| > |Spark16|51|39|12| > |Spark20|54|40|14| > HOS20 spends more time(2 secs) on preparing jobs than HOS16. After reviewing > the source code of spark, found that following point causes this: > code:[Client#distribute|https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L546], > In spark20, if spark cannot find spark.yarn.archive and spark.yarn.jars in > spark configuration file, it will first copy all jars in $SPARK_HOME/jars to > a tmp directory and upload the tmp directory to distribute cache. Comparing > [spark16|https://github.com/apache/spark/blob/branch-1.6/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1145], > > In spark16, it searches spark-assembly*.jar and upload it to distribute cache. > In spark20, it spends 2 more seconds to copy all jars in $SPARK_HOME/jar to a > tmp directory if we don't set "spark.yarn.archive" or "spark.yarn.jars". > We can accelerate the startup of hive on spark 20 by settintg > "spark.yarn.archive" or "spark.yarn.jars": > set "spark.yarn.archive": > {code} > cd $SPARK_HOME/jars > zip spark-archive.zip ./*.jar # this is important, enter the jars folder then > zip > $ hadoop fs -copyFromLocal spark-archive.zip > $ echo "spark.yarn.archive=hdfs:///xxx:8020/spark-archive.zip" >> > conf/spark-defaults.conf > {code} > set "spark.yarn.jars": > {code} > $ hadoop fs mkdir spark-2.0.0-bin-hadoop > $hadoop fs -copyFromLocal $SPARK_HOME/jars/* spark-2.0.0-bin-hadoop > $ echo "spark.yarn.jars=hdfs:///xxx:8020/spark-2.0.0-bin-hadoop/*" >> > conf/spark-defaults.conf > {code} > Suggest to add this part in wiki. > performance.improvement.after.set.spark.yarn.archive.PNG shows the detail > performance impovement after setting spark.yarn.archive in small queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15530) Optimize the column stats update logic in table alteration
[ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794303#comment-15794303 ] Hive QA commented on HIVE-15530: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12845326/HIVE-15530.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10883 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=139) [skewjoinopt15.q,vector_coalesce.q,orc_ppd_decimal.q,cbo_rp_lineage2.q,insert_into_with_schema.q,join_emit_interval.q,load_dyn_part3.q,auto_sortmerge_join_14.q,vector_null_projection.q,vector_cast_constant.q,mapjoin2.q,bucket_map_join_tez2.q,correlationoptimizer4.q,schema_evol_orc_acidvec_part_update.q,vectorization_12.q,vector_number_compare_projection.q,orc_merge_incompat3.q,vector_leftsemi_mapjoin.q,update_all_non_partitioned.q,multi_column_in_single.q,schema_evol_orc_nonvec_table.q,cbo_rp_semijoin.q,tez_insert_overwrite_local_directory_1.q,schema_evol_text_vecrow_table.q,vector_count.q,auto_sortmerge_join_15.q,vector_if_expr.q,delete_whole_partition.q,vector_decimal_6.q,sample1.q] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] (batchId=92) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2760/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2760/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2760/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12845326 - PreCommit-HIVE-Build > Optimize the column stats update logic in table alteration > -- > > Key: HIVE-15530 > URL: https://issues.apache.org/jira/browse/HIVE-15530 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-15530.1.patch > > > Currently when a table is altered, if any of below conditions is true, HMS > would try to update column statistics for the table: > # database name is changed > # table name is changed > # old columns and new columns are not the same > As a result, when a column is added to a table, Hive also tries to update > column statistics, which is not necessary. We can loose the last condition by > checking whether all existing columns are changed or not. If not, we don't > have to update stats info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15526) Some tests need SORT_QUERY_RESULTS
[ https://issues.apache.org/jira/browse/HIVE-15526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794388#comment-15794388 ] Hive QA commented on HIVE-15526: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12845328/HIVE-15526.1.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10913 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_testxpath] (batchId=28) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_coalesce] (batchId=75) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2761/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2761/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2761/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12845328 - PreCommit-HIVE-Build > Some tests need SORT_QUERY_RESULTS > -- > > Key: HIVE-15526 > URL: https://issues.apache.org/jira/browse/HIVE-15526 > Project: Hive > Issue Type: Test >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-15526.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-15526) Some tests need SORT_QUERY_RESULTS
[ https://issues.apache.org/jira/browse/HIVE-15526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15526: -- Description: {{temp_table_gb1.q}} and {{vector_between_in.q}} > Some tests need SORT_QUERY_RESULTS > -- > > Key: HIVE-15526 > URL: https://issues.apache.org/jira/browse/HIVE-15526 > Project: Hive > Issue Type: Test >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-15526.1.patch > > > {{temp_table_gb1.q}} and {{vector_between_in.q}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15526) Some tests need SORT_QUERY_RESULTS
[ https://issues.apache.org/jira/browse/HIVE-15526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794403#comment-15794403 ] Rui Li commented on HIVE-15526: --- Failures not related. [~xuefuz], could you have a look? The change is trivial. Thanks. > Some tests need SORT_QUERY_RESULTS > -- > > Key: HIVE-15526 > URL: https://issues.apache.org/jira/browse/HIVE-15526 > Project: Hive > Issue Type: Test >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-15526.1.patch > > > {{temp_table_gb1.q}} and {{vector_between_in.q}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)