[jira] [Commented] (HIVE-17267) Make HMS Notification Listeners typesafe
[ https://issues.apache.org/jira/browse/HIVE-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124491#comment-16124491 ] Barna Zsombor Klara commented on HIVE-17267: Unit test failures should not be related. > Make HMS Notification Listeners typesafe > > > Key: HIVE-17267 > URL: https://issues.apache.org/jira/browse/HIVE-17267 > Project: Hive > Issue Type: Bug >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17267.01.patch, HIVE-17267.02.patch, > HIVE-17267.03.patch > > > Currently in the HMS we support two types of notification listeners, > transactional and non-transactional ones. Transactional listeners will only > be invoked if the jdbc transaction finished successfully while > non-transactional ones are supposed to be resilient and will be invoked in > any case, even for failures. > Having the same type for these two is a source of confusion and opens the > door for misconfigurations. We should try to fix this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17305) New insert overwrite dynamic partitions qtest need to have the golden file regenerated
[ https://issues.apache.org/jira/browse/HIVE-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara reassigned HIVE-17305: -- > New insert overwrite dynamic partitions qtest need to have the golden file > regenerated > -- > > Key: HIVE-17305 > URL: https://issues.apache.org/jira/browse/HIVE-17305 > Project: Hive > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124480#comment-16124480 ] Hive QA commented on HIVE-17089: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881561/HIVE-17089.11.patch {color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10968 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lateral_view_cp] (batchId=82) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning (batchId=291) org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderIncompleteDelta (batchId=264) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) org.apache.hive.service.cli.operation.TestOperationLoggingAPIWithMr.testFetchResultsOfLogWithNoneMode (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6364/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6364/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6364/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881561 - PreCommit-HIVE-Build > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe
[ https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124479#comment-16124479 ] Tao Li commented on HIVE-17301: --- Test failures are tracked in HIVE-15058 and don't seem related to this change. > Make JSONMessageFactory.getTObj method thread safe > -- > > Key: HIVE-17301 > URL: https://issues.apache.org/jira/browse/HIVE-17301 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-17301.1.patch > > > This static method is using a singleton instance of TDeserializer, which is > not thread safe. Instead we want to create a new instance per method call. > This class is lightweight, so it should be fine from perf perspective. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins
[ https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124477#comment-16124477 ] Lefty Leverenz commented on HIVE-17283: --- Doc note: This adds *hive.tez.dynamic.semijoin.reduction.for.mapjoin* to HiveConf.java, so it needs to be documented in the wiki. * [Configuration Properties -- Tez | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez] Added a TODOC3.0 label. > Enable parallel edges of semijoin along with mapjoins > - > > Key: HIVE-17283 > URL: https://issues.apache.org/jira/browse/HIVE-17283 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-17283.1.patch, HIVE-17283.2.patch > > > https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of > semijoin with mapjoin. However, in some cases it maybe beneficial to have it. > We need a config which can enable it. > The default should be false which maintains the existing behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins
[ https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-17283: -- Labels: TODOC3.0 (was: ) > Enable parallel edges of semijoin along with mapjoins > - > > Key: HIVE-17283 > URL: https://issues.apache.org/jira/browse/HIVE-17283 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-17283.1.patch, HIVE-17283.2.patch > > > https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of > semijoin with mapjoin. However, in some cases it maybe beneficial to have it. > We need a config which can enable it. > The default should be false which maintains the existing behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17240) Function ACOS(2) and ASIN(2) should be null
[ https://issues.apache.org/jira/browse/HIVE-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124472#comment-16124472 ] Yuming Wang commented on HIVE-17240: [~sershe], is it ready to be committed? Thanks. > Function ACOS(2) and ASIN(2) should be null > --- > > Key: HIVE-17240 > URL: https://issues.apache.org/jira/browse/HIVE-17240 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.1.1, 1.2.2, 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-17240.1.patch, HIVE-17240.2.patch, > HIVE-17240.3.patch, HIVE-17240.4.patch, HIVE-17240.5.patch, HIVE-17240.6.patch > > > {{acos(2)}} should be null, same as MySQL: > {code:sql} > hive> desc function extended acos; > OK > acos(x) - returns the arc cosine of x if -1<=x<=1 or NULL otherwise > Example: > > SELECT acos(1) FROM src LIMIT 1; > 0 > > SELECT acos(2) FROM src LIMIT 1; > NULL > Time taken: 0.009 seconds, Fetched: 6 row(s) > hive> select acos(2); > OK > NaN > Time taken: 0.437 seconds, Fetched: 1 row(s) > {code} > {code:sql} > mysql> select acos(2); > +-+ > | acos(2) | > +-+ > |NULL | > +-+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem
[ https://issues.apache.org/jira/browse/HIVE-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124473#comment-16124473 ] Yuming Wang commented on HIVE-15794: [~sershe], is it ready to be committed? Thanks. > Support get hdfsEncryptionShim if FileSystem is ViewFileSystem > -- > > Key: HIVE-15794 > URL: https://issues.apache.org/jira/browse/HIVE-15794 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 1.2.0, 1.1.0, 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Attachments: HIVE-15794.1.patch, HIVE-15794.2.patch, > HIVE-15794.3.patch > > > *SQL*: > {code:sql} > hive> create table table2 as select * from table1; > hive> show create table table2; > OK > CREATE TABLE `table2`( > `id` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.mapred.TextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://cluster4/user/hive/warehouse/table2' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1486050317') > {code} > *LOG*: > {noformat} > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > 2017-02-02T20:12:49,738 INFO [99374b82-e9ca-4654-b803-93b194b9331b main] > session.SessionState: Could not get hdfsEncryptionShim, it is only applicable > to hdfs filesystem. > {noformat} > Can’t get hdfsEncryptionShim if {{FileSystem}} is > [ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html], > we should support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message
[ https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124450#comment-16124450 ] Hive QA commented on HIVE-17302: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881552/HIVE-17302.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6363/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6363/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6363/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881552 - PreCommit-HIVE-Build > ReduceRecordSource should not add batch string to Exception message > --- > > Key: HIVE-17302 > URL: https://issues.apache.org/jira/browse/HIVE-17302 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17302.patch, stack.txt > > > ReduceRecordSource is adding the batch data as a string to the exception > stack, this can lead to an OOM of the Query AM when the query fails due to > other issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124447#comment-16124447 ] Lefty Leverenz commented on HIVE-1941: -- Quite right, this is only documented as a design doc and the DDL doc doesn't even have a link to it. There's also a Views design doc that has information not covered in the DDL section. * [DDL -- Create/Drop/Alter View | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView] * [Design Docs -- Partitioned Views | https://cwiki.apache.org/confluence/display/Hive/PartitionedViews] * [Design Docs -- Views | https://cwiki.apache.org/confluence/display/Hive/ViewDev] By tracking, do you mean a TODOC label? We'd have to create a new one -- either a generic "TODOC" label or a version-specific one for 0.8.0, which would be TODOC8 for consistency with the other pre-1.0.0 labels. (I'm inclined to avoid label proliferation with a generic label.) > support explicit view partitioning > -- > > Key: HIVE-1941 > URL: https://issues.apache.org/jira/browse/HIVE-1941 > Project: Hive > Issue Type: New Feature > Components: Query Processor, Views >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.8.0 > > Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, > HIVE-1941.4.patch, HIVE-1941.5.patch > > > Allow creation of a view with an explicit partitioning definition, and > support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. > For more information, see > (obsolete: http://wiki.apache.org/hadoop/Hive/PartitionedViews) > https://cwiki.apache.org/confluence/display/Hive/PartitionedViews -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-1941) support explicit view partitioning
[ https://issues.apache.org/jira/browse/HIVE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-1941: - Description: Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see (obsolete: http://wiki.apache.org/hadoop/Hive/PartitionedViews) https://cwiki.apache.org/confluence/display/Hive/PartitionedViews was: Allow creation of a view with an explicit partitioning definition, and support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. For more information, see http://wiki.apache.org/hadoop/Hive/PartitionedViews > support explicit view partitioning > -- > > Key: HIVE-1941 > URL: https://issues.apache.org/jira/browse/HIVE-1941 > Project: Hive > Issue Type: New Feature > Components: Query Processor, Views >Affects Versions: 0.6.0 >Reporter: John Sichi >Assignee: John Sichi > Fix For: 0.8.0 > > Attachments: HIVE-1941.1.patch, HIVE-1941.2.patch, HIVE-1941.3.patch, > HIVE-1941.4.patch, HIVE-1941.5.patch > > > Allow creation of a view with an explicit partitioning definition, and > support ALTER VIEW ADD/DROP PARTITION for instantiating partitions. > For more information, see > (obsolete: http://wiki.apache.org/hadoop/Hive/PartitionedViews) > https://cwiki.apache.org/confluence/display/Hive/PartitionedViews -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124425#comment-16124425 ] Hive QA commented on HIVE-14731: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881548/HIVE-14731.19.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 46 failed/errored test(s), 11008 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_partition_pruning_2] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join0] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join29] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join30] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join_filters] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_join_nulls] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[auto_sortmerge_join_12] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_1] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_2] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_prod_4] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cross_product_check_2] (batchId=163) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[empty_join] (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_1] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[jdbc_handler] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[leftsemijoin] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_exists] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_multi] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_null_agg] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_between_columns] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_complex_all] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_mapjoin] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_include_no_sel] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join30] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_filters] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join_nulls] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_leftsemi_mapjoin] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_multi_output_select] (batchId=158) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[hybridgrace_hashjoin_1] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.t
[jira] [Comment Edited] (HIVE-17265) Cache merged column stats from retrieved partitions
[ https://issues.apache.org/jira/browse/HIVE-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124404#comment-16124404 ] Jesus Camacho Rodriguez edited comment on HIVE-17265 at 8/12/17 2:17 AM: - [~ashutoshc], sure, I have created it in : https://reviews.apache.org/r/61604/ Thanks was (Author: jcamachorodriguez): Sure, I have created it in : https://reviews.apache.org/r/61604/ Thanks > Cache merged column stats from retrieved partitions > --- > > Key: HIVE-17265 > URL: https://issues.apache.org/jira/browse/HIVE-17265 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17265.02.patch, HIVE-17265.patch > > > Currently when we retrieve stats from the metastore for a column in a > partitioned table, we will execute the logic to merge the column stats coming > from each partition multiple times. > Even though we avoid multiple calls to metastore if the cache for the stats > in enabled, merging the stats for a given column can take a large amount of > time if there is a large number of partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17265) Cache merged column stats from retrieved partitions
[ https://issues.apache.org/jira/browse/HIVE-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124404#comment-16124404 ] Jesus Camacho Rodriguez commented on HIVE-17265: Sure, I have created it in : https://reviews.apache.org/r/61604/ Thanks > Cache merged column stats from retrieved partitions > --- > > Key: HIVE-17265 > URL: https://issues.apache.org/jira/browse/HIVE-17265 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17265.02.patch, HIVE-17265.patch > > > Currently when we retrieve stats from the metastore for a column in a > partitioned table, we will execute the logic to merge the column stats coming > from each partition multiple times. > Even though we avoid multiple calls to metastore if the cache for the stats > in enabled, merging the stats for a given column can take a large amount of > time if there is a large number of partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe
[ https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124401#comment-16124401 ] Hive QA commented on HIVE-17301: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881518/HIVE-17301.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6361/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6361/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6361/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881518 - PreCommit-HIVE-Build > Make JSONMessageFactory.getTObj method thread safe > -- > > Key: HIVE-17301 > URL: https://issues.apache.org/jira/browse/HIVE-17301 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-17301.1.patch > > > This static method is using a singleton instance of TDeserializer, which is > not thread safe. Instead we want to create a new instance per method call. > This class is lightweight, so it should be fine from perf perspective. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17265) Cache merged column stats from retrieved partitions
[ https://issues.apache.org/jira/browse/HIVE-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124383#comment-16124383 ] Ashutosh Chauhan commented on HIVE-17265: - Can you please create a RB for this? Got some minor comments. > Cache merged column stats from retrieved partitions > --- > > Key: HIVE-17265 > URL: https://issues.apache.org/jira/browse/HIVE-17265 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17265.02.patch, HIVE-17265.patch > > > Currently when we retrieve stats from the metastore for a column in a > partitioned table, we will execute the logic to merge the column stats coming > from each partition multiple times. > Even though we avoid multiple calls to metastore if the cache for the stats > in enabled, merging the stats for a given column can take a large amount of > time if there is a large number of partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124382#comment-16124382 ] Prasanth Jayachandran commented on HIVE-17304: -- This provides a better estimates and also the estimates are pretty close the actual object size (observed this from heapdumps) atleast for vectorized case. Also bringing down the inflation factor from 2.0 to 1.5 as a result. Still testing this patch on larger dataset. > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17304: - Attachment: HIVE-17304.1.patch > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17304: - Status: Patch Available (was: Open) > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17304.1.patch > > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17268) WebUI / QueryPlan: query plan is sometimes null when explain output conf is on
[ https://issues.apache.org/jira/browse/HIVE-17268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124375#comment-16124375 ] Hive QA commented on HIVE-17268: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881481/HIVE-17268.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6360/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6360/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6360/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881481 - PreCommit-HIVE-Build > WebUI / QueryPlan: query plan is sometimes null when explain output conf is on > -- > > Key: HIVE-17268 > URL: https://issues.apache.org/jira/browse/HIVE-17268 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Minor > Attachments: HIVE-17268.2.patch, HIVE-17268.3.patch, HIVE-17268.patch > > > The Hive WebUI's Query Plan tab displays "SET hive.log.explain.output TO true > TO VIEW PLAN" even when hive.log.explain.output is set to true, when the > query cannot be compiled, because the plan is null in this case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors
[ https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17286: --- Attachment: HIVE-17286.03.patch > Avoid expensive String serialization/deserialization for bitvectors > --- > > Key: HIVE-17286 > URL: https://issues.apache.org/jira/browse/HIVE-17286 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17286.01.patch, HIVE-17286.02.patch, > HIVE-17286.03.patch, HIVE-17286.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors
[ https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17286: --- Attachment: HIVE-17286.02.patch > Avoid expensive String serialization/deserialization for bitvectors > --- > > Key: HIVE-17286 > URL: https://issues.apache.org/jira/browse/HIVE-17286 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17286.01.patch, HIVE-17286.02.patch, > HIVE-17286.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16873) Remove Thread Cache From Logging
[ https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124310#comment-16124310 ] Hive QA commented on HIVE-16873: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881476/HIVE-16873.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11003 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6359/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6359/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6359/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881476 - PreCommit-HIVE-Build > Remove Thread Cache From Logging > > > Key: HIVE-16873 > URL: https://issues.apache.org/jira/browse/HIVE-16873 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, > HIVE-16873.3.patch > > > In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} > class (and its buffer) tied to every thread. > This {{Formatter}} is for logging purposes. I would suggest that we simply > let let the logging framework itself handle these kind of details and ditch > the buffer per thread. > {code} > public static final String AUDIT_FORMAT = > "ugi=%s\t" + // ugi > "ip=%s\t" + // remote IP > "cmd=%s\t"; // command > public static final Logger auditLog = LoggerFactory.getLogger( > HiveMetaStore.class.getName() + ".audit"); > private static final ThreadLocal auditFormatter = > new ThreadLocal() { > @Override > protected Formatter initialValue() { > return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * > 4)); > } > }; > ... > private static final void logAuditEvent(String cmd) { > final Formatter fmt = auditFormatter.get(); > ((StringBuilder) fmt.out()).setLength(0); > String address = getIPAddress(); > if (address == null) { > address = "unknown-ip-addr"; > } > auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(), > address, cmd).toString()); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition
[ https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-17148: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks, Vlad! > Incorrect result for Hive join query with COALESCE in WHERE condition > - > > Key: HIVE-17148 > URL: https://issues.apache.org/jira/browse/HIVE-17148 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.1 >Reporter: Vlad Gudikov >Assignee: Vlad Gudikov > Fix For: 3.0.0 > > Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, > HIVE-17148.3.patch, HIVE-17148.patch > > > The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo > enabled: > STEPS TO REPRODUCE: > {code} > Step 1: Create a table ct1 > create table ct1 (a1 string,b1 string); > Step 2: Create a table ct2 > create table ct2 (a2 string); > Step 3 : Insert following data into table ct1 > insert into table ct1 (a1) values ('1'); > Step 4 : Insert following data into table ct2 > insert into table ct2 (a2) values ('1'); > Step 5 : Execute the following query > select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2; > {code} > ACTUAL RESULT: > {code} > The query returns nothing; > {code} > EXPECTED RESULT: > {code} > 1 NULL1 > {code} > The issue seems to be because of the incorrect query plan. In the plan we can > see: > predicate:(a1 is not null and b1 is not null) > which does not look correct. As a result, it is filtering out all the rows is > any column mentioned in the COALESCE has null value. > Please find the query plan below: > {code} > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Map 2 (BROADCAST_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 > File Output Operator [FS_10] > Map Join Operator [MAPJOIN_15] (rows=1 width=4) > > Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"] > <-Map 2 [BROADCAST_EDGE] > BROADCAST [RS_7] > PartitionCols:_col0 > Select Operator [SEL_5] (rows=1 width=1) > Output:["_col0"] > Filter Operator [FIL_14] (rows=1 width=1) > predicate:a2 is not null > TableScan [TS_3] (rows=1 width=1) > default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"] > <-Select Operator [SEL_2] (rows=1 width=4) > Output:["_col0","_col1"] > Filter Operator [FIL_13] (rows=1 width=4) > predicate:(a1 is not null and b1 is not null) > TableScan [TS_0] (rows=1 width=4) > default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"] > {code} > This happens only if join is inner type, otherwise HiveJoinAddNotRule which > creates this problem is skipped. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17241) Change metastore classes to not use the shims
[ https://issues.apache.org/jira/browse/HIVE-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124271#comment-16124271 ] ASF GitHub Bot commented on HIVE-17241: --- GitHub user alanfgates opened a pull request: https://github.com/apache/hive/pull/228 HIVE-17241 Removed shims from metastore. For HDFS and getPassword I just access… … those operations directly. I copied all of the HadoopThriftAuthBridge stuff over from Hive common. You can merge this pull request into a Git repository by running: $ git pull https://github.com/alanfgates/hive hive17241 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/228.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #228 > Change metastore classes to not use the shims > - > > Key: HIVE-17241 > URL: https://issues.apache.org/jira/browse/HIVE-17241 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > > As part of moving the metastore into a standalone package, it will no longer > have access to the shims. This means we need to either copy them or access > the underlying Hadoop operations directly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader
[ https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-17304: > ThreadMXBean based memory allocation monitory for hash table loader > --- > > Key: HIVE-17304 > URL: https://issues.apache.org/jira/browse/HIVE-17304 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > Hash table memory monitoring is based on java data model which can be > unreliable because of various reasons (wrong object size estimation, adding > new variables to any class without accounting its size for memory monitoring, > etc.). We can use allocation size per thread that is provided by ThreadMXBean > and fallback to DataModel in case if JDK doesn't support thread based > allocations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by
[ https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124263#comment-16124263 ] Xuefu Zhang commented on HIVE-17287: [~kellyzly], just curious, what error did you get for the failed tasks? Memory related? > HoS can not deal with skewed data group by > -- > > Key: HIVE-17287 > URL: https://issues.apache.org/jira/browse/HIVE-17287 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: query67-fail-at-groupby.png, > query67-groupby_shuffle_metric.png > > > In > [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql], > fact table {{store_sales}} joins with small tables {{date_dim}}, > {{item}},{{store}}. After join, groupby the intermediate data. > Here the data of {{store_sales}} on 3TB tpcds is skewed: there are 1824 > partitions. The biggest partition is 25.7G and others are 715M. > {code} > hadoop fs -du -h > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales > > 715.0 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639 > 713.9 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640 > 714.1 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641 > 712.9 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642 > 25.7 G > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__ > {code} > The skewed table {{store_sales}} caused the failed job. Is there any way to > solve the groupby problem of skewed table? I tried to enable > {{hive.groupby.skewindata}} to first divide the data more evenly then start > do group by. But the job still hangs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124224#comment-16124224 ] Hive QA commented on HIVE-17289: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881462/HIVE-17289.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11002 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=240) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_join_partition_key] (batchId=13) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6358/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6358/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6358/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881462 - PreCommit-HIVE-Build > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17274) RowContainer spills for timestamp column throws exception
[ https://issues.apache.org/jira/browse/HIVE-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17274: - Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) Test failures are not related to this patch. Committed patch to master and branch-2. Thanks for the review! > RowContainer spills for timestamp column throws exception > - > > Key: HIVE-17274 > URL: https://issues.apache.org/jira/browse/HIVE-17274 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0, 3.0.0, 2.4.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-17274.1.patch > > > Path names cannot contain ":" (HADOOP-3257) > Join key toString() is used as part of filename. > https://github.com/apache/hive/blob/16bfb9c9405b68a24c7e6c1b13bec00e38bbe213/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java#L523 > If join key is timestamp column then this will throw following exception. > {code} > 2017-08-05 23:51:33,631 ERROR [main] > org.apache.hadoop.hive.ql.exec.persistence.RowContainer: > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: .RowContainer7551143976922371245.[1792453531, > 2016-09-02 01:17:43,%202016-09-02%5D.tmp.crc > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: .RowContainer7551143976922371245.[1792453531, > 2016-09-02 01:17:43,%202016-09-02%5D.tmp.crc > at org.apache.hadoop.fs.Path.initialize(Path.java:205) > at org.apache.hadoop.fs.Path.(Path.java:171) > at org.apache.hadoop.fs.Path.(Path.java:93) > at > org.apache.hadoop.fs.ChecksumFileSystem.getChecksumFile(ChecksumFileSystem.java:94) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:404) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:463) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:926) > at > org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:1137) > at > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273) > at > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530) > at > org.apache.hadoop.hive.ql.exec.Utilities.createSequenceWriter(Utilities.java:1643) > at > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat.getHiveRecordWriter(HiveSequenceFileOutputFormat.java:64) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:243) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.setupWriter(RowContainer.java:538) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:299) > at > org.apache.hadoop.hive.ql.exec.persistence.RowContainer.copyToDFSDirecory(RowContainer.java:407) > at > org.apache.hadoop.hive.ql.exec.SkewJoinHandler.endGroup(SkewJoinHandler.java:185) > at > org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:249) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:195) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > .RowContainer7551143976922371245.[1792453531, 2016-09-02 > 01:17:43,%202016-09-02%5D.tmp.crc > at java.net.URI.checkPath(URI.java:1823) > at java.net.URI.(URI.java:745) > at org.apache.hadoop.fs.Path.initialize(Path.java:202) > ... 26 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez
[ https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124206#comment-16124206 ] Gopal V commented on HIVE-17303: [~bslim]: do you know the versions mismatched between Druid & Tez? Tez will upgrade eventually, but it would be good to know the versions. > Missmatch between roaring bitmap library used by druid and the one coming > from tez > -- > > Key: HIVE-17303 > URL: https://issues.apache.org/jira/browse/HIVE-17303 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17303.patch > > > {code} > > Caused by: java.util.concurrent.ExecutionException: > java.lang.NoSuchMethodError: > org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165) > ... 25 more > Caused by: java.lang.NoSuchMethodError: > org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z > at > org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65) > at > org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88) > at > org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348) > at > org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385) > at > org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44) > ... 4 more > ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] > killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2) > Options > Attachments > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124192#comment-16124192 ] Eugene Koifman commented on HIVE-17089: --- patch 11 - fix remaining 3 failures in TestInputOutputFormat > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17089: -- Attachment: HIVE-17089.11.patch > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez
[ https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17303: -- Attachment: HIVE-17303.patch > Missmatch between roaring bitmap library used by druid and the one coming > from tez > -- > > Key: HIVE-17303 > URL: https://issues.apache.org/jira/browse/HIVE-17303 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17303.patch > > > {code} > > Caused by: java.util.concurrent.ExecutionException: > java.lang.NoSuchMethodError: > org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165) > ... 25 more > Caused by: java.lang.NoSuchMethodError: > org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z > at > org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65) > at > org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88) > at > org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348) > at > org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385) > at > org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44) > ... 4 more > ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] > killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2) > Options > Attachments > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez
[ https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17303: -- Status: Patch Available (was: Open) > Missmatch between roaring bitmap library used by druid and the one coming > from tez > -- > > Key: HIVE-17303 > URL: https://issues.apache.org/jira/browse/HIVE-17303 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > > {code} > > Caused by: java.util.concurrent.ExecutionException: > java.lang.NoSuchMethodError: > org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165) > ... 25 more > Caused by: java.lang.NoSuchMethodError: > org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z > at > org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65) > at > org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88) > at > org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348) > at > org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385) > at > org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44) > ... 4 more > ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] > killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2) > Options > Attachments > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17303) Missmatch between roaring bitmap library used by druid and the one coming from tez
[ https://issues.apache.org/jira/browse/HIVE-17303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra reassigned HIVE-17303: - > Missmatch between roaring bitmap library used by druid and the one coming > from tez > -- > > Key: HIVE-17303 > URL: https://issues.apache.org/jira/browse/HIVE-17303 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > > {code} > > Caused by: java.util.concurrent.ExecutionException: > java.lang.NoSuchMethodError: > org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > at > org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:165) > ... 25 more > Caused by: java.lang.NoSuchMethodError: > org.roaringbitmap.buffer.MutableRoaringBitmap.runOptimize()Z > at > org.apache.hive.druid.com.metamx.collections.bitmap.WrappedRoaringBitmap.toImmutableBitmap(WrappedRoaringBitmap.java:65) > at > org.apache.hive.druid.com.metamx.collections.bitmap.RoaringBitmapFactory.makeImmutableBitmap(RoaringBitmapFactory.java:88) > at > org.apache.hive.druid.io.druid.segment.StringDimensionMergerV9.writeIndexes(StringDimensionMergerV9.java:348) > at > org.apache.hive.druid.io.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:218) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.merge(IndexMerger.java:438) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:186) > at > org.apache.hive.druid.io.druid.segment.IndexMerger.persist(IndexMerger.java:152) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:996) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.access$200(AppenderatorImpl.java:93) > at > org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl$2.doCall(AppenderatorImpl.java:385) > at > org.apache.hive.druid.io.druid.common.guava.ThreadRenamingCallable.call(ThreadRenamingCallable.java:44) > ... 4 more > ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 > killedTasks:89, Vertex vertex_1502470020457_0005_12_05 [Reducer 2] > killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to > VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2) > Options > Attachments > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message
[ https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124183#comment-16124183 ] Prasanth Jayachandran commented on HIVE-17302: -- +1, pending tests > ReduceRecordSource should not add batch string to Exception message > --- > > Key: HIVE-17302 > URL: https://issues.apache.org/jira/browse/HIVE-17302 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17302.patch, stack.txt > > > ReduceRecordSource is adding the batch data as a string to the exception > stack, this can lead to an OOM of the Query AM when the query fails due to > other issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17194) JDBC: Implement Gzip servlet filter
[ https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124172#comment-16124172 ] Thejas M Nair commented on HIVE-17194: -- Patch looks good. +1 More information from offline discussion with [~gopalv] - tests show +4% CPU for ~3x reduction sizes with TPC-DS customer_demographics. Overall time remained same. > JDBC: Implement Gzip servlet filter > --- > > Key: HIVE-17194 > URL: https://issues.apache.org/jira/browse/HIVE-17194 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, JDBC >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-17194.1.patch, HIVE-17194.2.patch, > HIVE-17194.3.patch > > > {code} > POST /cliservice HTTP/1.1 > Content-Type: application/x-thrift > Accept: application/x-thrift > User-Agent: Java/THttpClient/HC > Authorization: Basic YW5vbnltb3VzOmFub255bW91cw== > Content-Length: 71 > Host: localhost:10007 > Connection: Keep-Alive > Accept-Encoding: gzip,deflate > X-XSRF-HEADER: true > {code} > The Beeline client clearly sends out HTTP compression headers which are > ignored by the HTTP service layer in HS2. > After patch, result looks like > {code} > HTTP/1.1 200 OK > Date: Tue, 01 Aug 2017 01:47:23 GMT > Content-Type: application/x-thrift > Vary: Accept-Encoding, User-Agent > Content-Encoding: gzip > Transfer-Encoding: chunked > Server: Jetty(9.3.8.v20160314) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message
[ https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124164#comment-16124164 ] slim bouguerra commented on HIVE-17302: --- [~ashutoshc] can you check this out. > ReduceRecordSource should not add batch string to Exception message > --- > > Key: HIVE-17302 > URL: https://issues.apache.org/jira/browse/HIVE-17302 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17302.patch, stack.txt > > > ReduceRecordSource is adding the batch data as a string to the exception > stack, this can lead to an OOM of the Query AM when the query fails due to > other issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message
[ https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17302: -- Attachment: HIVE-17302.patch > ReduceRecordSource should not add batch string to Exception message > --- > > Key: HIVE-17302 > URL: https://issues.apache.org/jira/browse/HIVE-17302 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17302.patch, stack.txt > > > ReduceRecordSource is adding the batch data as a string to the exception > stack, this can lead to an OOM of the Query AM when the query fails due to > other issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message
[ https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17302: -- Assignee: slim bouguerra Status: Patch Available (was: Open) > ReduceRecordSource should not add batch string to Exception message > --- > > Key: HIVE-17302 > URL: https://issues.apache.org/jira/browse/HIVE-17302 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: stack.txt > > > ReduceRecordSource is adding the batch data as a string to the exception > stack, this can lead to an OOM of the Query AM when the query fails due to > other issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124139#comment-16124139 ] Zhiyuan Yang commented on HIVE-14731: - [~hagleitn] Non deterministic behavior should come from multiple cross product reducers > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, > HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, > HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, > HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, > HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, > HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, > HIVE-14731.8.patch, HIVE-14731.9.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.
[ https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124135#comment-16124135 ] Thejas M Nair commented on HIVE-17218: -- +1 pending test verificaiton > Canonical-ize hostnames for Hive metastore, and HS2 servers. > > > Key: HIVE-17218 > URL: https://issues.apache.org/jira/browse/HIVE-17218 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore, Security >Affects Versions: 1.2.2, 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17218.1.patch > > > Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not > canonical-ize the hostnames of the metastore/HS2 servers. In deployments > where there are multiple such servers behind a VIP, this causes a number of > inconveniences: > # The client-side configuration (e.g. {{hive.metastore.uris}} in > {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a > simplified CNAME, in the thrift URL. If the > {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees > GSS failures as follows: > {noformat} > hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net > --hiveconf > hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789" > ... > Exception in thread "main" java.lang.RuntimeException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > ... > {noformat} > This is because {{_HOST}} is filled in with the CNAME, and not the > canonicalized name. > # Oozie workflows that use HCat {{}} have to always use the VIP > hostname, and can't use {{_HOST}}-based service principals, if the CNAME > differs from the VIP name. > If the client-code simply canonical-ized the hostnames, it would enable the > use of both simplified CNAMEs, and _HOST in service principals. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-14731: -- Attachment: HIVE-14731.19.patch [~aplusplus] i've rebased the patch. i'm getting non-deterministic results in cross_prod_1-4 btw. Have you ever seen this? > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Gunther Hagleitner > Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, > HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, > HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, > HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, > HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, > HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, > HIVE-14731.8.patch, HIVE-14731.9.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner reassigned HIVE-14731: - Assignee: Zhiyuan Yang (was: Gunther Hagleitner) > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, > HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, > HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, > HIVE-14731.18.patch, HIVE-14731.19.patch, HIVE-14731.1.patch, > HIVE-14731.2.patch, HIVE-14731.3.patch, HIVE-14731.4.patch, > HIVE-14731.5.patch, HIVE-14731.6.patch, HIVE-14731.7.patch, > HIVE-14731.8.patch, HIVE-14731.9.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-14731) Use Tez cartesian product edge in Hive (unpartitioned case only)
[ https://issues.apache.org/jira/browse/HIVE-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner reassigned HIVE-14731: - Assignee: Gunther Hagleitner (was: Zhiyuan Yang) > Use Tez cartesian product edge in Hive (unpartitioned case only) > > > Key: HIVE-14731 > URL: https://issues.apache.org/jira/browse/HIVE-14731 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Gunther Hagleitner > Attachments: HIVE-14731.10.patch, HIVE-14731.11.patch, > HIVE-14731.12.patch, HIVE-14731.13.patch, HIVE-14731.14.patch, > HIVE-14731.15.patch, HIVE-14731.16.patch, HIVE-14731.17.patch, > HIVE-14731.18.patch, HIVE-14731.1.patch, HIVE-14731.2.patch, > HIVE-14731.3.patch, HIVE-14731.4.patch, HIVE-14731.5.patch, > HIVE-14731.6.patch, HIVE-14731.7.patch, HIVE-14731.8.patch, HIVE-14731.9.patch > > > Given cartesian product edge is available in Tez now (see TEZ-3230), let's > integrate it into Hive on Tez. This allows us to have more than one reducer > in cross product queries. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17224) Move JDO classes to standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124060#comment-16124060 ] ASF GitHub Bot commented on HIVE-17224: --- Github user alanfgates closed the pull request at: https://github.com/apache/hive/pull/220 > Move JDO classes to standalone metastore > > > Key: HIVE-17224 > URL: https://issues.apache.org/jira/browse/HIVE-17224 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 3.0.0 > > Attachments: HIVE-17224.patch > > > The JDO model classes (MDatabase, MTable, etc.) and the package.jdo file that > defines the DB mapping need to be moved to the standalone metastore. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe
[ https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124052#comment-16124052 ] Daniel Dai commented on HIVE-17301: --- +1 pending test. > Make JSONMessageFactory.getTObj method thread safe > -- > > Key: HIVE-17301 > URL: https://issues.apache.org/jira/browse/HIVE-17301 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-17301.1.patch > > > This static method is using a singleton instance of TDeserializer, which is > not thread safe. Instead we want to create a new instance per method call. > This class is lightweight, so it should be fine from perf perspective. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition
[ https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124042#comment-16124042 ] Hive QA commented on HIVE-17148: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881456/HIVE-17148.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 11003 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] (batchId=9) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6357/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6357/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6357/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881456 - PreCommit-HIVE-Build > Incorrect result for Hive join query with COALESCE in WHERE condition > - > > Key: HIVE-17148 > URL: https://issues.apache.org/jira/browse/HIVE-17148 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.1 >Reporter: Vlad Gudikov >Assignee: Vlad Gudikov > Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, > HIVE-17148.3.patch, HIVE-17148.patch > > > The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo > enabled: > STEPS TO REPRODUCE: > {code} > Step 1: Create a table ct1 > create table ct1 (a1 string,b1 string); > Step 2: Create a table ct2 > create table ct2 (a2 string); > Step 3 : Insert following data into table ct1 > insert into table ct1 (a1) values ('1'); > Step 4 : Insert following data into table ct2 > insert into table ct2 (a2) values ('1'); > Step 5 : Execute the following query > select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2; > {code} > ACTUAL RESULT: > {code} > The query returns nothing; > {code} > EXPECTED RESULT: > {code} > 1 NULL1 > {code} > The issue seems to be because of the incorrect query plan. In the plan we can > see: > predicate:(a1 is not null and b1 is not null) > which does not look correct. As a result, it is filtering out all the rows is > any column mentioned in the COALESCE has null value. > Please find the query plan below: > {code} > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Map 2 (BROADCAST_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 > File Output Operator [FS_10] > Map Join Operator [MAPJOIN_15] (rows=1 width=4) > > Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"] > <-Map 2 [BROADCAST_EDGE] > BROADCAST [RS_7] > PartitionCols:_col0 > Select Operator [SEL_5] (rows=1 width=1) > Output:["_col0"] > Filter Operator [FIL_14] (rows=1 width=1) > predicate:a2 is not null > TableScan [TS_3] (rows=1 width=1) >
[jira] [Updated] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins
[ https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17283: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master > Enable parallel edges of semijoin along with mapjoins > - > > Key: HIVE-17283 > URL: https://issues.apache.org/jira/browse/HIVE-17283 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Fix For: 3.0.0 > > Attachments: HIVE-17283.1.patch, HIVE-17283.2.patch > > > https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of > semijoin with mapjoin. However, in some cases it maybe beneficial to have it. > We need a config which can enable it. > The default should be false which maintains the existing behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected
[ https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-17281: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master > LLAP external client not properly handling KILLED notification that occurs > when a fragment is rejected > -- > > Key: HIVE-17281 > URL: https://issues.apache.org/jira/browse/HIVE-17281 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Fix For: 3.0.0 > > Attachments: HIVE-17281.1.patch > > > When LLAP fragment submission is rejected, the external client receives both > REJECTED and KILLED notifications for the fragment. The KILLED notification > is being treated as an error, which prevents the retry logic from > resubmitting the fragment. This needs to be fixed in the client logic. > {noformat} > 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: > attempt_2519876382789748565_0005_0_00_21_0 > 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: > attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy. > 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - > attempt_2519876382789748565_0005_0_00_21_0 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17224) Move JDO classes to standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-17224: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Patch committed. Thank you Vihang for the review. > Move JDO classes to standalone metastore > > > Key: HIVE-17224 > URL: https://issues.apache.org/jira/browse/HIVE-17224 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 3.0.0 > > Attachments: HIVE-17224.patch > > > The JDO model classes (MDatabase, MTable, etc.) and the package.jdo file that > defines the DB mapping need to be moved to the standalone metastore. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17302) ReduceRecordSource should not add batch string to Exception message
[ https://issues.apache.org/jira/browse/HIVE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17302: -- Attachment: stack.txt > ReduceRecordSource should not add batch string to Exception message > --- > > Key: HIVE-17302 > URL: https://issues.apache.org/jira/browse/HIVE-17302 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra > Attachments: stack.txt > > > ReduceRecordSource is adding the batch data as a string to the exception > stack, this can lead to an OOM of the Query AM when the query fails due to > other issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123943#comment-16123943 ] Alan Gates commented on HIVE-8472: -- +1 > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17256) add a notion of a guaranteed task to LLAP
[ https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17256: Attachment: HIVE-17256.01.patch > add a notion of a guaranteed task to LLAP > - > > Key: HIVE-17256 > URL: https://issues.apache.org/jira/browse/HIVE-17256 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17256.01.patch, HIVE-17256.patch > > > Tasks are basically on two levels, guaranteed and speculative, with > speculative being the default. As long as noone uses the new flag, the tasks > behave the same. > All the tasks that do have the flag also behave the same with regard to each > other. > The difference is that a guaranteed task is always higher priority, and > preempts, a speculative task. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17256) add a notion of a guaranteed task to LLAP
[ https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123907#comment-16123907 ] Sergey Shelukhin commented on HIVE-17256: - Will remove the wtf comment, it was just surprising. There's a test for queue ordering (the first one), not sure if scheduling tests apply here, that would be in the AM patch. If you mean LLAP scheduling tests, that seems to be covered by the added tests. Deadlock is possible with incorrect usage, similar to task priority inversions if they were to happen in the AM... > add a notion of a guaranteed task to LLAP > - > > Key: HIVE-17256 > URL: https://issues.apache.org/jira/browse/HIVE-17256 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17256.patch > > > Tasks are basically on two levels, guaranteed and speculative, with > speculative being the default. As long as noone uses the new flag, the tasks > behave the same. > All the tasks that do have the flag also behave the same with regard to each > other. > The difference is that a guaranteed task is always higher priority, and > preempts, a speculative task. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123901#comment-16123901 ] Sergey Shelukhin commented on HIVE-12631: - Sorry, a couple more comments on RB. > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive > Issue Type: Bug > Components: llap, Transactions >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, > HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, > HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, > HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, > HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, > HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.24.patch, > HIVE-12631.25.patch, HIVE-12631.26.patch, HIVE-12631.2.patch, > HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, > HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, > HIVE-12631.8.patch, HIVE-12631.9.patch > > > LLAP uses a completely separate read path in ORC to allow for caching and > parallelization of reads and processing. This path does not support ACID. As > far as I remember ACID logic is embedded inside ORC format; we need to > refactor it to be on top of some interface, if practical; or just port it to > LLAP read path. > Another consideration is how the logic will work with cache. The cache is > currently low-level (CB-level in ORC), so we could just use it to read bases > and deltas (deltas should be cached with higher priority) and merge as usual. > We could also cache merged representation in future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14746) Remove branch from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123863#comment-16123863 ] Hive QA commented on HIVE-14746: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881451/HIVE-14746.03.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6356/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6356/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6356/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hiveptest/logs/PreCommit-HIVE-Build-6356/succeeded/33-TestCliDriver-ql_rewrite_gbtoidx.q-json_serde1.q-constantPropWhen.q-and-27-more, remoteFile=/home/hiveptest/35.192.6.159-hiveptest-1/logs/, getExitCode()=255, getException()=null, getUser()=hiveptest, getHost()=35.192.6.159, getInstance()=1]: 'Warning: Permanently added '35.192.6.159' (ECDSA) to the list of known hosts. receiving incremental file list ./ TEST-33-TestCliDriver-ql_rewrite_gbtoidx.q-json_serde1.q-constantPropWhen.q-and-27-more-TEST-org.apache.hadoop.hive.cli.TestCliDriver.xml 0 0%0.00kB/s0:00:00 8,630 100%8.23MB/s0:00:00 (xfr#1, to-chk=5/7) maven-test.txt 0 0%0.00kB/s0:00:00 47,461 100%1.33MB/s0:00:00 (xfr#2, to-chk=4/7) logs/ logs/derby.log 0 0%0.00kB/s0:00:00 996 100% 28.61kB/s0:00:00 (xfr#3, to-chk=1/7) logs/hive.log 0 0%0.00kB/s0:00:00 30,212,096 1% 28.81MB/s0:00:55 85,327,872 5% 40.71MB/s0:00:38 142,082,048 8% 45.20MB/s0:00:33 198,770,688 11% 47.43MB/s0:00:30 255,459,328 15% 53.74MB/s0:00:25 312,868,864 18% 54.29MB/s0:00:24 369,655,808 22% 54.30MB/s0:00:23 427,261,952 25% 54.48MB/s0:00:22 Timeout, server 35.192.6.159 not responding. rsync: connection unexpectedly closed (484318874 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receiver=3.1.1] rsync: connection unexpectedly closed (441 bytes received so far) [generator] rsync error: unexplained error (code 255) at io.c(226) [generator=3.1.1] ssh: connect to host 35.192.6.159 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 35.192.6.159 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 35.192.6.159 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 35.192.6.159 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ' {noformat} This message is automatically generated. ATTACHMENT ID: 12881451 - PreCommit-HIVE-Build > Remove branch from profiles by sending them from ptest-client > - > > Key: HIVE-14746 > URL: https://issues.apache.org/jira/browse/HIVE-14746 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14746.01.patch, HIVE-14746.02.patch, > HIVE-14746.03.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes the branch name used to fetch the branch code. We > should get rid of this by detecting the branch from the > jenkins-execute-build.sh script, and send the information directly to > ptest-server as command line parameters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information
[ https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123835#comment-16123835 ] Xuefu Zhang commented on HIVE-17291: You're right. Thanks for the explanation. +1 to the patch. > Set the number of executors based on config if client does not provide > information > -- > > Key: HIVE-17291 > URL: https://issues.apache.org/jira/browse/HIVE-17291 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17291.1.patch > > > When calculating the memory and cores and the client does not provide > information we should try to use the one provided by default. This can happen > on startup, when {{spark.dynamicAllocation.enabled}} is not enabled -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe
[ https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li updated HIVE-17301: -- Status: Patch Available (was: Open) > Make JSONMessageFactory.getTObj method thread safe > -- > > Key: HIVE-17301 > URL: https://issues.apache.org/jira/browse/HIVE-17301 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-17301.1.patch > > > This static method is using a singleton instance of TDeserializer, which is > not thread safe. Instead we want to create a new instance per method call. > This class is lightweight, so it should be fine from perf perspective. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information
[ https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123826#comment-16123826 ] Peter Vary commented on HIVE-17291: --- Hi [~xuefuz], I might not see the whole picture here, but my intention was to modify only the case when the dynamic allocation is not enabled. The patch modifies only the {{SparkSessionImpl.getMemoryAndCores()}} method which is only used by {{SetSparkReducerParallelism.getSparkMemoryAndCores()}} method which looks like this: {code:title=SetSparkReducerParallelism} private void getSparkMemoryAndCores(OptimizeSparkProcContext context) throws SemanticException { if (sparkMemoryAndCores != null) { return; } if (context.getConf().getBoolean(SPARK_DYNAMIC_ALLOCATION_ENABLED, false)) { // If dynamic allocation is enabled, numbers for memory and cores are meaningless. So, we don't // try to get it. sparkMemoryAndCores = null; return; } [..] try { [..] sparkMemoryAndCores = sparkSession.getMemoryAndCores(); } catch (HiveException e) { [..] } } {code} If the above statements are true, then in case of dynamic allocation we do not use this data, and the number of reducers based only on the size of the data: {code:title=SetSparkReducerParallelism} @Override public Object process(Node nd, Stack stack, NodeProcessorCtx procContext, Object... nodeOutputs) throws SemanticException { [..] // Divide it by 2 so that we can have more reducers long bytesPerReducer = context.getConf().getLongVar(HiveConf.ConfVars.BYTESPERREDUCER) / 2; int numReducers = Utilities.estimateReducers(numberOfBytes, bytesPerReducer, maxReducers, false); getSparkMemoryAndCores(context); <-- In case of dynamic allocation this sets sparkMemoryAndCores to null if (sparkMemoryAndCores != null && sparkMemoryAndCores.getFirst() > 0 && sparkMemoryAndCores.getSecond() > 0) { // warn the user if bytes per reducer is much larger than memory per task if ((double) sparkMemoryAndCores.getFirst() / bytesPerReducer < 0.5) { LOG.warn("Average load of a reducer is much larger than its available memory. " + "Consider decreasing hive.exec.reducers.bytes.per.reducer"); } // If there are more cores, use the number of cores numReducers = Math.max(numReducers, sparkMemoryAndCores.getSecond()); } numReducers = Math.min(numReducers, maxReducers); LOG.info("Set parallelism for reduce sink " + sink + " to: " + numReducers + " (calculated)"); desc.setNumReducers(numReducers); [..] } {code} Might missed something, since I am quite newby in this part of the code. Thanks for taking the time and looking at this! Peter > Set the number of executors based on config if client does not provide > information > -- > > Key: HIVE-17291 > URL: https://issues.apache.org/jira/browse/HIVE-17291 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17291.1.patch > > > When calculating the memory and cores and the client does not provide > information we should try to use the one provided by default. This can happen > on startup, when {{spark.dynamicAllocation.enabled}} is not enabled -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe
[ https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li updated HIVE-17301: -- Attachment: HIVE-17301.1.patch > Make JSONMessageFactory.getTObj method thread safe > -- > > Key: HIVE-17301 > URL: https://issues.apache.org/jira/browse/HIVE-17301 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-17301.1.patch > > > This static method is using a singleton instance of TDeserializer, which is > not thread safe. Instead we want to create a new instance per method call. > This class is lightweight, so it should be fine from perf perspective. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe
[ https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li reassigned HIVE-17301: - Assignee: Tao Li > Make JSONMessageFactory.getTObj method thread safe > -- > > Key: HIVE-17301 > URL: https://issues.apache.org/jira/browse/HIVE-17301 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Tao Li >Assignee: Tao Li > > This static method is using a singleton instance of TDeserializer, which is > not thread safe. Instead we want to create a new instance per method call. > This class is lightweight, so it should be fine from perf perspective. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17301) Make JSONMessageFactory.getTObj method thread safe
[ https://issues.apache.org/jira/browse/HIVE-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li updated HIVE-17301: -- Component/s: Metastore > Make JSONMessageFactory.getTObj method thread safe > -- > > Key: HIVE-17301 > URL: https://issues.apache.org/jira/browse/HIVE-17301 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Tao Li >Assignee: Tao Li > > This static method is using a singleton instance of TDeserializer, which is > not thread safe. Instead we want to create a new instance per method call. > This class is lightweight, so it should be fine from perf perspective. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17267) Make HMS Notification Listeners typesafe
[ https://issues.apache.org/jira/browse/HIVE-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123798#comment-16123798 ] Hive QA commented on HIVE-17267: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881449/HIVE-17267.03.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11002 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6355/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6355/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6355/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881449 - PreCommit-HIVE-Build > Make HMS Notification Listeners typesafe > > > Key: HIVE-17267 > URL: https://issues.apache.org/jira/browse/HIVE-17267 > Project: Hive > Issue Type: Bug >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17267.01.patch, HIVE-17267.02.patch, > HIVE-17267.03.patch > > > Currently in the HMS we support two types of notification listeners, > transactional and non-transactional ones. Transactional listeners will only > be invoked if the jdbc transaction finished successfully while > non-transactional ones are supposed to be resilient and will be invoked in > any case, even for failures. > Having the same type for these two is a source of confusion and opens the > door for misconfigurations. We should try to fix this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17294) LLAP: switch task heartbeats to protobuf
[ https://issues.apache.org/jira/browse/HIVE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123787#comment-16123787 ] Siddharth Seth edited comment on HIVE-17294 at 8/11/17 6:21 PM: The endpoint is in the LLAP AM plugin. However, that extends an upstream tez plugin - so some translation will likely be required. That's an unnecessary step. Other than time to work on this, another thing to watch out for is the cost of serializing protobuf, and memory overhead of copying buffers with protobuf rpc engine. At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, other parts use the WritableRpcEngine (specifically the task to AM communication). When I say part of he work is done - it's the representation of various pieces of information in protobuf. was (Author: sseth): The endpoint is in the LLAP AM plugin. However, that extends an upstream tez plugin - so some translation will likely be required. That's an unnecessary step. Other than time to work on this, another thing to watch out for is the cost of serializing protobuf, and memory overhead of copying buffers with protobuf rpc engine. At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, other parts use the WritableRpcEngine (specifically the task to AM communication). > LLAP: switch task heartbeats to protobuf > > > Key: HIVE-17294 > URL: https://issues.apache.org/jira/browse/HIVE-17294 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17294) LLAP: switch task heartbeats to protobuf
[ https://issues.apache.org/jira/browse/HIVE-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123787#comment-16123787 ] Siddharth Seth commented on HIVE-17294: --- The endpoint is in the LLAP AM plugin. However, that extends an upstream tez plugin - so some translation will likely be required. That's an unnecessary step. Other than time to work on this, another thing to watch out for is the cost of serializing protobuf, and memory overhead of copying buffers with protobuf rpc engine. At the moment, some parts of the system use the ProtobufRpcEngine from Hadoop, other parts use the WritableRpcEngine (specifically the task to AM communication). > LLAP: switch task heartbeats to protobuf > > > Key: HIVE-17294 > URL: https://issues.apache.org/jira/browse/HIVE-17294 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123761#comment-16123761 ] Sankar Hariappan commented on HIVE-17289: - The test failures are irrelevant to this patch! > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information
[ https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123749#comment-16123749 ] Xuefu Zhang commented on HIVE-17291: Thanks for working on this, [~pvary]. The patch looks good. However, I was a little confused. The description suggests that we are fixing the case when dynamic allocation is not enabled. However, the code seemingly will get executed in either case. I'm not sure if it's proper to use {{spark.executor.instances}} when dynamic allocation is enabled. Any thoughts? > Set the number of executors based on config if client does not provide > information > -- > > Key: HIVE-17291 > URL: https://issues.apache.org/jira/browse/HIVE-17291 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17291.1.patch > > > When calculating the memory and cores and the client does not provide > information we should try to use the one provided by default. This can happen > on startup, when {{spark.dynamicAllocation.enabled}} is not enabled -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16873) Remove Thread Cache From Logging
[ https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123732#comment-16123732 ] Aihua Xu commented on HIVE-16873: - Yeah. That seems unnecessary. The change looks good to me. +1. > Remove Thread Cache From Logging > > > Key: HIVE-16873 > URL: https://issues.apache.org/jira/browse/HIVE-16873 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, > HIVE-16873.3.patch > > > In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} > class (and its buffer) tied to every thread. > This {{Formatter}} is for logging purposes. I would suggest that we simply > let let the logging framework itself handle these kind of details and ditch > the buffer per thread. > {code} > public static final String AUDIT_FORMAT = > "ugi=%s\t" + // ugi > "ip=%s\t" + // remote IP > "cmd=%s\t"; // command > public static final Logger auditLog = LoggerFactory.getLogger( > HiveMetaStore.class.getName() + ".audit"); > private static final ThreadLocal auditFormatter = > new ThreadLocal() { > @Override > protected Formatter initialValue() { > return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * > 4)); > } > }; > ... > private static final void logAuditEvent(String cmd) { > final Formatter fmt = auditFormatter.get(); > ((StringBuilder) fmt.out()).setLength(0); > String address = getIPAddress(); > if (address == null) { > address = "unknown-ip-addr"; > } > auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(), > address, cmd).toString()); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123668#comment-16123668 ] Hive QA commented on HIVE-17289: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881462/HIVE-17289.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11002 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=241) org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=241) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6354/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6354/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6354/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881462 - PreCommit-HIVE-Build > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17300) WebUI query plan graphs
[ https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-17300: -- Assignee: Karen Coppage > WebUI query plan graphs > --- > > Key: HIVE-17300 > URL: https://issues.apache.org/jira/browse/HIVE-17300 > Project: Hive > Issue Type: Improvement > Components: Web UI >Reporter: Karen Coppage >Assignee: Karen Coppage > Attachments: complete_success.png, full_mapred_stats.png, > graph_with_mapred_stats.png, last_stage_error.png, last_stage_running.png, > non_mapred_task_selected.png > > > Hi all, > I’m working on a feature of the Hive WebUI Query Plan tab that would provide > the option to display the query plan as a nice graph (scroll down for > screenshots). If you click on one of the graph’s stages, the plan for that > stage appears as text below. > Stages are color-coded if they have a status (Success, Error, Running), and > the rest are grayed out. Coloring is based on status already available in the > WebUI, under the Stages tab. > There is an additional option to display stats for MapReduce tasks. This > includes the job’s ID, tracking URL (where the logs are found), and mapper > and reducer numbers/progress, among other info. > The library I’m using for the graph is called vis.js (http://visjs.org/). It > has an Apache license, and the only necessary file to be included from this > library is about 700 KB. > I tried to keep server-side changes minimal, and graph generation is taken > care of by the client. Plans with more than a given number of stages > (default: 25) won't be displayed in order to preserve resources. > I’d love to hear any and all input from the community about this feature: do > you think it’s useful, and is there anything important I’m missing? > Thanks, > Karen Coppage -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-15767: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks [~gezapeti] for your contribution! > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Fix For: 3.0.0 > > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch, > HIVE-15767.1.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123582#comment-16123582 ] Peter Vary commented on HIVE-17292: --- Sorry I made a mistake when looking up the stuff again, and copied the wrong config name. We have to set {{yarn.scheduler.increment-allocation-mb}} - Notice *increment* :) This is a FairScheduler only configuration: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Properties_that_can_be_placed_in_yarn-site.xml ??The fairscheduler grants memory in increments of this value. If you submit a task with resource request that is not a multiple of increment-allocation-mb, the request will be rounded up to the nearest increment. Defaults to 1024 MB.?? Also see, why is not in the yarn documentation (YARN-5902) Sorry for the confusion :( Peter > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter
[ https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123550#comment-16123550 ] Hive QA commented on HIVE-17261: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881406/HIVE-17261.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 11002 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=240) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] (batchId=9) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=228) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6353/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6353/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6353/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881406 - PreCommit-HIVE-Build > Hive use deprecated ParquetInputSplit constructor which blocked parquet > dictionary filter > - > > Key: HIVE-17261 > URL: https://issues.apache.org/jira/browse/HIVE-17261 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Junjie Chen >Assignee: Junjie Chen > Attachments: HIVE-17261.2.patch, HIVE-17261.3.patch, HIVE-17261.diff, > HIVE-17261.patch > > > Hive use deprecated ParquetInputSplit in > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128] > Please see interface definition in > [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80] > Old interface set rowgroupoffset values which will lead to skip dictionary > filter in parquet. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123544#comment-16123544 ] Rui Li commented on HIVE-17292: --- I mean we did set {{RM_SCHEDULER_MINIMUM_ALLOCATION_MB}} to 512 in the code: {code} public MiniSparkShim(Configuration conf, int numberOfTaskTrackers, String nameNode, int numDir) throws IOException { mr = new MiniSparkOnYARNCluster("sparkOnYarn"); conf.set("fs.defaultFS", nameNode); conf.set("yarn.resourcemanager.scheduler.class", "org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler"); // disable resource monitoring, although it should be off by default conf.setBoolean(YarnConfiguration.YARN_MINICLUSTER_CONTROL_RESOURCE_MONITORING, false); conf.setInt(YarnConfiguration.YARN_MINICLUSTER_NM_PMEM_MB, 2048); conf.setInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 512); conf.setInt(YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB, 2048); configureImpersonation(conf); mr.init(conf); mr.start(); this.conf = mr.getConfig(); } {code} Do you mean {{RM_SCHEDULER_MINIMUM_ALLOCATION_MB}} is different from {{yarn.scheduler.minimum-allocation-mb}}, or we set it in the wrong place? > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17293) ETL split strategy not accounting for empty base and non-empty delta buckets
[ https://issues.apache.org/jira/browse/HIVE-17293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17293: -- Component/s: Transactions > ETL split strategy not accounting for empty base and non-empty delta buckets > > > Key: HIVE-17293 > URL: https://issues.apache.org/jira/browse/HIVE-17293 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 3.0.0, 2.4.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > > Observed an issue with customer case where there are 2 buckets (bucket_0 > and bucket_1). > Based bucket 0 had some rows whereas bucket 1 was empty. > Delta bucket 0 and 1 had some rows. > ETL split strategy did not generate OrcSplit for bucket 1 even though it had > some rows in delta directories. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14746) Remove branch from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123512#comment-16123512 ] Hive QA commented on HIVE-14746: {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/Hive-Build/10/testReport Console output: https://builds.apache.org/job/Hive-Build/10/console Test logs: http://104.199.114.197/logs/Hive-Build-10/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hiveptest/logs/Hive-Build-10/succeeded/54-TestCliDriver-push_or.q-encryption_move_tbl.q-vectorization_5.q-and-27-more, remoteFile=/home/hiveptest/35.184.192.137-hiveptest-0/logs/, getExitCode()=255, getException()=null, getUser()=hiveptest, getHost()=35.184.192.137, getInstance()=0]: 'ssh: connect to host 35.184.192.137 port 22: Connection refused rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 35.184.192.137 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 35.184.192.137 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 35.184.192.137 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ssh: connect to host 35.184.192.137 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.1] ' {noformat} This message is automatically generated. ATTACHMENT ID: - Hive-Build > Remove branch from profiles by sending them from ptest-client > - > > Key: HIVE-14746 > URL: https://issues.apache.org/jira/browse/HIVE-14746 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14746.01.patch, HIVE-14746.02.patch, > HIVE-14746.03.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes the branch name used to fetch the branch code. We > should get rid of this by detecting the branch from the > jenkins-execute-build.sh script, and send the information directly to > ptest-server as command line parameters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17268) WebUI / QueryPlan: query plan is sometimes null when explain output conf is on
[ https://issues.apache.org/jira/browse/HIVE-17268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage updated HIVE-17268: - Attachment: HIVE-17268.3.patch > WebUI / QueryPlan: query plan is sometimes null when explain output conf is on > -- > > Key: HIVE-17268 > URL: https://issues.apache.org/jira/browse/HIVE-17268 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Minor > Attachments: HIVE-17268.2.patch, HIVE-17268.3.patch, HIVE-17268.patch > > > The Hive WebUI's Query Plan tab displays "SET hive.log.explain.output TO true > TO VIEW PLAN" even when hive.log.explain.output is set to true, when the > query cannot be compiled, because the plan is null in this case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16873) Remove Thread Cache From Logging
[ https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123420#comment-16123420 ] BELUGA BEHR commented on HIVE-16873: [~aihuaxu] > Remove Thread Cache From Logging > > > Key: HIVE-16873 > URL: https://issues.apache.org/jira/browse/HIVE-16873 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, > HIVE-16873.3.patch > > > In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} > class (and its buffer) tied to every thread. > This {{Formatter}} is for logging purposes. I would suggest that we simply > let let the logging framework itself handle these kind of details and ditch > the buffer per thread. > {code} > public static final String AUDIT_FORMAT = > "ugi=%s\t" + // ugi > "ip=%s\t" + // remote IP > "cmd=%s\t"; // command > public static final Logger auditLog = LoggerFactory.getLogger( > HiveMetaStore.class.getName() + ".audit"); > private static final ThreadLocal auditFormatter = > new ThreadLocal() { > @Override > protected Formatter initialValue() { > return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * > 4)); > } > }; > ... > private static final void logAuditEvent(String cmd) { > final Formatter fmt = auditFormatter.get(); > ((StringBuilder) fmt.out()).setLength(0); > String address = getIPAddress(); > if (address == null) { > address = "unknown-ip-addr"; > } > auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(), > address, cmd).toString()); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17288) LlapOutputFormatService: Increase netty event loop threads
[ https://issues.apache.org/jira/browse/HIVE-17288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123421#comment-16123421 ] Hive QA commented on HIVE-17288: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881150/HIVE-17288.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6352/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6352/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6352/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hiveptest/logs/PreCommit-HIVE-Build-6352/failed/225_TestSSL, remoteFile=/home/hiveptest/35.202.179.168-hiveptest-1/logs/, getExitCode()=23, getException()=null, getUser()=hiveptest, getHost()=35.202.179.168, getInstance()=1]: 'Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts. receiving incremental file list rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No such file or directory (2) sent 8 bytes received 123 bytes 262.00 bytes/sec total size is 0 speedup is 0.00 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.1] rsync: [Receiver] write error: Broken pipe (32) Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts. receiving incremental file list rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No such file or directory (2) sent 8 bytes received 123 bytes 87.33 bytes/sec total size is 0 speedup is 0.00 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.1] rsync: [Receiver] write error: Broken pipe (32) Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts. receiving incremental file list rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No such file or directory (2) sent 8 bytes received 123 bytes 87.33 bytes/sec total size is 0 speedup is 0.00 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.1] rsync: [Receiver] write error: Broken pipe (32) Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts. receiving incremental file list rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No such file or directory (2) sent 8 bytes received 123 bytes 262.00 bytes/sec total size is 0 speedup is 0.00 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.1] rsync: [Receiver] write error: Broken pipe (32) Warning: Permanently added '35.202.179.168' (ECDSA) to the list of known hosts. receiving incremental file list rsync: change_dir "/home/hiveptest/35.202.179.168-hiveptest-1/logs" failed: No such file or directory (2) sent 8 bytes received 123 bytes 87.33 bytes/sec total size is 0 speedup is 0.00 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.1] rsync: [Receiver] write error: Broken pipe (32) ' {noformat} This message is automatically generated. ATTACHMENT ID: 12881150 - PreCommit-HIVE-Build > LlapOutputFormatService: Increase netty event loop threads > -- > > Key: HIVE-17288 > URL: https://issues.apache.org/jira/browse/HIVE-17288 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-17288.1.patch > > > Currently it is set to 1 which would be used for parent both acceptor and > client groups. It would be good to leave it at default, which sets the number > of threads to "number of processors * 2". It can be modified later via > {{-Dio.netty.eventLoopThreads}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16873) Remove Thread Cache From Logging
[ https://issues.apache.org/jira/browse/HIVE-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HIVE-16873: --- Attachment: HIVE-16873.3.patch > Remove Thread Cache From Logging > > > Key: HIVE-16873 > URL: https://issues.apache.org/jira/browse/HIVE-16873 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16873.1.patch, HIVE-16873.2.patch, > HIVE-16873.3.patch > > > In {{org.apache.hadoop.hive.metastore.HiveMetaStore}} we have a {{Formatter}} > class (and its buffer) tied to every thread. > This {{Formatter}} is for logging purposes. I would suggest that we simply > let let the logging framework itself handle these kind of details and ditch > the buffer per thread. > {code} > public static final String AUDIT_FORMAT = > "ugi=%s\t" + // ugi > "ip=%s\t" + // remote IP > "cmd=%s\t"; // command > public static final Logger auditLog = LoggerFactory.getLogger( > HiveMetaStore.class.getName() + ".audit"); > private static final ThreadLocal auditFormatter = > new ThreadLocal() { > @Override > protected Formatter initialValue() { > return new Formatter(new StringBuilder(AUDIT_FORMAT.length() * > 4)); > } > }; > ... > private static final void logAuditEvent(String cmd) { > final Formatter fmt = auditFormatter.get(); > ((StringBuilder) fmt.out()).setLength(0); > String address = getIPAddress(); > if (address == null) { > address = "unknown-ip-addr"; > } > auditLog.info(fmt.format(AUDIT_FORMAT, ugi.getUserName(), > address, cmd).toString()); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17300) WebUI query plan graphs
[ https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123413#comment-16123413 ] Karen Coppage commented on HIVE-17300: -- [~xuefuz] Thanks very much for the suggestion to create a JIRA on this topic. > WebUI query plan graphs > --- > > Key: HIVE-17300 > URL: https://issues.apache.org/jira/browse/HIVE-17300 > Project: Hive > Issue Type: Improvement > Components: Web UI >Reporter: Karen Coppage > Attachments: complete_success.png, full_mapred_stats.png, > graph_with_mapred_stats.png, last_stage_error.png, last_stage_running.png, > non_mapred_task_selected.png > > > Hi all, > I’m working on a feature of the Hive WebUI Query Plan tab that would provide > the option to display the query plan as a nice graph (scroll down for > screenshots). If you click on one of the graph’s stages, the plan for that > stage appears as text below. > Stages are color-coded if they have a status (Success, Error, Running), and > the rest are grayed out. Coloring is based on status already available in the > WebUI, under the Stages tab. > There is an additional option to display stats for MapReduce tasks. This > includes the job’s ID, tracking URL (where the logs are found), and mapper > and reducer numbers/progress, among other info. > The library I’m using for the graph is called vis.js (http://visjs.org/). It > has an Apache license, and the only necessary file to be included from this > library is about 700 KB. > I tried to keep server-side changes minimal, and graph generation is taken > care of by the client. Plans with more than a given number of stages > (default: 25) won't be displayed in order to preserve resources. > I’d love to hear any and all input from the community about this feature: do > you think it’s useful, and is there anything important I’m missing? > Thanks, > Karen Coppage -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123383#comment-16123383 ] Peter Vary commented on HIVE-17292: --- Yeah, I was able to identify the root cause of the problem. When a scheduler is used then there is an additional configuration value for the minimum allocation. See: https://github.com/apache/hadoop/blob/branch-2.8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java#L161 This is defined by {{yarn.scheduler.minimum-allocation-mb}}. By default this is set to {{1024}}, so the minimum memory allocation is 1G. Also this is documented here: https://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml ??The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this will throw a InvalidResourceRequestException.?? When I set {{yarn.scheduler.minimum-allocation-mb}} to {{512}} in hive-site.xml, then I will get 4 reducers as expected. As for the test results - we have to wait for HIVE-17291 to get in to have consistent outputs. Thanks, Peter > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123051#comment-16123051 ] Sankar Hariappan edited comment on HIVE-17289 at 8/11/17 1:59 PM: -- Added 01.patch with below changes. - Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD) - Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the user config hive.distcp.privileged.doAs in case of REPL LOAD. - Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to hive.distcp.privileged.doAs if lazy copy is true and null if false. This is just to avoid passing this argument from multiple flows and also, the incremental REPL LOAD shares common code with IMPORT. - Enabled distcp for copy within same file systems in case of large number of files or large size files. - Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the CopyUtils implementation which does the same. - Refactored ReplCopyTask.execute to properly distinguish code path for _files read and actual data files. - Set the default value of hive.distcp.privileged.doAs to "hive". - Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common for dump/load. - No tests added as the existing tests itself will cover the changes except distcp flow (due to hive.in.test) which needs to be tested manually. Request [~thejas]/[~daijy] to please review it! was (Author: sankarh): Added 01.patch with below changes. - Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD) - Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the user config hive.distcp.privileged.doAs in case of REPL LOAD. - Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to hive.distcp.privileged.doAs if lazy copy is true and null if false. This is just to avoid passing this argument from multiple flows and also, the incremental REPL LOAD shares common code with IMPORT. - Enabled distcp for copy within same file systems in case of large number of files or large size files. - Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the CopyUtils implementation which does the same. - Refactored ReplCopyTask.execute to properly distinguish code path for _files read and actual data files. - Set the default value of hive.distcp.privileged.doAs to "hive". - Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common for dump/load. - No tests added as the existing tests itself will cover the changes except distcp flow (due to hive.in.test) which needs to be tested manually. > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17300) WebUI query plan graphs
[ https://issues.apache.org/jira/browse/HIVE-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage updated HIVE-17300: - Attachment: complete_success.png full_mapred_stats.png graph_with_mapred_stats.png last_stage_error.png last_stage_running.png non_mapred_task_selected.png > WebUI query plan graphs > --- > > Key: HIVE-17300 > URL: https://issues.apache.org/jira/browse/HIVE-17300 > Project: Hive > Issue Type: Improvement > Components: Web UI >Reporter: Karen Coppage > Attachments: complete_success.png, full_mapred_stats.png, > graph_with_mapred_stats.png, last_stage_error.png, last_stage_running.png, > non_mapred_task_selected.png > > > Hi all, > I’m working on a feature of the Hive WebUI Query Plan tab that would provide > the option to display the query plan as a nice graph (scroll down for > screenshots). If you click on one of the graph’s stages, the plan for that > stage appears as text below. > Stages are color-coded if they have a status (Success, Error, Running), and > the rest are grayed out. Coloring is based on status already available in the > WebUI, under the Stages tab. > There is an additional option to display stats for MapReduce tasks. This > includes the job’s ID, tracking URL (where the logs are found), and mapper > and reducer numbers/progress, among other info. > The library I’m using for the graph is called vis.js (http://visjs.org/). It > has an Apache license, and the only necessary file to be included from this > library is about 700 KB. > I tried to keep server-side changes minimal, and graph generation is taken > care of by the client. Plans with more than a given number of stages > (default: 25) won't be displayed in order to preserve resources. > I’d love to hear any and all input from the community about this feature: do > you think it’s useful, and is there anything important I’m missing? > Thanks, > Karen Coppage -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17291) Set the number of executors based on config if client does not provide information
[ https://issues.apache.org/jira/browse/HIVE-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123370#comment-16123370 ] Peter Vary commented on HIVE-17291: --- Failures are not related. [~lirui], [~xuefuz], could you please review? Thanks, Peter > Set the number of executors based on config if client does not provide > information > -- > > Key: HIVE-17291 > URL: https://issues.apache.org/jira/browse/HIVE-17291 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17291.1.patch > > > When calculating the memory and cores and the client does not provide > information we should try to use the one provided by default. This can happen > on startup, when {{spark.dynamicAllocation.enabled}} is not enabled -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17289: Status: Patch Available (was: Open) > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17289: Attachment: HIVE-17289.01.patch > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17289: Attachment: (was: HIVE-17289.01.patch) > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17289: Status: Open (was: Patch Available) > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14746) Remove branch from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123349#comment-16123349 ] Hive QA commented on HIVE-14746: {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/Hive-Build/9/testReport Console output: https://builds.apache.org/job/Hive-Build/9/console Test logs: http://104.199.114.197/logs/Hive-Build-9/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/branch-2.3-working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-08-11 13:40:40.425 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + MAVEN_OPTS='-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hiveptest/branch-2.3-working/ + tee /data/hiveptest/logs/Hive-Build-9/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z branch-2.3 ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source ]] + git clone https://github.com/apache/hive.git apache-github-source-source Cloning into 'apache-github-source-source'... + date '+%Y-%m-%d %T.%3N' 2017-08-11 13:41:08.117 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at efa5b54 HIVE-17008: Fix boolean flag switchup in DropTableEvent (Dan Burkert, reviewed by Mohit Sabharwal and Peter Vary) + git clean -f -d + git checkout branch-2.3 Switched to a new branch 'branch-2.3' Branch branch-2.3 set up to track remote branch branch-2.3 from origin. + git reset --hard origin/branch-2.3 HEAD is now at 6f4c35c Release Notes + git merge --ff-only origin/branch-2.3 Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-08-11 13:41:10.272 + patchCommandPath=/data/hiveptest/branch-2.3-working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/branch-2.3-working/scratch/build.patch + [[ -f /data/hiveptest/branch-2.3-working/scratch/build.patch ]] + [[ maven == \m\a\v\e\n ]] + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/branch-2.3-working/maven [ERROR] Failed to execute goal on project hive-hcatalog: Could not resolve dependencies for project org.apache.hive.hcatalog:hive-hcatalog:pom:2.3.0: Failure to transfer javax.xml.bind:jaxb-api:jar:2.2.2 from http://www.datanucleus.org/downloads/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of datanucleus has elapsed or updates are forced. Original error: Could not transfer artifact javax.xml.bind:jaxb-api:jar:2.2.2 from/to datanucleus (http://www.datanucleus.org/downloads/maven2): Connect to localhost:3128 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused (Connection refused) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hive-hcatalog + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: - Hive-Build > Remove branch from profiles by sending them from ptest-client > - > > Key: HIVE-14746 > URL: https://issues.apache.org/jira/browse/HIVE-14746 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14746.01.patch, HIVE-14746.02.patch, > HIVE-14746.03.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes the branch name used to fetch the branch code. We > should get rid of this by detecting the branch from the > jenkins-execute-build.sh script, and send the information directly to > ptest-server as command line parameters. -- This message was sent by Atlass
[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition
[ https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vlad Gudikov updated HIVE-17148: Attachment: HIVE-17148.3.patch > Incorrect result for Hive join query with COALESCE in WHERE condition > - > > Key: HIVE-17148 > URL: https://issues.apache.org/jira/browse/HIVE-17148 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.1 >Reporter: Vlad Gudikov >Assignee: Vlad Gudikov > Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, > HIVE-17148.3.patch, HIVE-17148.patch > > > The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo > enabled: > STEPS TO REPRODUCE: > {code} > Step 1: Create a table ct1 > create table ct1 (a1 string,b1 string); > Step 2: Create a table ct2 > create table ct2 (a2 string); > Step 3 : Insert following data into table ct1 > insert into table ct1 (a1) values ('1'); > Step 4 : Insert following data into table ct2 > insert into table ct2 (a2) values ('1'); > Step 5 : Execute the following query > select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2; > {code} > ACTUAL RESULT: > {code} > The query returns nothing; > {code} > EXPECTED RESULT: > {code} > 1 NULL1 > {code} > The issue seems to be because of the incorrect query plan. In the plan we can > see: > predicate:(a1 is not null and b1 is not null) > which does not look correct. As a result, it is filtering out all the rows is > any column mentioned in the COALESCE has null value. > Please find the query plan below: > {code} > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Map 2 (BROADCAST_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 > File Output Operator [FS_10] > Map Join Operator [MAPJOIN_15] (rows=1 width=4) > > Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"] > <-Map 2 [BROADCAST_EDGE] > BROADCAST [RS_7] > PartitionCols:_col0 > Select Operator [SEL_5] (rows=1 width=1) > Output:["_col0"] > Filter Operator [FIL_14] (rows=1 width=1) > predicate:a2 is not null > TableScan [TS_3] (rows=1 width=1) > default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"] > <-Select Operator [SEL_2] (rows=1 width=4) > Output:["_col0","_col1"] > Filter Operator [FIL_13] (rows=1 width=4) > predicate:(a1 is not null and b1 is not null) > TableScan [TS_0] (rows=1 width=4) > default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"] > {code} > This happens only if join is inner type, otherwise HiveJoinAddNotRule which > creates this problem is skipped. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter
[ https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123320#comment-16123320 ] Hive QA commented on HIVE-17261: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881376/HIVE-17261.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11002 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_char] (batchId=9) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hadoop.hive.ql.io.parquet.TestParquetRowGroupFilter.testRowGroupFilterTakeEffect (batchId=263) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate2 (batchId=183) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY (batchId=183) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTimestamp (batchId=183) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6351/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6351/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6351/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881376 - PreCommit-HIVE-Build > Hive use deprecated ParquetInputSplit constructor which blocked parquet > dictionary filter > - > > Key: HIVE-17261 > URL: https://issues.apache.org/jira/browse/HIVE-17261 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0 >Reporter: Junjie Chen >Assignee: Junjie Chen > Attachments: HIVE-17261.2.patch, HIVE-17261.3.patch, HIVE-17261.diff, > HIVE-17261.patch > > > Hive use deprecated ParquetInputSplit in > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128] > Please see interface definition in > [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80] > Old interface set rowgroupoffset values which will lead to skip dictionary > filter in parquet. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-14746) Remove branch from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-14746: --- Attachment: HIVE-14746.03.patch > Remove branch from profiles by sending them from ptest-client > - > > Key: HIVE-14746 > URL: https://issues.apache.org/jira/browse/HIVE-14746 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14746.01.patch, HIVE-14746.02.patch, > HIVE-14746.03.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes the branch name used to fetch the branch code. We > should get rid of this by detecting the branch from the > jenkins-execute-build.sh script, and send the information directly to > ptest-server as command line parameters. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123051#comment-16123051 ] Sankar Hariappan edited comment on HIVE-17289 at 8/11/17 11:49 AM: --- Added 01.patch with below changes. - Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD) - Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the user config hive.distcp.privileged.doAs in case of REPL LOAD. - Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to hive.distcp.privileged.doAs if lazy copy is true and null if false. This is just to avoid passing this argument from multiple flows and also, the incremental REPL LOAD shares common code with IMPORT. - Enabled distcp for copy within same file systems in case of large number of files or large size files. - Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the CopyUtils implementation which does the same. - Refactored ReplCopyTask.execute to properly distinguish code path for _files read and actual data files. - Set the default value of hive.distcp.privileged.doAs to "hive". - Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common for dump/load. - No tests added as the existing tests itself will cover the changes except distcp flow (due to hive.in.test) which needs to be tested manually. was (Author: sankarh): Added 01.patch with below changes. - Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD) - Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the user config hive.distcp.privileged.doAs in case of REPL LOAD. - Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to hive.distcp.privileged.doAs if lazy copy is true and null if false. This is just to avoid passing this argument from multiple flows and also, the incremental REPL LOAD shares common code with IMPORT. - Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the CopyUtils implementation which does the same. - Refactored ReplCopyTask.execute to properly distinguish code path for _files read and actual data files. - Set the default value of hive.distcp.privileged.doAs to "hive". - Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common for dump/load. - No tests added as the existing tests itself will cover the changes except distcp flow (due to hive.in.test) which needs to be tested manually. > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client
[ https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123226#comment-16123226 ] Barna Zsombor Klara commented on HIVE-14747: [~spena] the last comment was generated by an instance of PTest which contained this patch. I ran it to validate my change. Do you have any other comments or suggestions? > Remove JAVA paths from profiles by sending them from ptest-client > - > > Key: HIVE-14747 > URL: https://issues.apache.org/jira/browse/HIVE-14747 > Project: Hive > Issue Type: Sub-task > Components: Hive, Testing Infrastructure >Reporter: Sergio Peña >Assignee: Barna Zsombor Klara > Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch, > HIVE-14747.03.patch, HIVE-14747.04.patch, HIVE-14747.05.patch > > > Hive ptest uses some properties files per branch that contain information > about how to execute the tests. > This profile includes JAVA paths to build and execute the tests. We should > get rid of these by passing such information from Jenkins to the > ptest-server. In case a profile needs a different java version, then we can > create a specific Jenkins job for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17267) Make HMS Notification Listeners typesafe
[ https://issues.apache.org/jira/browse/HIVE-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-17267: --- Attachment: HIVE-17267.03.patch > Make HMS Notification Listeners typesafe > > > Key: HIVE-17267 > URL: https://issues.apache.org/jira/browse/HIVE-17267 > Project: Hive > Issue Type: Bug >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17267.01.patch, HIVE-17267.02.patch, > HIVE-17267.03.patch > > > Currently in the HMS we support two types of notification listeners, > transactional and non-transactional ones. Transactional listeners will only > be invoked if the jdbc transaction finished successfully while > non-transactional ones are supposed to be resilient and will be invoked in > any case, even for failures. > Having the same type for these two is a source of confusion and opens the > door for misconfigurations. We should try to fix this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17251) Remove usage of org.apache.pig.ResourceStatistics#setmBytes method in HCatLoader
[ https://issues.apache.org/jira/browse/HIVE-17251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123219#comment-16123219 ] Nandor Kollar commented on HIVE-17251: -- Thank you all for taking care of my ticket! > Remove usage of org.apache.pig.ResourceStatistics#setmBytes method in > HCatLoader > > > Key: HIVE-17251 > URL: https://issues.apache.org/jira/browse/HIVE-17251 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Reporter: Nandor Kollar >Assignee: Adam Szita >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17251.0.patch > > > org.apache.pig.ResourceStatistics#setmBytes is marked as deprecated, and is > going to be removed from Pig. Is it possible to use use the the proper > replacement method (ResourceStatistics#setSizeInBytes) instead? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by
[ https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123204#comment-16123204 ] Rui Li commented on HIVE-17287: --- [~kellyzly], I mean you can check each of the group keys to see how they are skewed. > HoS can not deal with skewed data group by > -- > > Key: HIVE-17287 > URL: https://issues.apache.org/jira/browse/HIVE-17287 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: query67-fail-at-groupby.png, > query67-groupby_shuffle_metric.png > > > In > [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql], > fact table {{store_sales}} joins with small tables {{date_dim}}, > {{item}},{{store}}. After join, groupby the intermediate data. > Here the data of {{store_sales}} on 3TB tpcds is skewed: there are 1824 > partitions. The biggest partition is 25.7G and others are 715M. > {code} > hadoop fs -du -h > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales > > 715.0 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639 > 713.9 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640 > 714.1 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641 > 712.9 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642 > 25.7 G > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__ > {code} > The skewed table {{store_sales}} caused the failed job. Is there any way to > solve the groupby problem of skewed table? I tried to enable > {{hive.groupby.skewindata}} to first divide the data more evenly then start > do group by. But the job still hangs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by
[ https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123123#comment-16123123 ] liyunzhang_intel commented on HIVE-17287: - [~lirui] : bq.You can run some statistics on the group key to confirm not very understand, you mean "add select count(i_category), i_category, to see the number of every key"? bq. what will the metrics look like if you enable hive.groupby.skewindata? before i enabled {{hive.groupby.skewindata}}, still hangs on the group by stage after sending the data randomly. > HoS can not deal with skewed data group by > -- > > Key: HIVE-17287 > URL: https://issues.apache.org/jira/browse/HIVE-17287 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: query67-fail-at-groupby.png, > query67-groupby_shuffle_metric.png > > > In > [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql], > fact table {{store_sales}} joins with small tables {{date_dim}}, > {{item}},{{store}}. After join, groupby the intermediate data. > Here the data of {{store_sales}} on 3TB tpcds is skewed: there are 1824 > partitions. The biggest partition is 25.7G and others are 715M. > {code} > hadoop fs -du -h > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales > > 715.0 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639 > 713.9 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640 > 714.1 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641 > 712.9 M > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642 > 25.7 G > /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__ > {code} > The skewed table {{store_sales}} caused the failed job. Is there any way to > solve the groupby problem of skewed table? I tried to enable > {{hive.groupby.skewindata}} to first divide the data more evenly then start > do group by. But the job still hangs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)