[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199970#comment-14199970 ] Gunther Hagleitner commented on HIVE-8745: -- [~xuefuz] [~jdere] is right. You can't have it both ways. I don't see how you create an object that compares as equal on the byte-level but then magically reconstructs additional information on deserialization. You could add info to the value part of the MR key/value tuple but that's an unnecessarily complex solution. As [~jdere] says: This is a regression and I think we should revert HIVE-7373. The other option would be to pad all values to the column spec and make sure we compute the spec as the max for the join keys. I'm not sure why you were against that in the first place - it seems that's what most DBs do. However, that's complicated and should be tackled in 0.15.0. Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8122) Make use of SearchArgument classes for Parquet SERDE
[ https://issues.apache.org/jira/browse/HIVE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-8122: --- Attachment: HIVE-8122.1.patch Thank Szehon for your update. Update patch with following changes: 1. fix code-style issues 2. fix failed cases 3. fix NPE issues Make use of SearchArgument classes for Parquet SERDE Key: HIVE-8122 URL: https://issues.apache.org/jira/browse/HIVE-8122 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: HIVE-8122.1.patch, HIVE-8122.patch ParquetSerde could be much cleaner if we used SearchArgument and associated classes like ORC does: https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1416#comment-1416 ] Hive QA commented on HIVE-8732: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679704/HIVE-8732.patch {color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 6680 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge6 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_10_0 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationComplexExpr org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationLargeMaxSplit org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationSmallMaxSplit org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1658/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1658/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1658/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 32 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679704 - PreCommit-HIVE-TRUNK-Build ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1425#comment-1425 ] Hive QA commented on HIVE-8726: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679804/HIVE-8726.1-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7123 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testTimezone org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/317/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/317/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-317/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679804 - PreCommit-HIVE-SPARK-Build Collect Spark TaskMetrics and build job statistic[Spark Branch] --- Key: HIVE-8726 URL: https://issues.apache.org/jira/browse/HIVE-8726 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M3 Attachments: HIVE-8726.1-spark.patch Implement SparkListener to collect TaskMetrics, and build SparkStatistic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8073) Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200023#comment-14200023 ] Rui Li commented on HIVE-8073: -- Hi [~xuefuz], I've investigated all the optimizations in {{Optimizer}} and I don't think there's any optimizer unsuitable for spark. I'm not saying they will all work properly with spark (we need to enable more tests to catch that), but I think the ideas behind them apply to spark as they apply to MR. Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch] Key: HIVE-8073 URL: https://issues.apache.org/jira/browse/HIVE-8073 Project: Hive Issue Type: Task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li I have seen some optimization done in the logical plan that's not applicable, such as in HIVE-8054. We should go thru all those optimizaitons to identify if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200035#comment-14200035 ] Hive QA commented on HIVE-8753: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679735/HIVE-8753.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6674 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1659/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1659/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1659/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679735 - PreCommit-HIVE-TRUNK-Build TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk - Key: HIVE-8753 URL: https://issues.apache.org/jira/browse/HIVE-8753 Project: Hive Issue Type: Test Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8753.patch Because of HIVE-7111 needs .q.out update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7777) Add CSV Serde based on OpenCSV
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200053#comment-14200053 ] Alon Goldshuv commented on HIVE-: - While the serde works fine, it has an issue, which is quite serious IMO - It forces all the column types to String. This means that running a query on data that isn't all string type can return wrong query results. In the unit tests I see a single example of a table using all string columns, and in the tests linked here there are many tables with non-string types, but all the queries seem to be simple COUNT(*), which won't catch the problem. Consider the following example: {noformat} CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with serdeproperties (separatorChar = ,,quoteChar= ',escapeChar= \\) STORED AS TEXTFILE LOCATION 'some location' tblproperties (skip.header.line.count=1); {noformat} Now consider this sql: hive select min(totalprice) from test; in this case given my data, the result should have been 874.89, but the actual result became 11.57 (as it is first according to byte ordering of a string type). this is a wrong result. hive desc extended test; OK o_totalpricestring from deserializer ... I apologize if it's a false alarm and I'm misusing the DDL somehow. Otherwise - this is a concern as wrong query results is a bad thing... Add CSV Serde based on OpenCSV -- Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Ferdinand Xu Assignee: Ferdinand Xu Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.patch, csv-serde-master.zip There is no official support for csvSerde for hive while there is an open source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8611) grant/revoke syntax should support additional objects for authorization plugins
[ https://issues.apache.org/jira/browse/HIVE-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200062#comment-14200062 ] Hive QA commented on HIVE-8611: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679742/HIVE-8611.3.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6678 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_grant_server org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_grant_uri {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1660/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1660/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1660/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679742 - PreCommit-HIVE-TRUNK-Build grant/revoke syntax should support additional objects for authorization plugins --- Key: HIVE-8611 URL: https://issues.apache.org/jira/browse/HIVE-8611 Project: Hive Issue Type: Bug Components: Authentication, SQL Affects Versions: 0.13.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0 Attachments: HIVE-8611.1.patch, HIVE-8611.2.patch, HIVE-8611.2.patch, HIVE-8611.3.patch The authorization framework supports URI and global objects. The SQL syntax however doesn't allow granting privileges on these objects. We should allow the compiler to parse these so that it can be handled by authorization plugins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8711) DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer
[ https://issues.apache.org/jira/browse/HIVE-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200099#comment-14200099 ] Hive QA commented on HIVE-8711: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679759/HIVE-8711.2.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6674 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1661/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1661/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1661/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679759 - PreCommit-HIVE-TRUNK-Build DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer -- Key: HIVE-8711 URL: https://issues.apache.org/jira/browse/HIVE-8711 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8711.2.patch, HIVE-8711.patch TxnHandler.detectDeadlock has code to catch deadlocks in MySQL and Derby. But it does not detect a deadlock for Postgres, Oracle, or SQLServer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8612) Support metadata result filter hooks
[ https://issues.apache.org/jira/browse/HIVE-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200142#comment-14200142 ] Hive QA commented on HIVE-8612: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679764/HIVE-8612.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6678 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1662/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1662/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1662/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679764 - PreCommit-HIVE-TRUNK-Build Support metadata result filter hooks Key: HIVE-8612 URL: https://issues.apache.org/jira/browse/HIVE-8612 Project: Hive Issue Type: Bug Components: Authorization, Metastore Affects Versions: 0.13.1 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8612.1.patch, HIVE-8612.2.patch, HIVE-8612.3.patch Support metadata filter hook for metastore client. This will be useful for authorization plugins on hiveserver2 to filter metadata results, especially in case of non-impersonation mode where the metastore doesn't know the end user's identity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200155#comment-14200155 ] Chengxiang Li commented on HIVE-8548: - [~xuefuz], if we set spark.master as local, hive users connect to HiveServer2 which use local spark context to submit job with a seperate session for each user, we may still hit into multi spark context issue. so HiveServer2 could only use remote spark context, and CLI may use either local spark context or remote spark context, which we can add a parameter to configure it and set local spark context as default, what do you think about it? Integrate with remote Spark context after HIVE-8528 [Spark Branch] -- Key: HIVE-8548 URL: https://issues.apache.org/jira/browse/HIVE-8548 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li With HIVE-8528, HiverSever2 should use remote Spark context to submit job and monitor progress, etc. This is necessary if Hive runs on standalone cluster, Yarn, or Mesos. If Hive runs with spark.master=local, we should continue using SparkContext in current way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8542) Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li reassigned HIVE-8542: Assignee: Rui Li Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch] Key: HIVE-8542 URL: https://issues.apache.org/jira/browse/HIVE-8542 Project: Hive Issue Type: Test Components: Spark Reporter: Chao Assignee: Rui Li Currently, in Spark branch, results for these two test files are very different from MR's. We need to find out the cause for this, and identify potential bug in our current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8542) Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200162#comment-14200162 ] Rui Li commented on HIVE-8542: -- Hi [~csun], let me take this one. As it seems to be a bug in group by. Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch] Key: HIVE-8542 URL: https://issues.apache.org/jira/browse/HIVE-8542 Project: Hive Issue Type: Test Components: Spark Reporter: Chao Assignee: Rui Li Currently, in Spark branch, results for these two test files are very different from MR's. We need to find out the cause for this, and identify potential bug in our current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8636) CBO: split cbo_correctness test
[ https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200181#comment-14200181 ] Hive QA commented on HIVE-8636: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679769/HIVE-8636.02.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6699 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1663/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1663/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1663/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679769 - PreCommit-HIVE-TRUNK-Build CBO: split cbo_correctness test --- Key: HIVE-8636 URL: https://issues.apache.org/jira/browse/HIVE-8636 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, HIVE-8636.02.patch, HIVE-8636.patch CBO correctness test is extremely annoying - it runs forever, if anything fails it's hard to debug due to the volume of logs from all the stuff, also it doesn't run further so if multiple things fail they can only be discovered one by one; also SORT_QUERY_RESULTS cannot be used, because some queries presumably use sorting. It should be split into separate tests, the numbers in there now may be good as boundaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8758) Fix hadoop-1 build [Spark Branch]
Xuefu Zhang created HIVE-8758: - Summary: Fix hadoop-1 build [Spark Branch] Key: HIVE-8758 URL: https://issues.apache.org/jira/browse/HIVE-8758 Project: Hive Issue Type: Bug Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang This may mean merging patches from trunk and fixing whatever problem specific to Spark branch. Here are user reported problems: Problem 1: {code} Hive Serde . FAILURE [ 2.357 s] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-serde: Compilation failure: Compilation failure: [ERROR] /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24] cannot find symbol [ERROR] symbol: class Nullable [ERROR] location: package javax.annotation [ERROR] /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36] cannot find symbol [ERROR] symbol: class Nullable [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe {code} My understanding: Looks the Nullable annotation was recently added in the recent branch. Added the below dependency in the project hive-serde {code} dependency groupIdcom.google.code.findbugs/groupId artifactIdjsr305/artifactId version3.0.0/version /dependency {code} Problem 2: After adding the dependency for hive-serde, got the below compilation error {code} [INFO] Hive Query Language FAILURE [01:35 min] /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39] error: package org.apache.hadoop.mapreduce.util does not exist {code} In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have the package “org.apache.hadoop.mapreduce.util” to circumvent it added the below dependency where we had the package (not sure, it is right – I badly wanted to make the build successful L) {code} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-core/artifactId version0.23.11/version /dependency /dependencies {code} Problem 3: After making the above change, again failed in the same project @ file /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java. In the snippet below taken from the file, we can see the “fileStatus.isFile()” is called which is not available in the “org.apache.hadoop.fs.FileStatus” hadoop1 api. {code} for (FileStatus fileStatus: fs.listStatus(folder)) { Path filePath = fileStatus.getPath(); if (!fileStatus.isFile()) { throw new HiveException(Error, not a file: + filePath); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8758) Fix hadoop-1 build [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8758: -- Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 Fix hadoop-1 build [Spark Branch] - Key: HIVE-8758 URL: https://issues.apache.org/jira/browse/HIVE-8758 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang This may mean merging patches from trunk and fixing whatever problem specific to Spark branch. Here are user reported problems: Problem 1: {code} Hive Serde . FAILURE [ 2.357 s] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-serde: Compilation failure: Compilation failure: [ERROR] /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24] cannot find symbol [ERROR] symbol: class Nullable [ERROR] location: package javax.annotation [ERROR] /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36] cannot find symbol [ERROR] symbol: class Nullable [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe {code} My understanding: Looks the Nullable annotation was recently added in the recent branch. Added the below dependency in the project hive-serde {code} dependency groupIdcom.google.code.findbugs/groupId artifactIdjsr305/artifactId version3.0.0/version /dependency {code} Problem 2: After adding the dependency for hive-serde, got the below compilation error {code} [INFO] Hive Query Language FAILURE [01:35 min] /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39] error: package org.apache.hadoop.mapreduce.util does not exist {code} In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have the package “org.apache.hadoop.mapreduce.util” to circumvent it added the below dependency where we had the package (not sure, it is right – I badly wanted to make the build successful L) {code} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-core/artifactId version0.23.11/version /dependency /dependencies {code} Problem 3: After making the above change, again failed in the same project @ file /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java. In the snippet below taken from the file, we can see the “fileStatus.isFile()” is called which is not available in the “org.apache.hadoop.fs.FileStatus” hadoop1 api. {code} for (FileStatus fileStatus: fs.listStatus(folder)) { Path filePath = fileStatus.getPath(); if (!fileStatus.isFile()) { throw new HiveException(Error, not a file: + filePath); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8758) Fix hadoop-1 build [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200185#comment-14200185 ] Xuefu Zhang commented on HIVE-8758: --- [~jxiang], since problem #3 seemed related to your recent change, I assigned the JIRA to you for investigation/fix. Thanks. Fix hadoop-1 build [Spark Branch] - Key: HIVE-8758 URL: https://issues.apache.org/jira/browse/HIVE-8758 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang This may mean merging patches from trunk and fixing whatever problem specific to Spark branch. Here are user reported problems: Problem 1: {code} Hive Serde . FAILURE [ 2.357 s] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-serde: Compilation failure: Compilation failure: [ERROR] /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24] cannot find symbol [ERROR] symbol: class Nullable [ERROR] location: package javax.annotation [ERROR] /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36] cannot find symbol [ERROR] symbol: class Nullable [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe {code} My understanding: Looks the Nullable annotation was recently added in the recent branch. Added the below dependency in the project hive-serde {code} dependency groupIdcom.google.code.findbugs/groupId artifactIdjsr305/artifactId version3.0.0/version /dependency {code} Problem 2: After adding the dependency for hive-serde, got the below compilation error {code} [INFO] Hive Query Language FAILURE [01:35 min] /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39] error: package org.apache.hadoop.mapreduce.util does not exist {code} In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have the package “org.apache.hadoop.mapreduce.util” to circumvent it added the below dependency where we had the package (not sure, it is right – I badly wanted to make the build successful L) {code} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-core/artifactId version0.23.11/version /dependency /dependencies {code} Problem 3: After making the above change, again failed in the same project @ file /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java. In the snippet below taken from the file, we can see the “fileStatus.isFile()” is called which is not available in the “org.apache.hadoop.fs.FileStatus” hadoop1 api. {code} for (FileStatus fileStatus: fs.listStatus(folder)) { Path filePath = fileStatus.getPath(); if (!fileStatus.isFile()) { throw new HiveException(Error, not a file: + filePath); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8073) Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200191#comment-14200191 ] Xuefu Zhang commented on HIVE-8073: --- Hi [~ruili], thanks for the investigation. I think we can close this task now. Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch] Key: HIVE-8073 URL: https://issues.apache.org/jira/browse/HIVE-8073 Project: Hive Issue Type: Task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Fix For: spark-branch I have seen some optimization done in the logical plan that's not applicable, such as in HIVE-8054. We should go thru all those optimizaitons to identify if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8073) Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-8073. --- Resolution: Done Fix Version/s: spark-branch Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch] Key: HIVE-8073 URL: https://issues.apache.org/jira/browse/HIVE-8073 Project: Hive Issue Type: Task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Fix For: spark-branch I have seen some optimization done in the logical plan that's not applicable, such as in HIVE-8054. We should go thru all those optimizaitons to identify if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5469) support nullif
[ https://issues.apache.org/jira/browse/HIVE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200211#comment-14200211 ] Daniel Dinnyes commented on HIVE-5469: -- Thanks for the workaround code snippet. support nullif -- Key: HIVE-5469 URL: https://issues.apache.org/jira/browse/HIVE-5469 Project: Hive Issue Type: Improvement Affects Versions: 0.11.0 Reporter: N Campbell Assignee: Navis Priority: Minor Attachments: HIVE-5469.1.patch.txt, HIVE-5469.2.patch.txt, HIVE-5469.3.patch.txt Have to express case expression to work around lack of NULLIF select nullif(cint, 1) from tint select cint, case when cint = 1 then null else cint end from tint -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200248#comment-14200248 ] Xuefu Zhang commented on HIVE-8548: --- Hi [~chengxiang li], I think nobody is going to deploy HS2 in production with local mode and HS2 embedded mode (embedded in Beeline) should behave like Hive CLI. Thus, I think it might be better to keep them consistent. Based on this, I think local should be the default whether it's Hive CLI or HS2, and they actually share the same code path. In addition, local should refer to local spark context in both cases. As to the concurrentcy problem, we just need some proper documentation. Remote spark context should be used when {{spark.master != local}}. I think his approach makes the implemention simpler with seemingly better usability. We can revist this at a later phase. Integrate with remote Spark context after HIVE-8528 [Spark Branch] -- Key: HIVE-8548 URL: https://issues.apache.org/jira/browse/HIVE-8548 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li With HIVE-8528, HiverSever2 should use remote Spark context to submit job and monitor progress, etc. This is necessary if Hive runs on standalone cluster, Yarn, or Mesos. If Hive runs with spark.master=local, we should continue using SparkContext in current way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200248#comment-14200248 ] Xuefu Zhang edited comment on HIVE-8548 at 11/6/14 2:56 PM: Hi [~chengxiang li], I think nobody is going to deploy HS2 in production with local mode and HS2 embedded mode (embedded in Beeline) should behave like Hive CLI. Thus, I think it might be better to keep them consistent. Based on this, I think local should be the default whether it's Hive CLI or HS2, and they actually share the same code path (w.r.t. spark integration). In addition, local should refer to local spark context in both cases. As to the concurrentcy problem, we just need some proper documentation. Remote spark context should be used when {{spark.master != local}}. I think his approach makes the implemention simpler with seemingly better usability. We can revist this at a later phase. was (Author: xuefuz): Hi [~chengxiang li], I think nobody is going to deploy HS2 in production with local mode and HS2 embedded mode (embedded in Beeline) should behave like Hive CLI. Thus, I think it might be better to keep them consistent. Based on this, I think local should be the default whether it's Hive CLI or HS2, and they actually share the same code path. In addition, local should refer to local spark context in both cases. As to the concurrentcy problem, we just need some proper documentation. Remote spark context should be used when {{spark.master != local}}. I think his approach makes the implemention simpler with seemingly better usability. We can revist this at a later phase. Integrate with remote Spark context after HIVE-8528 [Spark Branch] -- Key: HIVE-8548 URL: https://issues.apache.org/jira/browse/HIVE-8548 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li With HIVE-8528, HiverSever2 should use remote Spark context to submit job and monitor progress, etc. This is necessary if Hive runs on standalone cluster, Yarn, or Mesos. If Hive runs with spark.master=local, we should continue using SparkContext in current way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths
[ https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200255#comment-14200255 ] Hive QA commented on HIVE-8735: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679773/HIVE-8735.02.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6674 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1664/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1664/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1664/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679773 - PreCommit-HIVE-TRUNK-Build statistics update can fail due to long paths Key: HIVE-8735 URL: https://issues.apache.org/jira/browse/HIVE-8735 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch {noformat} 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by:
[jira] [Updated] (HIVE-8748) jdbc uber jar is missing commons-logging
[ https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8748: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. [~hagleitn] ok for 0.14 ? jdbc uber jar is missing commons-logging Key: HIVE-8748 URL: https://issues.apache.org/jira/browse/HIVE-8748 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.15.0 Attachments: HIVE-8748.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5077) Provide an option to run local task in process
[ https://issues.apache.org/jira/browse/HIVE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5077: --- Resolution: Duplicate Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Fixed via HIVE-7271 after which using {{hive.exec.submit.local.task.via.child}} local task can run in memory if desired. Provide an option to run local task in process -- Key: HIVE-5077 URL: https://issues.apache.org/jira/browse/HIVE-5077 Project: Hive Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-5077.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-3109) metastore state not cleared
[ https://issues.apache.org/jira/browse/HIVE-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3109. Resolution: Cannot Reproduce Doesnt repro anymore. metastore state not cleared --- Key: HIVE-3109 URL: https://issues.apache.org/jira/browse/HIVE-3109 Project: Hive Issue Type: Bug Reporter: Namit Jain Assignee: Ashutosh Chauhan When some of the tests are in order, random bugs are encountered. ant test -Dtestcase=TestCliDriver -Dqfile=part_inherit_tbl_props.q,stats1.q leads to an error in stats1.q We ran into this error as part of parallel testing (HIVE-3085). As part of HIVE-3085, this will be fixed temporarily by clearing hive.metastore.partition.inherit.table.properties at the end of the test. But, in general, any property set in one .q file should not affect anything in other tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5054) Remove unused property submitviachild
[ https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5054: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) HIVE-7271 relies on this to speed-up unit test Remove unused property submitviachild - Key: HIVE-5054 URL: https://issues.apache.org/jira/browse/HIVE-5054 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-5054.patch, HIVE-5054.patch This property only exist in HiveConf and is always set to false. Lets get rid of dead code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-1033) change default value of hive.exec.parallel to true
[ https://issues.apache.org/jira/browse/HIVE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-1033: --- Assignee: (was: Ashutosh Chauhan) change default value of hive.exec.parallel to true -- Key: HIVE-1033 URL: https://issues.apache.org/jira/browse/HIVE-1033 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Attachments: HIVE-1033.2.patch, HIVE-1033.3.patch, hive.1033.1.patch There is no harm in changing it to true. Inside facebook, we have been testing it and it seems to be stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6297) [Refactor] Move new Auth Interface to common/
[ https://issues.apache.org/jira/browse/HIVE-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6297: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) Seems an impossible task now [Refactor] Move new Auth Interface to common/ -- Key: HIVE-6297 URL: https://issues.apache.org/jira/browse/HIVE-6297 Project: Hive Issue Type: Task Components: Security Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6297.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200337#comment-14200337 ] Brock Noland commented on HIVE-8744: works for me! hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
[jira] [Commented] (HIVE-7777) Add CSV Serde based on OpenCSV
[ https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200341#comment-14200341 ] Brock Noland commented on HIVE-: After some research, I think that was a limitation of the original Serde: https://github.com/ogrodnek/csv-serde however, we should be able to resolve this. Can you open a JIRA for adding non-string types to the OpenCSV Serde? Add CSV Serde based on OpenCSV -- Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Ferdinand Xu Assignee: Ferdinand Xu Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, HIVE-.patch, csv-serde-master.zip There is no official support for csvSerde for hive while there is an open source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8661) JDBC MinimizeJAR should be configurable in pom.xml
[ https://issues.apache.org/jira/browse/HIVE-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8661: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Gopal! JDBC MinimizeJAR should be configurable in pom.xml -- Key: HIVE-8661 URL: https://issues.apache.org/jira/browse/HIVE-8661 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.15.0 Attachments: HIVE-8661.1.patch, HIVE-8661.2.patch A large amount of dev time is wasted waiting for JDBC to minimize JARs from 33Mb - 16Mb during developer cycles. This should only kick-in during -Pdist, allowing for disabling this during dev cycles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8661) JDBC MinimizeJAR should be configurable in pom.xml
[ https://issues.apache.org/jira/browse/HIVE-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8661: --- Issue Type: Improvement (was: Bug) JDBC MinimizeJAR should be configurable in pom.xml -- Key: HIVE-8661 URL: https://issues.apache.org/jira/browse/HIVE-8661 Project: Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.15.0 Attachments: HIVE-8661.1.patch, HIVE-8661.2.patch A large amount of dev time is wasted waiting for JDBC to minimize JARs from 33Mb - 16Mb during developer cycles. This should only kick-in during -Pdist, allowing for disabling this during dev cycles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8661) JDBC MinimizeJAR should be configurable in pom.xml
[ https://issues.apache.org/jira/browse/HIVE-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8661: --- Affects Version/s: 0.14.0 JDBC MinimizeJAR should be configurable in pom.xml -- Key: HIVE-8661 URL: https://issues.apache.org/jira/browse/HIVE-8661 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 0.15.0 Attachments: HIVE-8661.1.patch, HIVE-8661.2.patch A large amount of dev time is wasted waiting for JDBC to minimize JARs from 33Mb - 16Mb during developer cycles. This should only kick-in during -Pdist, allowing for disabling this during dev cycles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster
[ https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200353#comment-14200353 ] Eugene Koifman commented on HIVE-8754: -- this is a webhcat only change, specifically around job submission. There are no unit tests that cover this Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster --- Key: HIVE-8754 URL: https://issues.apache.org/jira/browse/HIVE-8754 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8754.2.patch, HIVE-8754.patch HIVE-8588 added support for this by copying jdbc jars to lib/ of localized/exploded Sqoop tar. Unfortunately, in a secure cluster, Dist Cache intentionally sets permissions on exploded tars such that they are not writable. this needs to be fixed, otherwise the users would have to modify their sqoop tar to include the relevant jdbc jars which is burdensome is different DBs are used and may create headache around licensing issues NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8757) YARN dep in scheduler shim should be optional
[ https://issues.apache.org/jira/browse/HIVE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200360#comment-14200360 ] Hive QA commented on HIVE-8757: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679778/HIVE-8757.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1665/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1665/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1665/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679778 - PreCommit-HIVE-TRUNK-Build YARN dep in scheduler shim should be optional - Key: HIVE-8757 URL: https://issues.apache.org/jira/browse/HIVE-8757 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8757.patch The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should be optional so that yarn doesn't pollute dependent classpaths. Users who want to use this feature must provide the yarn classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8744: -- Status: Open (was: Patch Available) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source) at
[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8744: -- Attachment: HIVE-8744.2.patch hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source) at
[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8744: -- Status: Patch Available (was: Open) Submitted new patch that changes the stats table name to v3 hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at
[jira] [Commented] (HIVE-8122) Make use of SearchArgument classes for Parquet SERDE
[ https://issues.apache.org/jira/browse/HIVE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200456#comment-14200456 ] Hive QA commented on HIVE-8122: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679808/HIVE-8122.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1666/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1666/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1666/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679808 - PreCommit-HIVE-TRUNK-Build Make use of SearchArgument classes for Parquet SERDE Key: HIVE-8122 URL: https://issues.apache.org/jira/browse/HIVE-8122 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: HIVE-8122.1.patch, HIVE-8122.patch ParquetSerde could be much cleaner if we used SearchArgument and associated classes like ORC does: https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200462#comment-14200462 ] Sergio Peña commented on HIVE-8065: --- HI [~Ferd] Thanks for trying to help. There is already basic work done for this issue in local branch for hive 0.13. I will apply the patch for trunk and commit the changes to the HIVE-8065 branch. What we don't have yet are the unit query tests. Would you like to take that task? Support HDFS encryption functionality on Hive - Key: HIVE-8065 URL: https://issues.apache.org/jira/browse/HIVE-8065 Project: Hive Issue Type: Improvement Affects Versions: 0.13.1 Reporter: Sergio Peña Assignee: Sergio Peña The new encryption support on HDFS makes Hive incompatible and unusable when this feature is used. HDFS encryption is designed so that an user can configure different encryption zones (or directories) for multi-tenant environments. An encryption zone has an exclusive encryption key, such as AES-128 or AES-256. Because of security compliance, the HDFS does not allow to move/rename files between encryption zones. Renames are allowed only inside the same encryption zone. A copy is allowed between encryption zones. See HDFS-6134 for more details about HDFS encryption design. Hive currently uses a scratch directory (like /tmp/$user/$random). This scratch directory is used for the output of intermediate data (between MR jobs) and for the final output of the hive query which is later moved to the table directory location. If Hive tables are in different encryption zones than the scratch directory, then Hive won't be able to renames those files/directories, and it will make Hive unusable. To handle this problem, we can change the scratch directory of the query/statement to be inside the same encryption zone of the table directory location. This way, the renaming process will be successful. Also, for statements that move files between encryption zones (i.e. LOAD DATA), a copy may be executed instead of a rename. This will cause an overhead when copying large data files, but it won't break the encryption on Hive. Another security thing to consider is when using joins selects. If Hive joins different tables with different encryption key strengths, then the results of the select might break the security compliance of the tables. Let's say two tables with 128 bits and 256 bits encryption are joined, then the temporary results might be stored in the 128 bits encryption zone. This will conflict with the table encrypted with 256 bits temporary. To fix this, Hive should be able to select the scratch directory that is more secured/encrypted in order to save the intermediate data temporary with no compliance issues. For instance: {noformat} SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; {noformat} - This should use a scratch directory (or staging directory) inside the table-aes256 table location. {noformat} INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; {noformat} - This should use a scratch directory inside the table-aes1 location. {noformat} FROM table-unencrypted INSERT OVERWRITE TABLE table-aes128 SELECT id, name INSERT OVERWRITE TABLE table-aes256 SELECT id, name {noformat} - This should use a scratch directory on each of the tables locations. - The first SELECT will have its scratch directory on table-aes128 directory. - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200464#comment-14200464 ] Xuefu Zhang commented on HIVE-8745: --- {quote} The other option would be to pad all values to the column spec and make sure we compute the spec as the max for the join keys. I'm not sure why you were against that in the first place {quote} I'm not sure what this refers to. Nevertheless, I think the serde should be able to trim the zeros and pad it back as long as it has the right metadata. It seems it does have the metadata for each colomns. Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8758) Fix hadoop-1 build [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200479#comment-14200479 ] Jimmy Xiang commented on HIVE-8758: --- Sure, will take a look soon. Fix hadoop-1 build [Spark Branch] - Key: HIVE-8758 URL: https://issues.apache.org/jira/browse/HIVE-8758 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Jimmy Xiang This may mean merging patches from trunk and fixing whatever problem specific to Spark branch. Here are user reported problems: Problem 1: {code} Hive Serde . FAILURE [ 2.357 s] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-serde: Compilation failure: Compilation failure: [ERROR] /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24] cannot find symbol [ERROR] symbol: class Nullable [ERROR] location: package javax.annotation [ERROR] /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36] cannot find symbol [ERROR] symbol: class Nullable [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe {code} My understanding: Looks the Nullable annotation was recently added in the recent branch. Added the below dependency in the project hive-serde {code} dependency groupIdcom.google.code.findbugs/groupId artifactIdjsr305/artifactId version3.0.0/version /dependency {code} Problem 2: After adding the dependency for hive-serde, got the below compilation error {code} [INFO] Hive Query Language FAILURE [01:35 min] /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39] error: package org.apache.hadoop.mapreduce.util does not exist {code} In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have the package “org.apache.hadoop.mapreduce.util” to circumvent it added the below dependency where we had the package (not sure, it is right – I badly wanted to make the build successful L) {code} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-core/artifactId version0.23.11/version /dependency /dependencies {code} Problem 3: After making the above change, again failed in the same project @ file /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java. In the snippet below taken from the file, we can see the “fileStatus.isFile()” is called which is not available in the “org.apache.hadoop.fs.FileStatus” hadoop1 api. {code} for (FileStatus fileStatus: fs.listStatus(folder)) { Path filePath = fileStatus.getPath(); if (!fileStatus.isFile()) { throw new HiveException(Error, not a file: + filePath); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200483#comment-14200483 ] Xuefu Zhang commented on HIVE-8745: --- [~spena], could you do some research on this? Thanks. Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27687/ --- Review request for hive and Xuefu Zhang. Bugs: HIVE-8649 https://issues.apache.org/jira/browse/HIVE-8649 Repository: hive-git Description --- First patch for HIVE-8649, to increase the number of reducers for spark based on some info about the spark cluster. We need to add a SparkListener to handle cluster status change if such events are supported by spark. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java 2dbb5a3 Diff: https://reviews.apache.org/r/27687/diff/ Testing --- Thanks, Jimmy Xiang
[jira] [Commented] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200500#comment-14200500 ] Jimmy Xiang commented on HIVE-8649: --- Sure. Here is the patch on RB: https://reviews.apache.org/r/27687/. Thanks. Increase level of parallelism in reduce phase [Spark Branch] Key: HIVE-8649 URL: https://issues.apache.org/jira/browse/HIVE-8649 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8649.1-spark.patch We calculate the number of reducers based on the same code for MapReduce. However, reducers are vastly cheaper in Spark and it's generally recommended we have many more reducers than in MR. Sandy Ryza who works on Spark has some ideas about a heuristic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3779) An empty value to hive.logquery.location can't disable the creation of hive history log files
[ https://issues.apache.org/jira/browse/HIVE-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200521#comment-14200521 ] Anthony Hsu commented on HIVE-3779: --- In case you're still using an older version of Hive that doesn't let you disable the history log files, one workaround that you can use is to run {code} !rm -r /path/to/hive.querylog.location; {code} as your first shell command before running your queries. An empty value to hive.logquery.location can't disable the creation of hive history log files - Key: HIVE-3779 URL: https://issues.apache.org/jira/browse/HIVE-3779 Project: Hive Issue Type: Bug Components: Documentation Affects Versions: 0.9.0 Reporter: Bing Li Priority: Minor In AdminManual Configuration (https://cwiki.apache.org/Hive/adminmanual-configuration.html), the description of hive.querylog.location mentioned that if the variable set to empty string structured log will not be created. But it fails with the following setting, property namehive.querylog.location/name value/value /property It seems that it can NOT get an empty value from HiveConf.ConfVars.HIVEHISTORYFILELOC, but the default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8757) YARN dep in scheduler shim should be optional
[ https://issues.apache.org/jira/browse/HIVE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8757: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) YARN dep in scheduler shim should be optional - Key: HIVE-8757 URL: https://issues.apache.org/jira/browse/HIVE-8757 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.15.0 Attachments: HIVE-8757.patch The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should be optional so that yarn doesn't pollute dependent classpaths. Users who want to use this feature must provide the yarn classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper
Vaibhav Gumashta created HIVE-8759: -- Summary: HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper Key: HIVE-8759 URL: https://issues.apache.org/jira/browse/HIVE-8759 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8698) default log4j.properties not included in jar files anymore
[ https://issues.apache.org/jira/browse/HIVE-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8698: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.14 and trunk. Thank you Thejas! default log4j.properties not included in jar files anymore -- Key: HIVE-8698 URL: https://issues.apache.org/jira/browse/HIVE-8698 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8698.1.patch trunk and hive 0.14 based builds no longer have hive-log4j.properties in the jars. This means that in default tar, unless hive-log4j.properties is created in conf dir (from hive-log4j.properties.template file), hive cli is much more verbose in what is printed to console. Hiveserver2 fails to come up, as it errors out with - org.apache.hadoop.hive.common.LogUtils$LogInitializationException: Unable to initialize logging using hive-log4j.properties, not found on CLASSPATH! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper
[ https://issues.apache.org/jira/browse/HIVE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8759: --- Attachment: HIVE-8759.1.patch HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper --- Key: HIVE-8759 URL: https://issues.apache.org/jira/browse/HIVE-8759 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-8759.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8674) Fix tests after merge [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200546#comment-14200546 ] Xuefu Zhang commented on HIVE-8674: --- Hi [~brocknoland], is this ready to be committed? It looks like auto_join29.q failure is due to a known issue, but I'm not sure of nullscan. Other failures also seems existing. Fix tests after merge [Spark Branch] Key: HIVE-8674 URL: https://issues.apache.org/jira/browse/HIVE-8674 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8611) grant/revoke syntax should support additional objects for authorization plugins
[ https://issues.apache.org/jira/browse/HIVE-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-8611: -- Attachment: HIVE-8611.4.patch Update patch to fix test failure due to new error message grant/revoke syntax should support additional objects for authorization plugins --- Key: HIVE-8611 URL: https://issues.apache.org/jira/browse/HIVE-8611 Project: Hive Issue Type: Bug Components: Authentication, SQL Affects Versions: 0.13.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Fix For: 0.14.0 Attachments: HIVE-8611.1.patch, HIVE-8611.2.patch, HIVE-8611.2.patch, HIVE-8611.3.patch, HIVE-8611.4.patch The authorization framework supports URI and global objects. The SQL syntax however doesn't allow granting privileges on these objects. We should allow the compiler to parse these so that it can be handled by authorization plugins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper
[ https://issues.apache.org/jira/browse/HIVE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8759: --- Status: Patch Available (was: Open) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper --- Key: HIVE-8759 URL: https://issues.apache.org/jira/browse/HIVE-8759 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-8759.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200561#comment-14200561 ] Hive QA commented on HIVE-8744: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679872/HIVE-8744.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1667/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1667/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1667/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679872 - PreCommit-HIVE-TRUNK-Build hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines
[ https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200562#comment-14200562 ] Brock Noland commented on HIVE-8561: Hi [~nyang], Thank you for the annotations! Since the CBO is such a huge component and in it's infancy, I feel like {{Unstable}} might be more appropriate than {{Evolving}}. However, before making that change I think we should settle with [~jpullokkaran] the correct way to perform this integration. bq. Why can't Drill be plugged in as another execution engine just like MR, TEZ, Spark? [~jpullokkaran] It's reasonable for Drill to add API's in order to use the query plan. The Drill project like many other projects are users of Hive. As mentioned previously, it's important to agree upon some kind of api visibility and stability. Na has agreed to an unstable interface (It is the caller's responsibility to follow the hive side change). As one of the CBO experts, if there is a better an alternative implementation, could you please share how this could be improved? Expose Hive optiq operator tree to be able to support other sql on hadoop query engines --- Key: HIVE-8561 URL: https://issues.apache.org/jira/browse/HIVE-8561 Project: Hive Issue Type: Task Components: CBO Affects Versions: 0.14.0 Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-8561.2.patch, HIVE-8561.3.patch, HIVE-8561.patch Hive-0.14 added cost based optimization and optiq operator tree is created for select queries. However, the optiq operator tree is not visible from outside and hard to be used by other Sql on Hadoop query engine such as apache Drill. To be able to allow drill to access the hive optiq operator tree, we need to add a public api to return the hive optiq operator tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8636) CBO: split cbo_correctness test
[ https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200588#comment-14200588 ] Sergey Shelukhin commented on HIVE-8636: test failures are unrelated CBO: split cbo_correctness test --- Key: HIVE-8636 URL: https://issues.apache.org/jira/browse/HIVE-8636 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, HIVE-8636.02.patch, HIVE-8636.patch CBO correctness test is extremely annoying - it runs forever, if anything fails it's hard to debug due to the volume of logs from all the stuff, also it doesn't run further so if multiple things fail they can only be discovered one by one; also SORT_QUERY_RESULTS cannot be used, because some queries presumably use sorting. It should be split into separate tests, the numbers in there now may be good as boundaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8636) CBO: split cbo_correctness test
[ https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200606#comment-14200606 ] Ashutosh Chauhan commented on HIVE-8636: +1 CBO: split cbo_correctness test --- Key: HIVE-8636 URL: https://issues.apache.org/jira/browse/HIVE-8636 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, HIVE-8636.02.patch, HIVE-8636.patch CBO correctness test is extremely annoying - it runs forever, if anything fails it's hard to debug due to the volume of logs from all the stuff, also it doesn't run further so if multiple things fail they can only be discovered one by one; also SORT_QUERY_RESULTS cannot be used, because some queries presumably use sorting. It should be split into separate tests, the numbers in there now may be good as boundaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200612#comment-14200612 ] Prasanth J commented on HIVE-8753: -- +1 TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk - Key: HIVE-8753 URL: https://issues.apache.org/jira/browse/HIVE-8753 Project: Hive Issue Type: Test Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8753.patch Because of HIVE-7111 needs .q.out update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8753: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk - Key: HIVE-8753 URL: https://issues.apache.org/jira/browse/HIVE-8753 Project: Hive Issue Type: Test Components: Logical Optimizer Affects Versions: 0.15.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.15.0 Attachments: HIVE-8753.patch Because of HIVE-7111 needs .q.out update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200636#comment-14200636 ] Jason Dere commented on HIVE-8745: -- {quote} Nevertheless, I think the serde should be able to trim the zeros and pad it back as long as it has the right metadata. It seems it does have the metadata for each colomns. {quote} We have the metadata for the column, but not for individual values in each row. If you have a decimal(10,5) column, but the values {noformat} 1 1.0 1.00 {noformat} The only thing we could get from the column metadata is the precision=5, so we could pad everything to 1.0. To know how many extra zeros we need for each value of the column, we would have to save something for each value. Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200659#comment-14200659 ] Szehon Ho commented on HIVE-8744: - +1, thanks. hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source) ... 30 more Caused by: ERROR 22001: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown
[jira] [Commented] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time
[ https://issues.apache.org/jira/browse/HIVE-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200658#comment-14200658 ] Dain Sundstrom commented on HIVE-8746: -- A good first step would be to record the writer timezone in the file postscript. Then the current reader could throw an exception if the JVM timezone doesn't match the timezone declared in the postscript. Then when someone has more time, they could adjust the base epoch to the file timezone. What do you think? ORC timestamp columns are sensitive to daylight savings time Key: HIVE-8746 URL: https://issues.apache.org/jira/browse/HIVE-8746 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Hive uses Java's Timestamp class to manipulate timestamp columns. Unfortunately the textual parsing in Timestamp is done in local time and the internal storage is in UTC. ORC mostly side steps this issue by storing the difference between the time and a base time also in local and storing that difference in the file. Reading the file between timezones will mostly work correctly 2014-01-01 12:34:56 will read correctly in every timezone. However, when moving between timezones with different daylight saving it creates trouble. In particular, moving from a computer in PST to UTC will read 2014-06-06 12:34:56 as 2014-06-06 11:34:56. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8636) CBO: split cbo_correctness test
[ https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8636: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Sergey! CBO: split cbo_correctness test --- Key: HIVE-8636 URL: https://issues.apache.org/jira/browse/HIVE-8636 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.15.0 Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, HIVE-8636.02.patch, HIVE-8636.patch CBO correctness test is extremely annoying - it runs forever, if anything fails it's hard to debug due to the volume of logs from all the stuff, also it doesn't run further so if multiple things fail they can only be discovered one by one; also SORT_QUERY_RESULTS cannot be used, because some queries presumably use sorting. It should be split into separate tests, the numbers in there now may be good as boundaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27687/#review60210 --- ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java https://reviews.apache.org/r/27687/#comment101574 I don't feel we need to cache this, as this can change during a user session. ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java https://reviews.apache.org/r/27687/#comment101575 Can we document what are in the tuple, especially what each means? ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java https://reviews.apache.org/r/27687/#comment101576 I'm not sure why this needs to be synchronized. Will this method be called by concurrent threads? It doesn't seem to be the case. - Xuefu Zhang On Nov. 6, 2014, 5:25 p.m., Jimmy Xiang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27687/ --- (Updated Nov. 6, 2014, 5:25 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8649 https://issues.apache.org/jira/browse/HIVE-8649 Repository: hive-git Description --- First patch for HIVE-8649, to increase the number of reducers for spark based on some info about the spark cluster. We need to add a SparkListener to handle cluster status change if such events are supported by spark. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java 2dbb5a3 Diff: https://reviews.apache.org/r/27687/diff/ Testing --- Thanks, Jimmy Xiang
[jira] [Updated] (HIVE-8674) Fix tests after merge [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8674: --- Attachment: HIVE-8674.2-spark.patch Fix tests after merge [Spark Branch] Key: HIVE-8674 URL: https://issues.apache.org/jira/browse/HIVE-8674 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch, HIVE-8674.2-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200678#comment-14200678 ] Gunther Hagleitner commented on HIVE-8745: -- [~xuefuz] I'm going to revert HIVE-7373. I think that's reasonable given that it causes this issues. I'm also worried to change the behavior of decimals in hive .14 again if there's still questions about it. We've changed the behavior in .12 - .13 already and it caused a lot of grief to some folks. Given that BinarySortableSerde is involved, we also need to look into window functions, group by w/ w/o map-side aggr etc. Jason also brings up another good point: Performance. The decision to maintain the trailing zeroes for each individual value instead of at the column level, means that we will never be able to simply encode decimals in two longs, which was the idea behind limiting the precision of decimals in the first place. Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large
[ https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200683#comment-14200683 ] Prasanth J commented on HIVE-8744: -- HIVE-8735 is also addressing the same problem. Usually the client which publishes provides the key (FSOperator, StatsTask) has some logic to trim down the length of the key using MD5 hash. If the key gets greater than max stats key prefix (from hive config), Utilities.getHashedPrefixKey() method is invoked to get a smaller length key. Can you try with the patch from HIVE-8735 to see if the test case works? HIVE-8735 truncates the key before publishing. hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large - Key: HIVE-8744 URL: https://issues.apache.org/jira/browse/HIVE-8744 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch This test is related to the bug HIVE-8065 where I am trying to support HDFS encryption. One of the enhancements to support it is to create a .hive-staging directory on the same table directory location where the query is executed. Now, when running the hbase_stats3.q test from a temporary directory that has a large path, then the new path, a combination of table location + .hive-staging + random temporary subdirectories, is too large to fit into the statistics table, so the path is truncated. This causes the following error: {noformat} 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error during publishing statistics. java.sql.SQLDataException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown Source) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145) at org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.sql.SQLException: A truncation error was encountered trying to shrink VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255. at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
[jira] [Commented] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200711#comment-14200711 ] Xuefu Zhang commented on HIVE-8649: --- Some comments on review board. Increase level of parallelism in reduce phase [Spark Branch] Key: HIVE-8649 URL: https://issues.apache.org/jira/browse/HIVE-8649 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-8649.1-spark.patch We calculate the number of reducers based on the same code for MapReduce. However, reducers are vastly cheaper in Spark and it's generally recommended we have many more reducers than in MR. Sandy Ryza who works on Spark has some ideas about a heuristic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8760) Pass a copy of HiveConf to hooks
Ashutosh Chauhan created HIVE-8760: -- Summary: Pass a copy of HiveConf to hooks Key: HIVE-8760 URL: https://issues.apache.org/jira/browse/HIVE-8760 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0, 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan because hadoop's {{Configuration}} is not thread-safe -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200716#comment-14200716 ] Xuefu Zhang commented on HIVE-8745: --- Okay. Could you guys add this repro case so that when HIVE-7373 is revisted the issue here can be caught early? Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8760) Pass a copy of HiveConf to hooks
[ https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8760: --- Status: Patch Available (was: Open) Pass a copy of HiveConf to hooks Key: HIVE-8760 URL: https://issues.apache.org/jira/browse/HIVE-8760 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0, 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8760.patch because hadoop's {{Configuration}} is not thread-safe -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8760) Pass a copy of HiveConf to hooks
[ https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8760: --- Attachment: HIVE-8760.patch Pass a copy of HiveConf to hooks Key: HIVE-8760 URL: https://issues.apache.org/jira/browse/HIVE-8760 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0, 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8760.patch because hadoop's {{Configuration}} is not thread-safe -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8633) Move Alan Gates from committer list to PMC list on website
[ https://issues.apache.org/jira/browse/HIVE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8633: - Attachment: HIVE-8633.patch Move Alan Gates from committer list to PMC list on website -- Key: HIVE-8633 URL: https://issues.apache.org/jira/browse/HIVE-8633 Project: Hive Issue Type: Task Components: Website Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8633.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8633) Move Alan Gates from committer list to PMC list on website
[ https://issues.apache.org/jira/browse/HIVE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8633: - Status: Patch Available (was: Open) Move Alan Gates from committer list to PMC list on website -- Key: HIVE-8633 URL: https://issues.apache.org/jira/browse/HIVE-8633 Project: Hive Issue Type: Task Components: Website Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8633.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8674) Fix tests after merge [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200743#comment-14200743 ] Xuefu Zhang commented on HIVE-8674: --- I'm not sure it helps, but this one constantly fails, shown also in http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/317/testReport. Trunk doesn't seem to have this. Maybe we let it be until after the next merge. Fix tests after merge [Spark Branch] Key: HIVE-8674 URL: https://issues.apache.org/jira/browse/HIVE-8674 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch, HIVE-8674.2-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join
[ https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200771#comment-14200771 ] Jason Dere commented on HIVE-8745: -- Sounds like a good idea, will do. Joins on decimal keys return different results whether they are run as reduce join or map join -- Key: HIVE-8745 URL: https://issues.apache.org/jira/browse/HIVE-8745 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Jason Dere Priority: Critical Fix For: 0.14.0 Attachments: join_test.q See attached .q file to reproduce. The difference seems to be whether trailing 0s are considered the same value or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8748) jdbc uber jar is missing commons-logging
[ https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200784#comment-14200784 ] Gunther Hagleitner commented on HIVE-8748: -- +1 for hive .14 jdbc uber jar is missing commons-logging Key: HIVE-8748 URL: https://issues.apache.org/jira/browse/HIVE-8748 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.15.0 Attachments: HIVE-8748.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster
[ https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200790#comment-14200790 ] Gunther Hagleitner commented on HIVE-8754: -- Alright. +1 for hive.14 Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster --- Key: HIVE-8754 URL: https://issues.apache.org/jira/browse/HIVE-8754 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8754.2.patch, HIVE-8754.patch HIVE-8588 added support for this by copying jdbc jars to lib/ of localized/exploded Sqoop tar. Unfortunately, in a secure cluster, Dist Cache intentionally sets permissions on exploded tars such that they are not writable. this needs to be fixed, otherwise the users would have to modify their sqoop tar to include the relevant jdbc jars which is burdensome is different DBs are used and may create headache around licensing issues NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 27672: HIVE-8726 Collect Spark TaskMetrics and build job statistic[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27672/#review60224 --- Ship it! Ship It! - Xuefu Zhang On Nov. 6, 2014, 7:30 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27672/ --- (Updated Nov. 6, 2014, 7:30 a.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8726 https://issues.apache.org/jira/browse/HIVE-8726 Repository: hive-git Description --- collection spark task metrics and combine into job level metric and build into SparkStatistics. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 7ab9a34 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java f6cc581 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/JobStateListener.java b4f753f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/SimpleSparkJobStatus.java 78e16c5 Diff: https://reviews.apache.org/r/27672/diff/ Testing --- Thanks, chengxiang li
[jira] [Commented] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200797#comment-14200797 ] Xuefu Zhang commented on HIVE-8726: --- +1 Collect Spark TaskMetrics and build job statistic[Spark Branch] --- Key: HIVE-8726 URL: https://issues.apache.org/jira/browse/HIVE-8726 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M3 Attachments: HIVE-8726.1-spark.patch Implement SparkListener to collect TaskMetrics, and build SparkStatistic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8633) Move Alan Gates from committer list to PMC list on website
[ https://issues.apache.org/jira/browse/HIVE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200800#comment-14200800 ] Ashutosh Chauhan commented on HIVE-8633: +1 Don't think we need to pay Hive QA cycles for this : ) Move Alan Gates from committer list to PMC list on website -- Key: HIVE-8633 URL: https://issues.apache.org/jira/browse/HIVE-8633 Project: Hive Issue Type: Task Components: Website Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8633.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8726: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks to Chengxiang for this wonderful contribution. Collect Spark TaskMetrics and build job statistic[Spark Branch] --- Key: HIVE-8726 URL: https://issues.apache.org/jira/browse/HIVE-8726 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8726.1-spark.patch Implement SparkListener to collect TaskMetrics, and build SparkStatistic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]
On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 86 https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line86 Can we document what are in the tuple, especially what each means? Sure. Will add a doc. On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 75 https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line75 I don't feel we need to cache this, as this can change during a user session. Yes, it will change during a user session. I was thinking to update this when things are changed base on some event callbacks. Such info may be needed many times if there are many reducers. It should save us some time to go to the Spark master (assuming getExecutorMemoryStatus checking with the master). On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 89 https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line89 I'm not sure why this needs to be synchronized. Will this method be called by concurrent threads? It doesn't seem to be the case. Are you saying it won't be called by many threads? Each JVM can run one query at a time during all deployment modes? How come SparkClient.getInstance is synchronized? - Jimmy --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27687/#review60210 --- On Nov. 6, 2014, 5:25 p.m., Jimmy Xiang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27687/ --- (Updated Nov. 6, 2014, 5:25 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8649 https://issues.apache.org/jira/browse/HIVE-8649 Repository: hive-git Description --- First patch for HIVE-8649, to increase the number of reducers for spark based on some info about the spark cluster. We need to add a SparkListener to handle cluster status change if such events are supported by spark. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java 2dbb5a3 Diff: https://reviews.apache.org/r/27687/diff/ Testing --- Thanks, Jimmy Xiang
[jira] [Commented] (HIVE-8748) jdbc uber jar is missing commons-logging
[ https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200806#comment-14200806 ] Ashutosh Chauhan commented on HIVE-8748: Committed to 0.14 jdbc uber jar is missing commons-logging Key: HIVE-8748 URL: https://issues.apache.org/jira/browse/HIVE-8748 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-8748.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8748) jdbc uber jar is missing commons-logging
[ https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8748: --- Fix Version/s: (was: 0.15.0) 0.14.0 jdbc uber jar is missing commons-logging Key: HIVE-8748 URL: https://issues.apache.org/jira/browse/HIVE-8748 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-8748.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8760) Pass a copy of HiveConf to hooks
[ https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200809#comment-14200809 ] Gopal V commented on HIVE-8760: --- This is a perf regression for all well-written Hive hooks that exist. There is previous information to indicate copying Configuration is not a fast operation. In HIVE-4486 we've gone from a query which took 347.66 seconds down to 218 seconds by throwing out unnecessary {{new HiveConf();}} calls. If this is a thread-safety issue, then the hook spawning its own threads should synchronize - since this is class base config, which is pluggable that is very clearly the minimum impact fix. {code} @Override public void run(final HookContext hookContext) throws Exception { final long currentTime = System.currentTimeMillis(); + final HiveConf confCopy = new HiveConf(hookContext.getConf()); executor.submit(new Runnable() { ... // use local value off the closure capture in thread runnable {code} Pass a copy of HiveConf to hooks Key: HIVE-8760 URL: https://issues.apache.org/jira/browse/HIVE-8760 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0, 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8760.patch because hadoop's {{Configuration}} is not thread-safe -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8760) Pass a copy of HiveConf to hooks
[ https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200813#comment-14200813 ] Ashutosh Chauhan commented on HIVE-8760: But why to make an assumption about what hook is doing? Isnt it prudent that Hive does a safe thing when it can. Pass a copy of HiveConf to hooks Key: HIVE-8760 URL: https://issues.apache.org/jira/browse/HIVE-8760 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0, 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8760.patch because hadoop's {{Configuration}} is not thread-safe -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper
[ https://issues.apache.org/jira/browse/HIVE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200827#comment-14200827 ] Hive QA commented on HIVE-8759: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679892/HIVE-8759.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6700 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1668/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1668/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1668/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679892 - PreCommit-HIVE-TRUNK-Build HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper --- Key: HIVE-8759 URL: https://issues.apache.org/jira/browse/HIVE-8759 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-8759.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper
[ https://issues.apache.org/jira/browse/HIVE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200839#comment-14200839 ] Vaibhav Gumashta commented on HIVE-8759: Test failures are unrelated. HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper --- Key: HIVE-8759 URL: https://issues.apache.org/jira/browse/HIVE-8759 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-8759.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8674) Fix tests after merge [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200842#comment-14200842 ] Hive QA commented on HIVE-8674: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679911/HIVE-8674.2-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7123 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testTimezone org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/318/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/318/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-318/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679911 - PreCommit-HIVE-SPARK-Build Fix tests after merge [Spark Branch] Key: HIVE-8674 URL: https://issues.apache.org/jira/browse/HIVE-8674 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch, HIVE-8674.2-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster
[ https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-8754: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~thejas] for the review. Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster --- Key: HIVE-8754 URL: https://issues.apache.org/jira/browse/HIVE-8754 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Fix For: 0.14.0, 0.15.0 Attachments: HIVE-8754.2.patch, HIVE-8754.patch HIVE-8588 added support for this by copying jdbc jars to lib/ of localized/exploded Sqoop tar. Unfortunately, in a secure cluster, Dist Cache intentionally sets permissions on exploded tars such that they are not writable. this needs to be fixed, otherwise the users would have to modify their sqoop tar to include the relevant jdbc jars which is burdensome is different DBs are used and may create headache around licensing issues NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8674) Fix tests after merge [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200848#comment-14200848 ] Brock Noland commented on HIVE-8674: Ok, I will just commit the fix for parallel.q resolve this guy. Fix tests after merge [Spark Branch] Key: HIVE-8674 URL: https://issues.apache.org/jira/browse/HIVE-8674 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch, HIVE-8674.2-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8761) JDOPersistenceManager creation should be controlled by at the Server level and not Thread level
Vaibhav Gumashta created HIVE-8761: -- Summary: JDOPersistenceManager creation should be controlled by at the Server level and not Thread level Key: HIVE-8761 URL: https://issues.apache.org/jira/browse/HIVE-8761 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.15.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta When using JDO, we create a thread local RawStore (ObjectStore) object in each metastore thread. This leads to creation of a new JDOPersistenceManager per thread which are cached in JDOPersistanceManagerFactory. To remove JDOPersistenceManager from JDOPersistanceManagerFactory, an explicit JDOPersistenceManager.close needs to be called. This is a bad candidate for thread local as the effective object destruction requires the application to call close. So, when metastore threads are killed by the threadpool, this object will never be removed from the JDOPersistanceManagerFactory cache. We fixed this for HiveServer2 using embedded metastore (HIVE-7353) by customizing the GC collection of the dying thread, but I believe a better and more efficient solution is to pool JDOPersistenceManager objects and let each thread get an object for its use from the pool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8761) JDOPersistenceManager creation should be controlled by at the Server level and not Thread level
[ https://issues.apache.org/jira/browse/HIVE-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8761: --- Fix Version/s: 0.15.0 JDOPersistenceManager creation should be controlled by at the Server level and not Thread level --- Key: HIVE-8761 URL: https://issues.apache.org/jira/browse/HIVE-8761 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.15.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.15.0 When using JDO, we create a thread local RawStore (ObjectStore) object in each metastore thread. This leads to creation of a new JDOPersistenceManager per thread which are cached in JDOPersistanceManagerFactory. To remove JDOPersistenceManager from JDOPersistanceManagerFactory, an explicit JDOPersistenceManager.close needs to be called. This is a bad candidate for thread local as the effective object destruction requires the application to call close. So, when metastore threads are killed by the threadpool, this object will never be removed from the JDOPersistanceManagerFactory cache. We fixed this for HiveServer2 using embedded metastore (HIVE-7353) by customizing the GC collection of the dying thread, but I believe a better and more efficient solution is to pool JDOPersistenceManager objects and let each thread get an object for its use from the pool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8760) Pass a copy of HiveConf to hooks
[ https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200852#comment-14200852 ] Gopal V commented on HIVE-8760: --- Immutable structures are prudent from an API design stand-point - copying always to get to a shared-nothing model potentially breaks anyone who relied on synchronization on that object elsewhere. The stability impact of copying is currently invisible and unknown, but eventually a lot of System.identityHashcode is applied to debug those and System.err, because LOG.info() is synchronized. The performance impact of however is well known (as quoted earlier). The core API issue over-all for me is that we don't have immutable Conf objects - I keep hitting these {{new Configuration()}} perf issues (track HADOOP-11223 for the impact on HDFS). At the very least, I know the stability impact of copying in one Hook, the surface is rather narrow for that problem to trace through (i.e ship Hook2.java, Hook3.java etc and test them without rebuilding all of hive). On top of it, the biggest user of Hooks seems to be itests (which ships something like 20 single thread hooks). You'll be slowing down all of them, all the time. Pass a copy of HiveConf to hooks Key: HIVE-8760 URL: https://issues.apache.org/jira/browse/HIVE-8760 Project: Hive Issue Type: Bug Components: Configuration Affects Versions: 0.13.0, 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8760.patch because hadoop's {{Configuration}} is not thread-safe -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster
[ https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-8754: Fix Version/s: (was: 0.15.0) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster --- Key: HIVE-8754 URL: https://issues.apache.org/jira/browse/HIVE-8754 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8754.2.patch, HIVE-8754.patch HIVE-8588 added support for this by copying jdbc jars to lib/ of localized/exploded Sqoop tar. Unfortunately, in a secure cluster, Dist Cache intentionally sets permissions on exploded tars such that they are not writable. this needs to be fixed, otherwise the users would have to modify their sqoop tar to include the relevant jdbc jars which is burdensome is different DBs are used and may create headache around licensing issues NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]
On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 75 https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line75 I don't feel we need to cache this, as this can change during a user session. Jimmy Xiang wrote: Yes, it will change during a user session. I was thinking to update this when things are changed base on some event callbacks. Such info may be needed many times if there are many reducers. It should save us some time to go to the Spark master (assuming getExecutorMemoryStatus checking with the master). 1. I don't think there will be a callback. 2. Yeah, it will be called many times if there are multiple reducers. Therefore, it probably makes sense to put the info at SetSparkReducerParallelism, which is created for each query. 3. You also need to make sure this works for Spark standalone cluster. I'm not sure if you can get number of exectors/memory in the same way. On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 89 https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line89 I'm not sure why this needs to be synchronized. Will this method be called by concurrent threads? It doesn't seem to be the case. Jimmy Xiang wrote: Are you saying it won't be called by many threads? Each JVM can run one query at a time during all deployment modes? How come SparkClient.getInstance is synchronized? Yeah. Right now this is a little messy. Changes are coming. Concurrency isn't tested yet. It's fine to leave the synchronization there. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27687/#review60210 --- On Nov. 6, 2014, 5:25 p.m., Jimmy Xiang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27687/ --- (Updated Nov. 6, 2014, 5:25 p.m.) Review request for hive and Xuefu Zhang. Bugs: HIVE-8649 https://issues.apache.org/jira/browse/HIVE-8649 Repository: hive-git Description --- First patch for HIVE-8649, to increase the number of reducers for spark based on some info about the spark cluster. We need to add a SparkListener to handle cluster status change if such events are supported by spark. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java 2dbb5a3 Diff: https://reviews.apache.org/r/27687/diff/ Testing --- Thanks, Jimmy Xiang
[jira] [Created] (HIVE-8762) HiveMetaStore.BooleanPointer should be replaced with an AtomicBoolean
Alan Gates created HIVE-8762: Summary: HiveMetaStore.BooleanPointer should be replaced with an AtomicBoolean Key: HIVE-8762 URL: https://issues.apache.org/jira/browse/HIVE-8762 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates AtomicBoolean will serve the same purpose, with the added bonus that it will perform correctly if two threads try to write to it simultaneously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8711) DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer
[ https://issues.apache.org/jira/browse/HIVE-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200854#comment-14200854 ] Alan Gates commented on HIVE-8711: -- Created HIVE-8762 for the BooleanPointer issue brought up by Eugene. DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer -- Key: HIVE-8711 URL: https://issues.apache.org/jira/browse/HIVE-8711 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Critical Fix For: 0.14.0 Attachments: HIVE-8711.2.patch, HIVE-8711.patch TxnHandler.detectDeadlock has code to catch deadlocks in MySQL and Derby. But it does not detect a deadlock for Postgres, Oracle, or SQLServer -- This message was sent by Atlassian JIRA (v6.3.4#6332)