date:20141106


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199970#comment-14199970
 ] 

Gunther Hagleitner commented on HIVE-8745:
--

[~xuefuz] [~jdere] is right. You can't have it both ways. I don't see how you 
create an object that compares as equal on the byte-level but then magically 
reconstructs additional information on deserialization. You could add info to 
the value part of the MR key/value tuple but that's an unnecessarily complex 
solution. As [~jdere] says: This is a regression and I think we should revert 
HIVE-7373.

The other option would be to pad all values to the column spec and make sure we 
compute the spec as the max for the join keys. I'm not sure why you were 
against that in the first place - it seems that's what most DBs do. However, 
that's complicated and should be tackled in 0.15.0.

 Joins on decimal keys return different results whether they are run as reduce 
 join or map join
 --

 Key: HIVE-8745
 URL: https://issues.apache.org/jira/browse/HIVE-8745
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Jason Dere
Priority: Critical
 Fix For: 0.14.0

 Attachments: join_test.q


 See attached .q file to reproduce. The difference seems to be whether 
 trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8122) Make use of SearchArgument classes for Parquet SERDE

2014-11-06 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8122:
---
Attachment: HIVE-8122.1.patch

Thank Szehon for your update. Update patch with following changes: 
1. fix code-style issues
2. fix failed cases
3. fix NPE issues

 Make use of SearchArgument classes for Parquet SERDE
 

 Key: HIVE-8122
 URL: https://issues.apache.org/jira/browse/HIVE-8122
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8122.1.patch, HIVE-8122.patch


 ParquetSerde could be much cleaner if we used SearchArgument and associated 
 classes like ORC does:
 https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly


[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1416#comment-1416
 ] 

Hive QA commented on HIVE-8732:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679704/HIVE-8732.patch

{color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 6680 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_10_0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid
org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationComplexExpr
org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationLargeMaxSplit
org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationSmallMaxSplit
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1658/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1658/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1658/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 32 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679704 - PreCommit-HIVE-TRUNK-Build

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1425#comment-1425
 ] 

Hive QA commented on HIVE-8726:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679804/HIVE-8726.1-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7123 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testTimezone
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/317/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/317/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-317/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679804 - PreCommit-HIVE-SPARK-Build

 Collect Spark TaskMetrics and build job statistic[Spark Branch]
 ---

 Key: HIVE-8726
 URL: https://issues.apache.org/jira/browse/HIVE-8726
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M3
 Attachments: HIVE-8726.1-spark.patch


 Implement SparkListener to collect TaskMetrics, and build SparkStatistic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8073) Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]

2014-11-06 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200023#comment-14200023
 ] 

Rui Li commented on HIVE-8073:
--

Hi [~xuefuz], I've investigated all the optimizations in {{Optimizer}} and I 
don't think there's any optimizer unsuitable for spark. I'm not saying they 
will all work properly with spark (we need to enable more tests to catch that), 
but I think the ideas behind them apply to spark as they apply to MR.

 Go thru all operator plan optimizations and disable those that are not 
 suitable for Spark [Spark Branch]
 

 Key: HIVE-8073
 URL: https://issues.apache.org/jira/browse/HIVE-8073
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li

 I have seen some optimization done in the logical plan that's not applicable, 
 such as in HIVE-8054. We should go thru all those optimizaitons to identify 
 if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk


[ 
https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200035#comment-14200035
 ] 

Hive QA commented on HIVE-8753:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679735/HIVE-8753.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1659/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1659/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1659/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679735 - PreCommit-HIVE-TRUNK-Build

 TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
 -

 Key: HIVE-8753
 URL: https://issues.apache.org/jira/browse/HIVE-8753
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer
Affects Versions: 0.15.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8753.patch


 Because of HIVE-7111 
 needs .q.out update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7777) Add CSV Serde based on OpenCSV

2014-11-06 Thread Alon Goldshuv (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200053#comment-14200053
 ] 

Alon Goldshuv commented on HIVE-:
-

While the serde works fine, it has an issue, which is quite serious IMO - It 
forces all the column types to String. This means that running a query on data 
that isn't all string type can return wrong query results. In the unit tests I 
see a single example of a table using all string columns, and in the tests 
linked here there are many tables with non-string types, but all the queries 
seem to be simple COUNT(*), which won't catch the problem.

Consider the following example:

{noformat}
CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) 
ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with 
serdeproperties (separatorChar = ,,quoteChar= ',escapeChar= \\) 
STORED AS TEXTFILE 
LOCATION 'some location' 
tblproperties (skip.header.line.count=1);
{noformat}

Now consider this sql:

hive select min(totalprice) from test;

in this case given my data, the result should have been 874.89, but the actual 
result became 11.57 (as it is first according to byte ordering of a string 
type). this is a wrong result.

hive desc extended test;
OK
o_totalpricestring  from deserializer
...

I apologize if it's a false alarm and I'm misusing the DDL somehow. Otherwise - 
this is a concern as wrong query results is a bad thing...


 Add CSV Serde based on OpenCSV
 --

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.patch, csv-serde-master.zip


 There is no official support for csvSerde for hive while there is an open 
 source project in github(https://github.com/ogrodnek/csv-serde). CSV is of 
 high frequency in use as a data format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8611) grant/revoke syntax should support additional objects for authorization plugins


[ 
https://issues.apache.org/jira/browse/HIVE-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200062#comment-14200062
 ] 

Hive QA commented on HIVE-8611:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679742/HIVE-8611.3.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6678 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_grant_server
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_grant_uri
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1660/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1660/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1660/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679742 - PreCommit-HIVE-TRUNK-Build

 grant/revoke syntax should support additional objects for authorization 
 plugins
 ---

 Key: HIVE-8611
 URL: https://issues.apache.org/jira/browse/HIVE-8611
 Project: Hive
  Issue Type: Bug
  Components: Authentication, SQL
Affects Versions: 0.13.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0

 Attachments: HIVE-8611.1.patch, HIVE-8611.2.patch, HIVE-8611.2.patch, 
 HIVE-8611.3.patch


 The authorization framework supports URI and global objects. The SQL syntax 
 however doesn't allow granting privileges on these objects. We should allow 
 the compiler to parse these so that it can be handled by authorization 
 plugins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8711) DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer


[ 
https://issues.apache.org/jira/browse/HIVE-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200099#comment-14200099
 ] 

Hive QA commented on HIVE-8711:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679759/HIVE-8711.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1661/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1661/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1661/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679759 - PreCommit-HIVE-TRUNK-Build

 DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer
 --

 Key: HIVE-8711
 URL: https://issues.apache.org/jira/browse/HIVE-8711
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8711.2.patch, HIVE-8711.patch


 TxnHandler.detectDeadlock has code to catch deadlocks in MySQL and Derby.  
 But it does not detect a deadlock for Postgres, Oracle, or SQLServer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8612) Support metadata result filter hooks


[ 
https://issues.apache.org/jira/browse/HIVE-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200142#comment-14200142
 ] 

Hive QA commented on HIVE-8612:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679764/HIVE-8612.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6678 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1662/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1662/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1662/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679764 - PreCommit-HIVE-TRUNK-Build

 Support metadata result filter hooks
 

 Key: HIVE-8612
 URL: https://issues.apache.org/jira/browse/HIVE-8612
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Metastore
Affects Versions: 0.13.1
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8612.1.patch, HIVE-8612.2.patch, HIVE-8612.3.patch


 Support metadata filter hook for metastore client. This will be useful for 
 authorization plugins on hiveserver2 to filter metadata results, especially 
 in case of non-impersonation mode where the metastore doesn't know the end 
 user's identity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]

2014-11-06 Thread Chengxiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200155#comment-14200155
 ] 

Chengxiang Li commented on HIVE-8548:
-

[~xuefuz], if we set spark.master as local, hive users connect to HiveServer2 
which use local spark context to submit job with a seperate session for each 
user, we may still hit into multi spark context issue. so HiveServer2 could 
only use remote spark context, and CLI may use either local spark context or 
remote spark context, which we can add a parameter to configure it and set 
local spark context as default, what do you think about it?

 Integrate with remote Spark context after HIVE-8528 [Spark Branch]
 --

 Key: HIVE-8548
 URL: https://issues.apache.org/jira/browse/HIVE-8548
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li

 With HIVE-8528, HiverSever2 should use remote Spark context to submit job and 
 monitor progress, etc. This is necessary if Hive runs on standalone cluster, 
 Yarn, or Mesos. If Hive runs with spark.master=local, we should continue 
 using SparkContext in current way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8542) Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]

2014-11-06 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-8542:


Assignee: Rui Li

 Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
 

 Key: HIVE-8542
 URL: https://issues.apache.org/jira/browse/HIVE-8542
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Chao
Assignee: Rui Li

 Currently, in Spark branch, results for these two test files are very 
 different from MR's. We need to find out the cause for this, and identify 
 potential bug in our current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8542) Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]

2014-11-06 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200162#comment-14200162
 ] 

Rui Li commented on HIVE-8542:
--

Hi [~csun], let me take this one. As it seems to be a bug in group by.

 Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
 

 Key: HIVE-8542
 URL: https://issues.apache.org/jira/browse/HIVE-8542
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Chao
Assignee: Rui Li

 Currently, in Spark branch, results for these two test files are very 
 different from MR's. We need to find out the cause for this, and identify 
 potential bug in our current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8636) CBO: split cbo_correctness test


[ 
https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200181#comment-14200181
 ] 

Hive QA commented on HIVE-8636:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679769/HIVE-8636.02.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6699 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1663/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1663/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1663/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679769 - PreCommit-HIVE-TRUNK-Build

 CBO: split cbo_correctness test
 ---

 Key: HIVE-8636
 URL: https://issues.apache.org/jira/browse/HIVE-8636
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, 
 HIVE-8636.02.patch, HIVE-8636.patch


 CBO correctness test is extremely annoying - it runs forever, if anything 
 fails it's hard to debug due to the volume of logs from all the stuff, also 
 it doesn't run further so if multiple things fail they can only be discovered 
 one by one; also SORT_QUERY_RESULTS cannot be used, because some queries 
 presumably use sorting.
 It should be split into separate tests, the numbers in there now may be good 
 as boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8758) Fix hadoop-1 build [Spark Branch]

Xuefu Zhang created HIVE-8758:
-

 Summary: Fix hadoop-1 build [Spark Branch]
 Key: HIVE-8758
 URL: https://issues.apache.org/jira/browse/HIVE-8758
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang


This may mean merging patches from trunk and fixing whatever problem specific 
to Spark branch. Here are user reported problems:

Problem 1:
{code}
Hive Serde . FAILURE [  2.357 s]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-serde: Compilation failure: Compilation failure:
[ERROR] 
/data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24]
 cannot find symbol
[ERROR] symbol:   class Nullable
[ERROR] location: package javax.annotation
[ERROR] 
/data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36]
 cannot find symbol
[ERROR] symbol:   class Nullable
[ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe
{code}
My understanding: Looks the Nullable annotation was recently added in the 
recent branch. Added the below dependency in the project hive-serde
{code}
dependency
groupIdcom.google.code.findbugs/groupId
artifactIdjsr305/artifactId
version3.0.0/version
/dependency
{code}
Problem 2:
After adding the dependency for hive-serde, got the below compilation error
{code}
[INFO] Hive Query Language  FAILURE [01:35 min]

/data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39]
 error: package org.apache.hadoop.mapreduce.util does not exist
{code}
In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have the 
package “org.apache.hadoop.mapreduce.util” to circumvent it added the below 
dependency where we had the package (not sure, it is right – I badly wanted to 
make the build successful L)
{code}
dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-mapreduce-client-core/artifactId
version0.23.11/version
/dependency
  /dependencies
{code}
Problem 3:
After making the above change, again failed in the same project @ file 
/data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java.
 In the snippet below taken from the file, we can see the “fileStatus.isFile()” 
is called which is not available in the “org.apache.hadoop.fs.FileStatus” 
hadoop1 api.
{code}
 for (FileStatus fileStatus: fs.listStatus(folder)) {
   Path filePath = fileStatus.getPath();
if (!fileStatus.isFile()) {
  throw new HiveException(Error, not a file:  + filePath);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8758) Fix hadoop-1 build [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8758:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 Fix hadoop-1 build [Spark Branch]
 -

 Key: HIVE-8758
 URL: https://issues.apache.org/jira/browse/HIVE-8758
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 This may mean merging patches from trunk and fixing whatever problem specific 
 to Spark branch. Here are user reported problems:
 Problem 1:
 {code}
 Hive Serde . FAILURE [  2.357 s]
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-serde: Compilation failure: Compilation failure:
 [ERROR] 
 /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24]
  cannot find symbol
 [ERROR] symbol:   class Nullable
 [ERROR] location: package javax.annotation
 [ERROR] 
 /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36]
  cannot find symbol
 [ERROR] symbol:   class Nullable
 [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe
 {code}
 My understanding: Looks the Nullable annotation was recently added in the 
 recent branch. Added the below dependency in the project hive-serde
 {code}
 dependency
 groupIdcom.google.code.findbugs/groupId
 artifactIdjsr305/artifactId
 version3.0.0/version
 /dependency
 {code}
 Problem 2:
 After adding the dependency for hive-serde, got the below compilation error
 {code}
 [INFO] Hive Query Language  FAILURE [01:35 
 min]
 /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39]
  error: package org.apache.hadoop.mapreduce.util does not exist
 {code}
 In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have 
 the package “org.apache.hadoop.mapreduce.util” to circumvent it added the 
 below dependency where we had the package (not sure, it is right – I badly 
 wanted to make the build successful L)
 {code}
 dependency
 groupIdorg.apache.hadoop/groupId
 artifactIdhadoop-mapreduce-client-core/artifactId
 version0.23.11/version
 /dependency
   /dependencies
 {code}
 Problem 3:
 After making the above change, again failed in the same project @ file 
 /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java.
  In the snippet below taken from the file, we can see the 
 “fileStatus.isFile()” is called which is not available in the 
 “org.apache.hadoop.fs.FileStatus” hadoop1 api.
 {code}
  for (FileStatus fileStatus: fs.listStatus(folder)) {
Path filePath = fileStatus.getPath();
 if (!fileStatus.isFile()) {
   throw new HiveException(Error, not a file:  + filePath);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8758) Fix hadoop-1 build [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200185#comment-14200185
 ] 

Xuefu Zhang commented on HIVE-8758:
---

[~jxiang], since problem #3 seemed related to your recent change, I assigned 
the JIRA to you for investigation/fix. Thanks.

 Fix hadoop-1 build [Spark Branch]
 -

 Key: HIVE-8758
 URL: https://issues.apache.org/jira/browse/HIVE-8758
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 This may mean merging patches from trunk and fixing whatever problem specific 
 to Spark branch. Here are user reported problems:
 Problem 1:
 {code}
 Hive Serde . FAILURE [  2.357 s]
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-serde: Compilation failure: Compilation failure:
 [ERROR] 
 /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24]
  cannot find symbol
 [ERROR] symbol:   class Nullable
 [ERROR] location: package javax.annotation
 [ERROR] 
 /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36]
  cannot find symbol
 [ERROR] symbol:   class Nullable
 [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe
 {code}
 My understanding: Looks the Nullable annotation was recently added in the 
 recent branch. Added the below dependency in the project hive-serde
 {code}
 dependency
 groupIdcom.google.code.findbugs/groupId
 artifactIdjsr305/artifactId
 version3.0.0/version
 /dependency
 {code}
 Problem 2:
 After adding the dependency for hive-serde, got the below compilation error
 {code}
 [INFO] Hive Query Language  FAILURE [01:35 
 min]
 /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39]
  error: package org.apache.hadoop.mapreduce.util does not exist
 {code}
 In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have 
 the package “org.apache.hadoop.mapreduce.util” to circumvent it added the 
 below dependency where we had the package (not sure, it is right – I badly 
 wanted to make the build successful L)
 {code}
 dependency
 groupIdorg.apache.hadoop/groupId
 artifactIdhadoop-mapreduce-client-core/artifactId
 version0.23.11/version
 /dependency
   /dependencies
 {code}
 Problem 3:
 After making the above change, again failed in the same project @ file 
 /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java.
  In the snippet below taken from the file, we can see the 
 “fileStatus.isFile()” is called which is not available in the 
 “org.apache.hadoop.fs.FileStatus” hadoop1 api.
 {code}
  for (FileStatus fileStatus: fs.listStatus(folder)) {
Path filePath = fileStatus.getPath();
 if (!fileStatus.isFile()) {
   throw new HiveException(Error, not a file:  + filePath);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8073) Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200191#comment-14200191
 ] 

Xuefu Zhang commented on HIVE-8073:
---

Hi [~ruili], thanks for the investigation. I think we can close this task now.

 Go thru all operator plan optimizations and disable those that are not 
 suitable for Spark [Spark Branch]
 

 Key: HIVE-8073
 URL: https://issues.apache.org/jira/browse/HIVE-8073
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Fix For: spark-branch


 I have seen some optimization done in the logical plan that's not applicable, 
 such as in HIVE-8054. We should go thru all those optimizaitons to identify 
 if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8073) Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8073.
---
   Resolution: Done
Fix Version/s: spark-branch

 Go thru all operator plan optimizations and disable those that are not 
 suitable for Spark [Spark Branch]
 

 Key: HIVE-8073
 URL: https://issues.apache.org/jira/browse/HIVE-8073
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Fix For: spark-branch


 I have seen some optimization done in the logical plan that's not applicable, 
 such as in HIVE-8054. We should go thru all those optimizaitons to identify 
 if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5469) support nullif

2014-11-06 Thread Daniel Dinnyes (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200211#comment-14200211
 ] 

Daniel Dinnyes commented on HIVE-5469:
--

Thanks for the workaround code snippet.

 support nullif
 --

 Key: HIVE-5469
 URL: https://issues.apache.org/jira/browse/HIVE-5469
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.11.0
Reporter: N Campbell
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5469.1.patch.txt, HIVE-5469.2.patch.txt, 
 HIVE-5469.3.patch.txt


 Have to express case expression to work around lack of NULLIF
 select nullif(cint, 1) from tint
 select cint, case when cint = 1 then null else cint end from tint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200248#comment-14200248
 ] 

Xuefu Zhang commented on HIVE-8548:
---

Hi [~chengxiang li], I think nobody is going to deploy HS2 in production with 
local mode and HS2 embedded mode (embedded in Beeline) should behave like Hive 
CLI. Thus, I think it might be better to keep them consistent. Based on this, I 
think local should be the default whether it's Hive CLI or HS2, and they 
actually share the same code path. In addition, local should refer to local 
spark context in both cases. As to the concurrentcy problem, we just need some 
proper documentation. Remote spark context should be used when {{spark.master 
!= local}}. I think his approach makes the implemention simpler with seemingly 
better usability. We can revist this at a later phase.

 Integrate with remote Spark context after HIVE-8528 [Spark Branch]
 --

 Key: HIVE-8548
 URL: https://issues.apache.org/jira/browse/HIVE-8548
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li

 With HIVE-8528, HiverSever2 should use remote Spark context to submit job and 
 monitor progress, etc. This is necessary if Hive runs on standalone cluster, 
 Yarn, or Mesos. If Hive runs with spark.master=local, we should continue 
 using SparkContext in current way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]

[
https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200248#comment-14200248
]

Xuefu Zhang edited comment on HIVE-8548 at 11/6/14 2:56 PM:

Hi [~chengxiang li], I think nobody is going to deploy HS2 in production with
local mode and HS2 embedded mode (embedded in Beeline) should behave like Hive
CLI. Thus, I think it might be better to keep them consistent. Based on this, I
think local should be the default whether it's Hive CLI or HS2, and they
actually share the same code path (w.r.t. spark integration). In addition,
local should refer to local spark context in both cases. As to the
concurrentcy problem, we just need some proper documentation. Remote spark
context should be used when {{spark.master != local}}. I think his approach
makes the implemention simpler with seemingly better usability. We can revist
this at a later phase.

was (Author: xuefuz):
Hi [~chengxiang li], I think nobody is going to deploy HS2 in production with
local mode and HS2 embedded mode (embedded in Beeline) should behave like Hive
CLI. Thus, I think it might be better to keep them consistent. Based on this, I
think local should be the default whether it's Hive CLI or HS2, and they
actually share the same code path. In addition, local should refer to local
spark context in both cases. As to the concurrentcy problem, we just need some
proper documentation. Remote spark context should be used when {{spark.master
!= local}}. I think his approach makes the implemention simpler with seemingly
better usability. We can revist this at a later phase.

Integrate with remote Spark context after HIVE-8528 [Spark Branch]
--

Key: HIVE-8548
URL: https://issues.apache.org/jira/browse/HIVE-8548
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li

With HIVE-8528, HiverSever2 should use remote Spark context to submit job and
monitor progress, etc. This is necessary if Hive runs on standalone cluster,
Yarn, or Mesos. If Hive runs with spark.master=local, we should continue
using SparkContext in current way.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths


[ 
https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200255#comment-14200255
 ] 

Hive QA commented on HIVE-8735:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679773/HIVE-8735.02.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1664/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1664/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1664/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679773 - PreCommit-HIVE-TRUNK-Build

 statistics update can fail due to long paths
 

 Key: HIVE-8735
 URL: https://issues.apache.org/jira/browse/HIVE-8735
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch


 {noformat}
 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher 
 (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing 
 statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 Caused by:

[jira] [Updated] (HIVE-8748) jdbc uber jar is missing commons-logging


 [ 
https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8748:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.
[~hagleitn] ok for 0.14 ?

 jdbc uber jar is missing commons-logging
 

 Key: HIVE-8748
 URL: https://issues.apache.org/jira/browse/HIVE-8748
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.15.0

 Attachments: HIVE-8748.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5077) Provide an option to run local task in process


 [ 
https://issues.apache.org/jira/browse/HIVE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5077:
---
   Resolution: Duplicate
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Fixed via HIVE-7271 after which using {{hive.exec.submit.local.task.via.child}} 
local task can run in memory if desired.

 Provide an option to run local task in process
 --

 Key: HIVE-5077
 URL: https://issues.apache.org/jira/browse/HIVE-5077
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-5077.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-3109) metastore state not cleared


 [ 
https://issues.apache.org/jira/browse/HIVE-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3109.

Resolution: Cannot Reproduce

Doesnt repro anymore.

 metastore state not cleared
 ---

 Key: HIVE-3109
 URL: https://issues.apache.org/jira/browse/HIVE-3109
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Ashutosh Chauhan

 When some of the tests are in order, random bugs are encountered.
 ant test -Dtestcase=TestCliDriver -Dqfile=part_inherit_tbl_props.q,stats1.q
 leads to an error in stats1.q
 We ran into this error as part of parallel testing (HIVE-3085).
 As part of HIVE-3085, this will be fixed temporarily by clearing
 hive.metastore.partition.inherit.table.properties at the end of the test.
 But, in general, any property set in one .q file should not affect anything
 in other tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5054) Remove unused property submitviachild


 [ 
https://issues.apache.org/jira/browse/HIVE-5054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5054:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

HIVE-7271 relies on this to speed-up unit test

 Remove unused property submitviachild
 -

 Key: HIVE-5054
 URL: https://issues.apache.org/jira/browse/HIVE-5054
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-5054.patch, HIVE-5054.patch


 This property only exist in HiveConf and is always set to false. Lets get rid 
 of dead code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-1033) change default value of hive.exec.parallel to true


 [ 
https://issues.apache.org/jira/browse/HIVE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1033:
---
Assignee: (was: Ashutosh Chauhan)

 change default value of hive.exec.parallel to true
 --

 Key: HIVE-1033
 URL: https://issues.apache.org/jira/browse/HIVE-1033
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
 Attachments: HIVE-1033.2.patch, HIVE-1033.3.patch, hive.1033.1.patch


 There is no harm in changing it to true. 
 Inside facebook, we have been testing it and it seems to be stable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6297) [Refactor] Move new Auth Interface to common/


 [ 
https://issues.apache.org/jira/browse/HIVE-6297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6297:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Seems an impossible task now

 [Refactor] Move new Auth Interface to common/ 
 --

 Key: HIVE-6297
 URL: https://issues.apache.org/jira/browse/HIVE-6297
 Project: Hive
  Issue Type: Task
  Components: Security
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-6297.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large


[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200337#comment-14200337
 ] 

Brock Noland commented on HIVE-8744:


works for me!

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 30 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)

[jira] [Commented] (HIVE-7777) Add CSV Serde based on OpenCSV


[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200341#comment-14200341
 ] 

Brock Noland commented on HIVE-:


After some research, I think that was a limitation of the original Serde: 
https://github.com/ogrodnek/csv-serde however, we should be able to resolve 
this. Can you open a JIRA for adding non-string types to the OpenCSV Serde?

 Add CSV Serde based on OpenCSV
 --

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.patch, csv-serde-master.zip


 There is no official support for csvSerde for hive while there is an open 
 source project in github(https://github.com/ogrodnek/csv-serde). CSV is of 
 high frequency in use as a data format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8661) JDBC MinimizeJAR should be configurable in pom.xml


 [ 
https://issues.apache.org/jira/browse/HIVE-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8661:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Gopal!

 JDBC MinimizeJAR should be configurable in pom.xml
 --

 Key: HIVE-8661
 URL: https://issues.apache.org/jira/browse/HIVE-8661
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-8661.1.patch, HIVE-8661.2.patch


 A large amount of dev time is wasted waiting for JDBC to minimize JARs from 
 33Mb - 16Mb during developer cycles.
 This should only kick-in during -Pdist, allowing for disabling this during 
 dev cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8661) JDBC MinimizeJAR should be configurable in pom.xml


 [ 
https://issues.apache.org/jira/browse/HIVE-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8661:
---
Issue Type: Improvement  (was: Bug)

 JDBC MinimizeJAR should be configurable in pom.xml
 --

 Key: HIVE-8661
 URL: https://issues.apache.org/jira/browse/HIVE-8661
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-8661.1.patch, HIVE-8661.2.patch


 A large amount of dev time is wasted waiting for JDBC to minimize JARs from 
 33Mb - 16Mb during developer cycles.
 This should only kick-in during -Pdist, allowing for disabling this during 
 dev cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8661) JDBC MinimizeJAR should be configurable in pom.xml


 [ 
https://issues.apache.org/jira/browse/HIVE-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8661:
---
Affects Version/s: 0.14.0

 JDBC MinimizeJAR should be configurable in pom.xml
 --

 Key: HIVE-8661
 URL: https://issues.apache.org/jira/browse/HIVE-8661
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-8661.1.patch, HIVE-8661.2.patch


 A large amount of dev time is wasted waiting for JDBC to minimize JARs from 
 33Mb - 16Mb during developer cycles.
 This should only kick-in during -Pdist, allowing for disabling this during 
 dev cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200353#comment-14200353
 ] 

Eugene Koifman commented on HIVE-8754:
--

this is a webhcat only change, specifically around job submission. There are no 
unit tests that cover this

 Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
 in secure cluster
 ---

 Key: HIVE-8754
 URL: https://issues.apache.org/jira/browse/HIVE-8754
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8754.2.patch, HIVE-8754.patch


 HIVE-8588 added support for this by copying jdbc jars to lib/ of 
 localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
 intentionally sets permissions on exploded tars such that they are not 
 writable.
 this needs to be fixed, otherwise the users would have to modify their sqoop 
 tar to include the relevant jdbc jars which is burdensome is different DBs 
 are used and may create headache around licensing issues
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8757) YARN dep in scheduler shim should be optional


[ 
https://issues.apache.org/jira/browse/HIVE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200360#comment-14200360
 ] 

Hive QA commented on HIVE-8757:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679778/HIVE-8757.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1665/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1665/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1665/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679778 - PreCommit-HIVE-TRUNK-Build

 YARN dep in scheduler shim should be optional
 -

 Key: HIVE-8757
 URL: https://issues.apache.org/jira/browse/HIVE-8757
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8757.patch


 The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should 
 be optional so that yarn doesn't pollute dependent classpaths. Users who want 
 to use this feature must provide the yarn classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large


 [ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8744:
--
Status: Open  (was: Patch Available)

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 30 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
   at

[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large


 [ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8744:
--
Attachment: HIVE-8744.2.patch

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 30 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
   at

[jira] [Updated] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large


 [ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-8744:
--
Status: Patch Available  (was: Open)

Submitted new patch that changes the stats table name to v3

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 30 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at

[jira] [Commented] (HIVE-8122) Make use of SearchArgument classes for Parquet SERDE


[ 
https://issues.apache.org/jira/browse/HIVE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200456#comment-14200456
 ] 

Hive QA commented on HIVE-8122:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679808/HIVE-8122.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1666/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1666/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1666/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679808 - PreCommit-HIVE-TRUNK-Build

 Make use of SearchArgument classes for Parquet SERDE
 

 Key: HIVE-8122
 URL: https://issues.apache.org/jira/browse/HIVE-8122
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8122.1.patch, HIVE-8122.patch


 ParquetSerde could be much cleaner if we used SearchArgument and associated 
 classes like ORC does:
 https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200462#comment-14200462
 ] 

Sergio Peña commented on HIVE-8065:
---

HI [~Ferd]

Thanks for trying to help. There is already basic work done for this issue in 
local branch for hive 0.13. 
I will apply the patch for trunk and commit the changes to the HIVE-8065 branch.

What we don't have yet are the unit  query tests. Would you like to take that 
task?

 Support HDFS encryption functionality on Hive
 -

 Key: HIVE-8065
 URL: https://issues.apache.org/jira/browse/HIVE-8065
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña

 The new encryption support on HDFS makes Hive incompatible and unusable when 
 this feature is used.
 HDFS encryption is designed so that an user can configure different 
 encryption zones (or directories) for multi-tenant environments. An 
 encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
 Because of security compliance, the HDFS does not allow to move/rename files 
 between encryption zones. Renames are allowed only inside the same encryption 
 zone. A copy is allowed between encryption zones.
 See HDFS-6134 for more details about HDFS encryption design.
 Hive currently uses a scratch directory (like /tmp/$user/$random). This 
 scratch directory is used for the output of intermediate data (between MR 
 jobs) and for the final output of the hive query which is later moved to the 
 table directory location.
 If Hive tables are in different encryption zones than the scratch directory, 
 then Hive won't be able to renames those files/directories, and it will make 
 Hive unusable.
 To handle this problem, we can change the scratch directory of the 
 query/statement to be inside the same encryption zone of the table directory 
 location. This way, the renaming process will be successful. 
 Also, for statements that move files between encryption zones (i.e. LOAD 
 DATA), a copy may be executed instead of a rename. This will cause an 
 overhead when copying large data files, but it won't break the encryption on 
 Hive.
 Another security thing to consider is when using joins selects. If Hive joins 
 different tables with different encryption key strengths, then the results of 
 the select might break the security compliance of the tables. Let's say two 
 tables with 128 bits and 256 bits encryption are joined, then the temporary 
 results might be stored in the 128 bits encryption zone. This will conflict 
 with the table encrypted with 256 bits temporary.
 To fix this, Hive should be able to select the scratch directory that is more 
 secured/encrypted in order to save the intermediate data temporary with no 
 compliance issues.
 For instance:
 {noformat}
 SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
 {noformat}
 - This should use a scratch directory (or staging directory) inside the 
 table-aes256 table location.
 {noformat}
 INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
 {noformat}
 - This should use a scratch directory inside the table-aes1 location.
 {noformat}
 FROM table-unencrypted
 INSERT OVERWRITE TABLE table-aes128 SELECT id, name
 INSERT OVERWRITE TABLE table-aes256 SELECT id, name
 {noformat}
 - This should use a scratch directory on each of the tables locations.
 - The first SELECT will have its scratch directory on table-aes128 directory.
 - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200464#comment-14200464
 ] 

Xuefu Zhang commented on HIVE-8745:
---

{quote}
The other option would be to pad all values to the column spec and make sure we 
compute the spec as the max for the join keys. I'm not sure why you were 
against that in the first place
{quote}
I'm not sure what this refers to. Nevertheless, I think the serde should be 
able to trim the zeros and pad it back as long as it has the right metadata. It 
seems it does have the metadata for each colomns.



 Joins on decimal keys return different results whether they are run as reduce 
 join or map join
 --

 Key: HIVE-8745
 URL: https://issues.apache.org/jira/browse/HIVE-8745
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Jason Dere
Priority: Critical
 Fix For: 0.14.0

 Attachments: join_test.q


 See attached .q file to reproduce. The difference seems to be whether 
 trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8758) Fix hadoop-1 build [Spark Branch]

2014-11-06 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200479#comment-14200479
 ] 

Jimmy Xiang commented on HIVE-8758:
---

Sure, will take a look soon.

 Fix hadoop-1 build [Spark Branch]
 -

 Key: HIVE-8758
 URL: https://issues.apache.org/jira/browse/HIVE-8758
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Jimmy Xiang

 This may mean merging patches from trunk and fixing whatever problem specific 
 to Spark branch. Here are user reported problems:
 Problem 1:
 {code}
 Hive Serde . FAILURE [  2.357 s]
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
 on project hive-serde: Compilation failure: Compilation failure:
 [ERROR] 
 /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24]
  cannot find symbol
 [ERROR] symbol:   class Nullable
 [ERROR] location: package javax.annotation
 [ERROR] 
 /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36]
  cannot find symbol
 [ERROR] symbol:   class Nullable
 [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe
 {code}
 My understanding: Looks the Nullable annotation was recently added in the 
 recent branch. Added the below dependency in the project hive-serde
 {code}
 dependency
 groupIdcom.google.code.findbugs/groupId
 artifactIdjsr305/artifactId
 version3.0.0/version
 /dependency
 {code}
 Problem 2:
 After adding the dependency for hive-serde, got the below compilation error
 {code}
 [INFO] Hive Query Language  FAILURE [01:35 
 min]
 /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39]
  error: package org.apache.hadoop.mapreduce.util does not exist
 {code}
 In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have 
 the package “org.apache.hadoop.mapreduce.util” to circumvent it added the 
 below dependency where we had the package (not sure, it is right – I badly 
 wanted to make the build successful L)
 {code}
 dependency
 groupIdorg.apache.hadoop/groupId
 artifactIdhadoop-mapreduce-client-core/artifactId
 version0.23.11/version
 /dependency
   /dependencies
 {code}
 Problem 3:
 After making the above change, again failed in the same project @ file 
 /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java.
  In the snippet below taken from the file, we can see the 
 “fileStatus.isFile()” is called which is not available in the 
 “org.apache.hadoop.fs.FileStatus” hadoop1 api.
 {code}
  for (FileStatus fileStatus: fs.listStatus(folder)) {
Path filePath = fileStatus.getPath();
 if (!fileStatus.isFile()) {
   throw new HiveException(Error, not a file:  + filePath);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200483#comment-14200483
 ] 

Xuefu Zhang commented on HIVE-8745:
---

[~spena], could you do some research on this? Thanks.

 Joins on decimal keys return different results whether they are run as reduce 
 join or map join
 --

 Key: HIVE-8745
 URL: https://issues.apache.org/jira/browse/HIVE-8745
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Jason Dere
Priority: Critical
 Fix For: 0.14.0

 Attachments: join_test.q


 See attached .q file to reproduce. The difference seems to be whether 
 trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]

2014-11-06 Thread Jimmy Xiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27687/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8649
https://issues.apache.org/jira/browse/HIVE-8649


Repository: hive-git


Description
---

First patch for HIVE-8649, to increase the number of reducers for spark based 
on some info about the spark cluster.
We need to add a SparkListener to handle cluster status change if such events 
are supported by spark.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java
 2dbb5a3 

Diff: https://reviews.apache.org/r/27687/diff/


Testing
---


Thanks,

Jimmy Xiang

[jira] [Commented] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]

2014-11-06 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200500#comment-14200500
 ] 

Jimmy Xiang commented on HIVE-8649:
---

Sure. Here is the patch on RB: https://reviews.apache.org/r/27687/. Thanks.

 Increase level of parallelism in reduce phase [Spark Branch]
 

 Key: HIVE-8649
 URL: https://issues.apache.org/jira/browse/HIVE-8649
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8649.1-spark.patch


 We calculate the number of reducers based on the same code for MapReduce. 
 However, reducers are vastly cheaper in Spark and it's generally recommended 
 we have many more reducers than in MR.
 Sandy Ryza who works on Spark has some ideas about a heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-3779) An empty value to hive.logquery.location can't disable the creation of hive history log files

2014-11-06 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200521#comment-14200521
 ] 

Anthony Hsu commented on HIVE-3779:
---

In case you're still using an older version of Hive that doesn't let you 
disable the history log files, one workaround that you can use is to run
{code}
!rm -r /path/to/hive.querylog.location;
{code}
as your first shell command before running your queries.

 An empty value to hive.logquery.location can't disable the creation of hive 
 history log files
 -

 Key: HIVE-3779
 URL: https://issues.apache.org/jira/browse/HIVE-3779
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9.0
Reporter: Bing Li
Priority: Minor

 In AdminManual Configuration 
 (https://cwiki.apache.org/Hive/adminmanual-configuration.html), the 
 description of hive.querylog.location mentioned that if the variable set to 
 empty string structured log will not be created.
 But it fails with the following setting,
 property
   namehive.querylog.location/name
   value/value 
 /property
 It seems that it can NOT get an empty value from 
 HiveConf.ConfVars.HIVEHISTORYFILELOC, but the default value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8757) YARN dep in scheduler shim should be optional


 [ 
https://issues.apache.org/jira/browse/HIVE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8757:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

 YARN dep in scheduler shim should be optional
 -

 Key: HIVE-8757
 URL: https://issues.apache.org/jira/browse/HIVE-8757
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.15.0

 Attachments: HIVE-8757.patch


 The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should 
 be optional so that yarn doesn't pollute dependent classpaths. Users who want 
 to use this feature must provide the yarn classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper

Vaibhav Gumashta created HIVE-8759:
--

 Summary: HiveServer2 dynamic service discovery should add hostname 
instead of ipaddress to ZooKeeper
 Key: HIVE-8759
 URL: https://issues.apache.org/jira/browse/HIVE-8759
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8698) default log4j.properties not included in jar files anymore


 [ 
https://issues.apache.org/jira/browse/HIVE-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8698:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.14 and trunk. Thank you Thejas!

 default log4j.properties not included in jar files anymore
 --

 Key: HIVE-8698
 URL: https://issues.apache.org/jira/browse/HIVE-8698
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8698.1.patch


 trunk and hive 0.14 based builds no longer have  hive-log4j.properties in the 
 jars. This means that in default tar, unless  hive-log4j.properties is 
 created in conf dir (from  hive-log4j.properties.template file), hive cli is 
 much more verbose in what is printed to console. Hiveserver2 fails to come 
 up, as it errors out with - 
 org.apache.hadoop.hive.common.LogUtils$LogInitializationException: Unable to 
 initialize logging using hive-log4j.properties, not found on CLASSPATH!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper


 [ 
https://issues.apache.org/jira/browse/HIVE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8759:
---
Attachment: HIVE-8759.1.patch

 HiveServer2 dynamic service discovery should add hostname instead of 
 ipaddress to ZooKeeper
 ---

 Key: HIVE-8759
 URL: https://issues.apache.org/jira/browse/HIVE-8759
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-8759.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8674) Fix tests after merge [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200546#comment-14200546
 ] 

Xuefu Zhang commented on HIVE-8674:
---

Hi [~brocknoland], is this ready to be committed? It looks like auto_join29.q 
failure is due to a known issue, but I'm not sure of nullscan. Other failures 
also seems existing.

 Fix tests after merge [Spark Branch]
 

 Key: HIVE-8674
 URL: https://issues.apache.org/jira/browse/HIVE-8674
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8611) grant/revoke syntax should support additional objects for authorization plugins

2014-11-06 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8611:
--
Attachment: HIVE-8611.4.patch

Update patch to fix test failure due to new error message

 grant/revoke syntax should support additional objects for authorization 
 plugins
 ---

 Key: HIVE-8611
 URL: https://issues.apache.org/jira/browse/HIVE-8611
 Project: Hive
  Issue Type: Bug
  Components: Authentication, SQL
Affects Versions: 0.13.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0

 Attachments: HIVE-8611.1.patch, HIVE-8611.2.patch, HIVE-8611.2.patch, 
 HIVE-8611.3.patch, HIVE-8611.4.patch


 The authorization framework supports URI and global objects. The SQL syntax 
 however doesn't allow granting privileges on these objects. We should allow 
 the compiler to parse these so that it can be handled by authorization 
 plugins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper


 [ 
https://issues.apache.org/jira/browse/HIVE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8759:
---
Status: Patch Available  (was: Open)

 HiveServer2 dynamic service discovery should add hostname instead of 
 ipaddress to ZooKeeper
 ---

 Key: HIVE-8759
 URL: https://issues.apache.org/jira/browse/HIVE-8759
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-8759.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large


[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200561#comment-14200561
 ] 

Hive QA commented on HIVE-8744:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679872/HIVE-8744.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1667/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1667/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1667/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679872 - PreCommit-HIVE-TRUNK-Build

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at

[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines


[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200562#comment-14200562
 ] 

Brock Noland commented on HIVE-8561:


Hi [~nyang],

Thank you for the annotations! Since the CBO is such a huge component and in 
it's infancy, I feel like {{Unstable}} might be more appropriate than 
{{Evolving}}. However, before making that change I think we should settle with 
[~jpullokkaran] the correct way to perform this integration.

bq. Why can't Drill be plugged in as another execution engine just like MR, 
TEZ, Spark?

 [~jpullokkaran] It's reasonable for Drill to add API's in order to use the 
query plan. The Drill project like many other projects are users of Hive. As 
mentioned previously, it's important to agree upon some kind of api visibility 
and stability. Na has agreed to an unstable interface (It is the caller's 
responsibility to follow the hive side change). As one of the CBO experts, if 
there is a better an alternative implementation, could you please share how 
this could be improved?

 Expose Hive optiq operator tree to be able to support other sql on hadoop 
 query engines
 ---

 Key: HIVE-8561
 URL: https://issues.apache.org/jira/browse/HIVE-8561
 Project: Hive
  Issue Type: Task
  Components: CBO
Affects Versions: 0.14.0
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-8561.2.patch, HIVE-8561.3.patch, HIVE-8561.patch


 Hive-0.14 added cost based optimization and optiq operator tree is created 
 for select queries. However, the optiq operator tree is not visible from 
 outside and hard to be used by other Sql on Hadoop query engine such as 
 apache Drill. To be able to allow drill to access the hive optiq operator 
 tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8636) CBO: split cbo_correctness test

2014-11-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200588#comment-14200588
 ] 

Sergey Shelukhin commented on HIVE-8636:


test failures are unrelated

 CBO: split cbo_correctness test
 ---

 Key: HIVE-8636
 URL: https://issues.apache.org/jira/browse/HIVE-8636
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, 
 HIVE-8636.02.patch, HIVE-8636.patch


 CBO correctness test is extremely annoying - it runs forever, if anything 
 fails it's hard to debug due to the volume of logs from all the stuff, also 
 it doesn't run further so if multiple things fail they can only be discovered 
 one by one; also SORT_QUERY_RESULTS cannot be used, because some queries 
 presumably use sorting.
 It should be split into separate tests, the numbers in there now may be good 
 as boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8636) CBO: split cbo_correctness test


[ 
https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200606#comment-14200606
 ] 

Ashutosh Chauhan commented on HIVE-8636:


+1

 CBO: split cbo_correctness test
 ---

 Key: HIVE-8636
 URL: https://issues.apache.org/jira/browse/HIVE-8636
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, 
 HIVE-8636.02.patch, HIVE-8636.patch


 CBO correctness test is extremely annoying - it runs forever, if anything 
 fails it's hard to debug due to the volume of logs from all the stuff, also 
 it doesn't run further so if multiple things fail they can only be discovered 
 one by one; also SORT_QUERY_RESULTS cannot be used, because some queries 
 presumably use sorting.
 It should be split into separate tests, the numbers in there now may be good 
 as boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk

2014-11-06 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200612#comment-14200612
 ] 

Prasanth J commented on HIVE-8753:
--

+1

 TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
 -

 Key: HIVE-8753
 URL: https://issues.apache.org/jira/browse/HIVE-8753
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer
Affects Versions: 0.15.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8753.patch


 Because of HIVE-7111 
 needs .q.out update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk


 [ 
https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8753:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
 -

 Key: HIVE-8753
 URL: https://issues.apache.org/jira/browse/HIVE-8753
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer
Affects Versions: 0.15.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.15.0

 Attachments: HIVE-8753.patch


 Because of HIVE-7111 
 needs .q.out update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-06 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200636#comment-14200636
 ] 

Jason Dere commented on HIVE-8745:
--

{quote}
 Nevertheless, I think the serde should be able to trim the zeros and pad it 
back as long as it has the right metadata. It seems it does have the metadata 
for each colomns.
{quote}

We have the metadata for the column, but not for individual values in each row. 
If you have a decimal(10,5) column, but the values 
{noformat}
1
1.0
1.00
{noformat}

The only thing we could get from the column metadata is the precision=5, so we 
could pad everything to 1.0. To know how many extra zeros we need for each 
value of the column, we would have to save something for each value.

 Joins on decimal keys return different results whether they are run as reduce 
 join or map join
 --

 Key: HIVE-8745
 URL: https://issues.apache.org/jira/browse/HIVE-8745
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Jason Dere
Priority: Critical
 Fix For: 0.14.0

 Attachments: join_test.q


 See attached .q file to reproduce. The difference seems to be whether 
 trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large

2014-11-06 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200659#comment-14200659
 ] 

Szehon Ho commented on HIVE-8744:
-

+1, thanks.

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
  Source)
   ... 30 more
 Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
 VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to 
 length 255.
   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
 Source)
   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown

[jira] [Commented] (HIVE-8746) ORC timestamp columns are sensitive to daylight savings time

2014-11-06 Thread Dain Sundstrom (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200658#comment-14200658
 ] 

Dain Sundstrom commented on HIVE-8746:
--

A good first step would be to record the writer timezone in the file 
postscript.  Then the current reader could throw an exception if the JVM 
timezone doesn't match the timezone declared in the postscript.  Then when 
someone has more time, they could adjust the base epoch to the file timezone.

What do you think?

 ORC timestamp columns are sensitive to daylight savings time
 

 Key: HIVE-8746
 URL: https://issues.apache.org/jira/browse/HIVE-8746
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Hive uses Java's Timestamp class to manipulate timestamp columns. 
 Unfortunately the textual parsing in Timestamp is done in local time and the 
 internal storage is in UTC.
 ORC mostly side steps this issue by storing the difference between the time 
 and a base time also in local and storing that difference in the file. 
 Reading the file between timezones will mostly work correctly 2014-01-01 
 12:34:56 will read correctly in every timezone.
 However, when moving between timezones with different daylight saving it 
 creates trouble. In particular, moving from a computer in PST to UTC will 
 read 2014-06-06 12:34:56 as 2014-06-06 11:34:56.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8636) CBO: split cbo_correctness test


 [ 
https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8636:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Sergey!

 CBO: split cbo_correctness test
 ---

 Key: HIVE-8636
 URL: https://issues.apache.org/jira/browse/HIVE-8636
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.15.0

 Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, 
 HIVE-8636.02.patch, HIVE-8636.patch


 CBO correctness test is extremely annoying - it runs forever, if anything 
 fails it's hard to debug due to the volume of logs from all the stuff, also 
 it doesn't run further so if multiple things fail they can only be discovered 
 one by one; also SORT_QUERY_RESULTS cannot be used, because some queries 
 presumably use sorting.
 It should be split into separate tests, the numbers in there now may be good 
 as boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]

2014-11-06 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27687/#review60210
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java
https://reviews.apache.org/r/27687/#comment101574

I don't feel we need to cache this, as this can change during a user 
session.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java
https://reviews.apache.org/r/27687/#comment101575

Can we document what are in the tuple, especially what each means?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java
https://reviews.apache.org/r/27687/#comment101576

I'm not sure why this needs to be synchronized. Will this method be called 
by concurrent threads? It doesn't seem to be the case.


- Xuefu Zhang


On Nov. 6, 2014, 5:25 p.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27687/
 ---
 
 (Updated Nov. 6, 2014, 5:25 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8649
 https://issues.apache.org/jira/browse/HIVE-8649
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 First patch for HIVE-8649, to increase the number of reducers for spark based 
 on some info about the spark cluster.
 We need to add a SparkListener to handle cluster status change if such events 
 are supported by spark.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java
  2dbb5a3 
 
 Diff: https://reviews.apache.org/r/27687/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

[jira] [Updated] (HIVE-8674) Fix tests after merge [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8674:
---
Attachment: HIVE-8674.2-spark.patch

 Fix tests after merge [Spark Branch]
 

 Key: HIVE-8674
 URL: https://issues.apache.org/jira/browse/HIVE-8674
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch, 
 HIVE-8674.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200678#comment-14200678
 ] 

Gunther Hagleitner commented on HIVE-8745:
--

[~xuefuz] I'm going to revert HIVE-7373. I think that's reasonable given that 
it causes this issues. I'm also worried to change the behavior of decimals in 
hive .14 again if there's still questions about it. We've changed the behavior 
in .12 - .13 already and it caused a lot of grief to some folks. Given that 
BinarySortableSerde is involved, we also need to look into window functions, 
group by w/ w/o map-side aggr etc.

Jason also brings up another good point: Performance. The decision to maintain 
the trailing zeroes for each individual value instead of at the column level, 
means that we will never be able to simply encode decimals in two longs, which 
was the idea behind limiting the precision of decimals in the first place.

 Joins on decimal keys return different results whether they are run as reduce 
 join or map join
 --

 Key: HIVE-8745
 URL: https://issues.apache.org/jira/browse/HIVE-8745
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Jason Dere
Priority: Critical
 Fix For: 0.14.0

 Attachments: join_test.q


 See attached .q file to reproduce. The difference seems to be whether 
 trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large

2014-11-06 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200683#comment-14200683
 ] 

Prasanth J commented on HIVE-8744:
--

HIVE-8735 is also addressing the same problem. Usually the client which 
publishes provides the key (FSOperator, StatsTask) has some logic to trim down 
the length of the key using MD5 hash. If the key gets greater than max stats 
key prefix (from hive config), Utilities.getHashedPrefixKey() method is invoked 
to get a smaller length key. Can you try with the patch from HIVE-8735 to see 
if the test case works? HIVE-8735 truncates the key before publishing.

 hbase_stats3.q test fails when paths stored at 
 JDBCStatsUtils.getIdColumnName() are too large
 -

 Key: HIVE-8744
 URL: https://issues.apache.org/jira/browse/HIVE-8744
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-8744.1.patch, HIVE-8744.2.patch


 This test is related to the bug HIVE-8065 where I am trying to support HDFS 
 encryption. One of the enhancements to support it is to create a 
 .hive-staging directory on the same table directory location where the query 
 is executed.
 Now, when running the hbase_stats3.q test from a temporary directory that has 
 a large path, then the new path, a combination of table location + 
 .hive-staging + random temporary subdirectories, is too large to fit into the 
 statistics table, so the path is truncated.
 This causes the following error:
 {noformat}
 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
 jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
 during publishing statistics. 
 java.sql.SQLDataException: A truncation error was encountered trying to 
 shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
 Source)
   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
 Source)
   at 
 org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
 Source)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
   at 
 org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.sql.SQLException: A truncation error was encountered trying 
 to shrink VARCHAR 
 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.' to length 255.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)

[jira] [Commented] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200711#comment-14200711
 ] 

Xuefu Zhang commented on HIVE-8649:
---

Some comments on review board.

 Increase level of parallelism in reduce phase [Spark Branch]
 

 Key: HIVE-8649
 URL: https://issues.apache.org/jira/browse/HIVE-8649
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-8649.1-spark.patch


 We calculate the number of reducers based on the same code for MapReduce. 
 However, reducers are vastly cheaper in Spark and it's generally recommended 
 we have many more reducers than in MR.
 Sandy Ryza who works on Spark has some ideas about a heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8760) Pass a copy of HiveConf to hooks

Ashutosh Chauhan created HIVE-8760:
--

 Summary: Pass a copy of HiveConf to hooks
 Key: HIVE-8760
 URL: https://issues.apache.org/jira/browse/HIVE-8760
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0, 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200716#comment-14200716
 ] 

Xuefu Zhang commented on HIVE-8745:
---

Okay. Could you guys add this repro case so that when HIVE-7373 is revisted the 
issue here can be caught early?

 Joins on decimal keys return different results whether they are run as reduce 
 join or map join
 --

 Key: HIVE-8745
 URL: https://issues.apache.org/jira/browse/HIVE-8745
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Jason Dere
Priority: Critical
 Fix For: 0.14.0

 Attachments: join_test.q


 See attached .q file to reproduce. The difference seems to be whether 
 trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8760) Pass a copy of HiveConf to hooks


 [ 
https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8760:
---
Status: Patch Available  (was: Open)

 Pass a copy of HiveConf to hooks
 

 Key: HIVE-8760
 URL: https://issues.apache.org/jira/browse/HIVE-8760
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0, 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8760.patch


 because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8760) Pass a copy of HiveConf to hooks


 [ 
https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8760:
---
Attachment: HIVE-8760.patch

 Pass a copy of HiveConf to hooks
 

 Key: HIVE-8760
 URL: https://issues.apache.org/jira/browse/HIVE-8760
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0, 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8760.patch


 because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8633) Move Alan Gates from committer list to PMC list on website


 [ 
https://issues.apache.org/jira/browse/HIVE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8633:
-
Attachment: HIVE-8633.patch

 Move Alan Gates from committer list to PMC list on website
 --

 Key: HIVE-8633
 URL: https://issues.apache.org/jira/browse/HIVE-8633
 Project: Hive
  Issue Type: Task
  Components: Website
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8633.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8633) Move Alan Gates from committer list to PMC list on website


 [ 
https://issues.apache.org/jira/browse/HIVE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8633:
-
Status: Patch Available  (was: Open)

 Move Alan Gates from committer list to PMC list on website
 --

 Key: HIVE-8633
 URL: https://issues.apache.org/jira/browse/HIVE-8633
 Project: Hive
  Issue Type: Task
  Components: Website
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8633.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8674) Fix tests after merge [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200743#comment-14200743
 ] 

Xuefu Zhang commented on HIVE-8674:
---

I'm not sure it helps, but this one constantly fails, shown also in 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/317/testReport.
 Trunk doesn't seem to have this. Maybe we let it be until after the next merge.

 Fix tests after merge [Spark Branch]
 

 Key: HIVE-8674
 URL: https://issues.apache.org/jira/browse/HIVE-8674
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch, 
 HIVE-8674.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-06 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200771#comment-14200771
 ] 

Jason Dere commented on HIVE-8745:
--

Sounds like a good idea, will do.

 Joins on decimal keys return different results whether they are run as reduce 
 join or map join
 --

 Key: HIVE-8745
 URL: https://issues.apache.org/jira/browse/HIVE-8745
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Jason Dere
Priority: Critical
 Fix For: 0.14.0

 Attachments: join_test.q


 See attached .q file to reproduce. The difference seems to be whether 
 trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8748) jdbc uber jar is missing commons-logging


[ 
https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200784#comment-14200784
 ] 

Gunther Hagleitner commented on HIVE-8748:
--

+1 for hive .14

 jdbc uber jar is missing commons-logging
 

 Key: HIVE-8748
 URL: https://issues.apache.org/jira/browse/HIVE-8748
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.15.0

 Attachments: HIVE-8748.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster


[ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200790#comment-14200790
 ] 

Gunther Hagleitner commented on HIVE-8754:
--

Alright. +1 for hive.14

 Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
 in secure cluster
 ---

 Key: HIVE-8754
 URL: https://issues.apache.org/jira/browse/HIVE-8754
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8754.2.patch, HIVE-8754.patch


 HIVE-8588 added support for this by copying jdbc jars to lib/ of 
 localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
 intentionally sets permissions on exploded tars such that they are not 
 writable.
 this needs to be fixed, otherwise the users would have to modify their sqoop 
 tar to include the relevant jdbc jars which is burdensome is different DBs 
 are used and may create headache around licensing issues
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27672: HIVE-8726 Collect Spark TaskMetrics and build job statistic[Spark Branch]

2014-11-06 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27672/#review60224
---

Ship it!


Ship It!

- Xuefu Zhang


On Nov. 6, 2014, 7:30 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27672/
 ---
 
 (Updated Nov. 6, 2014, 7:30 a.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8726
 https://issues.apache.org/jira/browse/HIVE-8726
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 collection spark task metrics and combine into job level metric and build 
 into SparkStatistics.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 7ab9a34 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java 
 f6cc581 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/JobStateListener.java
  b4f753f 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/SimpleSparkJobStatus.java
  78e16c5 
 
 Diff: https://reviews.apache.org/r/27672/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Commented] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200797#comment-14200797
 ] 

Xuefu Zhang commented on HIVE-8726:
---

+1

 Collect Spark TaskMetrics and build job statistic[Spark Branch]
 ---

 Key: HIVE-8726
 URL: https://issues.apache.org/jira/browse/HIVE-8726
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M3
 Attachments: HIVE-8726.1-spark.patch


 Implement SparkListener to collect TaskMetrics, and build SparkStatistic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8633) Move Alan Gates from committer list to PMC list on website


[ 
https://issues.apache.org/jira/browse/HIVE-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200800#comment-14200800
 ] 

Ashutosh Chauhan commented on HIVE-8633:


+1 
Don't think we need to pay Hive QA cycles for this : )

 Move Alan Gates from committer list to PMC list on website
 --

 Key: HIVE-8633
 URL: https://issues.apache.org/jira/browse/HIVE-8633
 Project: Hive
  Issue Type: Task
  Components: Website
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8633.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8726:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks to Chengxiang for this wonderful contribution.

 Collect Spark TaskMetrics and build job statistic[Spark Branch]
 ---

 Key: HIVE-8726
 URL: https://issues.apache.org/jira/browse/HIVE-8726
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8726.1-spark.patch


 Implement SparkListener to collect TaskMetrics, and build SparkStatistic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]

2014-11-06 Thread Jimmy Xiang



 On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 86
  https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line86
 
  Can we document what are in the tuple, especially what each means?

Sure. Will add a doc.


 On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 75
  https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line75
 
  I don't feel we need to cache this, as this can change during a user 
  session.

Yes, it will change during a user session. I was thinking to update this when 
things are changed base on some event callbacks.

Such info may be needed many times if there are many reducers. It should save 
us some time to go to the Spark master (assuming getExecutorMemoryStatus 
checking with the master).


 On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 89
  https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line89
 
  I'm not sure why this needs to be synchronized. Will this method be 
  called by concurrent threads? It doesn't seem to be the case.

Are you saying it won't be called by many threads? Each JVM can run one query 
at a time during all deployment modes? How come SparkClient.getInstance is 
synchronized?


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27687/#review60210
---


On Nov. 6, 2014, 5:25 p.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27687/
 ---
 
 (Updated Nov. 6, 2014, 5:25 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8649
 https://issues.apache.org/jira/browse/HIVE-8649
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 First patch for HIVE-8649, to increase the number of reducers for spark based 
 on some info about the spark cluster.
 We need to add a SparkListener to handle cluster status change if such events 
 are supported by spark.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java
  2dbb5a3 
 
 Diff: https://reviews.apache.org/r/27687/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

[jira] [Commented] (HIVE-8748) jdbc uber jar is missing commons-logging


[ 
https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200806#comment-14200806
 ] 

Ashutosh Chauhan commented on HIVE-8748:


Committed to 0.14

 jdbc uber jar is missing commons-logging
 

 Key: HIVE-8748
 URL: https://issues.apache.org/jira/browse/HIVE-8748
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-8748.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8748) jdbc uber jar is missing commons-logging


 [ 
https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8748:
---
Fix Version/s: (was: 0.15.0)
   0.14.0

 jdbc uber jar is missing commons-logging
 

 Key: HIVE-8748
 URL: https://issues.apache.org/jira/browse/HIVE-8748
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.14.0

 Attachments: HIVE-8748.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8760) Pass a copy of HiveConf to hooks

2014-11-06 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200809#comment-14200809
 ] 

Gopal V commented on HIVE-8760:
---

This is a perf regression for all well-written Hive hooks that exist.

There is previous information to indicate copying Configuration is not a fast 
operation. In HIVE-4486 we've gone from a query which took 347.66 seconds down 
to 218 seconds by throwing out unnecessary {{new HiveConf();}} calls.

If this is a thread-safety issue, then the hook spawning its own threads should 
synchronize - since this is class base config, which is pluggable that is very 
clearly the minimum impact fix.

{code}
@Override
  public void run(final HookContext hookContext) throws Exception {
final long currentTime = System.currentTimeMillis();
+  final HiveConf confCopy = new HiveConf(hookContext.getConf());
executor.submit(new Runnable() {
... // use local value off the closure capture in thread runnable
{code}

 Pass a copy of HiveConf to hooks
 

 Key: HIVE-8760
 URL: https://issues.apache.org/jira/browse/HIVE-8760
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0, 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8760.patch


 because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8760) Pass a copy of HiveConf to hooks


[ 
https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200813#comment-14200813
 ] 

Ashutosh Chauhan commented on HIVE-8760:


But why to make an assumption about what hook is doing? Isnt it prudent that 
Hive does a safe thing when it can.

 Pass a copy of HiveConf to hooks
 

 Key: HIVE-8760
 URL: https://issues.apache.org/jira/browse/HIVE-8760
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0, 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8760.patch


 because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper


[ 
https://issues.apache.org/jira/browse/HIVE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200827#comment-14200827
 ] 

Hive QA commented on HIVE-8759:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679892/HIVE-8759.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6700 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1668/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1668/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1668/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679892 - PreCommit-HIVE-TRUNK-Build

 HiveServer2 dynamic service discovery should add hostname instead of 
 ipaddress to ZooKeeper
 ---

 Key: HIVE-8759
 URL: https://issues.apache.org/jira/browse/HIVE-8759
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-8759.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8759) HiveServer2 dynamic service discovery should add hostname instead of ipaddress to ZooKeeper


[ 
https://issues.apache.org/jira/browse/HIVE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200839#comment-14200839
 ] 

Vaibhav Gumashta commented on HIVE-8759:


Test failures are unrelated.

 HiveServer2 dynamic service discovery should add hostname instead of 
 ipaddress to ZooKeeper
 ---

 Key: HIVE-8759
 URL: https://issues.apache.org/jira/browse/HIVE-8759
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-8759.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8674) Fix tests after merge [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200842#comment-14200842
 ] 

Hive QA commented on HIVE-8674:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679911/HIVE-8674.2-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7123 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testTimezone
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/318/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/318/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-318/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679911 - PreCommit-HIVE-SPARK-Build

 Fix tests after merge [Spark Branch]
 

 Key: HIVE-8674
 URL: https://issues.apache.org/jira/browse/HIVE-8674
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch, 
 HIVE-8674.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-06 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8754:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~thejas] for the review.

 Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
 in secure cluster
 ---

 Key: HIVE-8754
 URL: https://issues.apache.org/jira/browse/HIVE-8754
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Fix For: 0.14.0, 0.15.0

 Attachments: HIVE-8754.2.patch, HIVE-8754.patch


 HIVE-8588 added support for this by copying jdbc jars to lib/ of 
 localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
 intentionally sets permissions on exploded tars such that they are not 
 writable.
 this needs to be fixed, otherwise the users would have to modify their sqoop 
 tar to include the relevant jdbc jars which is burdensome is different DBs 
 are used and may create headache around licensing issues
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8674) Fix tests after merge [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200848#comment-14200848
 ] 

Brock Noland commented on HIVE-8674:


Ok, I will just commit the fix for parallel.q resolve this guy.

 Fix tests after merge [Spark Branch]
 

 Key: HIVE-8674
 URL: https://issues.apache.org/jira/browse/HIVE-8674
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8674.1-spark.patch, HIVE-8674.2-spark.patch, 
 HIVE-8674.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8761) JDOPersistenceManager creation should be controlled by at the Server level and not Thread level

Vaibhav Gumashta created HIVE-8761:
--

 Summary: JDOPersistenceManager creation should be controlled by at 
the Server level and not Thread level
 Key: HIVE-8761
 URL: https://issues.apache.org/jira/browse/HIVE-8761
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.15.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta


When using JDO, we create a thread local RawStore (ObjectStore) object in each 
metastore thread. This leads to creation of a new  JDOPersistenceManager per 
thread which are cached in JDOPersistanceManagerFactory. To remove 
JDOPersistenceManager from JDOPersistanceManagerFactory, an explicit 
JDOPersistenceManager.close needs to be called. 
This is a bad candidate for thread local as the effective object destruction 
requires the application to call close. So, when metastore threads are killed 
by the threadpool, this object will never be removed from the 
JDOPersistanceManagerFactory cache.
We fixed this for HiveServer2 using embedded metastore (HIVE-7353) by 
customizing the GC collection of the dying thread, but I believe a better and 
more efficient solution is to pool JDOPersistenceManager objects and let each 
thread get an object for its use from the pool.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8761) JDOPersistenceManager creation should be controlled by at the Server level and not Thread level


 [ 
https://issues.apache.org/jira/browse/HIVE-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8761:
---
Fix Version/s: 0.15.0

 JDOPersistenceManager creation should be controlled by at the Server level 
 and not Thread level
 ---

 Key: HIVE-8761
 URL: https://issues.apache.org/jira/browse/HIVE-8761
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.15.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.15.0


 When using JDO, we create a thread local RawStore (ObjectStore) object in 
 each metastore thread. This leads to creation of a new  JDOPersistenceManager 
 per thread which are cached in JDOPersistanceManagerFactory. To remove 
 JDOPersistenceManager from JDOPersistanceManagerFactory, an explicit 
 JDOPersistenceManager.close needs to be called. 
 This is a bad candidate for thread local as the effective object destruction 
 requires the application to call close. So, when metastore threads are killed 
 by the threadpool, this object will never be removed from the 
 JDOPersistanceManagerFactory cache.
 We fixed this for HiveServer2 using embedded metastore (HIVE-7353) by 
 customizing the GC collection of the dying thread, but I believe a better and 
 more efficient solution is to pool JDOPersistenceManager objects and let each 
 thread get an object for its use from the pool.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8760) Pass a copy of HiveConf to hooks

2014-11-06 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200852#comment-14200852
 ] 

Gopal V commented on HIVE-8760:
---

Immutable structures are prudent from an API design stand-point - copying 
always to get to a shared-nothing model potentially breaks anyone who relied on 
synchronization on that object elsewhere.

The stability impact of copying is currently invisible and unknown, but 
eventually a lot of System.identityHashcode is applied to debug those and 
System.err, because LOG.info() is synchronized.

The performance impact of however is well known (as quoted earlier). The core 
API issue over-all for me is that we don't have immutable Conf objects - I keep 
hitting these {{new Configuration()}} perf issues (track HADOOP-11223 for the 
impact on HDFS).

At the very least, I know the stability impact of copying in one Hook, the 
surface is rather narrow for that problem to trace through (i.e ship 
Hook2.java, Hook3.java etc and test them without rebuilding all of hive).

On top of it, the biggest user of Hooks seems to be itests (which ships 
something like 20 single thread hooks). You'll be slowing down all of them, all 
the time.

 Pass a copy of HiveConf to hooks
 

 Key: HIVE-8760
 URL: https://issues.apache.org/jira/browse/HIVE-8760
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0, 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8760.patch


 because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-06 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8754:

Fix Version/s: (was: 0.15.0)

 Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
 in secure cluster
 ---

 Key: HIVE-8754
 URL: https://issues.apache.org/jira/browse/HIVE-8754
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8754.2.patch, HIVE-8754.patch


 HIVE-8588 added support for this by copying jdbc jars to lib/ of 
 localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
 intentionally sets permissions on exploded tars such that they are not 
 writable.
 this needs to be fixed, otherwise the users would have to modify their sqoop 
 tar to include the relevant jdbc jars which is burdensome is different DBs 
 are used and may create headache around licensing issues
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]

2014-11-06 Thread Xuefu Zhang



 On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 75
  https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line75
 
  I don't feel we need to cache this, as this can change during a user 
  session.
 
 Jimmy Xiang wrote:
 Yes, it will change during a user session. I was thinking to update this 
 when things are changed base on some event callbacks.
 
 Such info may be needed many times if there are many reducers. It should 
 save us some time to go to the Spark master (assuming getExecutorMemoryStatus 
 checking with the master).

1. I don't think there will be a callback.
2. Yeah, it will be called many times if there are multiple reducers. 
Therefore, it probably makes sense to put the info at 
SetSparkReducerParallelism, which is created for each query.
3. You also need to make sure this works for Spark standalone cluster. I'm not 
sure if you can get number of exectors/memory in the same way.


 On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 89
  https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line89
 
  I'm not sure why this needs to be synchronized. Will this method be 
  called by concurrent threads? It doesn't seem to be the case.
 
 Jimmy Xiang wrote:
 Are you saying it won't be called by many threads? Each JVM can run one 
 query at a time during all deployment modes? How come SparkClient.getInstance 
 is synchronized?

Yeah. Right now this is a little messy. Changes are coming. Concurrency isn't 
tested yet. It's fine to leave the synchronization there.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27687/#review60210
---


On Nov. 6, 2014, 5:25 p.m., Jimmy Xiang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/27687/
 ---
 
 (Updated Nov. 6, 2014, 5:25 p.m.)
 
 
 Review request for hive and Xuefu Zhang.
 
 
 Bugs: HIVE-8649
 https://issues.apache.org/jira/browse/HIVE-8649
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 First patch for HIVE-8649, to increase the number of reducers for spark based 
 on some info about the spark cluster.
 We need to add a SparkListener to handle cluster status change if such events 
 are supported by spark.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java
  2dbb5a3 
 
 Diff: https://reviews.apache.org/r/27687/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jimmy Xiang

[jira] [Created] (HIVE-8762) HiveMetaStore.BooleanPointer should be replaced with an AtomicBoolean

Alan Gates created HIVE-8762:


 Summary: HiveMetaStore.BooleanPointer should be replaced with an 
AtomicBoolean
 Key: HIVE-8762
 URL: https://issues.apache.org/jira/browse/HIVE-8762
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates


AtomicBoolean will serve the same purpose, with the added bonus that it will 
perform correctly if two threads try to write to it simultaneously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8711) DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer