date:20141105

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-05 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199960#comment-14199960
 ] 

Jason Dere commented on HIVE-8745:
--

{quote}
If the serde needs to remove trailing zeros during serialization, this is fine 
as long as it can get them back upon deserialization.
{quote}

What I'm saying is I don't think it's possible to get the trailing zeros back 
upon deserialization. Since this is BinarySortableSerde, the comparison is 
based on the bytes representing the decimal value.  If we had a way to 
differentiate 1.0 from 1.00 during deserialization, then there would have to be 
something in the BinarySortable representation of the decimal value to specify 
one trailing zero vs two trailing zeros, which would make the bytes comparison 
fail between 1.0 and 1.00. So the trailing zeros would be permanently trimmed.

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199956#comment-14199956
 ] 

Hive QA commented on HIVE-8744:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679708/HIVE-8744.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1657/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1657/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1657/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679708 - PreCommit-HIVE-TRUNK-Build

> hbase_stats3.q test fails when paths stored at 
> JDBCStatsUtils.getIdColumnName() are too large
> -
>
> Key: HIVE-8744
> URL: https://issues.apache.org/jira/browse/HIVE-8744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-8744.1.patch
>
>
> This test is related to the bug HIVE-8065 where I am trying to support HDFS 
> encryption. One of the enhancements to support it is to create a 
> .hive-staging directory on the same table directory location where the query 
> is executed.
> Now, when running the hbase_stats3.q test from a temporary directory that has 
> a large path, then the new path, a combination of table location + 
> .hive-staging + random temporary subdirectories, is too large to fit into the 
> statistics table, so the path is truncated.
> This causes the following error:
> {noformat}
> 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
> jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
> during publishing statistics. 
> java.sql.SQLDataException: A truncation error was encountered trying to 
> shrink VARCHAR 
> 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(Loca

[jira] [Updated] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]

2014-11-05 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8726:

Status: Patch Available  (was: Open)

> Collect Spark TaskMetrics and build job statistic[Spark Branch]
> ---
>
> Key: HIVE-8726
> URL: https://issues.apache.org/jira/browse/HIVE-8726
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-8726.1-spark.patch
>
>
> Implement SparkListener to collect TaskMetrics, and build SparkStatistic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27672: HIVE-8726 Collect Spark TaskMetrics and build job statistic[Spark Branch]

2014-11-05 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27672/
---

(Updated Nov. 6, 2014, 7:30 a.m.)


Review request for hive and Xuefu Zhang.


Bugs: HIVE-8726
https://issues.apache.org/jira/browse/HIVE-8726


Repository: hive-git


Description
---

collection spark task metrics and combine into job level metric and build into 
SparkStatistics.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 7ab9a34 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java 
f6cc581 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/JobStateListener.java
 b4f753f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/SimpleSparkJobStatus.java
 78e16c5 

Diff: https://reviews.apache.org/r/27672/diff/


Testing
---


Thanks,

chengxiang li

Review Request 27672: HIVE-8726 Collect Spark TaskMetrics and build job statistic[Spark Branch]

2014-11-05 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27672/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8726
https://issues.apache.org/jira/browse/HIVE-8726


Repository: hive-git


Description
---

collection spark task metrics and combine into job level metric and build into 
SparkStatistics.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 7ab9a34 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/SparkJobStatus.java 
f6cc581 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/JobStateListener.java
 b4f753f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/SimpleSparkJobStatus.java
 78e16c5 

Diff: https://reviews.apache.org/r/27672/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Updated] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]

2014-11-05 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8726:

Attachment: HIVE-8726.1-spark.patch

> Collect Spark TaskMetrics and build job statistic[Spark Branch]
> ---
>
> Key: HIVE-8726
> URL: https://issues.apache.org/jira/browse/HIVE-8726
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-8726.1-spark.patch
>
>
> Implement SparkListener to collect TaskMetrics, and build SparkStatistic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199916#comment-14199916
 ] 

Gunther Hagleitner commented on HIVE-8754:
--

Why did you disable precommit tests for this?

> Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
> in secure cluster
> ---
>
> Key: HIVE-8754
> URL: https://issues.apache.org/jira/browse/HIVE-8754
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8754.2.patch, HIVE-8754.patch
>
>
> HIVE-8588 added support for this by copying jdbc jars to lib/ of 
> localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
> intentionally sets permissions on exploded tars such that they are not 
> writable.
> this needs to be fixed, otherwise the users would have to modify their sqoop 
> tar to include the relevant jdbc jars which is burdensome is different DBs 
> are used and may create headache around licensing issues
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199899#comment-14199899
 ] 

Eugene Koifman commented on HIVE-8754:
--

done

> Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
> in secure cluster
> ---
>
> Key: HIVE-8754
> URL: https://issues.apache.org/jira/browse/HIVE-8754
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8754.2.patch, HIVE-8754.patch
>
>
> HIVE-8588 added support for this by copying jdbc jars to lib/ of 
> localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
> intentionally sets permissions on exploded tars such that they are not 
> writable.
> this needs to be fixed, otherwise the users would have to modify their sqoop 
> tar to include the relevant jdbc jars which is burdensome is different DBs 
> are used and may create headache around licensing issues
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8754:
-
Attachment: HIVE-8754.2.patch

> Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
> in secure cluster
> ---
>
> Key: HIVE-8754
> URL: https://issues.apache.org/jira/browse/HIVE-8754
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8754.2.patch, HIVE-8754.patch
>
>
> HIVE-8588 added support for this by copying jdbc jars to lib/ of 
> localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
> intentionally sets permissions on exploded tars such that they are not 
> writable.
> this needs to be fixed, otherwise the users would have to modify their sqoop 
> tar to include the relevant jdbc jars which is burdensome is different DBs 
> are used and may create headache around licensing issues
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8737) setEnv is not portable, which fails TestCliDriverMethods#testprocessInitFiles on Windows

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199893#comment-14199893
 ] 

Hive QA commented on HIVE-8737:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679696/HIVE-8737.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1655/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1655/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1655/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679696 - PreCommit-HIVE-TRUNK-Build

> setEnv is not portable, which fails TestCliDriverMethods#testprocessInitFiles 
> on Windows
> 
>
> Key: HIVE-8737
> URL: https://issues.apache.org/jira/browse/HIVE-8737
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
> Environment: Windows
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HIVE-8737.1.patch, HIVE-8737.2.patch, HIVE-8737.3.patch
>
>
> repro:
> {noformat}
> mvn test -Phadoop-2 -Dtest=TestCliDriverMethods#testprocessInitFiles
> {noformat}
> setEnv tries to do JVM wide system variables changes, previous approach is 
> not portable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199891#comment-14199891
 ] 

Thejas M Nair commented on HIVE-8754:
-

+1 . As discussed offline, can you also make that change to use the java path 
separator string instead of  ":" , so that it works across different OS ?


> Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
> in secure cluster
> ---
>
> Key: HIVE-8754
> URL: https://issues.apache.org/jira/browse/HIVE-8754
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8754.patch
>
>
> HIVE-8588 added support for this by copying jdbc jars to lib/ of 
> localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
> intentionally sets permissions on exploded tars such that they are not 
> writable.
> this needs to be fixed, otherwise the users would have to modify their sqoop 
> tar to include the relevant jdbc jars which is burdensome is different DBs 
> are used and may create headache around licensing issues
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large

2014-11-05 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199887#comment-14199887
 ] 

Szehon Ho commented on HIVE-8744:
-

Sounds good.  Another thought came to mind, what do you guys think about rename 
the table to v3?   I saw it was done for a schema change in HIVE-2471 awhile 
back.

That way, users dont have to manually drop the table/schema, and we can just in 
release notes doc to say that you can delete v2.  Just wanted to bring it up, 
not sure what you guys think.  Thanks.

> hbase_stats3.q test fails when paths stored at 
> JDBCStatsUtils.getIdColumnName() are too large
> -
>
> Key: HIVE-8744
> URL: https://issues.apache.org/jira/browse/HIVE-8744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-8744.1.patch
>
>
> This test is related to the bug HIVE-8065 where I am trying to support HDFS 
> encryption. One of the enhancements to support it is to create a 
> .hive-staging directory on the same table directory location where the query 
> is executed.
> Now, when running the hbase_stats3.q test from a temporary directory that has 
> a large path, then the new path, a combination of table location + 
> .hive-staging + random temporary subdirectories, is too large to fit into the 
> statistics table, so the path is truncated.
> This causes the following error:
> {noformat}
> 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
> jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
> during publishing statistics. 
> java.sql.SQLDataException: A truncation error was encountered trying to 
> shrink VARCHAR 
> 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.sql.SQLException: A truncation error was encountered trying 
> to shrink VARCHAR 
> 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby

[jira] [Commented] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199839#comment-14199839
 ] 

Eugene Koifman commented on HIVE-8754:
--

[~hagleitn], could we get this into 0.14 please?

> Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
> in secure cluster
> ---
>
> Key: HIVE-8754
> URL: https://issues.apache.org/jira/browse/HIVE-8754
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8754.patch
>
>
> HIVE-8588 added support for this by copying jdbc jars to lib/ of 
> localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
> intentionally sets permissions on exploded tars such that they are not 
> writable.
> this needs to be fixed, otherwise the users would have to modify their sqoop 
> tar to include the relevant jdbc jars which is burdensome is different DBs 
> are used and may create headache around licensing issues
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8509) UT: fix list_bucket_dml_2 test [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8509:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks to Chinna for the contribution.

> UT: fix list_bucket_dml_2 test [Spark Branch]
> -
>
> Key: HIVE-8509
> URL: https://issues.apache.org/jira/browse/HIVE-8509
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Friedrich
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Fix For: spark-branch
>
> Attachments: HIVE-8509-spark.patch, HIVE-8509-spark.patch
>
>
> The test list_bucket_dml_2 fails in FileSinkOperator.publishStats:
> org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: 
> StatsPublisher cannot be connected to.There was a error while connecting to 
> the StatsPublisher, and retrying might help. If you dont want the query to 
> fail because accurate statistics could not be collected, set 
> hive.stats.reliable=false
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1079)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:971)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:582)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:175)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:121)
> I debugged and found that FileSinkOperator.publishStats throws the exception 
> when calling statsPublisher.connect here:
> if (!statsPublisher.connect(hconf)) {
> // just return, stats gathering should not block the main query
> LOG.error("StatsPublishing error: cannot connect to database");
> if (isStatsReliable)
> { throw new 
> HiveException(ErrorMsg.STATSPUBLISHER_CONNECTION_ERROR.getErrorCodedMsg()); }
> return;
> }
> With the hive.stats.dbclass set to counter in data/conf/spark/hive-site.xml, 
> the statsPuvlisher is of type CounterStatsPublisher.
> In CounterStatsPublisher, the exception is thrown because getReporter() 
> returns null for the MapredContext:
> MapredContext context = MapredContext.get();
> if (context == null || context.getReporter() == null)
> { return false; }
> When changing hive.stats.dbclass to jdbc:derby in 
> data/conf/spark/hive-site.xml, similar to TestCliDriver it works:
> 
> hive.stats.dbclass
> 
> jdbc:derby
> The default storatge that stores temporary hive statistics. 
> Currently, jdbc, hbase and counter type is supported
> 
> In addition, I had to generate the out file for the test case for spark.
> When running this test with TestCliDriver and hive.stats.dbclass set to 
> counter, the test case still works. The reporter is set to 
> org.apache.hadoop.mapred.Task$TaskReporter. 
> Might need some additional investigation why the CounterStatsPublisher has no 
> reporter in case of spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8509) UT: fix list_bucket_dml_2 test [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8509:
--
Summary: UT: fix list_bucket_dml_2 test [Spark Branch]  (was: UT: fix 
list_bucket_dml_2 test)

> UT: fix list_bucket_dml_2 test [Spark Branch]
> -
>
> Key: HIVE-8509
> URL: https://issues.apache.org/jira/browse/HIVE-8509
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Friedrich
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Attachments: HIVE-8509-spark.patch, HIVE-8509-spark.patch
>
>
> The test list_bucket_dml_2 fails in FileSinkOperator.publishStats:
> org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: 
> StatsPublisher cannot be connected to.There was a error while connecting to 
> the StatsPublisher, and retrying might help. If you dont want the query to 
> fail because accurate statistics could not be collected, set 
> hive.stats.reliable=false
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1079)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:971)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:582)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:175)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:121)
> I debugged and found that FileSinkOperator.publishStats throws the exception 
> when calling statsPublisher.connect here:
> if (!statsPublisher.connect(hconf)) {
> // just return, stats gathering should not block the main query
> LOG.error("StatsPublishing error: cannot connect to database");
> if (isStatsReliable)
> { throw new 
> HiveException(ErrorMsg.STATSPUBLISHER_CONNECTION_ERROR.getErrorCodedMsg()); }
> return;
> }
> With the hive.stats.dbclass set to counter in data/conf/spark/hive-site.xml, 
> the statsPuvlisher is of type CounterStatsPublisher.
> In CounterStatsPublisher, the exception is thrown because getReporter() 
> returns null for the MapredContext:
> MapredContext context = MapredContext.get();
> if (context == null || context.getReporter() == null)
> { return false; }
> When changing hive.stats.dbclass to jdbc:derby in 
> data/conf/spark/hive-site.xml, similar to TestCliDriver it works:
> 
> hive.stats.dbclass
> 
> jdbc:derby
> The default storatge that stores temporary hive statistics. 
> Currently, jdbc, hbase and counter type is supported
> 
> In addition, I had to generate the out file for the test case for spark.
> When running this test with TestCliDriver and hive.stats.dbclass set to 
> counter, the test case still works. The reporter is set to 
> org.apache.hadoop.mapred.Task$TaskReporter. 
> Might need some additional investigation why the CounterStatsPublisher has no 
> reporter in case of spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8509) UT: fix list_bucket_dml_2 test

2014-11-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199836#comment-14199836
 ] 

Xuefu Zhang commented on HIVE-8509:
---

+1. The test passes now with counter is probably due to recently inclusion of 
support publishing stats via counter.

Test failure, auto_join1.q, is again likely caused by HIVE-8578. 

> UT: fix list_bucket_dml_2 test
> --
>
> Key: HIVE-8509
> URL: https://issues.apache.org/jira/browse/HIVE-8509
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Friedrich
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Attachments: HIVE-8509-spark.patch, HIVE-8509-spark.patch
>
>
> The test list_bucket_dml_2 fails in FileSinkOperator.publishStats:
> org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: 
> StatsPublisher cannot be connected to.There was a error while connecting to 
> the StatsPublisher, and retrying might help. If you dont want the query to 
> fail because accurate statistics could not be collected, set 
> hive.stats.reliable=false
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1079)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:971)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:582)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:175)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:121)
> I debugged and found that FileSinkOperator.publishStats throws the exception 
> when calling statsPublisher.connect here:
> if (!statsPublisher.connect(hconf)) {
> // just return, stats gathering should not block the main query
> LOG.error("StatsPublishing error: cannot connect to database");
> if (isStatsReliable)
> { throw new 
> HiveException(ErrorMsg.STATSPUBLISHER_CONNECTION_ERROR.getErrorCodedMsg()); }
> return;
> }
> With the hive.stats.dbclass set to counter in data/conf/spark/hive-site.xml, 
> the statsPuvlisher is of type CounterStatsPublisher.
> In CounterStatsPublisher, the exception is thrown because getReporter() 
> returns null for the MapredContext:
> MapredContext context = MapredContext.get();
> if (context == null || context.getReporter() == null)
> { return false; }
> When changing hive.stats.dbclass to jdbc:derby in 
> data/conf/spark/hive-site.xml, similar to TestCliDriver it works:
> 
> hive.stats.dbclass
> 
> jdbc:derby
> The default storatge that stores temporary hive statistics. 
> Currently, jdbc, hbase and counter type is supported
> 
> In addition, I had to generate the out file for the test case for spark.
> When running this test with TestCliDriver and hive.stats.dbclass set to 
> counter, the test case still works. The reporter is set to 
> org.apache.hadoop.mapred.Task$TaskReporter. 
> Might need some additional investigation why the CounterStatsPublisher has no 
> reporter in case of spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8661) JDBC MinimizeJAR should be configurable in pom.xml

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199832#comment-14199832
 ] 

Hive QA commented on HIVE-8661:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679698/HIVE-8661.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1654/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1654/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1654/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679698 - PreCommit-HIVE-TRUNK-Build

> JDBC MinimizeJAR should be configurable in pom.xml
> --
>
> Key: HIVE-8661
> URL: https://issues.apache.org/jira/browse/HIVE-8661
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-8661.1.patch, HIVE-8661.2.patch
>
>
> A large amount of dev time is wasted waiting for JDBC to minimize JARs from 
> 33Mb -> 16Mb during developer cycles.
> This should only kick-in during -Pdist, allowing for disabling this during 
> dev cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8509) UT: fix list_bucket_dml_2 test

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199824#comment-14199824
 ] 

Hive QA commented on HIVE-8509:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679785/HIVE-8509-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7099 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testTimezone
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/316/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/316/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-316/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679785 - PreCommit-HIVE-SPARK-Build

> UT: fix list_bucket_dml_2 test
> --
>
> Key: HIVE-8509
> URL: https://issues.apache.org/jira/browse/HIVE-8509
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Friedrich
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Attachments: HIVE-8509-spark.patch, HIVE-8509-spark.patch
>
>
> The test list_bucket_dml_2 fails in FileSinkOperator.publishStats:
> org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: 
> StatsPublisher cannot be connected to.There was a error while connecting to 
> the StatsPublisher, and retrying might help. If you dont want the query to 
> fail because accurate statistics could not be collected, set 
> hive.stats.reliable=false
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1079)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:971)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:582)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:175)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:121)
> I debugged and found that FileSinkOperator.publishStats throws the exception 
> when calling statsPublisher.connect here:
> if (!statsPublisher.connect(hconf)) {
> // just return, stats gathering should not block the main query
> LOG.error("StatsPublishing error: cannot connect to database");
> if (isStatsReliable)
> { throw new 
> HiveException(ErrorMsg.STATSPUBLISHER_CONNECTION_ERROR.getErrorCodedMsg()); }
> return;
> }
> With the hive.stats.dbclass set to counter in data/conf/spark/hive-site.xml, 
> the statsPuvlisher is of type CounterStatsPublisher.
> In CounterStatsPublisher, the exception is thrown because getReporter() 
> returns null for the MapredContext:
> MapredContext context = MapredContext.get();
> if (context == null || context.getReporter() == null)
> { return false; }
> When changing hive.stats.dbclass to jdbc:derby in 
> data/conf/spark/hive-site.xml, similar to TestCliDriver it works:
> 
> hive.stats.dbclass
> 
> jdbc:derby
> The default storatge that stores temporary hive statistics. 
> Currently, jdbc, hbase and counter type is supported
> 
> In addition, I had to generate the out file for the test case for spark.
> When running this test with TestCliDriver and hive.stats.dbclass set to 
> counter, the test case still works. The reporter is set to 
> org.apache.hadoop.mapred.Task$TaskReporter. 
> Might need some additional investigation why the CounterStatsPublisher has no 
> reporter in case of spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8578) Investigate test failures related to HIVE-8545 [Spark Branch]

2014-11-05 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199798#comment-14199798
 ] 

Chao commented on HIVE-8578:


I'm wondering if we can get a stack trace if the job is failed, instead of just 
the "Execution has failed" message..

> Investigate test failures related to HIVE-8545 [Spark Branch]
> -
>
> Key: HIVE-8578
> URL: https://issues.apache.org/jira/browse/HIVE-8578
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chao
>Assignee: Jimmy Xiang
>
> In HIVE-8545, there are a few test failures, for instance, 
> {{multi_insert_lateral_view.q}} and {{ppr_multi_insert.q}}. They appear to be 
> happening at random, and not reproducible locally. We need to track down the 
> root cause, and fix in this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7729) Enable q-tests for ANALYZE TABLE feature [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7729:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks to Na for the contribution.

> Enable q-tests for ANALYZE TABLE feature [Spark Branch]
> ---
>
> Key: HIVE-7729
> URL: https://issues.apache.org/jira/browse/HIVE-7729
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Na Yang
>  Labels: Spark-M3
> Fix For: spark-branch
>
> Attachments: HIVE-7729.1-spark.patch, HIVE-7729.2-spark.patch, 
> HIVE-7729.3-spark.patch
>
>
> Enable q-tests for ANALYZE TABLE feature since automatic test environment is 
> ready.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7729) Enable q-tests for ANALYZE TABLE feature [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199795#comment-14199795
 ] 

Xuefu Zhang commented on HIVE-7729:
---

+1, patch looks good to me too.

pcr.q failure seems related to HIVE-8578 instead.

> Enable q-tests for ANALYZE TABLE feature [Spark Branch]
> ---
>
> Key: HIVE-7729
> URL: https://issues.apache.org/jira/browse/HIVE-7729
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Na Yang
>  Labels: Spark-M3
> Attachments: HIVE-7729.1-spark.patch, HIVE-7729.2-spark.patch, 
> HIVE-7729.3-spark.patch
>
>
> Enable q-tests for ANALYZE TABLE feature since automatic test environment is 
> ready.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7795) Enable ptf.q and ptf_streaming.q.[Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7795:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks for the contribution from Jimmy.

> Enable ptf.q and ptf_streaming.q.[Spark Branch]
> ---
>
> Key: HIVE-7795
> URL: https://issues.apache.org/jira/browse/HIVE-7795
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-7795.1-spark.patch
>
>
> ptf.q and ptf_streaming.q contains join queries, we should enable these 
> qtests in milestone2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8748) jdbc uber jar is missing commons-logging

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199775#comment-14199775
 ] 

Hive QA commented on HIVE-8748:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679684/HIVE-8748.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1653/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1653/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1653/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679684 - PreCommit-HIVE-TRUNK-Build

> jdbc uber jar is missing commons-logging
> 
>
> Key: HIVE-8748
> URL: https://issues.apache.org/jira/browse/HIVE-8748
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-8748.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8578) Investigate test failures related to HIVE-8545 [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199774#comment-14199774
 ] 

Xuefu Zhang commented on HIVE-8578:
---

I think the problem is still happening from time to time. The latest example:

http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/311/testReport/org.apache.hadoop.hive.cli/TestSparkCliDriver/testCliDriver_skewjoinopt13/

> Investigate test failures related to HIVE-8545 [Spark Branch]
> -
>
> Key: HIVE-8578
> URL: https://issues.apache.org/jira/browse/HIVE-8578
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chao
>
> In HIVE-8545, there are a few test failures, for instance, 
> {{multi_insert_lateral_view.q}} and {{ppr_multi_insert.q}}. They appear to be 
> happening at random, and not reproducible locally. We need to track down the 
> root cause, and fix in this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8578) Investigate test failures related to HIVE-8545 [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-8578:
-

Assignee: Xuefu Zhang

> Investigate test failures related to HIVE-8545 [Spark Branch]
> -
>
> Key: HIVE-8578
> URL: https://issues.apache.org/jira/browse/HIVE-8578
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chao
>Assignee: Xuefu Zhang
>
> In HIVE-8545, there are a few test failures, for instance, 
> {{multi_insert_lateral_view.q}} and {{ppr_multi_insert.q}}. They appear to be 
> happening at random, and not reproducible locally. We need to track down the 
> root cause, and fix in this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8578) Investigate test failures related to HIVE-8545 [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-8578:
-

Assignee: Jimmy Xiang  (was: Xuefu Zhang)

> Investigate test failures related to HIVE-8545 [Spark Branch]
> -
>
> Key: HIVE-8578
> URL: https://issues.apache.org/jira/browse/HIVE-8578
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chao
>Assignee: Jimmy Xiang
>
> In HIVE-8545, there are a few test failures, for instance, 
> {{multi_insert_lateral_view.q}} and {{ppr_multi_insert.q}}. They appear to be 
> happening at random, and not reproducible locally. We need to track down the 
> root cause, and fix in this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7795) Enable ptf.q and ptf_streaming.q.[Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199768#comment-14199768
 ] 

Xuefu Zhang commented on HIVE-7795:
---

+1.

I think skewjoinopt13 is related to HIVE-8578.

> Enable ptf.q and ptf_streaming.q.[Spark Branch]
> ---
>
> Key: HIVE-7795
> URL: https://issues.apache.org/jira/browse/HIVE-7795
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-7795.1-spark.patch
>
>
> ptf.q and ptf_streaming.q contains join queries, we should enable these 
> qtests in milestone2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large

2014-11-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199770#comment-14199770
 ] 

Brock Noland commented on HIVE-8744:


That's a pretty old database and will be older when we release 0.15. I think we 
should move ahead...

> hbase_stats3.q test fails when paths stored at 
> JDBCStatsUtils.getIdColumnName() are too large
> -
>
> Key: HIVE-8744
> URL: https://issues.apache.org/jira/browse/HIVE-8744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-8744.1.patch
>
>
> This test is related to the bug HIVE-8065 where I am trying to support HDFS 
> encryption. One of the enhancements to support it is to create a 
> .hive-staging directory on the same table directory location where the query 
> is executed.
> Now, when running the hbase_stats3.q test from a temporary directory that has 
> a large path, then the new path, a combination of table location + 
> .hive-staging + random temporary subdirectories, is too large to fit into the 
> statistics table, so the path is truncated.
> This causes the following error:
> {noformat}
> 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
> jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
> during publishing statistics. 
> java.sql.SQLDataException: A truncation error was encountered trying to 
> shrink VARCHAR 
> 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.sql.SQLException: A truncation error was encountered trying 
> to shrink VARCHAR 
> 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>   ... 30 more
> Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
> VARCHAR 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.&' t

[jira] [Updated] (HIVE-8578) Investigate test failures related to HIVE-8545 [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8578:
--
Issue Type: Sub-task  (was: Test)
Parent: HIVE-7292

> Investigate test failures related to HIVE-8545 [Spark Branch]
> -
>
> Key: HIVE-8578
> URL: https://issues.apache.org/jira/browse/HIVE-8578
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chao
>
> In HIVE-8545, there are a few test failures, for instance, 
> {{multi_insert_lateral_view.q}} and {{ppr_multi_insert.q}}. They appear to be 
> happening at random, and not reproducible locally. We need to track down the 
> root cause, and fix in this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8611) grant/revoke syntax should support additional objects for authorization plugins

2014-11-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199765#comment-14199765
 ] 

Brock Noland commented on HIVE-8611:


+1 pending tests

> grant/revoke syntax should support additional objects for authorization 
> plugins
> ---
>
> Key: HIVE-8611
> URL: https://issues.apache.org/jira/browse/HIVE-8611
> Project: Hive
>  Issue Type: Bug
>  Components: Authentication, SQL
>Affects Versions: 0.13.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Fix For: 0.14.0
>
> Attachments: HIVE-8611.1.patch, HIVE-8611.2.patch, HIVE-8611.2.patch, 
> HIVE-8611.3.patch
>
>
> The authorization framework supports URI and global objects. The SQL syntax 
> however doesn't allow granting privileges on these objects. We should allow 
> the compiler to parse these so that it can be handled by authorization 
> plugins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8509) UT: fix list_bucket_dml_2 test

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8509:
--
Attachment: HIVE-8509-spark.patch

Reattach the same patch to rerun the task, as many test failures seemed 
unrelated.

> UT: fix list_bucket_dml_2 test
> --
>
> Key: HIVE-8509
> URL: https://issues.apache.org/jira/browse/HIVE-8509
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Friedrich
>Assignee: Chinna Rao Lalam
>Priority: Minor
> Attachments: HIVE-8509-spark.patch, HIVE-8509-spark.patch
>
>
> The test list_bucket_dml_2 fails in FileSinkOperator.publishStats:
> org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: 
> StatsPublisher cannot be connected to.There was a error while connecting to 
> the StatsPublisher, and retrying might help. If you dont want the query to 
> fail because accurate statistics could not be collected, set 
> hive.stats.reliable=false
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1079)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:971)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:582)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:175)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:121)
> I debugged and found that FileSinkOperator.publishStats throws the exception 
> when calling statsPublisher.connect here:
> if (!statsPublisher.connect(hconf)) {
> // just return, stats gathering should not block the main query
> LOG.error("StatsPublishing error: cannot connect to database");
> if (isStatsReliable)
> { throw new 
> HiveException(ErrorMsg.STATSPUBLISHER_CONNECTION_ERROR.getErrorCodedMsg()); }
> return;
> }
> With the hive.stats.dbclass set to counter in data/conf/spark/hive-site.xml, 
> the statsPuvlisher is of type CounterStatsPublisher.
> In CounterStatsPublisher, the exception is thrown because getReporter() 
> returns null for the MapredContext:
> MapredContext context = MapredContext.get();
> if (context == null || context.getReporter() == null)
> { return false; }
> When changing hive.stats.dbclass to jdbc:derby in 
> data/conf/spark/hive-site.xml, similar to TestCliDriver it works:
> 
> hive.stats.dbclass
> 
> jdbc:derby
> The default storatge that stores temporary hive statistics. 
> Currently, jdbc, hbase and counter type is supported
> 
> In addition, I had to generate the out file for the test case for spark.
> When running this test with TestCliDriver and hive.stats.dbclass set to 
> counter, the test case still works. The reporter is set to 
> org.apache.hadoop.mapred.Task$TaskReporter. 
> Might need some additional investigation why the CounterStatsPublisher has no 
> reporter in case of spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8219) Multi-Insert optimization, don't sink the source into a file [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8219.
---
Resolution: Won't Fix

With HIVE-8118, this is no longer needed.

> Multi-Insert optimization, don't sink the source into a file [Spark Branch]
> ---
>
> Key: HIVE-8219
> URL: https://issues.apache.org/jira/browse/HIVE-8219
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>  Labels: Spark-M1
>
> Current implementation split the operator plan at the lowest common ancester 
> by inserting one FileSinkOperator and a list of TableScanOperators. Writing 
> to a file (by the FS) is expensive. We should be able to insert a 
> ReduceSinkOperator instead. The result RDD from the first job can be cached 
> and refereed in subsequent Spark jobs.
> This is a followup for HIVE-7503.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199753#comment-14199753
 ] 

Hive QA commented on HIVE-8649:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679762/HIVE-8649.1-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7098 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testTimezone
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/315/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/315/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-315/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679762 - PreCommit-HIVE-SPARK-Build

> Increase level of parallelism in reduce phase [Spark Branch]
> 
>
> Key: HIVE-8649
> URL: https://issues.apache.org/jira/browse/HIVE-8649
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8649.1-spark.patch
>
>
> We calculate the number of reducers based on the same code for MapReduce. 
> However, reducers are vastly cheaper in Spark and it's generally recommended 
> we have many more reducers than in MR.
> Sandy Ryza who works on Spark has some ideas about a heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]

2014-11-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199742#comment-14199742
 ] 

Xuefu Zhang commented on HIVE-8649:
---

Hi [~jxiang], could you provide a RB link for this? Thanks.

> Increase level of parallelism in reduce phase [Spark Branch]
> 
>
> Key: HIVE-8649
> URL: https://issues.apache.org/jira/browse/HIVE-8649
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8649.1-spark.patch
>
>
> We calculate the number of reducers based on the same code for MapReduce. 
> However, reducers are vastly cheaper in Spark and it's generally recommended 
> we have many more reducers than in MR.
> Sandy Ryza who works on Spark has some ideas about a heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8757) YARN dep in scheduler shim should be optional

2014-11-05 Thread Prasad Mujumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199738#comment-14199738
 ] 

Prasad Mujumdar commented on HIVE-8757:
---

+1 pending tests

[~brocknoland] Thanks for catching the problem and the fix!

> YARN dep in scheduler shim should be optional
> -
>
> Key: HIVE-8757
> URL: https://issues.apache.org/jira/browse/HIVE-8757
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-8757.patch
>
>
> The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should 
> be optional so that yarn doesn't pollute dependent classpaths. Users who want 
> to use this feature must provide the yarn classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8711) DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer

2014-11-05 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199734#comment-14199734
 ] 

Eugene Koifman commented on HIVE-8711:
--

+1

> DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer
> --
>
> Key: HIVE-8711
> URL: https://issues.apache.org/jira/browse/HIVE-8711
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8711.2.patch, HIVE-8711.patch
>
>
> TxnHandler.detectDeadlock has code to catch deadlocks in MySQL and Derby.  
> But it does not detect a deadlock for Postgres, Oracle, or SQLServer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8612) Support metadata result filter hooks

2014-11-05 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199730#comment-14199730
 ] 

Brock Noland commented on HIVE-8612:


thanks for the updated patch

+1 pending tests

> Support metadata result filter hooks
> 
>
> Key: HIVE-8612
> URL: https://issues.apache.org/jira/browse/HIVE-8612
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Metastore
>Affects Versions: 0.13.1
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8612.1.patch, HIVE-8612.2.patch, HIVE-8612.3.patch
>
>
> Support metadata filter hook for metastore client. This will be useful for 
> authorization plugins on hiveserver2 to filter metadata results, especially 
> in case of non-impersonation mode where the metastore doesn't know the end 
> user's identity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8757) YARN dep in scheduler shim should be optional

2014-11-05 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8757:
---
Summary: YARN dep in scheduler shim should be optional  (was: Yarn dep in 
scheduler shim should be optional)

> YARN dep in scheduler shim should be optional
> -
>
> Key: HIVE-8757
> URL: https://issues.apache.org/jira/browse/HIVE-8757
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-8757.patch
>
>
> The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should 
> be optional so that yarn doesn't pollute dependent classpaths. Users who want 
> to use this feature must provide the yarn classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8757) Yarn dep in scheduler shim should be optional

2014-11-05 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8757:
---
Attachment: HIVE-8757.patch

FYI [~prasadm]

> Yarn dep in scheduler shim should be optional
> -
>
> Key: HIVE-8757
> URL: https://issues.apache.org/jira/browse/HIVE-8757
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-8757.patch
>
>
> The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should 
> be optional so that yarn doesn't pollute dependent classpaths. Users who want 
> to use this feature must provide the yarn classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8757) Yarn dep in scheduler shim should be optional

2014-11-05 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8757:
---
Affects Version/s: 0.15.0
   Status: Patch Available  (was: Open)

> Yarn dep in scheduler shim should be optional
> -
>
> Key: HIVE-8757
> URL: https://issues.apache.org/jira/browse/HIVE-8757
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-8757.patch
>
>
> The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should 
> be optional so that yarn doesn't pollute dependent classpaths. Users who want 
> to use this feature must provide the yarn classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8757) Yarn dep in scheduler shim should be optional

2014-11-05 Thread Brock Noland (JIRA)

Brock Noland created HIVE-8757:
--

 Summary: Yarn dep in scheduler shim should be optional
 Key: HIVE-8757
 URL: https://issues.apache.org/jira/browse/HIVE-8757
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland


The {{hadoop-yarn-server-resourcemanager}} dep in the scheduler shim should be 
optional so that yarn doesn't pollute dependent classpaths. Users who want to 
use this feature must provide the yarn classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199718#comment-14199718
 ] 

Hive QA commented on HIVE-8735:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679672/HIVE-8735.01.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1652/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1652/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1652/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679672 - PreCommit-HIVE-TRUNK-Build

> statistics update can fail due to long paths
> 
>
> Key: HIVE-8735
> URL: https://issues.apache.org/jira/browse/HIVE-8735
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch
>
>
> {noformat}
> 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher 
> (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing 
> statistics. 
> java.sql.SQLDataException: A truncation error was encountered trying to 
> shrink VARCHAR 
> 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> Caused by: java.sql.SQLException: A truncation error was encountered trying 
> to shrink VARCHAR 
> 'pf

[jira] [Comment Edited] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199713#comment-14199713
 ] 

Xuefu Zhang edited comment on HIVE-8745 at 11/6/14 3:35 AM:


I don't think reverting HIVE-7373 is a viable option. That's the trigger but 
not the root cause of the problem. The root cause is that BinarySortableSerde 
should be able to serailize decimal values in such way that comparison can be 
performed at byte level w/o requiring values be trailed for zeroes. I don't 
know how to do this, but I think that's the direction into which we should 
look. If the serde needs to remove trailing zeros during serialization, this is 
fine as long as it can get them back upon deserialization.


was (Author: xuefuz):
I don't thinking reverting HIVE-7373 is a viable option. That's the trigger but 
not the root cause of the problem. The root cause is that BinarySortableSerde 
should be able to serailize decimal values in such way that comparison can be 
performed at byte level w/o requiring values be trailed for zeroes. I don't 
know how to do this, but I think that's the direction into which we should 
look. If the serde needs to remove trailing zeros during serialization, this is 
fine as long as it can get them back upon deserialization.

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199713#comment-14199713
 ] 

Xuefu Zhang commented on HIVE-8745:
---

I don't thinking reverting HIVE-7373 is a viable option. That's the trigger but 
not the root cause of the problem. The root cause is that BinarySortableSerde 
should be able to serailize decimal values in such way that comparison can be 
performed at byte level w/o requiring values be trailed for zeroes. I don't 
know how to do this, but I think that's the direction into which we should 
look. If the serde needs to remove trailing zeros during serialization, this is 
fine as long as it can get them back upon deserialization.

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27640: HIVE-8700 Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

2014-11-05 Thread Suhas Satish



> On Nov. 5, 2014, 9:23 p.m., Szehon Ho wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
> >  line 254
> > 
> >
> > Are you sure we dont need to initialize the HTSOperator's values like 
> > it does in LocalMapJoinProcFactory?
> 
> Suhas Satish wrote:
> I will take a closer look.

I dug into the history of this changeset a bit. 
It was introduced in this commit 
https://github.com/apache/hive/commit/9b4ba6a9bb2a1184857fc8cca11e3dc6c48c1380

>From one of the comments on HIVE-4867, 
there is a problem in mapjoin on tez. MR compiler replaces RS with HashSink 
made from value exprs of Join but Tez compiler uses RS as is,  assuming it has 
same columns with value exprs of Join, which is not true

HIVE-4867 dedups columns in RS for reducer join and RS for order-by. But small 
aliases of mapjoin of MR tasks still contains key columns in value exprs.
 
Not having this can at worst, be a performance issue on memory (slightly larger 
footprint) but not impact functionality.


- Suhas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27640/#review60031
---


On Nov. 5, 2014, 8:29 p.m., Suhas Satish wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27640/
> ---
> 
> (Updated Nov. 5, 2014, 8:29 p.m.)
> 
> 
> Review request for hive, Chao Sun, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This replaces ReduceSinks with HashTableSinks in smaller tables for a 
> map-join. But the condition check field to detect map-join is actually being 
> set in CommonJoinResolver, which doesnt exist yet. We need to decide where is 
> the right place to populate this field. 
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 795a5d7 
> 
> Diff: https://reviews.apache.org/r/27640/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Suhas Satish
> 
>

[jira] [Commented] (HIVE-7729) Enable q-tests for ANALYZE TABLE feature [Spark Branch]

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199707#comment-14199707
 ] 

Hive QA commented on HIVE-7729:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679758/HIVE-7729.3-spark.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 7120 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampUtils.testTimezone
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/314/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/314/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-314/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679758 - PreCommit-HIVE-SPARK-Build

> Enable q-tests for ANALYZE TABLE feature [Spark Branch]
> ---
>
> Key: HIVE-7729
> URL: https://issues.apache.org/jira/browse/HIVE-7729
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Na Yang
>  Labels: Spark-M3
> Attachments: HIVE-7729.1-spark.patch, HIVE-7729.2-spark.patch, 
> HIVE-7729.3-spark.patch
>
>
> Enable q-tests for ANALYZE TABLE feature since automatic test environment is 
> ready.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8735) statistics update can fail due to long paths

2014-11-05 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199705#comment-14199705
 ] 

Prasanth J commented on HIVE-8735:
--

+1. Will be good if you can add some tests.

> statistics update can fail due to long paths
> 
>
> Key: HIVE-8735
> URL: https://issues.apache.org/jira/browse/HIVE-8735
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch
>
>
> {noformat}
> 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher 
> (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing 
> statistics. 
> java.sql.SQLDataException: A truncation error was encountered trying to 
> shrink VARCHAR 
> 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> Caused by: java.sql.SQLException: A truncation error was encountered trying 
> to shrink VARCHAR 
> 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>   ... 31 more
> Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
> VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub&' to 
> length 255.
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
>   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
>   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
>   at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown 
> Source)
>   at 
> org.apache.derby.impl

Re: Review Request 27599: HIVE-8735 : statistics update can fail due to long paths

2014-11-05 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27599/#review60099
---

Ship it!


Ship It!

- Prasanth_J


On Nov. 6, 2014, 3:07 a.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27599/
> ---
> 
> (Updated Nov. 6, 2014, 3:07 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
> b074ca9 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
> 5e317ab 
>   
> ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java 
> 70badf2 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 
> 4625d27 
> 
> Diff: https://reviews.apache.org/r/27599/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>

Re: Review Request 27599: HIVE-8735 : statistics update can fail due to long paths

2014-11-05 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27599/
---

(Updated Nov. 6, 2014, 3:07 a.m.)


Review request for hive.


Repository: hive-git


Description
---

see jira


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
b074ca9 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
5e317ab 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java 
70badf2 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 

Diff: https://reviews.apache.org/r/27599/diff/


Testing
---


Thanks,

Sergey Shelukhin

[jira] [Updated] (HIVE-8735) statistics update can fail due to long paths

2014-11-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8735:
---
Attachment: HIVE-8735.02.patch

> statistics update can fail due to long paths
> 
>
> Key: HIVE-8735
> URL: https://issues.apache.org/jira/browse/HIVE-8735
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8735.01.patch, HIVE-8735.02.patch, HIVE-8735.patch
>
>
> {noformat}
> 2014-11-04 01:34:38,610 ERROR jdbc.JDBCStatsPublisher 
> (JDBCStatsPublisher.java:publishStat(198)) - Error during publishing 
> statistics. 
> java.sql.SQLDataException: A truncation error was encountered trying to 
> shrink VARCHAR 
> 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:147)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:144)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2910)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1153)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:992)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:205)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> Caused by: java.sql.SQLException: A truncation error was encountered trying 
> to shrink VARCHAR 
> 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>   ... 31 more
> Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
> VARCHAR 'pfile:/grid/0/jenkins/workspace/UT-hive-champlain-common/sub&' to 
> length 255.
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at org.apache.derby.iapi.types.SQLChar.hasNonBlankChars(Unknown Source)
>   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
>   at org.apache.derby.iapi.types.SQLVarchar.normalize(Unknown Source)
>   at org.apache.derby.iapi.types.DataTypeDescriptor.normalize(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeColumn(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.execute.NormalizeResultSet.normalizeRow(Unknown 
> Source)
>   at 
> org.apache.derby.impl.sql.execute.NormalizeResultSet.getNextRowCore(Unknown

[jira] [Assigned] (HIVE-8512) queries with star and aggregate should fail

2014-11-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-8512:
--

Assignee: Sergey Shelukhin

> queries with star and aggregate should fail
> ---
>
> Key: HIVE-8512
> URL: https://issues.apache.org/jira/browse/HIVE-8512
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> ctas_colname test uses these, for example
> This errors out: {noformat} select key, sum(key) from src;{noformat}
> But this passes:  {noformat} select *, sum(key), count(value) from 
> src;{noformat}
> It looks like it returns 2 sums and 2 counts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8636) CBO: split cbo_correctness test

2014-11-05 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199690#comment-14199690
 ] 

Sergey Shelukhin commented on HIVE-8636:


I cannot update the RB because the patch is too big. I did address the comment 
about stats writer. There were no other significant changes... patches can be 
compared by applying to a tree and doing "git difftool" with opendiff, 
diffmerge, meld or something (opendiff is available by default on Mac)

> CBO: split cbo_correctness test
> ---
>
> Key: HIVE-8636
> URL: https://issues.apache.org/jira/browse/HIVE-8636
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, 
> HIVE-8636.02.patch, HIVE-8636.patch
>
>
> CBO correctness test is extremely annoying - it runs forever, if anything 
> fails it's hard to debug due to the volume of logs from all the stuff, also 
> it doesn't run further so if multiple things fail they can only be discovered 
> one by one; also SORT_QUERY_RESULTS cannot be used, because some queries 
> presumably use sorting.
> It should be split into separate tests, the numbers in there now may be good 
> as boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8726) Collect Spark TaskMetrics and build job statistic[Spark Branch]

2014-11-05 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li reassigned HIVE-8726:
---

Assignee: Chengxiang Li

> Collect Spark TaskMetrics and build job statistic[Spark Branch]
> ---
>
> Key: HIVE-8726
> URL: https://issues.apache.org/jira/browse/HIVE-8726
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
>
> Implement SparkListener to collect TaskMetrics, and build SparkStatistic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8636) CBO: split cbo_correctness test

2014-11-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8636:
---
Attachment: HIVE-8636.02.patch

remove the creation of tables from the tests that create part and lineitem.

> CBO: split cbo_correctness test
> ---
>
> Key: HIVE-8636
> URL: https://issues.apache.org/jira/browse/HIVE-8636
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8636.01.patch, HIVE-8636.01.patch, 
> HIVE-8636.02.patch, HIVE-8636.patch
>
>
> CBO correctness test is extremely annoying - it runs forever, if anything 
> fails it's hard to debug due to the volume of logs from all the stuff, also 
> it doesn't run further so if multiple things fail they can only be discovered 
> one by one; also SORT_QUERY_RESULTS cannot be used, because some queries 
> presumably use sorting.
> It should be split into separate tests, the numbers in there now may be good 
> as boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6977) Delete Hiveserver1

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199676#comment-14199676
 ] 

Hive QA commented on HIVE-6977:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679717/HIVE-6977.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6633 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1651/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1651/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1651/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679717 - PreCommit-HIVE-TRUNK-Build

> Delete Hiveserver1
> --
>
> Key: HIVE-6977
> URL: https://issues.apache.org/jira/browse/HIVE-6977
> Project: Hive
>  Issue Type: Task
>  Components: JDBC, Server Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-6977.1.patch, HIVE-6977.patch
>
>
> See mailing list discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8751) Null Pointer Exception when counter is used for stats during inserting overwrite partitioned tables [Spark Branch]

2014-11-05 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang reassigned HIVE-8751:
-

Assignee: Na Yang

> Null Pointer Exception when counter is used for stats during inserting 
> overwrite partitioned tables [Spark Branch]
> --
>
> Key: HIVE-8751
> URL: https://issues.apache.org/jira/browse/HIVE-8751
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Na Yang
>Assignee: Na Yang
>
> The following quries cause NullPointerException
> {noformat}
> set hive.stats.dbclass=counter;
> set hive.stats.autogather=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> create table dummy (key string, value string) partitioned by (ds string, hr 
> string);
> insert overwrite table dummy partition (ds='10',hr='11') select * from src;
> {noformat}
> Here is the stacktrace
> {noformat}
> 2014-11-05 15:30:42,621 ERROR [main] exec.Task (SparkTask.java:execute(116)) 
> - Failed to execute spark task.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.getRequiredCounterPrefix(SparkTask.java:235)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:103)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1644)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1404)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1216)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1033)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> The cause of this NullPointerException is that the SparkTask tries to get 
> table partition info to set RequiredCounterPrefix before data are actually 
> inserted to the partitioned table. Therefore, a null partition info is 
> returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-05 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang reassigned HIVE-8756:
-

Assignee: Na Yang

> numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
> ---
>
> Key: HIVE-8756
> URL: https://issues.apache.org/jira/browse/HIVE-8756
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
>
> Run the following hive queries
> {noformat}
> set datanucleus.cache.collections=false;
> set hive.stats.autogather=true;
> set hive.merge.mapfiles=false;
> set hive.merge.mapredfiles=false;
> set hive.map.aggr=true;
> create table tmptable(key string, value string);
> INSERT OVERWRITE TABLE tmptable
> SELECT unionsrc.key, unionsrc.value 
> FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
>   UNION  ALL  
>   SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
> DESCRIBE FORMATTED tmptable;
> {noformat}
> The hive on spark prints the following table parameters:
> {noformat}
> COLUMN_STATS_ACCURATE true
>   numFiles2   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   225
> {noformat}
> The hive on mr prints the following table parameters:
> {noformat}
> able Parameters:   
>   COLUMN_STATS_ACCURATE   true
>   numFiles2   
>   numRows 26  
>   rawDataSize 199 
>   totalSize   225 
> {noformat}
> As above we can see the numRows and rawDataSize are not collected by hive on 
> spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8612) Support metadata result filter hooks

2014-11-05 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8612:
--
Attachment: HIVE-8612.3.patch

> Support metadata result filter hooks
> 
>
> Key: HIVE-8612
> URL: https://issues.apache.org/jira/browse/HIVE-8612
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Metastore
>Affects Versions: 0.13.1
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8612.1.patch, HIVE-8612.2.patch, HIVE-8612.3.patch
>
>
> Support metadata filter hook for metastore client. This will be useful for 
> authorization plugins on hiveserver2 to filter metadata results, especially 
> in case of non-impersonation mode where the metastore doesn't know the end 
> user's identity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-05 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-8756:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

> numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
> ---
>
> Key: HIVE-8756
> URL: https://issues.apache.org/jira/browse/HIVE-8756
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>
> Run the following hive queries
> {noformat}
> set datanucleus.cache.collections=false;
> set hive.stats.autogather=true;
> set hive.merge.mapfiles=false;
> set hive.merge.mapredfiles=false;
> set hive.map.aggr=true;
> create table tmptable(key string, value string);
> INSERT OVERWRITE TABLE tmptable
> SELECT unionsrc.key, unionsrc.value 
> FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
>   UNION  ALL  
>   SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
> DESCRIBE FORMATTED tmptable;
> {noformat}
> The hive on spark prints the following table parameters:
> {noformat}
> COLUMN_STATS_ACCURATE true
>   numFiles2   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   225
> {noformat}
> The hive on mr prints the following table parameters:
> {noformat}
> able Parameters:   
>   COLUMN_STATS_ACCURATE   true
>   numFiles2   
>   numRows 26  
>   rawDataSize 199 
>   totalSize   225 
> {noformat}
> As above we can see the numRows and rawDataSize are not collected by hive on 
> spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-05 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-8756:
--
Component/s: Spark

> numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
> ---
>
> Key: HIVE-8756
> URL: https://issues.apache.org/jira/browse/HIVE-8756
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Na Yang
>
> Run the following hive queries
> {noformat}
> set datanucleus.cache.collections=false;
> set hive.stats.autogather=true;
> set hive.merge.mapfiles=false;
> set hive.merge.mapredfiles=false;
> set hive.map.aggr=true;
> create table tmptable(key string, value string);
> INSERT OVERWRITE TABLE tmptable
> SELECT unionsrc.key, unionsrc.value 
> FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
>   UNION  ALL  
>   SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
> DESCRIBE FORMATTED tmptable;
> {noformat}
> The hive on spark prints the following table parameters:
> {noformat}
> COLUMN_STATS_ACCURATE true
>   numFiles2   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   225
> {noformat}
> The hive on mr prints the following table parameters:
> {noformat}
> able Parameters:   
>   COLUMN_STATS_ACCURATE   true
>   numFiles2   
>   numRows 26  
>   rawDataSize 199 
>   totalSize   225 
> {noformat}
> As above we can see the numRows and rawDataSize are not collected by hive on 
> spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-05 Thread Na Yang (JIRA)

Na Yang created HIVE-8756:
-

 Summary: numRows and rawDataSize are not collected by the Spark 
stats [Spark Branch]
 Key: HIVE-8756
 URL: https://issues.apache.org/jira/browse/HIVE-8756
 Project: Hive
  Issue Type: Bug
Reporter: Na Yang


Run the following hive queries
{noformat}
set datanucleus.cache.collections=false;
set hive.stats.autogather=true;
set hive.merge.mapfiles=false;
set hive.merge.mapredfiles=false;
set hive.map.aggr=true;

create table tmptable(key string, value string);
INSERT OVERWRITE TABLE tmptable
SELECT unionsrc.key, unionsrc.value 
FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
  UNION  ALL  
  SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
DESCRIBE FORMATTED tmptable;
{noformat}

The hive on spark prints the following table parameters:
{noformat}
COLUMN_STATS_ACCURATE   true
numFiles2   
numRows 0   
rawDataSize 0   
totalSize   225
{noformat}

The hive on mr prints the following table parameters:
{noformat}
able Parameters: 
COLUMN_STATS_ACCURATE   true
numFiles2   
numRows 26  
rawDataSize 199 
totalSize   225 
{noformat}

As above we can see the numRows and rawDataSize are not collected by hive on 
spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]

2014-11-05 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199651#comment-14199651
 ] 

Jimmy Xiang commented on HIVE-8649:
---

This patch assumes information doesn't change. If we do support changes like 
increasing # of executors, we need a listener for such events.

> Increase level of parallelism in reduce phase [Spark Branch]
> 
>
> Key: HIVE-8649
> URL: https://issues.apache.org/jira/browse/HIVE-8649
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8649.1-spark.patch
>
>
> We calculate the number of reducers based on the same code for MapReduce. 
> However, reducers are vastly cheaper in Spark and it's generally recommended 
> we have many more reducers than in MR.
> Sandy Ryza who works on Spark has some ideas about a heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]

2014-11-05 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8649:
--
Fix Version/s: spark-branch
   Status: Patch Available  (was: Open)

> Increase level of parallelism in reduce phase [Spark Branch]
> 
>
> Key: HIVE-8649
> URL: https://issues.apache.org/jira/browse/HIVE-8649
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8649.1-spark.patch
>
>
> We calculate the number of reducers based on the same code for MapReduce. 
> However, reducers are vastly cheaper in Spark and it's generally recommended 
> we have many more reducers than in MR.
> Sandy Ryza who works on Spark has some ideas about a heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8649) Increase level of parallelism in reduce phase [Spark Branch]

2014-11-05 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8649:
--
Attachment: HIVE-8649.1-spark.patch

> Increase level of parallelism in reduce phase [Spark Branch]
> 
>
> Key: HIVE-8649
> URL: https://issues.apache.org/jira/browse/HIVE-8649
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Jimmy Xiang
> Attachments: HIVE-8649.1-spark.patch
>
>
> We calculate the number of reducers based on the same code for MapReduce. 
> However, reducers are vastly cheaper in Spark and it's generally recommended 
> we have many more reducers than in MR.
> Sandy Ryza who works on Spark has some ideas about a heuristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-05 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199637#comment-14199637
 ] 

Jason Dere commented on HIVE-8745:
--

Marking for 0.14 - since this behavior change happened in 0.14, would like for 
this to be addressed before it gets released.

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-05 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8745:
-
Fix Version/s: 0.14.0

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-05 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199629#comment-14199629
 ] 

Jason Dere commented on HIVE-8745:
--

Looking into this, it's not just HiveDecimal.equals() - BinarySortableSerde is 
serializing decimals in such a way that 1.1 is not the same as 1.10. This is 
why we're seeing the difference in the join behavior.

It looks like this difference in behavior is due to HIVE-7373. Before 
HiveDecimal was automatically trimming the trailing zeros and so 1.1 and 1.10 
would both be represented as 1.1. Now that they have different internal 
representations, there seem to be some unexpected differences in behavior like 
we're seeing with BinarySortableSerde. We may want to consider backing out the 
changes from HIVE-7373.

If we were to try to fix this issue without reverting HIVE-7373, we would still 
have to trim trailing zeros within BinarySortableSerde so that 1.1 == 1.10. If 
we do this this will result in having the trimmed behavior that was deemed 
undesirable in HIVE-7373, but which would only exhibit itself when 
BinarySortableSerde is used (joins), which seems a bit odd.

Thoughts?

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Attachments: join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7729) Enable q-tests for ANALYZE TABLE feature [Spark Branch]

2014-11-05 Thread Na Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199623#comment-14199623
 ] 

Na Yang commented on HIVE-7729:
---

I re-ran the stats19.q test on my local machine a few times, but still got the 
same output file. For now, I removed that test case from the patch and will 
handle that test case separately. 

> Enable q-tests for ANALYZE TABLE feature [Spark Branch]
> ---
>
> Key: HIVE-7729
> URL: https://issues.apache.org/jira/browse/HIVE-7729
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Na Yang
>  Labels: Spark-M3
> Attachments: HIVE-7729.1-spark.patch, HIVE-7729.2-spark.patch, 
> HIVE-7729.3-spark.patch
>
>
> Enable q-tests for ANALYZE TABLE feature since automatic test environment is 
> ready.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8711) DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer

2014-11-05 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8711:
-
Attachment: HIVE-8711.2.patch

A new version of the patch with changes suggested by Eugene and with a change 
in how I detect deadlocks in Oracle since I discovered a few things in testing.

> DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer
> --
>
> Key: HIVE-8711
> URL: https://issues.apache.org/jira/browse/HIVE-8711
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8711.2.patch, HIVE-8711.patch
>
>
> TxnHandler.detectDeadlock has code to catch deadlocks in MySQL and Derby.  
> But it does not detect a deadlock for Postgres, Oracle, or SQLServer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8711) DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer

2014-11-05 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8711:
-
Status: Patch Available  (was: Open)

> DB deadlocks not handled in TxnHandler for Postgres, Oracle, and SQLServer
> --
>
> Key: HIVE-8711
> URL: https://issues.apache.org/jira/browse/HIVE-8711
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8711.2.patch, HIVE-8711.patch
>
>
> TxnHandler.detectDeadlock has code to catch deadlocks in MySQL and Derby.  
> But it does not detect a deadlock for Postgres, Oracle, or SQLServer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7729) Enable q-tests for ANALYZE TABLE feature [Spark Branch]

2014-11-05 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7729:
--
Attachment: HIVE-7729.3-spark.patch

> Enable q-tests for ANALYZE TABLE feature [Spark Branch]
> ---
>
> Key: HIVE-7729
> URL: https://issues.apache.org/jira/browse/HIVE-7729
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Na Yang
>  Labels: Spark-M3
> Attachments: HIVE-7729.1-spark.patch, HIVE-7729.2-spark.patch, 
> HIVE-7729.3-spark.patch
>
>
> Enable q-tests for ANALYZE TABLE feature since automatic test environment is 
> ready.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8065) Support HDFS encryption functionality on Hive

2014-11-05 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199585#comment-14199585
 ] 

Ferdinand Xu commented on HIVE-8065:


Hi [~spena], I am interested in this jira. Do you need some helps to move it 
forwards? If so, please feel free to divide this one into several sub tasks 
which I can work on. Thanks! 

> Support HDFS encryption functionality on Hive
> -
>
> Key: HIVE-8065
> URL: https://issues.apache.org/jira/browse/HIVE-8065
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> The new encryption support on HDFS makes Hive incompatible and unusable when 
> this feature is used.
> HDFS encryption is designed so that an user can configure different 
> encryption zones (or directories) for multi-tenant environments. An 
> encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
> Because of security compliance, the HDFS does not allow to move/rename files 
> between encryption zones. Renames are allowed only inside the same encryption 
> zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This 
> scratch directory is used for the output of intermediate data (between MR 
> jobs) and for the final output of the hive query which is later moved to the 
> table directory location.
> If Hive tables are in different encryption zones than the scratch directory, 
> then Hive won't be able to renames those files/directories, and it will make 
> Hive unusable.
> To handle this problem, we can change the scratch directory of the 
> query/statement to be inside the same encryption zone of the table directory 
> location. This way, the renaming process will be successful. 
> Also, for statements that move files between encryption zones (i.e. LOAD 
> DATA), a copy may be executed instead of a rename. This will cause an 
> overhead when copying large data files, but it won't break the encryption on 
> Hive.
> Another security thing to consider is when using joins selects. If Hive joins 
> different tables with different encryption key strengths, then the results of 
> the select might break the security compliance of the tables. Let's say two 
> tables with 128 bits and 256 bits encryption are joined, then the temporary 
> results might be stored in the 128 bits encryption zone. This will conflict 
> with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more 
> secured/encrypted in order to save the intermediate data temporary with no 
> compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the 
> table-aes256 table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8698) default log4j.properties not included in jar files anymore

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199579#comment-14199579
 ] 

Hive QA commented on HIVE-8698:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12678775/HIVE-8698.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1650/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1650/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1650/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12678775 - PreCommit-HIVE-TRUNK-Build

> default log4j.properties not included in jar files anymore
> --
>
> Key: HIVE-8698
> URL: https://issues.apache.org/jira/browse/HIVE-8698
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8698.1.patch
>
>
> trunk and hive 0.14 based builds no longer have  hive-log4j.properties in the 
> jars. This means that in default tar, unless  hive-log4j.properties is 
> created in conf dir (from  hive-log4j.properties.template file), hive cli is 
> much more verbose in what is printed to console. Hiveserver2 fails to come 
> up, as it errors out with - 
> org.apache.hadoop.hive.common.LogUtils$LogInitializationException: Unable to 
> initialize logging using hive-log4j.properties, not found on CLASSPATH!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27640: HIVE-8700 Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

2014-11-05 Thread Suhas Satish



> On Nov. 5, 2014, 10:41 p.m., Szehon Ho wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
> >  line 201
> > 
> >
> > Hi Suhas, I was taking a look through the code, I dont think its easy 
> > now to identify which is the big-table parent vs small-table parent.  There 
> > is a HashTableDummyOperator representing the small-table but it only has 
> > some basic information.
> > 
> > Maybe you know more about it, but was wondering do we need to save the 
> > info to a context when we cut the small-table RS from MapJoin in 
> > ReduceSinkMapJoinProc?  Thanks.
> 
> Suhas Satish wrote:
> Hi Szehon,
> GenSparkProcContext has this - 
>   // we need to keep the original list of operators in the map join to 
> know
>   // what position in the mapjoin the different parent work items will 
> have.
>   public final Map>> mapJoinParentMap;
>   
> There is also another data structure in GenSparkProcContext to keep track 
> of which MapJoinWork is connected to which ReduceSinks. 
>   // a map to keep track of what reduce sinks have to be hooked up to
>   // map join work
>   public final Map> 
> linkWorkWithReduceSinkMap;
>   
> Maybe we need to introduce a similar one for HashTableSinkOperator  like 
>  public final Map> 
> linkWorkWithHashTableSinkMap;
>  
> In any case,  we should pass this GenSparkProcContext along to the 
> physicalContext in the pyhsical resolvers. Let me know your thoughts.
> 
> Szehon Ho wrote:
> Hi Suhas, can we re-use that even?  It seems that only small-table RS are 
> connected to MJ at this point.  So big-table RS should never get into here.  
> If we can't re-use it we will have to create a new data structure.  The idea 
> is to identify which RS to replace with HashTableSink.  Hope that makes 
> sense, thanks.

Agreed. We only replace a reduceSink with HashTableSink if it exists in 
mapJoinParentMap for that mapJoin. Also, another thing I had forgotten in the 
code was to set the parents for hashTableSinkOps introduced. I have added this 
now hashTableSinkOp.setParentOperators(parentOps);

No new data structures are needed.


- Suhas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27640/#review60052
---


On Nov. 5, 2014, 8:29 p.m., Suhas Satish wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27640/
> ---
> 
> (Updated Nov. 5, 2014, 8:29 p.m.)
> 
> 
> Review request for hive, Chao Sun, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This replaces ReduceSinks with HashTableSinks in smaller tables for a 
> map-join. But the condition check field to detect map-join is actually being 
> set in CommonJoinResolver, which doesnt exist yet. We need to decide where is 
> the right place to populate this field. 
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 795a5d7 
> 
> Diff: https://reviews.apache.org/r/27640/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Suhas Satish
> 
>

[jira] [Updated] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8754:
-
Description: 
HIVE-8588 added support for this by copying jdbc jars to lib/ of 
localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
intentionally sets permissions on exploded tars such that they are not writable.

this needs to be fixed, otherwise the users would have to modify their sqoop 
tar to include the relevant jdbc jars which is burdensome is different DBs are 
used and may create headache around licensing issues

NO PRECOMMIT TESTS

  was:
HIVE-8588 added support for this by copying jdbc jars to lib/ of 
localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
intentionally sets permissions on exploded tars such that they are not writable.

this needs to be fixed, otherwise the users would have to modify their sqoop 
tar to include the relevant jdbc jars which is burdensome is different DBs are 
used and may create headache around licensing issues


> Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
> in secure cluster
> ---
>
> Key: HIVE-8754
> URL: https://issues.apache.org/jira/browse/HIVE-8754
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8754.patch
>
>
> HIVE-8588 added support for this by copying jdbc jars to lib/ of 
> localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
> intentionally sets permissions on exploded tars such that they are not 
> writable.
> this needs to be fixed, otherwise the users would have to modify their sqoop 
> tar to include the relevant jdbc jars which is burdensome is different DBs 
> are used and may create headache around licensing issues
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8754:
-
Attachment: HIVE-8754.patch

> Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
> in secure cluster
> ---
>
> Key: HIVE-8754
> URL: https://issues.apache.org/jira/browse/HIVE-8754
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8754.patch
>
>
> HIVE-8588 added support for this by copying jdbc jars to lib/ of 
> localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
> intentionally sets permissions on exploded tars such that they are not 
> writable.
> this needs to be fixed, otherwise the users would have to modify their sqoop 
> tar to include the relevant jdbc jars which is burdensome is different DBs 
> are used and may create headache around licensing issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8754:
-
Status: Patch Available  (was: Open)

> Sqoop job submission via WebHCat doesn't properly localize required jdbc jars 
> in secure cluster
> ---
>
> Key: HIVE-8754
> URL: https://issues.apache.org/jira/browse/HIVE-8754
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8754.patch
>
>
> HIVE-8588 added support for this by copying jdbc jars to lib/ of 
> localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
> intentionally sets permissions on exploded tars such that they are not 
> writable.
> this needs to be fixed, otherwise the users would have to modify their sqoop 
> tar to include the relevant jdbc jars which is burdensome is different DBs 
> are used and may create headache around licensing issues
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8755) Create unit tests for encryption integration

2014-11-05 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8755:
---
Summary: Create unit tests for encryption integration  (was: Create some 
unit tests which use encryption)

> Create unit tests for encryption integration
> 
>
> Key: HIVE-8755
> URL: https://issues.apache.org/jira/browse/HIVE-8755
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>
> We'll want to create some unit tests which use encryption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8755) Create some unit tests which use encryption

2014-11-05 Thread Brock Noland (JIRA)

Brock Noland created HIVE-8755:
--

 Summary: Create some unit tests which use encryption
 Key: HIVE-8755
 URL: https://issues.apache.org/jira/browse/HIVE-8755
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


We'll want to create some unit tests which use encryption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27640: HIVE-8700 Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

2014-11-05 Thread Szehon Ho



> On Nov. 5, 2014, 10:41 p.m., Szehon Ho wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
> >  line 201
> > 
> >
> > Hi Suhas, I was taking a look through the code, I dont think its easy 
> > now to identify which is the big-table parent vs small-table parent.  There 
> > is a HashTableDummyOperator representing the small-table but it only has 
> > some basic information.
> > 
> > Maybe you know more about it, but was wondering do we need to save the 
> > info to a context when we cut the small-table RS from MapJoin in 
> > ReduceSinkMapJoinProc?  Thanks.
> 
> Suhas Satish wrote:
> Hi Szehon,
> GenSparkProcContext has this - 
>   // we need to keep the original list of operators in the map join to 
> know
>   // what position in the mapjoin the different parent work items will 
> have.
>   public final Map>> mapJoinParentMap;
>   
> There is also another data structure in GenSparkProcContext to keep track 
> of which MapJoinWork is connected to which ReduceSinks. 
>   // a map to keep track of what reduce sinks have to be hooked up to
>   // map join work
>   public final Map> 
> linkWorkWithReduceSinkMap;
>   
> Maybe we need to introduce a similar one for HashTableSinkOperator  like 
>  public final Map> 
> linkWorkWithHashTableSinkMap;
>  
> In any case,  we should pass this GenSparkProcContext along to the 
> physicalContext in the pyhsical resolvers. Let me know your thoughts.

Hi Suhas, can we re-use that even?  It seems that only small-table RS are 
connected to MJ at this point.  So big-table RS should never get into here.  If 
we can't re-use it we will have to create a new data structure.  The idea is to 
identify which RS to replace with HashTableSink.  Hope that makes sense, thanks.


- Szehon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27640/#review60052
---


On Nov. 5, 2014, 8:29 p.m., Suhas Satish wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27640/
> ---
> 
> (Updated Nov. 5, 2014, 8:29 p.m.)
> 
> 
> Review request for hive, Chao Sun, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This replaces ReduceSinks with HashTableSinks in smaller tables for a 
> map-join. But the condition check field to detect map-join is actually being 
> set in CommonJoinResolver, which doesnt exist yet. We need to decide where is 
> the right place to populate this field. 
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 795a5d7 
> 
> Diff: https://reviews.apache.org/r/27640/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Suhas Satish
> 
>

[jira] [Created] (HIVE-8754) Sqoop job submission via WebHCat doesn't properly localize required jdbc jars in secure cluster

2014-11-05 Thread Eugene Koifman (JIRA)

Eugene Koifman created HIVE-8754:


 Summary: Sqoop job submission via WebHCat doesn't properly 
localize required jdbc jars in secure cluster
 Key: HIVE-8754
 URL: https://issues.apache.org/jira/browse/HIVE-8754
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Fix For: 0.14.0, 0.15.0


HIVE-8588 added support for this by copying jdbc jars to lib/ of 
localized/exploded Sqoop tar.  Unfortunately, in a secure cluster, Dist Cache 
intentionally sets permissions on exploded tars such that they are not writable.

this needs to be fixed, otherwise the users would have to modify their sqoop 
tar to include the relevant jdbc jars which is burdensome is different DBs are 
used and may create headache around licensing issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6977) Delete Hiveserver1

2014-11-05 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199532#comment-14199532
 ] 

Vaibhav Gumashta commented on HIVE-6977:


+1 

> Delete Hiveserver1
> --
>
> Key: HIVE-6977
> URL: https://issues.apache.org/jira/browse/HIVE-6977
> Project: Hive
>  Issue Type: Task
>  Components: JDBC, Server Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-6977.1.patch, HIVE-6977.patch
>
>
> See mailing list discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27646: delete HS1

2014-11-05 Thread Vaibhav Gumashta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27646/#review60083
---

Ship it!


Ship It!

- Vaibhav Gumashta


On Nov. 6, 2014, 12:49 a.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27646/
> ---
> 
> (Updated Nov. 6, 2014, 12:49 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-6977
> https://issues.apache.org/jira/browse/HIVE-6977
> 
> 
> Repository: hive
> 
> 
> Description
> ---
> 
> delete HS1
> 
> 
> Diffs
> -
> 
>   trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1637002 
>   trunk/cli/src/java/org/apache/hadoop/hive/cli/CliSessionState.java 1637002 
>   trunk/cli/src/java/org/apache/hadoop/hive/cli/OptionsProcessor.java 1637002 
>   trunk/cli/src/test/org/apache/hadoop/hive/cli/TestCliDriverMethods.java 
> 1637002 
>   trunk/cli/src/test/org/apache/hadoop/hive/cli/TestCliSessionState.java 
> 1637002 
>   trunk/cli/src/test/org/apache/hadoop/hive/cli/TestOptionsProcessor.java 
> 1637002 
>   
> trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java
>  1637002 
>   
> trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/service/TestHiveServer.java
>  1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveBaseResultSet.java 
> 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveCallableStatement.java 
> 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDataSource.java 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDatabaseMetaData.java 
> 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDriver.java 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveMetaDataResultSet.java 
> 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java 
> 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 
> 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveResultSetMetaData.java 
> 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcColumn.java 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcTable.java 1637002 
>   trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/Utils.java 1637002 
>   trunk/service/src/java/org/apache/hadoop/hive/service/HiveClient.java 
> 1637002 
>   trunk/service/src/java/org/apache/hadoop/hive/service/HiveInterface.java 
> 1637002 
>   trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 
> 1637002 
>   
> trunk/service/src/test/org/apache/hadoop/hive/service/TestHiveServerSessions.java
>  1637002 
> 
> Diff: https://reviews.apache.org/r/27646/diff/
> 
> 
> Testing
> ---
> 
> removed test cases for HS1
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>

[jira] [Updated] (HIVE-8611) grant/revoke syntax should support additional objects for authorization plugins

2014-11-05 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8611:
--
Attachment: HIVE-8611.3.patch

> grant/revoke syntax should support additional objects for authorization 
> plugins
> ---
>
> Key: HIVE-8611
> URL: https://issues.apache.org/jira/browse/HIVE-8611
> Project: Hive
>  Issue Type: Bug
>  Components: Authentication, SQL
>Affects Versions: 0.13.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Fix For: 0.14.0
>
> Attachments: HIVE-8611.1.patch, HIVE-8611.2.patch, HIVE-8611.2.patch, 
> HIVE-8611.3.patch
>
>
> The authorization framework supports URI and global objects. The SQL syntax 
> however doesn't allow granting privileges on these objects. We should allow 
> the compiler to parse these so that it can be handled by authorization 
> plugins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk

2014-11-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8753:
---
Status: Patch Available  (was: Open)

[~navis] can you take a quick look?

> TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
> -
>
> Key: HIVE-8753
> URL: https://issues.apache.org/jira/browse/HIVE-8753
> Project: Hive
>  Issue Type: Test
>  Components: Logical Optimizer
>Affects Versions: 0.15.0
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-8753.patch
>
>
> Because of HIVE-7111 
> needs .q.out update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk

2014-11-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-8753:
--

Assignee: Ashutosh Chauhan

> TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
> -
>
> Key: HIVE-8753
> URL: https://issues.apache.org/jira/browse/HIVE-8753
> Project: Hive
>  Issue Type: Test
>  Components: Logical Optimizer
>Affects Versions: 0.15.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-8753.patch
>
>
> Because of HIVE-7111 
> needs .q.out update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8661) JDBC MinimizeJAR should be configurable in pom.xml

2014-11-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199514#comment-14199514
 ] 

Ashutosh Chauhan commented on HIVE-8661:


+1

> JDBC MinimizeJAR should be configurable in pom.xml
> --
>
> Key: HIVE-8661
> URL: https://issues.apache.org/jira/browse/HIVE-8661
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-8661.1.patch, HIVE-8661.2.patch
>
>
> A large amount of dev time is wasted waiting for JDBC to minimize JARs from 
> 33Mb -> 16Mb during developer cycles.
> This should only kick-in during -Pdist, allowing for disabling this during 
> dev cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk

2014-11-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8753:
---
Attachment: HIVE-8753.patch

> TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk
> -
>
> Key: HIVE-8753
> URL: https://issues.apache.org/jira/browse/HIVE-8753
> Project: Hive
>  Issue Type: Test
>  Components: Logical Optimizer
>Affects Versions: 0.15.0
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-8753.patch
>
>
> Because of HIVE-7111 
> needs .q.out update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-05 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199506#comment-14199506
 ] 

Matt McCline commented on HIVE-8745:


We should include vectorization in any new tests for this issue.

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Attachments: join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27646: delete HS1

2014-11-05 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27646/
---

(Updated Nov. 6, 2014, 12:49 a.m.)


Review request for hive.


Changes
---

removed HS1 service as well


Bugs: HIVE-6977
https://issues.apache.org/jira/browse/HIVE-6977


Repository: hive


Description
---

delete HS1


Diffs (updated)
-

  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1637002 
  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliSessionState.java 1637002 
  trunk/cli/src/java/org/apache/hadoop/hive/cli/OptionsProcessor.java 1637002 
  trunk/cli/src/test/org/apache/hadoop/hive/cli/TestCliDriverMethods.java 
1637002 
  trunk/cli/src/test/org/apache/hadoop/hive/cli/TestCliSessionState.java 
1637002 
  trunk/cli/src/test/org/apache/hadoop/hive/cli/TestOptionsProcessor.java 
1637002 
  
trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java
 1637002 
  
trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/service/TestHiveServer.java
 1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveBaseResultSet.java 
1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveCallableStatement.java 
1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDataSource.java 1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDatabaseMetaData.java 
1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveDriver.java 1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveMetaDataResultSet.java 
1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java 
1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 
1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveResultSetMetaData.java 
1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcColumn.java 1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcTable.java 1637002 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/Utils.java 1637002 
  trunk/service/src/java/org/apache/hadoop/hive/service/HiveClient.java 1637002 
  trunk/service/src/java/org/apache/hadoop/hive/service/HiveInterface.java 
1637002 
  trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1637002 
  
trunk/service/src/test/org/apache/hadoop/hive/service/TestHiveServerSessions.java
 1637002 

Diff: https://reviews.apache.org/r/27646/diff/


Testing
---

removed test cases for HS1


Thanks,

Ashutosh Chauhan

[jira] [Updated] (HIVE-6977) Delete Hiveserver1

2014-11-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6977:
---
Status: Patch Available  (was: Open)

> Delete Hiveserver1
> --
>
> Key: HIVE-6977
> URL: https://issues.apache.org/jira/browse/HIVE-6977
> Project: Hive
>  Issue Type: Task
>  Components: JDBC, Server Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-6977.1.patch, HIVE-6977.patch
>
>
> See mailing list discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6977) Delete Hiveserver1

2014-11-05 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6977:
---
Attachment: HIVE-6977.1.patch

> Delete Hiveserver1
> --
>
> Key: HIVE-6977
> URL: https://issues.apache.org/jira/browse/HIVE-6977
> Project: Hive
>  Issue Type: Task
>  Components: JDBC, Server Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-6977.1.patch, HIVE-6977.patch
>
>
> See mailing list discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8753) TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce failing on trunk

2014-11-05 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-8753:
--

 Summary: TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce 
failing on trunk
 Key: HIVE-8753
 URL: https://issues.apache.org/jira/browse/HIVE-8753
 Project: Hive
  Issue Type: Test
  Components: Logical Optimizer
Affects Versions: 0.15.0
Reporter: Ashutosh Chauhan


Because of HIVE-7111 
needs .q.out update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-05 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8740:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14

> Sorted dynamic partition does not work correctly with constant folding
> --
>
> Key: HIVE-8740
> URL: https://issues.apache.org/jira/browse/HIVE-8740
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch, HIVE-8740.3.patch, 
> HIVE-8740.4.patch
>
>
> Sorted dynamic partition optimization looks for partition columns from the 
> operator above FileSinkOperator. As per hive convention it expects partition 
> columns at the last. But with HIVE-8585 equality filters on partition columns 
> gets folded to constant. The column pruner then prunes the constant 
> expression as they don't reference any columns. This in some cases will yield 
> unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
> dynamic partition insert optimization. In such cases we don't really need 
> sorted dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8740) Sorted dynamic partition does not work correctly with constant folding

2014-11-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199472#comment-14199472
 ] 

Hive QA commented on HIVE-8740:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679663/HIVE-8740.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6674 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1649/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1649/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1649/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679663 - PreCommit-HIVE-TRUNK-Build

> Sorted dynamic partition does not work correctly with constant folding
> --
>
> Key: HIVE-8740
> URL: https://issues.apache.org/jira/browse/HIVE-8740
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8740.1.patch, HIVE-8740.2.patch, HIVE-8740.3.patch, 
> HIVE-8740.4.patch
>
>
> Sorted dynamic partition optimization looks for partition columns from the 
> operator above FileSinkOperator. As per hive convention it expects partition 
> columns at the last. But with HIVE-8585 equality filters on partition columns 
> gets folded to constant. The column pruner then prunes the constant 
> expression as they don't reference any columns. This in some cases will yield 
> unexpected results (throw ArrayIndexOutOfBounds exception) with sorted 
> dynamic partition insert optimization. In such cases we don't really need 
> sorted dynamic partition optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27640: HIVE-8700 Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

2014-11-05 Thread Suhas Satish



> On Nov. 5, 2014, 10:41 p.m., Szehon Ho wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java,
> >  line 201
> > 
> >
> > Hi Suhas, I was taking a look through the code, I dont think its easy 
> > now to identify which is the big-table parent vs small-table parent.  There 
> > is a HashTableDummyOperator representing the small-table but it only has 
> > some basic information.
> > 
> > Maybe you know more about it, but was wondering do we need to save the 
> > info to a context when we cut the small-table RS from MapJoin in 
> > ReduceSinkMapJoinProc?  Thanks.

Hi Szehon,
GenSparkProcContext has this - 
  // we need to keep the original list of operators in the map join to know
  // what position in the mapjoin the different parent work items will have.
  public final Map>> mapJoinParentMap;
  
There is also another data structure in GenSparkProcContext to keep track of 
which MapJoinWork is connected to which ReduceSinks. 
  // a map to keep track of what reduce sinks have to be hooked up to
  // map join work
  public final Map> 
linkWorkWithReduceSinkMap;
  
Maybe we need to introduce a similar one for HashTableSinkOperator  like 
 public final Map> 
linkWorkWithHashTableSinkMap;
 
In any case,  we should pass this GenSparkProcContext along to the 
physicalContext in the pyhsical resolvers. Let me know your thoughts.


- Suhas


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27640/#review60052
---


On Nov. 5, 2014, 8:29 p.m., Suhas Satish wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27640/
> ---
> 
> (Updated Nov. 5, 2014, 8:29 p.m.)
> 
> 
> Review request for hive, Chao Sun, Jimmy Xiang, Szehon Ho, and Xuefu Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> This replaces ReduceSinks with HashTableSinks in smaller tables for a 
> map-join. But the condition check field to detect map-join is actually being 
> set in CommonJoinResolver, which doesnt exist yet. We need to decide where is 
> the right place to populate this field. 
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 
> 795a5d7 
> 
> Diff: https://reviews.apache.org/r/27640/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Suhas Satish
> 
>

[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Dain Sundstrom (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199468#comment-14199468
 ] 

Dain Sundstrom commented on HIVE-8732:
--

The patch looks reasonable.  

We still need to decide how to deal with doubles with NaN. 

> ORC string statistics are not merged correctly
> --
>
> Key: HIVE-8732
> URL: https://issues.apache.org/jira/browse/HIVE-8732
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8732.patch, HIVE-8732.patch
>
>
> Currently ORC's string statistics do not merge correctly causing incorrect 
> maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8752) Disjunction cardinality estimation has selectivity of 1

2014-11-05 Thread Mostafa Mokhtar (JIRA)

Mostafa Mokhtar created HIVE-8752:
-

 Summary: Disjunction cardinality estimation has selectivity of 1
 Key: HIVE-8752
 URL: https://issues.apache.org/jira/browse/HIVE-8752
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
Priority: Critical
 Fix For: 0.14.0


TPC-DS Q89 has the wrong join order.
Store_sales should be joining with item first then date_dim.

The issue is that the predicate on item shows a selectivity of 1 
{code}
((i_category in ('Home','Books','Electronics') and
  i_class in ('wallpaper','parenting','musical')
 )
  or (i_category in ('Shoes','Jewelry','Men') and
  i_class in ('womens','birdal','pants') 
))
{code}

{code}
HiveProjectRel(i_item_sk=[$0], i_brand=[$8], i_class=[$10], 
i_category=[$12]): rowcount = 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 
0.0 io}, id = 4052
  HiveFilterRel(condition=[OR(AND(in($12, 'Home', 'Books', 
'Electronics'), in($10, 'wallpaper', 'parenting', 'musical')), AND(in($12, 
'Shoes', 'Jewelry', 'Men'), in($10, 'womens', 'birdal', 'pants')))]): rowcount 
= 462000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4050

HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_3.item]]): rowcount = 
462000.0, cumulative cost = {0}, id = 3818
{code}

Query
{code}

select  *
from(
select i_category, i_class, i_brand,
   s_store_name, s_company_name,
   d_moy,
   sum(ss_sales_price) sum_sales,
   avg(sum(ss_sales_price)) over
 (partition by i_category, i_brand, s_store_name, s_company_name)
 avg_monthly_sales
from item, store_sales, date_dim, store
where store_sales.ss_item_sk = item.i_item_sk and
  store_sales.ss_sold_date_sk = date_dim.d_date_sk and
  store_sales.ss_store_sk = store.s_store_sk and
  d_year in (2000) and
((i_category in ('Home','Books','Electronics') and
  i_class in ('wallpaper','parenting','musical')
 )
  or (i_category in ('Shoes','Jewelry','Men') and
  i_class in ('womens','birdal','pants') 
))
group by i_category, i_class, i_brand,
 s_store_name, s_company_name, d_moy) tmp1
where case when (avg_monthly_sales <> 0) then (abs(sum_sales - 
avg_monthly_sales) / avg_monthly_sales) else null end > 0.1
order by sum_sales - avg_monthly_sales, s_store_name
limit 100
{code}

The result of the wrong join order is that the query runs in 335 seconds 
compared to 124 seconds with the correct join order.

Removing the disjunction in the item filter produces the correct plan
{code}
 i_category in ('Home','Books','Electronics') and
  i_class in ('wallpaper','parenting','musical')
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8744) hbase_stats3.q test fails when paths stored at JDBCStatsUtils.getIdColumnName() are too large

2014-11-05 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199462#comment-14199462
 ] 

Szehon Ho commented on HIVE-8744:
-

Thanks Sergio.  It looks ok to me, only comment is that old tables need to be 
re-created, which might warrant a release note.

Also, some old version of mysql before 5.0.3 doesn't seem to support varchar 
beyond 255, not sure if its a concern.  [~brocknoland] do you happen to know?  
Thanks

> hbase_stats3.q test fails when paths stored at 
> JDBCStatsUtils.getIdColumnName() are too large
> -
>
> Key: HIVE-8744
> URL: https://issues.apache.org/jira/browse/HIVE-8744
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-8744.1.patch
>
>
> This test is related to the bug HIVE-8065 where I am trying to support HDFS 
> encryption. One of the enhancements to support it is to create a 
> .hive-staging directory on the same table directory location where the query 
> is executed.
> Now, when running the hbase_stats3.q test from a temporary directory that has 
> a large path, then the new path, a combination of table location + 
> .hive-staging + random temporary subdirectories, is too large to fit into the 
> statistics table, so the path is truncated.
> This causes the following error:
> {noformat}
> 2014-11-04 08:57:36,680 ERROR [LocalJobRunner Map Task Executor #0]: 
> jdbc.JDBCStatsPublisher (JDBCStatsPublisher.java:publishStat(199)) - Error 
> during publishing statistics. 
> java.sql.SQLDataException: A truncation error was encountered trying to 
> shrink VARCHAR 
> 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>   at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeLargeUpdate(Unknown 
> Source)
>   at 
> org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeUpdate(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:148)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher$2.run(JDBCStatsPublisher.java:145)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.executeWithRetry(Utilities.java:2667)
>   at 
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.publishStat(JDBCStatsPublisher.java:161)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1031)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:870)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:579)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:591)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.sql.SQLException: A truncation error was encountered trying 
> to shrink VARCHAR 
> 'pfile:/home/hiveptest/hive-ptest-cloudera-slaves-ee9-24.vpc.&' to length 255.
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>   at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown

[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Dain Sundstrom (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199460#comment-14199460
 ] 

Dain Sundstrom commented on HIVE-8732:
--

This seems like a reasonable plan.

> ORC string statistics are not merged correctly
> --
>
> Key: HIVE-8732
> URL: https://issues.apache.org/jira/browse/HIVE-8732
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8732.patch, HIVE-8732.patch
>
>
> Currently ORC's string statistics do not merge correctly causing incorrect 
> maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 >

1 - 100 of 268 matches

Mail list logo