[jira] [Updated] (HIVE-8766) Hive RetryHMSHandler should be retrying the metastore operation in case of NucleusException

2014-11-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8766:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to .14 branch and trunk. Thanks [~hsubramaniyan]!

> Hive RetryHMSHandler should be retrying the metastore operation in case of 
> NucleusException
> ---
>
> Key: HIVE-8766
> URL: https://issues.apache.org/jira/browse/HIVE-8766
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 0.14.0
>
> Attachments: HIVE-8766.1.patch, HIVE-8766.2.patch
>
>
> When we have Metastore operations and the Metastore Database is heavily 
> loaded or takes a long time to respond, we might run into NucleusExceptions 
> as shown in the below stacktrace. In the below scenario, the MetastoreDB is 
> SQL Server and the SQLServer is configured to timeout and terminate with a 
> connection reset after 'x' seconds if it doesnt return a ResultSet. While 
> this needs configuration change at the Metastore DB side, we need to make 
> sure that in such cases the HMS Retrying mechanism should not provide a rigid 
> rule to fail such hive queries. The proposed fix would be allow retries when 
> we hit a Nucleus Exception as shown below: 
> {noformat}
> 2014-11-04 06:40:03,208 ERROR bonecp.ConnectionHandle 
> (ConnectionHandle.java:markPossiblyBroken(388)) - Database access problem. 
> Killing off this connection and all remaining connections in the connection 
> pool. SQL State = 08S01
> 2014-11-04 06:40:03,213 ERROR DataNucleus.Transaction 
> (Log4JLogger.java:error(115)) - Operation rollback failed on resource: 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl$EmulatedXAResource@1a35cc16,
>  error code UNKNOWN and transaction: [DataNucleus Transaction, ID=Xid=   �, 
> enlisted 
> resources=[org.datanucleus.store.rdbms.ConnectionFactoryImpl$EmulatedXAResource@1a35cc16]]
> 2014-11-04 06:40:03,217 ERROR metastore.RetryingHMSHandler 
> (RetryingHMSHandler.java:invoke(139)) - 
> MetaException(message:org.datanucleus.exceptions.NucleusDataStoreException: 
> Size request failed : SELECT COUNT(*) FROM SKEWED_VALUES THIS WHERE 
> THIS.SD_ID_OID=? AND THIS.INTEGER_IDX>=0)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5183)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:1738)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1699)
>   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:101)
>   at com.sun.proxy.$Proxy11.get_table(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1091)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:112)
>   at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
>   at com.sun.proxy.$Proxy12.getTable(Unknown Source)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1060)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1015)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1316)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1309)
>   at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1387)
>   at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableSerde(DDLSemanticAnalyzer.java:1356)
>   at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:299)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:415)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:303)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1067)
>   at org.apache.hadoop.hive.ql.Driver.

[jira] [Updated] (HIVE-8781) Nullsafe joins are busted on Tez

2014-11-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8781:
-
Component/s: Tez

> Nullsafe joins are busted on Tez
> 
>
> Key: HIVE-8781
> URL: https://issues.apache.org/jira/browse/HIVE-8781
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8781.1.patch, HIVE-8781.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8781) Nullsafe joins are busted on Tez

2014-11-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8781:
-
Affects Version/s: 0.14.0

> Nullsafe joins are busted on Tez
> 
>
> Key: HIVE-8781
> URL: https://issues.apache.org/jira/browse/HIVE-8781
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8781.1.patch, HIVE-8781.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6977) Delete Hiveserver1

2014-11-07 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6977:
-
Labels: TODOC15  (was: )

> Delete Hiveserver1
> --
>
> Key: HIVE-6977
> URL: https://issues.apache.org/jira/browse/HIVE-6977
> Project: Hive
>  Issue Type: Task
>  Components: JDBC, Server Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>  Labels: TODOC15
> Fix For: 0.15.0
>
> Attachments: HIVE-6977.1.patch, HIVE-6977.patch
>
>
> See mailing list discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8700) Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

2014-11-07 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8700:

   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to spark.  Thanks a lot to Suhas for the fix!

> Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]
> --
>
> Key: HIVE-8700
> URL: https://issues.apache.org/jira/browse/HIVE-8700
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Suhas Satish
> Fix For: spark-branch
>
> Attachments: HIVE-8700-spark.patch, HIVE-8700.2-spark.patch, 
> HIVE-8700.3-spark.patch, HIVE-8700.patch
>
>
> With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small 
> tables. For example, the follow represents the operator plan for the small 
> table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from 
> dec join dec1 on dec.value=dec1.d;{code}
> {code}
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: dec1
>   Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL 
> Column stats: NONE
>   Filter Operator
> predicate: d is not null (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: d (type: decimal(5,2))
>   sort order: +
>   Map-reduce partition columns: d (type: decimal(5,2))
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>   value expressions: i (type: int)
> {code}
> With the new design for broadcasting small tables, we need to convert the 
> ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6977) Delete Hiveserver1

2014-11-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201808#comment-14201808
 ] 

Lefty Leverenz commented on HIVE-6977:
--

This needs to be documented in the wiki, but we can't just remove all the old 
docs because Hive Server still exists in pre-0.15 releases.

For starters, these need to be updated:

* [Hive Client | https://cwiki.apache.org/confluence/display/Hive/HiveClient]
* [Setting Up Hive Server | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer]
** [Thrift Hive Server | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer]
** [Hive JDBC Interface -- Integration with Pentaho (see step 3) | 
https://cwiki.apache.org/confluence/display/Hive/HiveJDBCInterface#HiveJDBCInterface-IntegrationwithPentaho]
** [Hive ODBC Driver | 
https://cwiki.apache.org/confluence/display/Hive/HiveODBC]

A search of the wiki will uncover more places to update.  Does this also get 
rid of the CLI?

> Delete Hiveserver1
> --
>
> Key: HIVE-6977
> URL: https://issues.apache.org/jira/browse/HIVE-6977
> Project: Hive
>  Issue Type: Task
>  Components: JDBC, Server Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>  Labels: TODOC15
> Fix For: 0.15.0
>
> Attachments: HIVE-6977.1.patch, HIVE-6977.patch
>
>
> See mailing list discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201813#comment-14201813
 ] 

Hive QA commented on HIVE-8771:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680021/HIVE-8771.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6665 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1681/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1681/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1681/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680021 - PreCommit-HIVE-TRUNK-Build

> Abstract merge file operator does not move/rename incompatible files correctly
> --
>
> Key: HIVE-8771
> URL: https://issues.apache.org/jira/browse/HIVE-8771
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8771.1.patch
>
>
> AbstractFileMergeOperator moves incompatible files (files which cannot be 
> merged) to final destination. The destination path must be directory instead 
> of file. This causes orc_merge_incompat2.q to fail under CentOS with 
> IOException failing to rename/move files.
> Stack trace:
> {code}
> 2014-11-05 02:38:56,588 DEBUG fs.FileSystem 
> (RawLocalFileSystem.java:rename(337)) - Falling through to a copy of 
> file:/home/prasanth/hive/itests/qtest/target/warehouse/orc_merge5a/st=80.0/00_0
>  to 
> file:/home/prasanth/hive/itests/qtest/target/tmp/scratchdir/prasanth/0de64e52-6615-4c5a-bdfb-c3b2c28131f6/hive_2014-11-05_02-38-55_511_7578595409877157627-1/_tmp.-ext-1/00_0/00_0
> 2014-11-05 02:38:56,589 INFO  mapred.LocalJobRunner 
> (LocalJobRunner.java:runTasks(456)) - map task executor complete.
> 2014-11-05 02:38:56,590 WARN  mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1144733438_0036
> java.lang.Exception: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
> at 
> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:100)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:233)
> at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:220)
> at 
> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98)
> ... 10 more
> Caused by: java.io.FileNotFoundException: Destination exists and is not a 
> directory: 
> /home/prasanth/hive/itests/qtest/target/tmp/scratchdir/prasanth/0de64e52-6615-4c5a-bdfb-c3b2c28131f6/hive_2014-11-05_02-38-55_511_7578595409877157627-1/_tmp.-ext-1/00_0
> at 
> org.apache.hadoop.fs.RawLocalFi

[jira] [Updated] (HIVE-8771) Abstract merge file operator does not move/rename incompatible files correctly

2014-11-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8771:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and .14. Thanks [~prasanth_j]!

> Abstract merge file operator does not move/rename incompatible files correctly
> --
>
> Key: HIVE-8771
> URL: https://issues.apache.org/jira/browse/HIVE-8771
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8771.1.patch
>
>
> AbstractFileMergeOperator moves incompatible files (files which cannot be 
> merged) to final destination. The destination path must be directory instead 
> of file. This causes orc_merge_incompat2.q to fail under CentOS with 
> IOException failing to rename/move files.
> Stack trace:
> {code}
> 2014-11-05 02:38:56,588 DEBUG fs.FileSystem 
> (RawLocalFileSystem.java:rename(337)) - Falling through to a copy of 
> file:/home/prasanth/hive/itests/qtest/target/warehouse/orc_merge5a/st=80.0/00_0
>  to 
> file:/home/prasanth/hive/itests/qtest/target/tmp/scratchdir/prasanth/0de64e52-6615-4c5a-bdfb-c3b2c28131f6/hive_2014-11-05_02-38-55_511_7578595409877157627-1/_tmp.-ext-1/00_0/00_0
> 2014-11-05 02:38:56,589 INFO  mapred.LocalJobRunner 
> (LocalJobRunner.java:runTasks(456)) - map task executor complete.
> 2014-11-05 02:38:56,590 WARN  mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1144733438_0036
> java.lang.Exception: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.io.IOException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
> at 
> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:100)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
> at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:233)
> at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:220)
> at 
> org.apache.hadoop.hive.ql.io.merge.MergeFileMapper.close(MergeFileMapper.java:98)
> ... 10 more
> Caused by: java.io.FileNotFoundException: Destination exists and is not a 
> directory: 
> /home/prasanth/hive/itests/qtest/target/tmp/scratchdir/prasanth/0de64e52-6615-4c5a-bdfb-c3b2c28131f6/hive_2014-11-05_02-38-55_511_7578595409877157627-1/_tmp.-ext-1/00_0
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:423)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:267)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:257)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:339)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:507)
> at 
> org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:214)
> at 
> org.apache.hadoop.fs.ProxyFileSystem.rename(ProxyFileSystem.java:177)
> at 
> org.apache.hadoop.fs.FilterFileSystem.rename(FilterFileSystem.java:214)
> at 
> org.apache.hadoop.hive.ql.exec.Utiliti

[jira] [Commented] (HIVE-8773) Fix TestWebHCatE2e#getStatus for Java8

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201875#comment-14201875
 ] 

Hive QA commented on HIVE-8773:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680025/HIVE-8773.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6665 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1682/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1682/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1682/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680025 - PreCommit-HIVE-TRUNK-Build

> Fix TestWebHCatE2e#getStatus for Java8
> --
>
> Key: HIVE-8773
> URL: https://issues.apache.org/jira/browse/HIVE-8773
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Attachments: HIVE-8773.patch
>
>
> [HttpMethod.getResponseBodyAsString|https://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/HttpMethod.html#getResponseBodyAsString()]
>  returns response body in different order for Java8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8774) CBO: enable groupBy index

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201922#comment-14201922
 ] 

Hive QA commented on HIVE-8774:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680028/HIVE-8774.1.patch

{color:red}ERROR:{color} -1 due to 944 failed/errored test(s),  tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join33
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autogen_colalias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_decimal_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_

[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201978#comment-14201978
 ] 

Hive QA commented on HIVE-8745:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680049/HIVE-8745.3.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s),  tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1684/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1684/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1684/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680049 - PreCommit-HIVE-TRUNK-Build

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8745.1.patch, HIVE-8745.2.patch, HIVE-8745.3.patch, 
> join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8542) Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]

2014-11-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8542:
-
Attachment: HIVE-8542.1-spark.patch

Submit a patch to let the tests run. I expect many to fail.

> Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
> 
>
> Key: HIVE-8542
> URL: https://issues.apache.org/jira/browse/HIVE-8542
> Project: Hive
>  Issue Type: Test
>  Components: Spark
>Reporter: Chao
>Assignee: Rui Li
> Attachments: HIVE-8542.1-spark.patch
>
>
> Currently, in Spark branch, results for these two test files are very 
> different from MR's. We need to find out the cause for this, and identify 
> potential bug in our current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8542) Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]

2014-11-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8542:
-
Issue Type: Bug  (was: Test)

> Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
> 
>
> Key: HIVE-8542
> URL: https://issues.apache.org/jira/browse/HIVE-8542
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao
>Assignee: Rui Li
> Attachments: HIVE-8542.1-spark.patch
>
>
> Currently, in Spark branch, results for these two test files are very 
> different from MR's. We need to find out the cause for this, and identify 
> potential bug in our current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8542) Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]

2014-11-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8542:
-
Status: Patch Available  (was: Open)

> Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]
> 
>
> Key: HIVE-8542
> URL: https://issues.apache.org/jira/browse/HIVE-8542
> Project: Hive
>  Issue Type: Test
>  Components: Spark
>Reporter: Chao
>Assignee: Rui Li
> Attachments: HIVE-8542.1-spark.patch
>
>
> Currently, in Spark branch, results for these two test files are very 
> different from MR's. We need to find out the cause for this, and identify 
> potential bug in our current implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202023#comment-14202023
 ] 

Hive QA commented on HIVE-8778:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680074/HIVE-8778.1.patch

{color:green}SUCCESS:{color} +1 6667 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1685/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1685/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680074 - PreCommit-HIVE-TRUNK-Build

> ORC split elimination can cause NPE when column statistics is null
> --
>
> Key: HIVE-8778
> URL: https://issues.apache.org/jira/browse/HIVE-8778
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8778.1.patch
>
>
> Row group elimination has protection for NULL statistics values in 
> RecordReaderImpl.evaluatePredicate() which then calls 
> evaluatePredicateRange(). But split elimination directly calls 
> evaluatePredicateRange() without NULL protection. This can lead to 
> NullPointerException when a column is NULL in entire stripe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8542) Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202036#comment-14202036
 ] 

Hive QA commented on HIVE-8542:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680146/HIVE-8542.1-spark.patch

{color:red}ERROR:{color} -1 due to 352 failed/errored test(s), 7125 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin9
org.apache.hadoop.hiv

[jira] [Commented] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for sub-second queries

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202086#comment-14202086
 ] 

Hive QA commented on HIVE-8779:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680078/HIVE-8779.1.patch

{color:green}SUCCESS:{color} +1 6665 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1686/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1686/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1686/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680078 - PreCommit-HIVE-TRUNK-Build

> Tez in-place progress UI can show wrong estimated time for sub-second queries
> -
>
> Key: HIVE-8779
> URL: https://issues.apache.org/jira/browse/HIVE-8779
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Trivial
> Attachments: HIVE-8779.1.patch
>
>
> The in-place progress update UI added as part of HIVE-8495 can show wrong 
> estimated time for AM only job which goes from INITED to SUCCEEDED DAG state 
> directly without going to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

2014-11-07 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60316
---


1. Looking at the diff, I was sure where we are removing unnecessary counter 
registrations.
2. It would be great if we can have some tests that are enabled with counter 
statistics collection, so that we know what kind of output we are expecting and 
avoid future breakage.

- Xuefu Zhang


On Nov. 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> ---
> 
> (Updated Nov. 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
> https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java 
> bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>



[jira] [Commented] (HIVE-8548) Integrate with remote Spark context after HIVE-8528 [Spark Branch]

2014-11-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202098#comment-14202098
 ] 

Xuefu Zhang commented on HIVE-8548:
---

Hi [~chengxiang li], as to automatical testing of remote spart context, we will 
be switch out unit test from "local" to "local-cluster", hoping that will cover 
it. For now, it seems that we don't need an extra configuration for that 
purpose.

> Integrate with remote Spark context after HIVE-8528 [Spark Branch]
> --
>
> Key: HIVE-8548
> URL: https://issues.apache.org/jira/browse/HIVE-8548
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
>
> With HIVE-8528, HiverSever2 should use remote Spark context to submit job and 
> monitor progress, etc. This is necessary if Hive runs on standalone cluster, 
> Yarn, or Mesos. If Hive runs with spark.master=local, we should continue 
> using SparkContext in current way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

2014-11-07 Thread chengxiang li


> On 十一月 7, 2014, 2:36 p.m., Xuefu Zhang wrote:
> > 1. Looking at the diff, I was sure where we are removing unnecessary 
> > counter registrations.
> > 2. It would be great if we can have some tests that are enabled with 
> > counter statistics collection, so that we know what kind of output we are 
> > expecting and avoid future breakage.

Thanks,xuefu.the hive operator level counters is used during spark job 
execution,so our current tests should have covered this patch change.but for 
table statistic collection, as you know that we use fs as default counter 
storage for test. i run qtests locally before,but yes related change would not 
reflect in automatic test result.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60316
---


On 十一月 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> ---
> 
> (Updated 十一月 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
> https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java 
> bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>



Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

2014-11-07 Thread Xuefu Zhang


> On Nov. 7, 2014, 2:36 p.m., Xuefu Zhang wrote:
> > 1. Looking at the diff, I was sure where we are removing unnecessary 
> > counter registrations.
> > 2. It would be great if we can have some tests that are enabled with 
> > counter statistics collection, so that we know what kind of output we are 
> > expecting and avoid future breakage.
> 
> chengxiang li wrote:
> Thanks,xuefu.the hive operator level counters is used during spark job 
> execution,so our current tests should have covered this patch change.but for 
> table statistic collection, as you know that we use fs as default counter 
> storage for test. i run qtests locally before,but yes related change would 
> not reflect in automatic test result.

Actually, for #1, I meant "I was NOT sure..."

Can we have some tests running wiht counter?


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60316
---


On Nov. 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> ---
> 
> (Updated Nov. 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
> https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java 
> bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>



[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-11-07 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202116#comment-14202116
 ] 

Laljo John Pullokkaran commented on HIVE-8561:
--

Hive Optiq Rel Node is gonna be removed shortly as part of refactoring; also 
Optiq/Calcite is going through refactoring/renaming.

w.r.t public api, an api to get logical plan with/without optimization may be 
what you want.
Public API needs to address:
 1. what sort of queries can this api handle (select, insert in to, create 
table as)?
 2. What about security (is Hive gonna enforce security or is it going to be 
Drill)?
 3. What about views?

> Expose Hive optiq operator tree to be able to support other sql on hadoop 
> query engines
> ---
>
> Key: HIVE-8561
> URL: https://issues.apache.org/jira/browse/HIVE-8561
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Affects Versions: 0.14.0
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8561.2.patch, HIVE-8561.3.patch, HIVE-8561.patch
>
>
> Hive-0.14 added cost based optimization and optiq operator tree is created 
> for select queries. However, the optiq operator tree is not visible from 
> outside and hard to be used by other Sql on Hadoop query engine such as 
> apache Drill. To be able to allow drill to access the hive optiq operator 
> tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202126#comment-14202126
 ] 

Owen O'Malley commented on HIVE-8732:
-

I should also point out that I added a line to the orcfiledump with a line 
about the version. New files will get the line:

File Version: 0.12 with HIVE_8732

Files written by the old writer will say either:

File Version: 0.12 with ORIGINAL
or
File Version: 0.11 with ORIGINAL



> ORC string statistics are not merged correctly
> --
>
> Key: HIVE-8732
> URL: https://issues.apache.org/jira/browse/HIVE-8732
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch
>
>
> Currently ORC's string statistics do not merge correctly causing incorrect 
> maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8775) Merge from trunk 11/6/14 [SPARK BRANCH]

2014-11-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202129#comment-14202129
 ] 

Xuefu Zhang commented on HIVE-8775:
---

It looks like a lot of diffs on stats appeared.

> Merge from trunk 11/6/14 [SPARK BRANCH]
> ---
>
> Key: HIVE-8775
> URL: https://issues.apache.org/jira/browse/HIVE-8775
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-8775.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

2014-11-07 Thread chengxiang li


> On 十一月 7, 2014, 2:36 p.m., Xuefu Zhang wrote:
> > 1. Looking at the diff, I was sure where we are removing unnecessary 
> > counter registrations.
> > 2. It would be great if we can have some tests that are enabled with 
> > counter statistics collection, so that we know what kind of output we are 
> > expecting and avoid future breakage.
> 
> chengxiang li wrote:
> Thanks,xuefu.the hive operator level counters is used during spark job 
> execution,so our current tests should have covered this patch change.but for 
> table statistic collection, as you know that we use fs as default counter 
> storage for test. i run qtests locally before,but yes related change would 
> not reflect in automatic test result.
> 
> Xuefu Zhang wrote:
> Actually, for #1, I meant "I was NOT sure..."
> 
> Can we have some tests running wiht counter?

Oh, previously we register all possible counters in SparkCounters, in this 
patch, we get only required counters with SparkTask::getOperatorCounters, and 
register them in SparkClient.Maybe i'm not descirbe this clearly, this patch 
only include hive operator counter register related change,as all queries would 
collect hive operator statistic,so all qtests would go through the changed 
logic. table statistic collection is not related with this patch.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60316
---


On 十一月 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> ---
> 
> (Updated 十一月 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
> https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java 
> bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>



[jira] [Commented] (HIVE-8768) CBO: Fix filter selectivity for "in clause" & "<>"

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202163#comment-14202163
 ] 

Hive QA commented on HIVE-8768:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680082/HIVE-8768.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6665 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1687/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1687/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1687/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680082 - PreCommit-HIVE-TRUNK-Build

> CBO: Fix filter selectivity for "in clause" & "<>" 
> ---
>
> Key: HIVE-8768
> URL: https://issues.apache.org/jira/browse/HIVE-8768
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-8768.1.patch, HIVE-8768.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8622) Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-07 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8622:
---
Attachment: HIVE-8622.2-spark.patch

Another patch with a cleaner solution in my opinion. I tested with 
subquery_multiinsert.q, and result looks fine. Please give suggestions!

> Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]
> 
>
> Key: HIVE-8622
> URL: https://issues.apache.org/jira/browse/HIVE-8622
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Suhas Satish
>Assignee: Chao
> Attachments: HIVE-8622.2-spark.patch, HIVE-8622.patch
>
>
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-07 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/
---

(Updated Nov. 7, 2014, 3:57 p.m.)


Review request for hive.


Changes
---

Another patch with a cleaner solution in my opinion. I tested it with 
subquery_multiinsert.q and result looks fine. Please give suggestions!


Bugs: HIVE-8622
https://issues.apache.org/jira/browse/HIVE-8622


Repository: hive-git


Description
---

This is a sub-task of map-join for spark 
https://issues.apache.org/jira/browse/HIVE-7613
This can use the baseline patch for map-join
https://issues.apache.org/jira/browse/HIVE-8616


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/27627/diff/


Testing
---


Thanks,

Chao Sun



[jira] [Updated] (HIVE-8758) Fix hadoop-1 build [Spark Branch]

2014-11-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8758:
--
Status: Open  (was: Patch Available)

> Fix hadoop-1 build [Spark Branch]
> -
>
> Key: HIVE-8758
> URL: https://issues.apache.org/jira/browse/HIVE-8758
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8758.1-spark.patch
>
>
> This may mean merging patches from trunk and fixing whatever problem specific 
> to Spark branch. Here are user reported problems:
> Problem 1:
> {code}
> Hive Serde . FAILURE [  2.357 s]
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-serde: Compilation failure: Compilation failure:
> [ERROR] 
> /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24]
>  cannot find symbol
> [ERROR] symbol:   class Nullable
> [ERROR] location: package javax.annotation
> [ERROR] 
> /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36]
>  cannot find symbol
> [ERROR] symbol:   class Nullable
> [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe
> {code}
> My understanding: Looks the Nullable annotation was recently added in the 
> recent branch. Added the below dependency in the project hive-serde
> {code}
> 
> com.google.code.findbugs
> jsr305
> 3.0.0
> 
> {code}
> Problem 2:
> After adding the dependency for hive-serde, got the below compilation error
> {code}
> [INFO] Hive Query Language  FAILURE [01:35 
> min]
> /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39]
>  error: package org.apache.hadoop.mapreduce.util does not exist
> {code}
> In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have 
> the package “org.apache.hadoop.mapreduce.util” to circumvent it added the 
> below dependency where we had the package (not sure, it is right – I badly 
> wanted to make the build successful L)
> {code}
> 
> org.apache.hadoop
> hadoop-mapreduce-client-core
> 0.23.11
> 
>   
> {code}
> Problem 3:
> After making the above change, again failed in the same project @ file 
> /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java.
>  In the snippet below taken from the file, we can see the 
> “fileStatus.isFile()” is called which is not available in the 
> “org.apache.hadoop.fs.FileStatus” hadoop1 api.
> {code}
>  for (FileStatus fileStatus: fs.listStatus(folder)) {
>Path filePath = fileStatus.getPath();
> if (!fileStatus.isFile()) {
>   throw new HiveException("Error, not a file: " + filePath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8758) Fix hadoop-1 build [Spark Branch]

2014-11-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202195#comment-14202195
 ] 

Jimmy Xiang commented on HIVE-8758:
---

The trunk branch doesn't compile with hadoop-1:
{noformat}
[INFO] Hive HBase Handler  FAILURE [0.875s]
{noformat}


> Fix hadoop-1 build [Spark Branch]
> -
>
> Key: HIVE-8758
> URL: https://issues.apache.org/jira/browse/HIVE-8758
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8758.1-spark.patch
>
>
> This may mean merging patches from trunk and fixing whatever problem specific 
> to Spark branch. Here are user reported problems:
> Problem 1:
> {code}
> Hive Serde . FAILURE [  2.357 s]
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-serde: Compilation failure: Compilation failure:
> [ERROR] 
> /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24]
>  cannot find symbol
> [ERROR] symbol:   class Nullable
> [ERROR] location: package javax.annotation
> [ERROR] 
> /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36]
>  cannot find symbol
> [ERROR] symbol:   class Nullable
> [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe
> {code}
> My understanding: Looks the Nullable annotation was recently added in the 
> recent branch. Added the below dependency in the project hive-serde
> {code}
> 
> com.google.code.findbugs
> jsr305
> 3.0.0
> 
> {code}
> Problem 2:
> After adding the dependency for hive-serde, got the below compilation error
> {code}
> [INFO] Hive Query Language  FAILURE [01:35 
> min]
> /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39]
>  error: package org.apache.hadoop.mapreduce.util does not exist
> {code}
> In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have 
> the package “org.apache.hadoop.mapreduce.util” to circumvent it added the 
> below dependency where we had the package (not sure, it is right – I badly 
> wanted to make the build successful L)
> {code}
> 
> org.apache.hadoop
> hadoop-mapreduce-client-core
> 0.23.11
> 
>   
> {code}
> Problem 3:
> After making the above change, again failed in the same project @ file 
> /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java.
>  In the snippet below taken from the file, we can see the 
> “fileStatus.isFile()” is called which is not available in the 
> “org.apache.hadoop.fs.FileStatus” hadoop1 api.
> {code}
>  for (FileStatus fileStatus: fs.listStatus(folder)) {
>Path filePath = fileStatus.getPath();
> if (!fileStatus.isFile()) {
>   throw new HiveException("Error, not a file: " + filePath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

2014-11-07 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60333
---

Ship it!


Got it. Thanks for the explanation.

I'm going to create a separate JIRA requesting to have some tests with stats 
collection with counter.

- Xuefu Zhang


On Nov. 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> ---
> 
> (Updated Nov. 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
> https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java 
> bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>



[jira] [Commented] (HIVE-8777) Should only register used counters in SparkCounters[Spark Branch]

2014-11-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202197#comment-14202197
 ] 

Xuefu Zhang commented on HIVE-8777:
---

+1.

It seems that we need some tests that collect stats with counter. Currently all 
stats collection tests are taking default "fs". Maybe some tests can be turned 
to use "counter". Will create a seperate JIRA for this.

> Should only register used counters in SparkCounters[Spark Branch]
> -
>
> Key: HIVE-8777
> URL: https://issues.apache.org/jira/browse/HIVE-8777
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Attachments: HIVE-8777.1-spark.patch
>
>
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8782) HBase handler doesn't compile with hadoop-1

2014-11-07 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-8782:
-

 Summary: HBase handler doesn't compile with hadoop-1
 Key: HIVE-8782
 URL: https://issues.apache.org/jira/browse/HIVE-8782
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang


{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-hbase-handler: Compilation failure
[ERROR] 
/home/jxiang/git-repos/apache/tmp/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java:[482,31]
 cannot find symbol
[ERROR] symbol:   method mergeAll(org.apache.hadoop.security.Credentials)
[ERROR] location: class org.apache.hadoop.security.Credentials
[ERROR] -> [Help 1]
[ERROR] 
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8780) insert1.q and ppd_join4.q hangs with hadoop-1 [Spark Branch]

2014-11-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202206#comment-14202206
 ] 

Jimmy Xiang commented on HIVE-8780:
---

I will look into it further once HIVE-8782 and 8758 are resolved.

> insert1.q and ppd_join4.q hangs with hadoop-1 [Spark Branch]
> 
>
> Key: HIVE-8780
> URL: https://issues.apache.org/jira/browse/HIVE-8780
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Jimmy Xiang
>
> In working on HIVE-8758, found these tests hang at 
> {noformat}
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor.startMoni
> tor(SparkJobMonitor.java:129)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java
> :111)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.ja
> va:85)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1644)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1404)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1216)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1033)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:2
> 47)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:345)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:832)
> at 
> org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDri
> ver.java:3706)
> at 
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4
> (TestSparkCliDriver.java:2790)
> {noformat}
> Both tests hang at the same place. There could be other hanging tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8782) HBase handler doesn't compile with hadoop-1

2014-11-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202214#comment-14202214
 ] 

Jimmy Xiang commented on HIVE-8782:
---

[~sseth], can you take a look? Do we have to use mergeAll instead of addAll?  
With addAll, both hadoop 1 and 2 should compile.

> HBase handler doesn't compile with hadoop-1
> ---
>
> Key: HIVE-8782
> URL: https://issues.apache.org/jira/browse/HIVE-8782
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-hbase-handler: Compilation failure
> [ERROR] 
> /home/jxiang/git-repos/apache/tmp/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java:[482,31]
>  cannot find symbol
> [ERROR] symbol:   method mergeAll(org.apache.hadoop.security.Credentials)
> [ERROR] location: class org.apache.hadoop.security.Credentials
> [ERROR] -> [Help 1]
> [ERROR] 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-07 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202230#comment-14202230
 ] 

Prasanth J commented on HIVE-8732:
--

I have verified that file version in file dump with old orc formats. 

> ORC string statistics are not merged correctly
> --
>
> Key: HIVE-8732
> URL: https://issues.apache.org/jira/browse/HIVE-8732
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 0.14.0
>
> Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch
>
>
> Currently ORC's string statistics do not merge correctly causing incorrect 
> maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8772) zookeeper info logs are always printed from beeline with service discovery mode

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202240#comment-14202240
 ] 

Hive QA commented on HIVE-8772:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680102/HIVE-8772.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6665 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1688/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1688/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1688/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680102 - PreCommit-HIVE-TRUNK-Build

> zookeeper info logs are always printed from beeline with service discovery 
> mode
> ---
>
> Key: HIVE-8772
> URL: https://issues.apache.org/jira/browse/HIVE-8772
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 0.14.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.14.0
>
> Attachments: HIVE-8772.1.patch
>
>
> Log messages like following are being printed by zookeeper for beeline 
> commands, and there is no way to suppress that using beeline commandline 
> options (--silent or --verbose).
> {noformat}
> 14/11/04 16:05:47 INFO zookeeper.ZooKeeper: Client 
> environment:java.vendor=Oracle Corporation
> 14/11/04 16:05:47 INFO zookeeper.ZooKeeper: Client 
> environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.71.x86_64/jre
> 14/11/04 16:05:47 INFO zookeeper.ZooKeeper: Client 
> environment:java.class.path=/usr/hdp/2.2.0.0-1756/hadoop/conf:/usr/hdp/2.2.0.0-1756/hadoop/lib/ranger-plugins-cred-0.4.0.2.2.0.0-1756.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jetty-util-6.1.26.hwx
> .jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/ranger-hdfs-plugin-0.4.0.2.2.0.0-1756.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/commons-math3-3.1.1.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/mysql-connector-java.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/mockito-all-1.8
> .5.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/curator-framework-2.6.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/guava-11.0.2.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/java-xmlbuilder-0.4.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/ranger-plugins-audit-0.4.0.2.2.0.0-
> 1756.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/curator-client-2.6.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/commons-httpclient-3.1.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/junit-4.11.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jersey-core-1.9.jar:/usr/hdp/2.2.0.
> 0-1756/hadoop/lib/jersey-json-1.9.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jersey-server-1.9.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/api-asn1-api-1.0.0-M20.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/asm-3.2.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/protobuf-jav
> a-2.5.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/eclipselink-2.5.2-M1.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/curator-recipes-2.6.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jettison-1.1.jar:/us
> r/hdp/2.2.0.0-1756/hadoop/lib/commons-digester-1.8.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/htrace-core-3.0.4.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/commons-compress-1.4.1.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jackson-jaxrs-1.9.13.jar:/usr/hdp/2.2.0.
> 0-1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8758) Fix hadoop-1 build [Spark Branch]

2014-11-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202269#comment-14202269
 ] 

Xuefu Zhang commented on HIVE-8758:
---

Please create a JIRA on trunk with the JIRA that caused the issue. Thanks.

> Fix hadoop-1 build [Spark Branch]
> -
>
> Key: HIVE-8758
> URL: https://issues.apache.org/jira/browse/HIVE-8758
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8758.1-spark.patch
>
>
> This may mean merging patches from trunk and fixing whatever problem specific 
> to Spark branch. Here are user reported problems:
> Problem 1:
> {code}
> Hive Serde . FAILURE [  2.357 s]
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-serde: Compilation failure: Compilation failure:
> [ERROR] 
> /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24]
>  cannot find symbol
> [ERROR] symbol:   class Nullable
> [ERROR] location: package javax.annotation
> [ERROR] 
> /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36]
>  cannot find symbol
> [ERROR] symbol:   class Nullable
> [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe
> {code}
> My understanding: Looks the Nullable annotation was recently added in the 
> recent branch. Added the below dependency in the project hive-serde
> {code}
> 
> com.google.code.findbugs
> jsr305
> 3.0.0
> 
> {code}
> Problem 2:
> After adding the dependency for hive-serde, got the below compilation error
> {code}
> [INFO] Hive Query Language  FAILURE [01:35 
> min]
> /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39]
>  error: package org.apache.hadoop.mapreduce.util does not exist
> {code}
> In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have 
> the package “org.apache.hadoop.mapreduce.util” to circumvent it added the 
> below dependency where we had the package (not sure, it is right – I badly 
> wanted to make the build successful L)
> {code}
> 
> org.apache.hadoop
> hadoop-mapreduce-client-core
> 0.23.11
> 
>   
> {code}
> Problem 3:
> After making the above change, again failed in the same project @ file 
> /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java.
>  In the snippet below taken from the file, we can see the 
> “fileStatus.isFile()” is called which is not available in the 
> “org.apache.hadoop.fs.FileStatus” hadoop1 api.
> {code}
>  for (FileStatus fileStatus: fs.listStatus(folder)) {
>Path filePath = fileStatus.getPath();
> if (!fileStatus.isFile()) {
>   throw new HiveException("Error, not a file: " + filePath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8777) Should only register used counters in SparkCounters[Spark Branch]

2014-11-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8777:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks to Chengxiang for the contribution.

> Should only register used counters in SparkCounters[Spark Branch]
> -
>
> Key: HIVE-8777
> URL: https://issues.apache.org/jira/browse/HIVE-8777
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Fix For: spark-branch
>
> Attachments: HIVE-8777.1-spark.patch
>
>
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27720: HIVE-8777 should only register used operator counters.[Spark Branch]

2014-11-07 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27720/#review60343
---



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java


I'm wondering, if there are multiple instance of an operator, say 
FileSinkOperator, what's the effect on counter registration.


- Xuefu Zhang


On Nov. 7, 2014, 5:27 a.m., chengxiang li wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27720/
> ---
> 
> (Updated Nov. 7, 2014, 5:27 a.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8777
> https://issues.apache.org/jira/browse/HIVE-8777
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java e955da3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 46b04bc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java 
> bb3597a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 
> 
> Diff: https://reviews.apache.org/r/27720/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> chengxiang li
> 
>



[jira] [Created] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]

2014-11-07 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-8783:
-

 Summary: Create some tests that use Spark counter for stats 
collection [Spark Branch]
 Key: HIVE-8783
 URL: https://issues.apache.org/jira/browse/HIVE-8783
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Xuefu Zhang


Currently when .q tests are run with Spark, the default stats collection is 
"fs". We need to have some tests that use Spark counter for stats collection to 
enhance coverage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8783) Create some tests that use Spark counter for stats collection [Spark Branch]

2014-11-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8783:
--
Issue Type: Sub-task  (was: Test)
Parent: HIVE-7292

> Create some tests that use Spark counter for stats collection [Spark Branch]
> 
>
> Key: HIVE-8783
> URL: https://issues.apache.org/jira/browse/HIVE-8783
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>
> Currently when .q tests are run with Spark, the default stats collection is 
> "fs". We need to have some tests that use Spark counter for stats collection 
> to enhance coverage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8777) Should only register used counters in SparkCounters[Spark Branch]

2014-11-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202282#comment-14202282
 ] 

Xuefu Zhang commented on HIVE-8777:
---

HIVE-8783 is created to add tests as mentioned above.

> Should only register used counters in SparkCounters[Spark Branch]
> -
>
> Key: HIVE-8777
> URL: https://issues.apache.org/jira/browse/HIVE-8777
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>  Labels: Spark-M3
> Fix For: spark-branch
>
> Attachments: HIVE-8777.1-spark.patch
>
>
> Currently we register all hive operator counters in SparkCounters, while 
> actually not all hive operators are used in SparkTask, we should iterate 
> SparkTask's operators, and only register conuters required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-07 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202289#comment-14202289
 ] 

Jason Dere commented on HIVE-8745:
--

Failures do not appear to be related to the patch.

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8745.1.patch, HIVE-8745.2.patch, HIVE-8745.3.patch, 
> join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8329) Enable postgres for storing stats

2014-11-07 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-8329:
---
Fix Version/s: (was: 0.14.0)
   0.15.0

> Enable postgres for storing stats
> -
>
> Key: HIVE-8329
> URL: https://issues.apache.org/jira/browse/HIVE-8329
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Assignee: Damien Carol
> Fix For: 0.15.0
>
> Attachments: HIVE-8329.1.patch, HIVE-8329.1.patch, HIVE-8329.1.patch
>
>
> Simple patch to enable postgresql as JDBC publisher for statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8329) Enable postgres for storing stats

2014-11-07 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202307#comment-14202307
 ] 

Damien Carol commented on HIVE-8329:


I don't had enough time to fix it for 0.14. Delayed to 0.15. Still a WIP

> Enable postgres for storing stats
> -
>
> Key: HIVE-8329
> URL: https://issues.apache.org/jira/browse/HIVE-8329
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.14.0
>Reporter: Damien Carol
>Assignee: Damien Carol
> Fix For: 0.15.0
>
> Attachments: HIVE-8329.1.patch, HIVE-8329.1.patch, HIVE-8329.1.patch
>
>
> Simple patch to enable postgresql as JDBC publisher for statistics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8779) Tez in-place progress UI can show wrong estimated time for sub-second queries

2014-11-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8779:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14

> Tez in-place progress UI can show wrong estimated time for sub-second queries
> -
>
> Key: HIVE-8779
> URL: https://issues.apache.org/jira/browse/HIVE-8779
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Trivial
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8779.1.patch
>
>
> The in-place progress update UI added as part of HIVE-8495 can show wrong 
> estimated time for AM only job which goes from INITED to SUCCEEDED DAG state 
> directly without going to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8778) ORC split elimination can cause NPE when column statistics is null

2014-11-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8778:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-0.14

> ORC split elimination can cause NPE when column statistics is null
> --
>
> Key: HIVE-8778
> URL: https://issues.apache.org/jira/browse/HIVE-8778
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Critical
> Fix For: 0.14.0, 0.15.0
>
> Attachments: HIVE-8778.1.patch
>
>
> Row group elimination has protection for NULL statistics values in 
> RecordReaderImpl.evaluatePredicate() which then calls 
> evaluatePredicateRange(). But split elimination directly calls 
> evaluatePredicateRange() without NULL protection. This can lead to 
> NullPointerException when a column is NULL in entire stripe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8710) Add more tests for transactional inserts

2014-11-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8710:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk.  Thank you Eugene for the review.

> Add more tests for transactional inserts
> 
>
> Key: HIVE-8710
> URL: https://issues.apache.org/jira/browse/HIVE-8710
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.15.0
>
> Attachments: HIVE-8710.2.patch, HIVE-8710.patch
>
>
> Test cases are needed for inserting the results of a join and reading from a 
> transactional table and inserting into a non-transactional table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8781) Nullsafe joins are busted on Tez

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202353#comment-14202353
 ] 

Hive QA commented on HIVE-8781:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680107/HIVE-8781.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s),  tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_group_by
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1689/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1689/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1689/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680107 - PreCommit-HIVE-TRUNK-Build

> Nullsafe joins are busted on Tez
> 
>
> Key: HIVE-8781
> URL: https://issues.apache.org/jira/browse/HIVE-8781
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8781.1.patch, HIVE-8781.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27627: Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-07 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27627/
---

(Updated Nov. 7, 2014, 6:07 p.m.)


Review request for hive.


Changes
---

Instead of using a Set, we should use a Map from a BaseWork w/ MJ to all its 
parent BaseWorks w/ HTSs. The principle is, we cannot process all BaseWorks 
below this MJ until all HTSs are processed.


Bugs: HIVE-8622
https://issues.apache.org/jira/browse/HIVE-8622


Repository: hive-git


Description
---

This is a sub-task of map-join for spark 
https://issues.apache.org/jira/browse/HIVE-7613
This can use the baseline patch for map-join
https://issues.apache.org/jira/browse/HIVE-8616


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java 66fd6b6 

Diff: https://reviews.apache.org/r/27627/diff/


Testing
---


Thanks,

Chao Sun



[jira] [Updated] (HIVE-8622) Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-07 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8622:
---
Attachment: HIVE-8622.3-spark.patch

Instead of using a Set, we should use a Map from a BaseWork w/ MJ to all its 
parent BaseWorks w/ HTSs. The principle is, we cannot process all BaseWorks 
below this MJ until all HTSs are processed.


> Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]
> 
>
> Key: HIVE-8622
> URL: https://issues.apache.org/jira/browse/HIVE-8622
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Suhas Satish
>Assignee: Chao
> Attachments: HIVE-8622.2-spark.patch, HIVE-8622.3-spark.patch, 
> HIVE-8622.patch
>
>
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8764) Windows: HiveServer2 SSL cannot recognize localhost

2014-11-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8764:
---
Summary: Windows: HiveServer2 SSL cannot recognize localhost  (was: 
HiveServer2 SSL cannot recognize localhost)

> Windows: HiveServer2 SSL cannot recognize localhost
> ---
>
> Key: HIVE-8764
> URL: https://issues.apache.org/jira/browse/HIVE-8764
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
>
> Seen on windows and HS2 running in binary mode (http mode works fine; so does 
> using dynamic service discovery). Previously jdbc clients could use 
> localhost:port to connect to the server, now they explicitly need to specify 
> hostname:port. With ZooKeeper indirection however, this is not an issue coz 
> uris on ZK are added in the hostname:port format anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8764) Windows: HiveServer2 SSL cannot recognize localhost

2014-11-07 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202381#comment-14202381
 ] 

Vaibhav Gumashta commented on HIVE-8764:


[~gopalv] Just edited the title to be more specific. On windows, TCP mode, with 
SSL sockets, HS2 is not binding to the wildcard IP (0.0.0.0). This is causing 
localhost:1 to not resolve to 0.0.0.0:1.

> Windows: HiveServer2 SSL cannot recognize localhost
> ---
>
> Key: HIVE-8764
> URL: https://issues.apache.org/jira/browse/HIVE-8764
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
>
> Seen on windows and HS2 running in binary mode (http mode works fine; so does 
> using dynamic service discovery). Previously jdbc clients could use 
> localhost:port to connect to the server, now they explicitly need to specify 
> hostname:port. With ZooKeeper indirection however, this is not an issue coz 
> uris on ZK are added in the hostname:port format anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8764) Windows: HiveServer2 TCP SSL cannot recognize localhost

2014-11-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8764:
---
Summary: Windows: HiveServer2 TCP SSL cannot recognize localhost   (was: 
Windows: HiveServer2 SSL cannot recognize localhost)

> Windows: HiveServer2 TCP SSL cannot recognize localhost 
> 
>
> Key: HIVE-8764
> URL: https://issues.apache.org/jira/browse/HIVE-8764
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
>
> Seen on windows and HS2 running in binary mode (http mode works fine; so does 
> using dynamic service discovery). Previously jdbc clients could use 
> localhost:port to connect to the server, now they explicitly need to specify 
> hostname:port. With ZooKeeper indirection however, this is not an issue coz 
> uris on ZK are added in the hostname:port format anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8758) Fix hadoop-1 build [Spark Branch]

2014-11-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202390#comment-14202390
 ] 

Jimmy Xiang commented on HIVE-8758:
---

Filed HIVE-8782 to track it for trunk.

> Fix hadoop-1 build [Spark Branch]
> -
>
> Key: HIVE-8758
> URL: https://issues.apache.org/jira/browse/HIVE-8758
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8758.1-spark.patch
>
>
> This may mean merging patches from trunk and fixing whatever problem specific 
> to Spark branch. Here are user reported problems:
> Problem 1:
> {code}
> Hive Serde . FAILURE [  2.357 s]
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-serde: Compilation failure: Compilation failure:
> [ERROR] 
> /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[27,24]
>  cannot find symbol
> [ERROR] symbol:   class Nullable
> [ERROR] location: package javax.annotation
> [ERROR] 
> /data/hive-spark/serde/src/java/org/apache/hadoop/hive/serde2/AbstractSerDe.java:[67,36]
>  cannot find symbol
> [ERROR] symbol:   class Nullable
> [ERROR] location: class org.apache.hadoop.hive.serde2.AbstractSerDe
> {code}
> My understanding: Looks the Nullable annotation was recently added in the 
> recent branch. Added the below dependency in the project hive-serde
> {code}
> 
> com.google.code.findbugs
> jsr305
> 3.0.0
> 
> {code}
> Problem 2:
> After adding the dependency for hive-serde, got the below compilation error
> {code}
> [INFO] Hive Query Language  FAILURE [01:35 
> min]
> /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/counter/SparkCounters.java:[35,39]
>  error: package org.apache.hadoop.mapreduce.util does not exist
> {code}
> In the dependency jar for hadoop-1 (hadoop-core-1.2.1.jar) - We do not have 
> the package “org.apache.hadoop.mapreduce.util” to circumvent it added the 
> below dependency where we had the package (not sure, it is right – I badly 
> wanted to make the build successful L)
> {code}
> 
> org.apache.hadoop
> hadoop-mapreduce-client-core
> 0.23.11
> 
>   
> {code}
> Problem 3:
> After making the above change, again failed in the same project @ file 
> /data/hive-spark/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java.
>  In the snippet below taken from the file, we can see the 
> “fileStatus.isFile()” is called which is not available in the 
> “org.apache.hadoop.fs.FileStatus” hadoop1 api.
> {code}
>  for (FileStatus fileStatus: fs.listStatus(folder)) {
>Path filePath = fileStatus.getPath();
> if (!fileStatus.isFile()) {
>   throw new HiveException("Error, not a file: " + filePath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8504) UT: fix bucket_num_reducers test

2014-11-07 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam reassigned HIVE-8504:
--

Assignee: Chinna Rao Lalam

> UT: fix bucket_num_reducers test
> 
>
> Key: HIVE-8504
> URL: https://issues.apache.org/jira/browse/HIVE-8504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Thomas Friedrich
>Assignee: Chinna Rao Lalam
>Priority: Minor
>
> The test bucket_num_reducers fails with a error:
> Exception: Number of MapReduce jobs is incorrect expected:<1> but was:<0>
> junit.framework.AssertionFailedError: Number of MapReduce jobs is incorrect 
> expected:<1> but was:<0>
> at 
> org.apache.hadoop.hive.ql.hooks.VerifyNumReducersHook.run(VerifyNumReducersHook.java:46)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6915) Hive Hbase queries fail on secure Tez cluster

2014-11-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202414#comment-14202414
 ] 

Jimmy Xiang commented on HIVE-6915:
---

[~szehon], I filed HIVE-8782 for the hadoop-1 compilation issue.

> Hive Hbase queries fail on secure Tez cluster
> -
>
> Key: HIVE-6915
> URL: https://issues.apache.org/jira/browse/HIVE-6915
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 0.13.0
> Environment: Kerberos secure Tez cluster
>Reporter: Deepesh Khandelwal
>Assignee: Siddharth Seth
> Fix For: 0.14.0
>
> Attachments: HIVE-6915.03.patch, HIVE-6915.1.patch, HIVE-6915.2.patch
>
>
> Hive queries reading and writing to HBase are currently failing with the 
> following exception in a secure Tez cluster:
> {noformat}
> 2014-04-14 13:47:05,644 FATAL [InputInitializer [Map 1] #0] 
> org.apache.hadoop.ipc.RpcClient: SASL authentication failed. The most likely 
> cause is missing or invalid credentials. Consider 'kinit'.
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>   at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:152)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:792)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:918)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:915)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:915)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1065)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.tracedWriteRequest(RpcClient.java:1032)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1474)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1684)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1737)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:29288)
>   at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1562)
>   at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:87)
>   at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:84)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:121)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:97)
>   at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:90)
>   at 
> org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:67)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512)
>   at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:60)
>   at 
> org.apache.hadoop.hbase.security.token.TokenUtil$3.run(TokenUtil.java:174)
>   at 
> org.apache.hadoop.hbase.security.token.TokenUtil$3.run(TokenUtil.java:172)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>   at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainTokenForJob(TokenUtil.java:171)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at org.apache.hadoop.hbase.util.Methods.call(Methods.java:39)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.obtainAuthTokenForJob(User.java:334)
>   at 
> org.apache.hadoop.h

[jira] [Commented] (HIVE-8099) IN operator for partition column fails when the partition column type is DATE

2014-11-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202426#comment-14202426
 ] 

Sergey Shelukhin commented on HIVE-8099:


looks good to me; +1 if [~ashutoshc] has no reservations

> IN operator for partition column fails when the partition column type is DATE
> -
>
> Key: HIVE-8099
> URL: https://issues.apache.org/jira/browse/HIVE-8099
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 0.14.0
>
> Attachments: HIVE-8099-ppr-fix.patch, HIVE-8099.1.patch, 
> HIVE-8099.2.patch, HIVE-8099.3.patch, HIVE-8099.4.patch
>
>
> Test table DLL:
> {code}
> CREATE TABLE testTbl(col1 string) PARTITIONED BY (date_prt date);
> {code}
> Following query used to work fine in Hive 0.12 as the constant types are 
> 'string' and partition column type is considered as 'string' throughout the 
> planning and optimization (including partition pruning).
> {code}
> SELECT * FROM testTbl WHERE date_prt IN ('2014-08-09', '2014-08-08'); 
> {code}
> In trunk the above query fails with:
> {code}
> Line 1:33 Wrong arguments ''2014-08-08'': The arguments for IN should be the 
> same type! Types are: {date IN (string, string)}
> {code}
> HIVE-6642 changed the SemanticAnalyzer.java to consider partition type given 
> in table definition instead of hardcoded 'string' type. (Modified [Hive 0.12 
> code|https://github.com/apache/hive/blob/branch-0.12/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L7778]).
>  So changed the query as follows to go past the above error:
> {code}
> SELECT * FROM testTbl WHERE date_prt IN (CAST('2014-08-09' AS DATE), 
> CAST('2014-08-08' AS DATE)); 
> {code}
> Now query goes past the error in SemanticAnalyzer, but hits the same issue 
> (default 'string' type for partition columns) in partition pruning 
> optimization. (Realted code 
> [here|https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java#L110]).
>  
> {code}
> 14/09/14 20:07:20 ERROR ql.Driver: FAILED: SemanticException 
> MetaException(message:The arguments for IN should be the same type! Types 
> are: {string IN (date, date)})
> {code}
> We need to change partition pruning code to consider the partition column as 
> the type given in table definition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8784) Querying partition does not work with JDO enabled against PostgreSQL

2014-11-07 Thread Chaoyu Tang (JIRA)
Chaoyu Tang created HIVE-8784:
-

 Summary: Querying partition does not work with JDO enabled against 
PostgreSQL
 Key: HIVE-8784
 URL: https://issues.apache.org/jira/browse/HIVE-8784
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.15.0
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
 Fix For: 0.15.0


Querying a partition in PostgreSQL fails when using JDO (with 
hive.metastore.try.direct.sql=false) . Following is the reproduce example:
{code}
create table partition_test_multilevel (key string, value string) partitioned 
by (level1 string, level2 string, level3 string);

insert overwrite table partition_test_multilevel partition(level1='', 
level2='111', level3='11') select key, value from srcpart tablesample (11 rows);
insert overwrite table partition_test_multilevel partition(level1='', 
level2='222', level3='11') select key, value from srcpart tablesample (15 rows);
insert overwrite table partition_test_multilevel partition(level1='', 
level2='333', level3='11') select key, value from srcpart tablesample (20 rows);

select level1, level2, level3, count(*) from partition_test_multilevel where 
level2 <= '222' group by level1, level2, level3;
{code}
The query fails with following error:
{code}
  Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
MetaException(message:Invocation of method "substring" on "StringExpression" 
requires argument 1 of type "NumericExpression")
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:392)
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:215)
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:139)
at 
org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:619)
at 
org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:110)
... 21 more
{code}

It is because the JDO pushdown filter generated for a query having 
inequality/between partition predicate uses DN indexOf function which is not 
working properly with postgresql (see 
http://www.datanucleus.org/servlet/jira/browse/NUCRDBMS-840) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8784) Querying partition does not work with JDO enabled against PostgreSQL

2014-11-07 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-8784:
--
Attachment: HIVE-8784.patch

Hive (see generateJDOFilterOverPartitions in 
org.apache.hadoop.hive.metastore.parser.ExpressTree) is currently using some DN 
string functions (substring, indexOf) to get the value for a partitionKey. 
Actually there is a straightforward way (as implemented in this patch) to get 
it which in addition avoids the DN indexOf issue with postgresql.

The test cases provided in this patch have been tested against various DB 
including Derby, Mysql and PostgreSQL etc.

> Querying partition does not work with JDO enabled against PostgreSQL
> 
>
> Key: HIVE-8784
> URL: https://issues.apache.org/jira/browse/HIVE-8784
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.15.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 0.15.0
>
> Attachments: HIVE-8784.patch
>
>
> Querying a partition in PostgreSQL fails when using JDO (with 
> hive.metastore.try.direct.sql=false) . Following is the reproduce example:
> {code}
> create table partition_test_multilevel (key string, value string) partitioned 
> by (level1 string, level2 string, level3 string);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='111', level3='11') select key, value from srcpart tablesample (11 
> rows);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='222', level3='11') select key, value from srcpart tablesample (15 
> rows);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='333', level3='11') select key, value from srcpart tablesample (20 
> rows);
> select level1, level2, level3, count(*) from partition_test_multilevel where 
> level2 <= '222' group by level1, level2, level3;
> {code}
> The query fails with following error:
> {code}
>   Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
> MetaException(message:Invocation of method "substring" on "StringExpression" 
> requires argument 1 of type "NumericExpression")
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:392)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:215)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:139)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:619)
>   at 
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:110)
>   ... 21 more
> {code}
> It is because the JDO pushdown filter generated for a query having 
> inequality/between partition predicate uses DN indexOf function which is not 
> working properly with postgresql (see 
> http://www.datanucleus.org/servlet/jira/browse/NUCRDBMS-840) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8556) introduce overflow control and sanity check to BytesBytesMapJoin

2014-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8556:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

committed to trunk

> introduce overflow control and sanity check to BytesBytesMapJoin
> 
>
> Key: HIVE-8556
> URL: https://issues.apache.org/jira/browse/HIVE-8556
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Fix For: 0.15.0
>
> Attachments: HIVE-8556.patch
>
>
> When stats are incorrect, negative or very large number can be passed to the 
> map



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8784) Querying partition does not work with JDO enabled against PostgreSQL

2014-11-07 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-8784:
--
Status: Patch Available  (was: Open)

> Querying partition does not work with JDO enabled against PostgreSQL
> 
>
> Key: HIVE-8784
> URL: https://issues.apache.org/jira/browse/HIVE-8784
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.15.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 0.15.0
>
> Attachments: HIVE-8784.patch
>
>
> Querying a partition in PostgreSQL fails when using JDO (with 
> hive.metastore.try.direct.sql=false) . Following is the reproduce example:
> {code}
> create table partition_test_multilevel (key string, value string) partitioned 
> by (level1 string, level2 string, level3 string);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='111', level3='11') select key, value from srcpart tablesample (11 
> rows);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='222', level3='11') select key, value from srcpart tablesample (15 
> rows);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='333', level3='11') select key, value from srcpart tablesample (20 
> rows);
> select level1, level2, level3, count(*) from partition_test_multilevel where 
> level2 <= '222' group by level1, level2, level3;
> {code}
> The query fails with following error:
> {code}
>   Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
> MetaException(message:Invocation of method "substring" on "StringExpression" 
> requires argument 1 of type "NumericExpression")
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:392)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:215)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:139)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:619)
>   at 
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:110)
>   ... 21 more
> {code}
> It is because the JDO pushdown filter generated for a query having 
> inequality/between partition predicate uses DN indexOf function which is not 
> working properly with postgresql (see 
> http://www.datanucleus.org/servlet/jira/browse/NUCRDBMS-840) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6915) Hive Hbase queries fail on secure Tez cluster

2014-11-07 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202449#comment-14202449
 ] 

Szehon Ho commented on HIVE-6915:
-

Thanks a lot Jimmy, really appreciate someone taking a look.

> Hive Hbase queries fail on secure Tez cluster
> -
>
> Key: HIVE-6915
> URL: https://issues.apache.org/jira/browse/HIVE-6915
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 0.13.0
> Environment: Kerberos secure Tez cluster
>Reporter: Deepesh Khandelwal
>Assignee: Siddharth Seth
> Fix For: 0.14.0
>
> Attachments: HIVE-6915.03.patch, HIVE-6915.1.patch, HIVE-6915.2.patch
>
>
> Hive queries reading and writing to HBase are currently failing with the 
> following exception in a secure Tez cluster:
> {noformat}
> 2014-04-14 13:47:05,644 FATAL [InputInitializer [Map 1] #0] 
> org.apache.hadoop.ipc.RpcClient: SASL authentication failed. The most likely 
> cause is missing or invalid credentials. Consider 'kinit'.
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>   at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:152)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:792)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:918)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:915)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:915)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1065)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.tracedWriteRequest(RpcClient.java:1032)
>   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1474)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1684)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1737)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:29288)
>   at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1562)
>   at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:87)
>   at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:84)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:121)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:97)
>   at 
> org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.callExecService(RegionCoprocessorRpcChannel.java:90)
>   at 
> org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callBlockingMethod(CoprocessorRpcChannel.java:67)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512)
>   at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:60)
>   at 
> org.apache.hadoop.hbase.security.token.TokenUtil$3.run(TokenUtil.java:174)
>   at 
> org.apache.hadoop.hbase.security.token.TokenUtil$3.run(TokenUtil.java:172)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>   at 
> org.apache.hadoop.hbase.security.token.TokenUtil.obtainTokenForJob(TokenUtil.java:171)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at org.apache.hadoop.hbase.util.Methods.call(Methods.java:39)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.obtainAuthTokenForJob(User.java:334)
>   at 
> org.apache.hadoop.hbase.map

[jira] [Updated] (HIVE-8782) HBase handler doesn't compile with hadoop-1

2014-11-07 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8782:

Priority: Blocker  (was: Major)

Should be a blocker for 0.14, as it cannot run on hadoop-1.

> HBase handler doesn't compile with hadoop-1
> ---
>
> Key: HIVE-8782
> URL: https://issues.apache.org/jira/browse/HIVE-8782
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Priority: Blocker
>
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-hbase-handler: Compilation failure
> [ERROR] 
> /home/jxiang/git-repos/apache/tmp/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java:[482,31]
>  cannot find symbol
> [ERROR] symbol:   method mergeAll(org.apache.hadoop.security.Credentials)
> [ERROR] location: class org.apache.hadoop.security.Credentials
> [ERROR] -> [Help 1]
> [ERROR] 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 27737: HIVE-8784:Querying partition does not work with JDO enabled against PostgreSQL

2014-11-07 Thread Chaoyu Tang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27737/
---

Review request for hive.


Repository: hive-git


Description
---

Querying a partition in PostgreSQL fails when using JDO (with 
hive.metastore.try.direct.sql=false). It is because the JDO pushdown filter 
generated for a query having inequality/between partition predicate uses DN 
indexOf function which is not working properly with postgresql. Hive (see 
generateJDOFilterOverPartitions in 
org.apache.hadoop.hive.metastore.parser.ExpressTree) uses some DN string 
functions (substring, indexOf) to get the value for a partitionKey. Actually 
there is a straightforward way (as implemented in this patch) to get it which 
in addition avoids the DN indexOf issue with postgresql.


Diffs
-

  
metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 
b8d1afc57642d9b07cb6b3b48c4ac9bcf6c76704 
  ql/src/test/queries/clientpositive/partition_multilevels.q PRE-CREATION 
  ql/src/test/results/clientpositive/partition_multilevels.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27737/diff/


Testing
---

The test cases provided in this patch have been tested against various DB 
including Derby, Mysql and PostgreSQL etc.


Thanks,

Chaoyu Tang



[jira] [Commented] (HIVE-8784) Querying partition does not work with JDO enabled against PostgreSQL

2014-11-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202466#comment-14202466
 ] 

Chaoyu Tang commented on HIVE-8784:
---

Patch is posted on RB and requesting for the review, see 
https://reviews.apache.org/r/27737/ , thanks.

> Querying partition does not work with JDO enabled against PostgreSQL
> 
>
> Key: HIVE-8784
> URL: https://issues.apache.org/jira/browse/HIVE-8784
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.15.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 0.15.0
>
> Attachments: HIVE-8784.patch
>
>
> Querying a partition in PostgreSQL fails when using JDO (with 
> hive.metastore.try.direct.sql=false) . Following is the reproduce example:
> {code}
> create table partition_test_multilevel (key string, value string) partitioned 
> by (level1 string, level2 string, level3 string);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='111', level3='11') select key, value from srcpart tablesample (11 
> rows);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='222', level3='11') select key, value from srcpart tablesample (15 
> rows);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='333', level3='11') select key, value from srcpart tablesample (20 
> rows);
> select level1, level2, level3, count(*) from partition_test_multilevel where 
> level2 <= '222' group by level1, level2, level3;
> {code}
> The query fails with following error:
> {code}
>   Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
> MetaException(message:Invocation of method "substring" on "StringExpression" 
> requires argument 1 of type "NumericExpression")
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:392)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:215)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:139)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:619)
>   at 
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:110)
>   ... 21 more
> {code}
> It is because the JDO pushdown filter generated for a query having 
> inequality/between partition predicate uses DN indexOf function which is not 
> working properly with postgresql (see 
> http://www.datanucleus.org/servlet/jira/browse/NUCRDBMS-840) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8762) HiveMetaStore.BooleanPointer should be replaced with an AtomicBoolean

2014-11-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8762:
-
Status: Patch Available  (was: Open)

> HiveMetaStore.BooleanPointer should be replaced with an AtomicBoolean
> -
>
> Key: HIVE-8762
> URL: https://issues.apache.org/jira/browse/HIVE-8762
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-8762.patch
>
>
> AtomicBoolean will serve the same purpose, with the added bonus that it will 
> perform correctly if two threads try to write to it simultaneously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8762) HiveMetaStore.BooleanPointer should be replaced with an AtomicBoolean

2014-11-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8762:
-
Attachment: HIVE-8762.patch

> HiveMetaStore.BooleanPointer should be replaced with an AtomicBoolean
> -
>
> Key: HIVE-8762
> URL: https://issues.apache.org/jira/browse/HIVE-8762
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-8762.patch
>
>
> AtomicBoolean will serve the same purpose, with the added bonus that it will 
> perform correctly if two threads try to write to it simultaneously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8772) zookeeper info logs are always printed from beeline with service discovery mode

2014-11-07 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202480#comment-14202480
 ] 

Vaibhav Gumashta commented on HIVE-8772:


+1

> zookeeper info logs are always printed from beeline with service discovery 
> mode
> ---
>
> Key: HIVE-8772
> URL: https://issues.apache.org/jira/browse/HIVE-8772
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 0.14.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.14.0
>
> Attachments: HIVE-8772.1.patch
>
>
> Log messages like following are being printed by zookeeper for beeline 
> commands, and there is no way to suppress that using beeline commandline 
> options (--silent or --verbose).
> {noformat}
> 14/11/04 16:05:47 INFO zookeeper.ZooKeeper: Client 
> environment:java.vendor=Oracle Corporation
> 14/11/04 16:05:47 INFO zookeeper.ZooKeeper: Client 
> environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.71.x86_64/jre
> 14/11/04 16:05:47 INFO zookeeper.ZooKeeper: Client 
> environment:java.class.path=/usr/hdp/2.2.0.0-1756/hadoop/conf:/usr/hdp/2.2.0.0-1756/hadoop/lib/ranger-plugins-cred-0.4.0.2.2.0.0-1756.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jetty-util-6.1.26.hwx
> .jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/ranger-hdfs-plugin-0.4.0.2.2.0.0-1756.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/commons-math3-3.1.1.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/mysql-connector-java.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/mockito-all-1.8
> .5.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/curator-framework-2.6.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/guava-11.0.2.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/java-xmlbuilder-0.4.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/ranger-plugins-audit-0.4.0.2.2.0.0-
> 1756.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/curator-client-2.6.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/commons-httpclient-3.1.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/junit-4.11.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jersey-core-1.9.jar:/usr/hdp/2.2.0.
> 0-1756/hadoop/lib/jersey-json-1.9.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jersey-server-1.9.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/api-asn1-api-1.0.0-M20.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/asm-3.2.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/protobuf-jav
> a-2.5.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/eclipselink-2.5.2-M1.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/curator-recipes-2.6.0.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jettison-1.1.jar:/us
> r/hdp/2.2.0.0-1756/hadoop/lib/commons-digester-1.8.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/htrace-core-3.0.4.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/commons-compress-1.4.1.jar:/usr/hdp/2.2.0.0-1756/hadoop/lib/jackson-jaxrs-1.9.13.jar:/usr/hdp/2.2.0.
> 0-1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8622) Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-07 Thread Suhas Satish (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202496#comment-14202496
 ] 

Suhas Satish commented on HIVE-8622:


[~csun] - We already have a mapr of  BaseWork containing the map-join to its 
parent ReduceSinks. 
This exists as {{linkWorkWithReduceSinkMap}} in {{GenSparkProcContext}}

Do you think we can leverage that in some way, or replace the RSs in that Map 
with the HashTableSinks that we introduced? It looks like we should still 
propagate the whole GenSparkProcContext to the {{SparkMapJoinResolver}} through 
the SparkCompiler.generateTaskTree(...) and {{SparkCompiler.optimizeTaskPlan}}  

All the state information stored there will make life a lot easier. 

> Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]
> 
>
> Key: HIVE-8622
> URL: https://issues.apache.org/jira/browse/HIVE-8622
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Suhas Satish
>Assignee: Chao
> Attachments: HIVE-8622.2-spark.patch, HIVE-8622.3-spark.patch, 
> HIVE-8622.patch
>
>
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8622) Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]

2014-11-07 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202515#comment-14202515
 ] 

Chao commented on HIVE-8622:


[~suhassatish] Yes, don't have to recompute the map would be nice. But I think 
it wouldn't affect much to compute it in {{SparkMapJoinResolver}}, given that 
we are dealing with BaseWorks in the class, and usually they won't be many of 
them.

> Split map-join plan into 2 SparkTasks in 3 stages [Spark Branch]
> 
>
> Key: HIVE-8622
> URL: https://issues.apache.org/jira/browse/HIVE-8622
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Suhas Satish
>Assignee: Chao
> Attachments: HIVE-8622.2-spark.patch, HIVE-8622.3-spark.patch, 
> HIVE-8622.patch
>
>
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8775) Merge from trunk 11/6/14 [SPARK BRANCH]

2014-11-07 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8775:
---
Attachment: HIVE-8775.2-spark.patch

I fixed the tests locally and committed the change. Just want to run tests one 
more time so I am using a dummy patch.

> Merge from trunk 11/6/14 [SPARK BRANCH]
> ---
>
> Key: HIVE-8775
> URL: https://issues.apache.org/jira/browse/HIVE-8775
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-8775.1-spark.patch, HIVE-8775.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8760) Pass a copy of HiveConf to hooks

2014-11-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8760:
-
Assignee: Gunther Hagleitner  (was: Ashutosh Chauhan)

> Pass a copy of HiveConf to hooks
> 
>
> Key: HIVE-8760
> URL: https://issues.apache.org/jira/browse/HIVE-8760
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Gunther Hagleitner
> Attachments: HIVE-8760.patch
>
>
> because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8334) Delete statement supports multi-table syntax

2014-11-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8334:
-
Assignee: (was: Alan Gates)

> Delete statement supports multi-table syntax
> 
>
> Key: HIVE-8334
> URL: https://issues.apache.org/jira/browse/HIVE-8334
> Project: Hive
>  Issue Type: Improvement
>  Components: SQL
>Reporter: John Scheibmeir
>Priority: Minor
>
> Desire to support syntax such as:
> delete t1
> from my_database.my_table1 t1
> ,my_database.my_table2 t2
> where t1.col1 = t2.col1
> and t1.col2 between t2.col2 and t2.col3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8333) Update statement supports multi-table synatx

2014-11-07 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8333:
-
Assignee: (was: Alan Gates)

> Update statement supports multi-table synatx
> 
>
> Key: HIVE-8333
> URL: https://issues.apache.org/jira/browse/HIVE-8333
> Project: Hive
>  Issue Type: Improvement
>  Components: SQL
>Reporter: John Scheibmeir
>Priority: Minor
>
> Desire to support syntax such as:
> update t1 
> from my_database.my_table1 t1
> ,my_database.my_table2 t2
> set t1_col1 = t2.t2_col1
> where t1.t1_col2 = t2.t2_col2
> and t1.t1_col3 < t2.t2_col3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-07 Thread Na Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-8756:
--
Attachment: HIVE-8756.2-spark.patch

> numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
> ---
>
> Key: HIVE-8756
> URL: https://issues.apache.org/jira/browse/HIVE-8756
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8756.1-spark.patch, HIVE-8756.2-spark.patch
>
>
> Run the following hive queries
> {noformat}
> set datanucleus.cache.collections=false;
> set hive.stats.autogather=true;
> set hive.merge.mapfiles=false;
> set hive.merge.mapredfiles=false;
> set hive.map.aggr=true;
> create table tmptable(key string, value string);
> INSERT OVERWRITE TABLE tmptable
> SELECT unionsrc.key, unionsrc.value 
> FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
>   UNION  ALL  
>   SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
> DESCRIBE FORMATTED tmptable;
> {noformat}
> The hive on spark prints the following table parameters:
> {noformat}
> COLUMN_STATS_ACCURATE true
>   numFiles2   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   225
> {noformat}
> The hive on mr prints the following table parameters:
> {noformat}
> able Parameters:   
>   COLUMN_STATS_ACCURATE   true
>   numFiles2   
>   numRows 26  
>   rawDataSize 199 
>   totalSize   225 
> {noformat}
> As above we can see the numRows and rawDataSize are not collected by hive on 
> spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8784) Querying partition does not work with JDO enabled against PostgreSQL

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202683#comment-14202683
 ] 

Hive QA commented on HIVE-8784:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680221/HIVE-8784.patch

{color:green}SUCCESS:{color} +1 6670 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1690/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1690/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1690/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680221 - PreCommit-HIVE-TRUNK-Build

> Querying partition does not work with JDO enabled against PostgreSQL
> 
>
> Key: HIVE-8784
> URL: https://issues.apache.org/jira/browse/HIVE-8784
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.15.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 0.15.0
>
> Attachments: HIVE-8784.patch
>
>
> Querying a partition in PostgreSQL fails when using JDO (with 
> hive.metastore.try.direct.sql=false) . Following is the reproduce example:
> {code}
> create table partition_test_multilevel (key string, value string) partitioned 
> by (level1 string, level2 string, level3 string);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='111', level3='11') select key, value from srcpart tablesample (11 
> rows);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='222', level3='11') select key, value from srcpart tablesample (15 
> rows);
> insert overwrite table partition_test_multilevel partition(level1='', 
> level2='333', level3='11') select key, value from srcpart tablesample (20 
> rows);
> select level1, level2, level3, count(*) from partition_test_multilevel where 
> level2 <= '222' group by level1, level2, level3;
> {code}
> The query fails with following error:
> {code}
>   Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 
> MetaException(message:Invocation of method "substring" on "StringExpression" 
> requires argument 1 of type "NumericExpression")
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.getPartitionsFromServer(PartitionPruner.java:392)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:215)
>   at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:139)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:619)
>   at 
> org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:110)
>   ... 21 more
> {code}
> It is because the JDO pushdown filter generated for a query having 
> inequality/between partition predicate uses DN indexOf function which is not 
> working properly with postgresql (see 
> http://www.datanucleus.org/servlet/jira/browse/NUCRDBMS-840) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-07 Thread Na Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-8756:
--
Attachment: (was: HIVE-8756.2-spark.patch)

> numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
> ---
>
> Key: HIVE-8756
> URL: https://issues.apache.org/jira/browse/HIVE-8756
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8756.1-spark.patch, HIVE-8756.2-spark.patch
>
>
> Run the following hive queries
> {noformat}
> set datanucleus.cache.collections=false;
> set hive.stats.autogather=true;
> set hive.merge.mapfiles=false;
> set hive.merge.mapredfiles=false;
> set hive.map.aggr=true;
> create table tmptable(key string, value string);
> INSERT OVERWRITE TABLE tmptable
> SELECT unionsrc.key, unionsrc.value 
> FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
>   UNION  ALL  
>   SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
> DESCRIBE FORMATTED tmptable;
> {noformat}
> The hive on spark prints the following table parameters:
> {noformat}
> COLUMN_STATS_ACCURATE true
>   numFiles2   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   225
> {noformat}
> The hive on mr prints the following table parameters:
> {noformat}
> able Parameters:   
>   COLUMN_STATS_ACCURATE   true
>   numFiles2   
>   numRows 26  
>   rawDataSize 199 
>   totalSize   225 
> {noformat}
> As above we can see the numRows and rawDataSize are not collected by hive on 
> spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8756) numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-07 Thread Na Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-8756:
--
Attachment: HIVE-8756.2-spark.patch

> numRows and rawDataSize are not collected by the Spark stats [Spark Branch]
> ---
>
> Key: HIVE-8756
> URL: https://issues.apache.org/jira/browse/HIVE-8756
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
> Attachments: HIVE-8756.1-spark.patch, HIVE-8756.2-spark.patch
>
>
> Run the following hive queries
> {noformat}
> set datanucleus.cache.collections=false;
> set hive.stats.autogather=true;
> set hive.merge.mapfiles=false;
> set hive.merge.mapredfiles=false;
> set hive.map.aggr=true;
> create table tmptable(key string, value string);
> INSERT OVERWRITE TABLE tmptable
> SELECT unionsrc.key, unionsrc.value 
> FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
>   UNION  ALL  
>   SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
> DESCRIBE FORMATTED tmptable;
> {noformat}
> The hive on spark prints the following table parameters:
> {noformat}
> COLUMN_STATS_ACCURATE true
>   numFiles2   
>   numRows 0   
>   rawDataSize 0   
>   totalSize   225
> {noformat}
> The hive on mr prints the following table parameters:
> {noformat}
> able Parameters:   
>   COLUMN_STATS_ACCURATE   true
>   numFiles2   
>   numRows 26  
>   rawDataSize 199 
>   totalSize   225 
> {noformat}
> As above we can see the numRows and rawDataSize are not collected by hive on 
> spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8760) Pass a copy of HiveConf to hooks

2014-11-07 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8760:
-
Attachment: HIVE-8760.2.patch

I'd like to propose a different change with smaller scope to get at least into 
hive .14. I think that'd be safer for 2 reasons:

- It only changes ATSHook, anyone else will not see a change to their hooks
- It's faster if you run w/o ATSHook, because the clone only happens there. 
Cloning is fairly expensive.
- It's a workaround for a hadoop issue. We should still try to get the hadoop 
problem fixed and take it back out for perf reasons. That limits the changes to 
only ATSHook again

Note: ExecHooks and hadoop conf have been around unchanged for quite a while so 
leaving the code path alone as much as possible for .14 seems a safe choice.

> Pass a copy of HiveConf to hooks
> 
>
> Key: HIVE-8760
> URL: https://issues.apache.org/jira/browse/HIVE-8760
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Gunther Hagleitner
> Attachments: HIVE-8760.2.patch, HIVE-8760.patch
>
>
> because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-07 Thread Na Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/
---

(Updated Nov. 7, 2014, 9:16 p.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Changes
---

1. removed whilespace characters
2. handle operators which have multiple children
3. update stats config info for all cloned FileSinkOperators


Bugs: Hive-8756
https://issues.apache.org/jira/browse/Hive-8756


Repository: hive-git


Description
---

numRows and rawDataSize are not collected by the Spark stats. That is caused by 
the FileSinkOperator in the ReduceWork is not set the stats config. In the 
GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new 
FileSinkOperator is generated and set to the reduce work. However, during 
processFileSink, the original FileSinkOperator is set the collectStats tag in 
GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in the 
ReduceWork.  


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 79a0132 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
8290568 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java e8e18a7 
  ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 8d237c5 
  ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 4946815 
  ql/src/test/results/clientpositive/spark/semijoin.q.out 9b6802d 
  ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/27719/diff/


Testing
---


Thanks,

Na Yang



[jira] [Updated] (HIVE-8395) CBO: enable by default

2014-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8395:
---
Attachment: HIVE-8395.23.patch

temporarily include HIVE-8768

> CBO: enable by default
> --
>
> Key: HIVE-8395
> URL: https://issues.apache.org/jira/browse/HIVE-8395
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.15.0
>
> Attachments: HIVE-8395.01.patch, HIVE-8395.02.patch, 
> HIVE-8395.03.patch, HIVE-8395.04.patch, HIVE-8395.05.patch, 
> HIVE-8395.06.patch, HIVE-8395.07.patch, HIVE-8395.08.patch, 
> HIVE-8395.09.patch, HIVE-8395.10.patch, HIVE-8395.11.patch, 
> HIVE-8395.12.patch, HIVE-8395.12.patch, HIVE-8395.13.patch, 
> HIVE-8395.13.patch, HIVE-8395.14.patch, HIVE-8395.15.patch, 
> HIVE-8395.16.patch, HIVE-8395.17.patch, HIVE-8395.18.patch, 
> HIVE-8395.18.patch, HIVE-8395.19.patch, HIVE-8395.20.patch, 
> HIVE-8395.21.patch, HIVE-8395.22.patch, HIVE-8395.23.patch, HIVE-8395.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8760) Pass a copy of HiveConf to hooks

2014-11-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202720#comment-14202720
 ] 

Gopal V commented on HIVE-8760:
---

bq. It's faster if you run w/o ATSHook, because the clone only happens there. 
Cloning is fairly expensive.

+1 - this is a more contained fix for 0.14. This does not change any existing 
behaviour or API contracts.

This will not break outside of ATSHook, which we can disable in case it shows 
issues.

Copying always will be slow, but it runs the risk of changing the API contract 
strictness too late for 0.14. Badly written hooks might fail, if they use the 
common conf to pass variables between different hooks.

For a cleaner solution long-term (0.15+), I think we need a {{new 
HiveConf(conf).asReadonly()}}, to enforce the API contract for such badly 
behaving hooks.

> Pass a copy of HiveConf to hooks
> 
>
> Key: HIVE-8760
> URL: https://issues.apache.org/jira/browse/HIVE-8760
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Gunther Hagleitner
> Attachments: HIVE-8760.2.patch, HIVE-8760.patch
>
>
> because hadoop's {{Configuration}} is not thread-safe



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8621) Dump small table join data for map-join [Spark Branch]

2014-11-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8621:
--
Attachment: HIVE-8621.2-spark.patch

> Dump small table join data for map-join [Spark Branch]
> --
>
> Key: HIVE-8621
> URL: https://issues.apache.org/jira/browse/HIVE-8621
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Suhas Satish
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8621.1-spark.patch, HIVE-8621.2-spark.patch
>
>
> This jira aims to re-use a slightly modified approach of map-reduce 
> distributed cache in spark to dump map-joined small tables as hash tables 
> onto spark DFS cluster. 
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616
> The original thought process was to use broadcast variable concept in spark, 
> for the small tables. 
> The number of broadcast variables that must be created is m x n where
> 'm' is  the number of small tables in the (m+1) way join and n is the number 
> of buckets of tables. If unbucketed, n=1
> But it was discovered that objects compressed with kryo serialization on 
> disk, can occupy 20X or more when deserialized in-memory. For bucket join, 
> the spark Driver has to hold all the buckets (for bucketed tables) in-memory 
> (to provide for fault-tolerance against Executor failures) although the 
> executors only need individual buckets in their memory. So the broadcast 
> variable approach may not be the right approach. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 27745: HIVE-8621 Dump small table join data for map-join [Spark Branch]

2014-11-07 Thread Jimmy Xiang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27745/
---

Review request for hive and Xuefu Zhang.


Bugs: HIVE-8621
https://issues.apache.org/jira/browse/HIVE-8621


Repository: hive-git


Description
---

In case spark, HashTableSinkOperator should dump files to a folder expected by 
HashTableLoader.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java f0e04e7 

Diff: https://reviews.apache.org/r/27745/diff/


Testing
---


Thanks,

Jimmy Xiang



[jira] [Commented] (HIVE-8621) Dump small table join data for map-join [Spark Branch]

2014-11-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202727#comment-14202727
 ] 

Jimmy Xiang commented on HIVE-8621:
---

Attached v2 which is on RB: https://reviews.apache.org/r/27745/

> Dump small table join data for map-join [Spark Branch]
> --
>
> Key: HIVE-8621
> URL: https://issues.apache.org/jira/browse/HIVE-8621
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Suhas Satish
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-8621.1-spark.patch, HIVE-8621.2-spark.patch
>
>
> This jira aims to re-use a slightly modified approach of map-reduce 
> distributed cache in spark to dump map-joined small tables as hash tables 
> onto spark DFS cluster. 
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616
> The original thought process was to use broadcast variable concept in spark, 
> for the small tables. 
> The number of broadcast variables that must be created is m x n where
> 'm' is  the number of small tables in the (m+1) way join and n is the number 
> of buckets of tables. If unbucketed, n=1
> But it was discovered that objects compressed with kryo serialization on 
> disk, can occupy 20X or more when deserialized in-memory. For bucket join, 
> the spark Driver has to hold all the buckets (for bucketed tables) in-memory 
> (to provide for fault-tolerance against Executor failures) although the 
> executors only need individual buckets in their memory. So the broadcast 
> variable approach may not be the right approach. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-07 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/#review60385
---



ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java


One thing I'm not clear is that why cloning the operator tree doesn't clone 
the missed stats flags. From Utilities.cloneOperatorTree(), it seems it should.


- Xuefu Zhang


On Nov. 7, 2014, 9:16 p.m., Na Yang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27719/
> ---
> 
> (Updated Nov. 7, 2014, 9:16 p.m.)
> 
> 
> Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: Hive-8756
> https://issues.apache.org/jira/browse/Hive-8756
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> numRows and rawDataSize are not collected by the Spark stats. That is caused 
> by the FileSinkOperator in the ReduceWork is not set the stats config. In the 
> GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new 
> FileSinkOperator is generated and set to the reduce work. However, during 
> processFileSink, the original FileSinkOperator is set the collectStats tag in 
> GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in 
> the ReduceWork.  
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 79a0132 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
> 8290568 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> e8e18a7 
>   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 8d237c5 
>   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
> 4946815 
>   ql/src/test/results/clientpositive/spark/semijoin.q.out 9b6802d 
>   ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27719/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Na Yang
> 
>



[jira] [Commented] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-07 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202752#comment-14202752
 ] 

Gunther Hagleitner commented on HIVE-8745:
--

Not sure it's needed, because this is reverting, but: +1 for trunk and hive 0.14

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8745.1.patch, HIVE-8745.2.patch, HIVE-8745.3.patch, 
> join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8782) HBase handler doesn't compile with hadoop-1

2014-11-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-8782:
-

Assignee: Jimmy Xiang

> HBase handler doesn't compile with hadoop-1
> ---
>
> Key: HIVE-8782
> URL: https://issues.apache.org/jira/browse/HIVE-8782
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Blocker
>
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-hbase-handler: Compilation failure
> [ERROR] 
> /home/jxiang/git-repos/apache/tmp/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java:[482,31]
>  cannot find symbol
> [ERROR] symbol:   method mergeAll(org.apache.hadoop.security.Credentials)
> [ERROR] location: class org.apache.hadoop.security.Credentials
> [ERROR] -> [Help 1]
> [ERROR] 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-07 Thread Na Yang


> On Nov. 7, 2014, 9:42 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 
> > 228
> > 
> >
> > One thing I'm not clear is that why cloning the operator tree doesn't 
> > clone the missed stats flags. From Utilities.cloneOperatorTree(), it seems 
> > it should.

Hi Xuefu, the stats flag is set after the cloneOperatorTree happens. The stats 
flag is set in the processFileSink step - GenMapRedUtils.isMergeRequired. Since 
we did not put the cloned filesinks to the fileSinkSet, so the stats flags are 
only set to the original FileSinkOperator. In the 
GenMapRedUtils.processFileSink API, I add the following to get the stats flag 
from the original filesinkop and set to the cloned filesinkops. 

// Set stats config for FileSinkOperators which are cloned from the fileSink
List fileSinkList = context.fileSinkMap.get(fileSink);
if (fileSinkList != null) {
  for (FileSinkOperator fsOp : fileSinkList) {
fsOp.getConf().setGatherStats(fileSink.getConf().isGatherStats());
fsOp.getConf().setStatsReliable(fileSink.getConf().isStatsReliable());

fsOp.getConf().setMaxStatsKeyPrefixLength(fileSink.getConf().getMaxStatsKeyPrefixLength());
  }
}


- Na


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/#review60385
---


On Nov. 7, 2014, 9:16 p.m., Na Yang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27719/
> ---
> 
> (Updated Nov. 7, 2014, 9:16 p.m.)
> 
> 
> Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: Hive-8756
> https://issues.apache.org/jira/browse/Hive-8756
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> numRows and rawDataSize are not collected by the Spark stats. That is caused 
> by the FileSinkOperator in the ReduceWork is not set the stats config. In the 
> GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new 
> FileSinkOperator is generated and set to the reduce work. However, during 
> processFileSink, the original FileSinkOperator is set the collectStats tag in 
> GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in 
> the ReduceWork.  
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 79a0132 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
> 8290568 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> e8e18a7 
>   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 8d237c5 
>   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
> 4946815 
>   ql/src/test/results/clientpositive/spark/semijoin.q.out 9b6802d 
>   ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27719/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Na Yang
> 
>



Re: Review Request 27745: HIVE-8621 Dump small table join data for map-join [Spark Branch]

2014-11-07 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27745/#review60386
---


If it's found that too much customization is needed for Spark, we might as well 
extend it from instead.


ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java


Don't we need this any more?



ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java


This doesn't seem resolve conflicts for files generated by different 
partitions. These partitions can run on different nodes, so fileIndex might be 
the same.


- Xuefu Zhang


On Nov. 7, 2014, 9:34 p.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27745/
> ---
> 
> (Updated Nov. 7, 2014, 9:34 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8621
> https://issues.apache.org/jira/browse/HIVE-8621
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In case spark, HashTableSinkOperator should dump files to a folder expected 
> by HashTableLoader.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
> f0e04e7 
> 
> Diff: https://reviews.apache.org/r/27745/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>



Re: Review Request 27745: HIVE-8621 Dump small table join data for map-join [Spark Branch]

2014-11-07 Thread Suhas Satish

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27745/#review60388
---



ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java


What if there are 2 partitions for big table?  I guess they will then be 
processed on 2 separate spark nodes, right?  

So in this case, there are 2 replicas created for this HashTableSink. How 
do we control that these 2 replicas will be on the same data nodes as the ones 
where the 2 big table partitions will be processing map-joins ?


- Suhas Satish


On Nov. 7, 2014, 9:34 p.m., Jimmy Xiang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27745/
> ---
> 
> (Updated Nov. 7, 2014, 9:34 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8621
> https://issues.apache.org/jira/browse/HIVE-8621
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In case spark, HashTableSinkOperator should dump files to a folder expected 
> by HashTableLoader.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
> f0e04e7 
> 
> Diff: https://reviews.apache.org/r/27745/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>



[jira] [Commented] (HIVE-8618) Add SORT_QUERY_RESULTS for test that doesn't guarantee order #3

2014-11-07 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202764#comment-14202764
 ] 

Jason Dere commented on HIVE-8618:
--

This was not included in branch-0.14, but HIVE-8461 (which is in branch-0.14) 
added a Tez test for vector_decimal_mapjoin.q that was based on the changes 
from HIVE-8618. So tez/vector_decimal_mapjoin.q.out is not correct in 
branch-0.14. Also my patch for HIVE-8745 is not applying cleanly on branch-0.14 
and I think this is the reason why.
[~hagleitn] since this Jira is test-only changes, I'd like to bring this into 
branch-0.14.

> Add SORT_QUERY_RESULTS for test that doesn't guarantee order #3
> ---
>
> Key: HIVE-8618
> URL: https://issues.apache.org/jira/browse/HIVE-8618
> Project: Hive
>  Issue Type: Test
>Reporter: Chao
>Assignee: Chao
>Priority: Minor
> Fix For: 0.15.0
>
> Attachments: HIVE-8618.1.patch
>
>
> We need to add {{SORT_QUERY_RESULTS}} to a few more tests:
> {noformat}
> auto_join26
> date_join1
> join40
> vector_decimal_mapjoin
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 27719: numRows and rawDataSize are not collected by the Spark stats [Spark Branch]

2014-11-07 Thread Xuefu Zhang


> On Nov. 7, 2014, 9:42 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java, line 
> > 228
> > 
> >
> > One thing I'm not clear is that why cloning the operator tree doesn't 
> > clone the missed stats flags. From Utilities.cloneOperatorTree(), it seems 
> > it should.
> 
> Na Yang wrote:
> Hi Xuefu, the stats flag is set after the cloneOperatorTree happens. The 
> stats flag is set in the processFileSink step - 
> GenMapRedUtils.isMergeRequired. Since we did not put the cloned filesinks to 
> the fileSinkSet, so the stats flags are only set to the original 
> FileSinkOperator. In the GenMapRedUtils.processFileSink API, I add the 
> following to get the stats flag from the original filesinkop and set to the 
> cloned filesinkops. 
> 
> // Set stats config for FileSinkOperators which are cloned from the 
> fileSink
> List fileSinkList = 
> context.fileSinkMap.get(fileSink);
> if (fileSinkList != null) {
>   for (FileSinkOperator fsOp : fileSinkList) {
> fsOp.getConf().setGatherStats(fileSink.getConf().isGatherStats());
> 
> fsOp.getConf().setStatsReliable(fileSink.getConf().isStatsReliable());
> 
> fsOp.getConf().setMaxStatsKeyPrefixLength(fileSink.getConf().getMaxStatsKeyPrefixLength());
>   }
> }

Thanks for the explanation. If we do put the closed FileSinkOperators in 
fileSinkSet, is it true that we don't have to manually copy the flags over?


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27719/#review60385
---


On Nov. 7, 2014, 9:16 p.m., Na Yang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27719/
> ---
> 
> (Updated Nov. 7, 2014, 9:16 p.m.)
> 
> 
> Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: Hive-8756
> https://issues.apache.org/jira/browse/Hive-8756
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> numRows and rawDataSize are not collected by the Spark stats. That is caused 
> by the FileSinkOperator in the ReduceWork is not set the stats config. In the 
> GenSparkUtils.removeUnionOperators, the operator tree gets cloned and new 
> FileSinkOperator is generated and set to the reduce work. However, during 
> processFileSink, the original FileSinkOperator is set the collectStats tag in 
> GenMapRedUtils.addStatsTask, not the new FileSinkOperator which is used in 
> the ReduceWork.  
> 
> 
> Diffs
> -
> 
>   itests/src/test/resources/testconfiguration.properties 79a0132 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
> 8290568 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 
> e8e18a7 
>   ql/src/test/results/clientpositive/spark/groupby_sort_1_23.q.out 8d237c5 
>   ql/src/test/results/clientpositive/spark/groupby_sort_skew_1_23.q.out 
> 4946815 
>   ql/src/test/results/clientpositive/spark/semijoin.q.out 9b6802d 
>   ql/src/test/results/clientpositive/spark/stats1.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27719/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Na Yang
> 
>



[jira] [Created] (HIVE-8785) HiveServer2 LogDivertAppender should be more selective for beeline getLogs

2014-11-07 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-8785:
---

 Summary: HiveServer2 LogDivertAppender should be more selective 
for beeline getLogs
 Key: HIVE-8785
 URL: https://issues.apache.org/jira/browse/HIVE-8785
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Thejas M Nair


A simple query run via beeline JDBC like {{explain select count(1) from 
testing.foo;}} produces 50 lines of output which looks like 

{code}
0: jdbc:hive2://localhost:10002> explain select count(1) from testing.foo;
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO parse.ParseDriver: Parsing command: explain select 
count(1) from testing.foo
14/11/06 00:35:59 INFO parse.ParseDriver: Parse Completed
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
14/11/06 00:35:59 INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic 
Analysis
14/11/06 00:35:59 INFO parse.SemanticAnalyzer: Get metadata for source tables
14/11/06 00:35:59 INFO parse.SemanticAnalyzer: Get metadata for subqueries
14/11/06 00:35:59 INFO parse.SemanticAnalyzer: Get metadata for destination 
tables
14/11/06 00:35:59 INFO ql.Context: New scratch dir is 
hdfs://cn041-10.l42scl.hortonworks.com:8020/tmp/hive/gopal/6b3980f6-3238-4e91-ae53-cb3f54092dab/hive_2014-11-06_00-35-59_379_317426424610374080-1
14/11/06 00:35:59 INFO parse.SemanticAnalyzer: Completed getting MetaData in 
Semantic Analysis
14/11/06 00:35:59 INFO parse.SemanticAnalyzer: Set stats collection dir : 
hdfs://cn041-10.l42scl.hortonworks.com:8020/tmp/hive/gopal/6b3980f6-3238-4e91-ae53-cb3f54092dab/hive_2014-11-06_00-35-59_379_317426424610374080-1/-ext-10002
14/11/06 00:35:59 INFO ppd.OpProcFactory: Processing for FS(16)
14/11/06 00:35:59 INFO ppd.OpProcFactory: Processing for SEL(15)
14/11/06 00:35:59 INFO ppd.OpProcFactory: Processing for GBY(14)
14/11/06 00:35:59 INFO ppd.OpProcFactory: Processing for RS(13)
14/11/06 00:35:59 INFO ppd.OpProcFactory: Processing for GBY(12)
14/11/06 00:35:59 INFO ppd.OpProcFactory: Processing for SEL(11)
14/11/06 00:35:59 INFO ppd.OpProcFactory: Processing for TS(10)
14/11/06 00:35:59 INFO optimizer.ColumnPrunerProcFactory: RS 13 oldColExprMap: 
{VALUE._col0=Column[_col0]}
14/11/06 00:35:59 INFO optimizer.ColumnPrunerProcFactory: RS 13 newColExprMap: 
{VALUE._col0=Column[_col0]}
14/11/06 00:35:59 INFO parse.SemanticAnalyzer: Completed plan generation
14/11/06 00:35:59 INFO ql.Driver: Semantic Analysis Completed
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO ql.Driver: Returning Hive schema: 
Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], 
properties:null)
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO log.PerfLogger: 
++--+
|  Explain   |
++--+
| STAGE DEPENDENCIES:|
|   Stage-0 is a root stage  |
||
| STAGE PLANS:   |
|   Stage: Stage-0   |
| Fetch Operator |
|   limit: 1 |
|   Processor Tree:  |
| ListSink   |
||
++--+
10 rows selected (0.1 seconds)
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO ql.Driver: Concurrency mode is disabled, not creating a 
lock manager
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO ql.Driver: Starting command: explain select count(1) 
from testing.foo
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO ql.Driver: Starting task [Stage-1:EXPLAIN] in serial mode
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO ql.Driver: OK
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO log.PerfLogger: 
14/11/06 00:35:59 INFO log.PerfLogger: 
{code}

A more complex query like Query27 produces 800+ lines of unnecessary logging.

This is unreadable and in-fact slows down the beeline JDBC client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8618) Add SORT_QUERY_RESULTS for test that doesn't guarantee order #3

2014-11-07 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202794#comment-14202794
 ] 

Jason Dere commented on HIVE-8618:
--

Ran the modified tests on branch-0.14 and they pass. I've committed this to 
branch-0.14

> Add SORT_QUERY_RESULTS for test that doesn't guarantee order #3
> ---
>
> Key: HIVE-8618
> URL: https://issues.apache.org/jira/browse/HIVE-8618
> Project: Hive
>  Issue Type: Test
>Reporter: Chao
>Assignee: Chao
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-8618.1.patch
>
>
> We need to add {{SORT_QUERY_RESULTS}} to a few more tests:
> {noformat}
> auto_join26
> date_join1
> join40
> vector_decimal_mapjoin
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8618) Add SORT_QUERY_RESULTS for test that doesn't guarantee order #3

2014-11-07 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8618:
-
Fix Version/s: (was: 0.15.0)
   0.14.0

> Add SORT_QUERY_RESULTS for test that doesn't guarantee order #3
> ---
>
> Key: HIVE-8618
> URL: https://issues.apache.org/jira/browse/HIVE-8618
> Project: Hive
>  Issue Type: Test
>Reporter: Chao
>Assignee: Chao
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-8618.1.patch
>
>
> We need to add {{SORT_QUERY_RESULTS}} to a few more tests:
> {noformat}
> auto_join26
> date_join1
> join40
> vector_decimal_mapjoin
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8775) Merge from trunk 11/6/14 [SPARK BRANCH]

2014-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202801#comment-14202801
 ] 

Hive QA commented on HIVE-8775:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12680258/HIVE-8775.2-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7232 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/328/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/328/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-328/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12680258 - PreCommit-HIVE-SPARK-Build

> Merge from trunk 11/6/14 [SPARK BRANCH]
> ---
>
> Key: HIVE-8775
> URL: https://issues.apache.org/jira/browse/HIVE-8775
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-8775.1-spark.patch, HIVE-8775.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8745) Joins on decimal keys return different results whether they are run as reduce join or map join

2014-11-07 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8745:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk/branch-0.14

> Joins on decimal keys return different results whether they are run as reduce 
> join or map join
> --
>
> Key: HIVE-8745
> URL: https://issues.apache.org/jira/browse/HIVE-8745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Gunther Hagleitner
>Assignee: Jason Dere
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8745.1.patch, HIVE-8745.2.patch, HIVE-8745.3.patch, 
> join_test.q
>
>
> See attached .q file to reproduce. The difference seems to be whether 
> trailing 0s are considered the same value or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >