date:20150908

[jira] [Updated] (HIVE-11746) Connect command should not to be allowed from user[beeline-cli branch]

2015-09-08 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-11746:

Attachment: HIVE-11746.3-beeline-cli.patch

Thanks [~xuefuz] for the suggestion. That will be cleaner.

> Connect command should not to be allowed from user[beeline-cli branch]
> --
>
> Key: HIVE-11746
> URL: https://issues.apache.org/jira/browse/HIVE-11746
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: beeline-cli-branch
>
> Attachments: HIVE-11746.1-beeline-cli.patch, 
> HIVE-11746.2-beeline-cli.patch, HIVE-11746.3-beeline-cli.patch
>
>
> For new cli, user should not be allowed to connect a server or database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-11768) java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances

2015-09-08 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-11768:


Assignee: Navis

> java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances
> 
>
> Key: HIVE-11768
> URL: https://issues.apache.org/jira/browse/HIVE-11768
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Assignee: Navis
> Attachments: HIVE-11768.1.patch.txt
>
>
>   More than 490,000 paths was added to java.io.DeleteOnExitHook on one of our 
> long running HiveServer2 instances,taken up more than 100MB on heap.
>   Most of the paths contains a suffix of ".piepout".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11768) java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances

2015-09-08 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-11768:
-
Attachment: HIVE-11768.1.patch.txt

> java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances
> 
>
> Key: HIVE-11768
> URL: https://issues.apache.org/jira/browse/HIVE-11768
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
> Attachments: HIVE-11768.1.patch.txt
>
>
>   More than 490,000 paths was added to java.io.DeleteOnExitHook on one of our 
> long running HiveServer2 instances,taken up more than 100MB on heap.
>   Most of the paths contains a suffix of ".piepout".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11727) Hive on Tez through Oozie: Some queries fail with fnf exception

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736296#comment-14736296
 ] 

Hive QA commented on HIVE-11727:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754071/HIVE-11727.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9422 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Delimited
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5207/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5207/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5207/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754071 - PreCommit-HIVE-TRUNK-Build

> Hive on Tez through Oozie: Some queries fail with fnf exception
> ---
>
> Key: HIVE-11727
> URL: https://issues.apache.org/jira/browse/HIVE-11727
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-11727.1.patch
>
>
> When we read back row containers from disk, a misconfiguration causes us to 
> look for a non-existing file.
> {noformat}
> Caused by: java.io.FileNotFoundException: File 
> file:/grid/0/hadoop/yarn/local/usercache/appcache/application_1440685000561_0028/container_e26_1440685000561_0028_01_05/container_tokens
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:608)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:821)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:598)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:414)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:169)
>   ... 31 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11375) Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)

2015-09-08 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736287#comment-14736287
 ] 

Lefty Leverenz commented on HIVE-11375:
---

This was backported to branch-1.2 (commit 
300717b39428a7898e4228139fbb08ca5c425ca7) so the Fix Version/s should include 
1.2.2.

I don't see any commit to branch-1 for the 1.3.0 release.

> Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)
> --
>
> Key: HIVE-11375
> URL: https://issues.apache.org/jira/browse/HIVE-11375
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Mariusz Sakowski
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11375.2.patch, HIVE-11375.3.patch, 
> HIVE-11375.4.patch, HIVE-11375.branch-1.patch, HIVE-11375.patch
>
>
> When running query like this:
> {code}explain select * from test where (val is not null and val <> 0);{code}
> hive will simplify expression in parenthesis and omit is not null check:
> {code}
>   Filter Operator
> predicate: (val <> 0) (type: boolean)
> {code}
> which is fine.
> but if we negate condition using NOT operator:
> {code}explain select * from test where not (val is not null and val <> 
> 0);{code}
> hive will also simplify thing, but now it will break stuff:
> {code}
>   Filter Operator
> predicate: (not (val <> 0)) (type: boolean)
> {code}
> because valid predicate should be *val == 0 or val is null*, while above row 
> is equivalent to *val == 0* only, filtering away rows where val is null
> simple example:
> {code}
> CREATE TABLE example (
> val bigint
> );
> INSERT INTO example VALUES (1), (NULL), (0);
> -- returns 2 rows - NULL and 0
> select * from example where (val is null or val == 0);
> -- returns 1 row - 0
> select * from example where not (val is not null and val <> 0);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11301) thrift metastore issue when getting stats results in disconnect

2015-09-08 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736271#comment-14736271
 ] 

Lefty Leverenz commented on HIVE-11301:
---

Commit c2f5b3c5de4105a2008bf91da378a9581dbd6a89 put this in branch-1.2 so 
shouldn't the Fix Version/s include 1.2.2 now?

> thrift metastore issue when getting stats results in disconnect
> ---
>
> Key: HIVE-11301
> URL: https://issues.apache.org/jira/browse/HIVE-11301
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Sergey Shelukhin
>Assignee: Pengcheng Xiong
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11301.01.patch, HIVE-11301.02.patch
>
>
> On metastore side it looks like this:
> {noformat}
> 2015-07-17 20:32:27,795 ERROR [pool-3-thread-150]: server.TThreadPoolServer 
> (TThreadPoolServer.java:run(294)) - Thrift error occurred during processing 
> of message.
> org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
> unset! Struct:AggrStats(colStats:null, partsFound:0)
> at 
> org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> and then
> {noformat}
> 2015-07-17 20:32:27,796 WARN  [pool-3-thread-150]: 
> transport.TIOStreamTransport (TIOStreamTransport.java:close(112)) - Error 
> closing output stream.
> java.net.SocketException: Socket closed
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at 
> org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
> at org.apache.thrift.transport.TSocket.close(TSocket.java:196)
> at 
> org.apache.hadoop.hive.thrift.TFilterTransport.close(TFilterTransport.java:52)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:304)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Which on client manifests as
> {noformat}
> 2015-07-17 20:32:27,796 WARN  [main()]: metastore.RetryingMetaStoreClient 
> (RetryingMetaStoreClient.java:invoke(187)) - MetaStoreClient lost connection. 
> Attempting to reconnect.
> org.apache.thrift.transport.TTransportException
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TSer

[jira] [Commented] (HIVE-11329) Column prefix in key of hbase column prefix map

2015-09-08 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736258#comment-14736258
 ] 

Lefty Leverenz commented on HIVE-11329:
---

(1)  Version query:  Since this was committed to master, shouldn't Fix 
Version/s be 2.0.0?  Or was it previously committed to branch-1 (1.3.0)?

The commit to master (Tue Sept 8) is d51c62a455eb08ee49f10ea2e117ca90de0bf47b 
although patch 3 has a header dated Fri Jul 24 which gives commit ID 
a7a15acb58742bca61824d6221446ad1446d5ab7, so I'm confused.

(2)  Doc query:  Does this need any documentation?  In particular, should 
*hbase.columns.mapping.prefix.hide* be documented in the Hive HBase Integration 
wikidoc?

* [Hive HBase Integration | 
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration]

> Column prefix in key of hbase column prefix map
> ---
>
> Key: HIVE-11329
> URL: https://issues.apache.org/jira/browse/HIVE-11329
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.14.0
>Reporter: Wojciech Indyk
>Assignee: Wojciech Indyk
>Priority: Minor
> Fix For: 1.3.0
>
> Attachments: HIVE-11329.3.patch
>
>
> When I create a table with hbase column prefix 
> https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result 
> map in hive. 
> E.g. record in HBase
> rowkey: 123
> column: tag_one, value: 0.5
> column: tag_two, value 0.5
> representation in Hive via column prefix mapping "tag_.*":
> column: tag map
> key: tag_one, value: 0.5
> key: tag_two, value: 0.5
> should be:
> key: one, value: 0.5
> key: two: value: 0.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11706) Implement "show create database"

2015-09-08 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-11706:
-
Attachment: HIVE-11706.3.patch.txt

Fixed test fails

> Implement "show create database"
> 
>
> Key: HIVE-11706
> URL: https://issues.apache.org/jira/browse/HIVE-11706
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-11706.1.patch.txt, HIVE-11706.2.patch.txt, 
> HIVE-11706.3.patch.txt
>
>
> HIVE-967 introduced "show create table". How about "show create database"?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11590) AvroDeserializer is very chatty

2015-09-08 Thread Swarnim Kulkarni (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-11590:

Attachment: HIVE-11590.1.patch.txt

Patch attached.

> AvroDeserializer is very chatty
> ---
>
> Key: HIVE-11590
> URL: https://issues.apache.org/jira/browse/HIVE-11590
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11590.1.patch.txt
>
>
> It seems like AvroDeserializer is currently very chatty with it logging tons 
> of messages at INFO level in the mapreduce logs. It would be helpful to push 
> down some of these to debug level to keep the logs clean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11590) AvroDeserializer is very chatty

2015-09-08 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736171#comment-14736171
 ] 

Swarnim Kulkarni commented on HIVE-11590:
-

RB: https://reviews.apache.org/r/38204/

> AvroDeserializer is very chatty
> ---
>
> Key: HIVE-11590
> URL: https://issues.apache.org/jira/browse/HIVE-11590
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11590.1.patch.txt
>
>
> It seems like AvroDeserializer is currently very chatty with it logging tons 
> of messages at INFO level in the mapreduce logs. It would be helpful to push 
> down some of these to debug level to keep the logs clean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736166#comment-14736166
 ] 

Hive QA commented on HIVE-11755:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754743/HIVE-11755.003.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9425 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5206/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5206/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5206/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754743 - PreCommit-HIVE-TRUNK-Build

> Incorrect method called with Kerberos enabled in AccumuloStorageHandler
> ---
>
> Key: HIVE-11755
> URL: https://issues.apache.org/jira/browse/HIVE-11755
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.2.2
>
> Attachments: HIVE-11755.001.patch, HIVE-11755.002.patch, 
> HIVE-11755.003.patch
>
>
> The following exception was noticed in testing out the 
> AccumuloStorageHandler's OutputFormat:
> {noformat}
> java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
> only be set once per job
>   at 
> org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
>   at 
> org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
>   at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
>   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliD

[jira] [Commented] (HIVE-10708) Add SchemaCompatibility check to AvroDeserializer

2015-09-08 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736153#comment-14736153
 ] 

Swarnim Kulkarni commented on HIVE-10708:
-

RB: https://reviews.apache.org/r/38203/

> Add SchemaCompatibility check to AvroDeserializer
> -
>
> Key: HIVE-10708
> URL: https://issues.apache.org/jira/browse/HIVE-10708
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-10708.1.patch.txt
>
>
> Avro provides a nice API[1] to check if the given reader schema can be used 
> to deserialize the data given its writer schema. I think it would be super 
> nice to integrate this into the AvroDeserializer so that we can fail fast and 
> gracefully if there is a bad schema compatibility
> [1] 
> https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10708) Add SchemaCompatibility check to AvroDeserializer

2015-09-08 Thread Swarnim Kulkarni (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-10708:

Attachment: HIVE-10708.1.patch.txt

Patch attached.

> Add SchemaCompatibility check to AvroDeserializer
> -
>
> Key: HIVE-10708
> URL: https://issues.apache.org/jira/browse/HIVE-10708
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-10708.1.patch.txt
>
>
> Avro provides a nice API[1] to check if the given reader schema can be used 
> to deserialize the data given its writer schema. I think it would be super 
> nice to integrate this into the AvroDeserializer so that we can fail fast and 
> gracefully if there is a bad schema compatibility
> [1] 
> https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10708) Add SchemaCompatibility check to AvroDeserializer

2015-09-08 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736147#comment-14736147
 ] 

Swarnim Kulkarni commented on HIVE-10708:
-

Decided to add a simple flag to turn this compatibility check on. Keeping this 
flag off by default for backwards compatibility.

> Add SchemaCompatibility check to AvroDeserializer
> -
>
> Key: HIVE-10708
> URL: https://issues.apache.org/jira/browse/HIVE-10708
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
>
> Avro provides a nice API[1] to check if the given reader schema can be used 
> to deserialize the data given its writer schema. I think it would be super 
> nice to integrate this into the AvroDeserializer so that we can fail fast and 
> gracefully if there is a bad schema compatibility
> [1] 
> https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11752) Pre-materializing complex CTE queries

2015-09-08 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-11752:
-
Attachment: HIVE-11752.2.patch.txt

Fixed missing read/write entities

> Pre-materializing complex CTE queries
> -
>
> Key: HIVE-11752
> URL: https://issues.apache.org/jira/browse/HIVE-11752
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-11752.1.patch.txt, HIVE-11752.2.patch.txt
>
>
> Currently, hive regards CTE clauses as a simple alias to the query block, 
> which makes redundant works if it's used multiple times in a query. This 
> introduces a reference threshold for pre-materializing the CTE clause as a 
> volatile table (which is not exists in any form of metastore and just 
> accessible from QB).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11768) java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances

2015-09-08 Thread Nemon Lou (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736138#comment-14736138
 ] 

Nemon Lou commented on HIVE-11768:
--

The java.io.File.deleteOnExit API is flawed and causes JVM crashes on long 
running servers ,see [this 
link|https://bugs.openjdk.java.net/browse/JDK-4872014]
Code path that produce "piepout" file and invoking File.deleteOnExit() is 
[here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java#L802]
{code:java}
File tmpFile = File.createTempFile(sessionID, ".pipeout", tmpDir);
tmpFile.deleteOnExit();
{code}

> java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances
> 
>
> Key: HIVE-11768
> URL: https://issues.apache.org/jira/browse/HIVE-11768
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>
>   More than 490,000 paths was added to java.io.DeleteOnExitHook on one of our 
> long running HiveServer2 instances,taken up more than 100MB on heap.
>   Most of the paths contains a suffix of ".piepout".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-08 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736106#comment-14736106
 ] 

Pengcheng Xiong commented on HIVE-11614:


[~jpullokkaran], could you please take a look? The test failure is unrelated 
and it also appeared in the other pre-commit runs. Thanks.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after 
> order by has problem
> -
>
> Key: HIVE-11614
> URL: https://issues.apache.org/jira/browse/HIVE-11614
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11614.01.patch, HIVE-11614.02.patch, 
> HIVE-11614.03.patch, HIVE-11614.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11746) Connect command should not to be allowed from user[beeline-cli branch]

2015-09-08 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736103#comment-14736103
 ] 

Xuefu Zhang commented on HIVE-11746:


[~Ferd], thanks for the explanation. The new changes seem working, but it still 
seems a little confusing. Dispatch() method now takes a new param, but this is 
only needed by embeddedConnect() and we have to burden other callers to pass a 
value to this param any way.

Looking at embeddedConnect(), I'm wondering if we just call 
execCommandWithPrefix() directly. This is because when we connect in the 
embedded mode, the command is a constant, and we don't need any of the 
processing in dispatch(). This way, the dispatch() method becomes cleaner.

What do you think? 

> Connect command should not to be allowed from user[beeline-cli branch]
> --
>
> Key: HIVE-11746
> URL: https://issues.apache.org/jira/browse/HIVE-11746
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Fix For: beeline-cli-branch
>
> Attachments: HIVE-11746.1-beeline-cli.patch, 
> HIVE-11746.2-beeline-cli.patch
>
>
> For new cli, user should not be allowed to connect a server or database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-11724) WebHcat get jobs to order jobs on time order with latest at top

2015-09-08 Thread Kiran Kumar Kolli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kiran Kumar Kolli reassigned HIVE-11724:


Assignee: Kiran Kumar Kolli

> WebHcat get jobs to order jobs on time order with latest at top
> ---
>
> Key: HIVE-11724
> URL: https://issues.apache.org/jira/browse/HIVE-11724
> Project: Hive
>  Issue Type: Improvement
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Kiran Kumar Kolli
>Assignee: Kiran Kumar Kolli
>
> HIVE-5519 added pagination feature support to WebHcat. This implementation 
> returns the jobs lexicographically resulting in older jobs showing at the 
> top. 
> Improvement is to order them on time with latest at top. Typically latest 
> jobs (or running) ones are more relevant to the user. Time based ordering 
> with pagination makes more sense. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11751) hive-exec-log4j2.xml settings causes DEBUG messages to be generated and ignored

2015-09-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736088#comment-14736088
 ] 

Prasanth Jayachandran commented on HIVE-11751:
--

What is the value for tez.am.log.level? 

> hive-exec-log4j2.xml settings causes DEBUG messages to be generated and 
> ignored
> ---
>
> Key: HIVE-11751
> URL: https://issues.apache.org/jira/browse/HIVE-11751
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: hive-exec-log4j2.xml, hive-log4j2.xml, 
> hiveserver2_log4j.png
>
>
> Setting "INFO" in 
> dist/hive/conf/hive-exec-log4j2.xml fixes the problem. Should it be made as 
> default in hive-exec-log4j2.xml? "--hiveconf hive.log.level=INFO" from 
> commandline does not have any impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11754) Not reachable code parts in StatsUtils

2015-09-08 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-11754:
-
Attachment: HIVE-11754.2.patch.txt

> Not reachable code parts in StatsUtils
> --
>
> Key: HIVE-11754
> URL: https://issues.apache.org/jira/browse/HIVE-11754
> Project: Hive
>  Issue Type: Task
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-11754.1.patch.txt, HIVE-11754.2.patch.txt
>
>
> No need to check "oi instanceof WritableConstantHiveCharObjectInspector" 
> after "oi instanceof ConstantObjectInspector".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11201) HCatalog is ignoring user specified avro schema in the table definition

2015-09-08 Thread Bing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736076#comment-14736076
 ] 

Bing Li commented on HIVE-11201:


I submitted the review request manually.
The link is https://reviews.apache.org/r/34877/

> HCatalog  is ignoring user specified avro schema in the table definition
> 
>
> Key: HIVE-11201
> URL: https://issues.apache.org/jira/browse/HIVE-11201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Critical
> Attachments: HIVE-11201.1.patch
>
>
> HCatalog  is ignoring user specified avro schema in the table definition , 
> instead generating its own avro based  from hive meta store. 
> By generating its own schema  will result in mismatch names.  For exmple Avro 
> fields name are Case Sensitive.  By generating it's own schema will  result 
> in incorrect schema written to the avro file , and result   select fail on 
> read.   And also Even if user specified schema does not allow null ,  when 
> data is written using Hcatalog , it will write a schema that will allow null. 
> For example in the table ,  user specified , all CAPITAL letters in the 
> schema , and record name as LINEITEM.  The schema should be written as it is. 
>  Instead Hcatalog ignores it and generated its own avro schema from the hive 
> table case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2015-09-08 Thread Bing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736074#comment-14736074
 ] 

Bing Li commented on HIVE-4577:
---

I submitted a review request manually.
The link is https://reviews.apache.org/r/38199/

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11745) Alter table Exchange partition with multiple partition_spec is not working

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736069#comment-14736069
 ] 

Hive QA commented on HIVE-11745:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754734/HIVE-11745.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9423 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testTimeOutReaper
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5205/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5205/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5205/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754734 - PreCommit-HIVE-TRUNK-Build

> Alter table Exchange partition with multiple partition_spec is not working
> --
>
> Key: HIVE-11745
> URL: https://issues.apache.org/jira/browse/HIVE-11745
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11745.1.patch
>
>
> Single partition works, but multiple partitions will not work.
> Reproduce steps:
> {noformat}
> DROP TABLE IF EXISTS t1;
> DROP TABLE IF EXISTS t2;
> DROP TABLE IF EXISTS t3;
> DROP TABLE IF EXISTS t4;
> CREATE TABLE t1 (a int) PARTITIONED BY (d1 int);
> CREATE TABLE t2 (a int) PARTITIONED BY (d1 int);
> CREATE TABLE t3 (a int) PARTITIONED BY (d1 int, d2 int);
> CREATE TABLE t4 (a int) PARTITIONED BY (d1 int, d2 int);
> INSERT OVERWRITE TABLE t1 PARTITION (d1 = 1) SELECT salary FROM jsmall LIMIT 
> 10;
> INSERT OVERWRITE TABLE t3 PARTITION (d1 = 1, d2 = 1) SELECT salary FROM 
> jsmall LIMIT 10;
> SELECT * FROM t1;
> SELECT * FROM t3;
> ALTER TABLE t2 EXCHANGE PARTITION (d1 = 1) WITH TABLE t1;
> SELECT * FROM t1;
> SELECT * FROM t2;
> ALTER TABLE t4 EXCHANGE PARTITION (d1 = 1, d2 = 1) WITH TABLE t3;
> SELECT * FROM t3;
> SELECT * FROM t4;
> {noformat}
> The output:
> {noformat}
> 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t3;
> +---+++--+
> | t3.a  | t3.d1  | t3.d2  |
> +---+++--+
> +---+++--+
> No rows selected (0.227 seconds)
> 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t4;
> +---+++--+
> | t4.a  | t4.d1  | t4.d2  |
> +---+++--+
> +---+++--+
> No rows selected (0.266 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11756) Avoid redundant key serialization in RS for distinct query

2015-09-08 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736032#comment-14736032
 ] 

Swarnim Kulkarni commented on HIVE-11756:
-

[~navis] Mind doing a quick RB for this?

> Avoid redundant key serialization in RS for distinct query
> --
>
> Key: HIVE-11756
> URL: https://issues.apache.org/jira/browse/HIVE-11756
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-11756.1.patch.txt, HIVE-11756.2.patch.txt
>
>
> Currently hive serializes twice to know the length of distribution key for 
> distinct queries. This introduces IndexedSerializer to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-09-08 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: HIVE-11642.05.patch

Retry with the build fixed and branch-specific tests presumably enabled

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11756) Avoid redundant key serialization in RS for distinct query

2015-09-08 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-11756:
-
Attachment: HIVE-11756.2.patch.txt

> Avoid redundant key serialization in RS for distinct query
> --
>
> Key: HIVE-11756
> URL: https://issues.apache.org/jira/browse/HIVE-11756
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-11756.1.patch.txt, HIVE-11756.2.patch.txt
>
>
> Currently hive serializes twice to know the length of distribution key for 
> distinct queries. This introduces IndexedSerializer to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-11767) LLAP: merge master into branch

2015-09-08 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-11767.
-
Resolution: Fixed

> LLAP: merge master into branch
> --
>
> Key: HIVE-11767
> URL: https://issues.apache.org/jira/browse/HIVE-11767
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735962#comment-14735962
 ] 

Hive QA commented on HIVE-11614:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754729/HIVE-11614.04.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9424 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5204/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5204/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5204/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754729 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after 
> order by has problem
> -
>
> Key: HIVE-11614
> URL: https://issues.apache.org/jira/browse/HIVE-11614
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11614.01.patch, HIVE-11614.02.patch, 
> HIVE-11614.03.patch, HIVE-11614.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-11732.
--
Resolution: Fixed

Committed patch to llap branch.

> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch, HIVE-11732.2.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11751) hive-exec-log4j2.xml settings causes DEBUG messages to be generated and ignored

2015-09-08 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-11751:

Attachment: hive-exec-log4j2.xml
hive-log4j2.xml

Attaching the log4j2 configs in my setup (both are of them are defaults and no 
changes done).

"hive.tez.log.level" has not been changed, so it defaults to INFO. 


> hive-exec-log4j2.xml settings causes DEBUG messages to be generated and 
> ignored
> ---
>
> Key: HIVE-11751
> URL: https://issues.apache.org/jira/browse/HIVE-11751
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: hive-exec-log4j2.xml, hive-log4j2.xml, 
> hiveserver2_log4j.png
>
>
> Setting "INFO" in 
> dist/hive/conf/hive-exec-log4j2.xml fixes the problem. Should it be made as 
> default in hive-exec-log4j2.xml? "--hiveconf hive.log.level=INFO" from 
> commandline does not have any impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735951#comment-14735951
 ] 

Prasanth Jayachandran commented on HIVE-11732:
--

Created subtask HIVE-11766 also linked the hadoop-1 removal JIRA.


> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch, HIVE-11732.2.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735933#comment-14735933
 ] 

Sergey Shelukhin commented on HIVE-11732:
-

ok, +1 ... is there JIRA to remove hadoop-1? We could add comment or sub-jira 
there

> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch, HIVE-11732.2.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735919#comment-14735919
 ] 

Prasanth Jayachandran commented on HIVE-11732:
--

[~sershe] There are just 3 reflective invocations. 1 for getTotalMemory (this 
can be pulled over to common utils with added dependency or duplicated) and 2nd 
is the actual creation and launching of the daemon. And one for shutdown. 
Anyways all these will go away once we rip off hadoop-1 from master.

> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch, HIVE-11732.2.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11605) Incorrect results with bucket map join in tez.

2015-09-08 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11605:
--
Attachment: (was: HIVE-11606.branch-1.patch)

> Incorrect results with bucket map join in tez.
> --
>
> Key: HIVE-11605
> URL: https://issues.apache.org/jira/browse/HIVE-11605
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.0, 1.2.0, 1.0.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-11605.1.patch, HIVE-11606.2.patch
>
>
> In some cases, we aggressively try to convert to a bucket map join and this 
> ends up producing incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11606) Bucket map joins fail at hash table construction time

2015-09-08 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-11606:
--
Attachment: (was: HIVE-11606.branch-1.patch)

> Bucket map joins fail at hash table construction time
> -
>
> Key: HIVE-11606
> URL: https://issues.apache.org/jira/browse/HIVE-11606
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-11606.1.patch, HIVE-11606.2.patch
>
>
> {code}
> info=[Error: Failure while running task:java.lang.RuntimeException: 
> java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a 
> power of two
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity 
> must be a power of two
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>  
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11751) hive-exec-log4j2.xml settings causes DEBUG messages to be generated and ignored

2015-09-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735891#comment-14735891
 ] 

Prasanth Jayachandran commented on HIVE-11751:
--

"--hiveconf hive.log.level=INFO" -> This just sets the logging level for 
client. 

What is the value for this config hive.tez.log.level?

> hive-exec-log4j2.xml settings causes DEBUG messages to be generated and 
> ignored
> ---
>
> Key: HIVE-11751
> URL: https://issues.apache.org/jira/browse/HIVE-11751
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: hiveserver2_log4j.png
>
>
> Setting "INFO" in 
> dist/hive/conf/hive-exec-log4j2.xml fixes the problem. Should it be made as 
> default in hive-exec-log4j2.xml? "--hiveconf hive.log.level=INFO" from 
> commandline does not have any impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-11751) hive-exec-log4j2.xml settings causes DEBUG messages to be generated and ignored

2015-09-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735885#comment-14735885
 ] 

Prasanth Jayachandran edited comment on HIVE-11751 at 9/9/15 12:22 AM:
---

[~rajesh.balamohan] hive-log4j2.xml controls the client side logging (hive 
client, hadoop ipc client ) and hive-exec-log4j2.xml controls the server side 
logging. Can you try setting hive.log.level=INFO on both xml files and see if 
it solves the issue? With both being set, I am not seeing any DEBUG logs.


was (Author: prasanth_j):
[~rajesh.balamohan] hive-log4j2.xml controls the client side logging (hive 
client only) and hive-exec-log4j2.xml controls the server side logging. Can you 
try setting hive.log.level=INFO on both xml files and see if it solves the 
issue? With both being set, I am not seeing any DEBUG logs.

> hive-exec-log4j2.xml settings causes DEBUG messages to be generated and 
> ignored
> ---
>
> Key: HIVE-11751
> URL: https://issues.apache.org/jira/browse/HIVE-11751
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: hiveserver2_log4j.png
>
>
> Setting "INFO" in 
> dist/hive/conf/hive-exec-log4j2.xml fixes the problem. Should it be made as 
> default in hive-exec-log4j2.xml? "--hiveconf hive.log.level=INFO" from 
> commandline does not have any impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-11751) hive-exec-log4j2.xml settings causes DEBUG messages to be generated and ignored

2015-09-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735885#comment-14735885
 ] 

Prasanth Jayachandran edited comment on HIVE-11751 at 9/9/15 12:19 AM:
---

[~rajesh.balamohan] hive-log4j2.xml controls the client side logging (hive 
client only) and hive-exec-log4j2.xml controls the server side logging. Can you 
try setting hive.log.level=INFO on both xml files and see if it solves the 
issue? With both being set, I am not seeing any DEBUG logs.


was (Author: prasanth_j):
[~rajesh.balamohan] hive-log4j2.xml controls the client side logging (hive 
client, tez client, dfs client etc.) and hive-exec-log4j2.xml controls the 
server side logging. Can you try setting hive.log.level=INFO on both xml files 
and see if it solves the issue? With both being set, I am not seeing any DEBUG 
logs.

> hive-exec-log4j2.xml settings causes DEBUG messages to be generated and 
> ignored
> ---
>
> Key: HIVE-11751
> URL: https://issues.apache.org/jira/browse/HIVE-11751
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: hiveserver2_log4j.png
>
>
> Setting "INFO" in 
> dist/hive/conf/hive-exec-log4j2.xml fixes the problem. Should it be made as 
> default in hive-exec-log4j2.xml? "--hiveconf hive.log.level=INFO" from 
> commandline does not have any impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11751) hive-exec-log4j2.xml settings causes DEBUG messages to be generated and ignored

2015-09-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735885#comment-14735885
 ] 

Prasanth Jayachandran commented on HIVE-11751:
--

[~rajesh.balamohan] hive-log4j2.xml controls the client side logging (hive 
client, tez client, dfs client etc.) and hive-exec-log4j2.xml controls the 
server side logging. Can you try setting hive.log.level=INFO on both xml files 
and see if it solves the issue? With both being set, I am not seeing any DEBUG 
logs.

> hive-exec-log4j2.xml settings causes DEBUG messages to be generated and 
> ignored
> ---
>
> Key: HIVE-11751
> URL: https://issues.apache.org/jira/browse/HIVE-11751
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: hiveserver2_log4j.png
>
>
> Setting "INFO" in 
> dist/hive/conf/hive-exec-log4j2.xml fixes the problem. Should it be made as 
> default in hive-exec-log4j2.xml? "--hiveconf hive.log.level=INFO" from 
> commandline does not have any impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735864#comment-14735864
 ] 

Sergey Shelukhin commented on HIVE-11732:
-

Hmm it looks like the same or similar patch?

> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch, HIVE-11732.2.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4243) Fix column names in FileSinkOperator

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735860#comment-14735860
 ] 

Hive QA commented on HIVE-4243:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754724/HIVE-4243.patch

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 9420 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial_ndv
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby6_map
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_casesensitive
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_part
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_ptf
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5203/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5203/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5203/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 30 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754724 - PreCommit-HIVE-TRUNK-Build

> Fix column names in FileSinkOperator
> 
>
> Key: HIVE-4243
> URL: https://issues.apache.org/jira/browse/HIVE-4243
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-4243.patch
>
>
> All of the ObjectInspectors given to SerDe's by FileSinkOperator have virtual 
> column names. Since the files are part of tables, Hive knows the column 
> names. For self-describing file formats like ORC, having the real column 
> names will improve the understandability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11711) Merge hbase-metastore branch to trunk

2015-09-08 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11711:
--
Attachment: HIVE-11711.7.patch

> Merge hbase-metastore branch to trunk
> -
>
> Key: HIVE-11711
> URL: https://issues.apache.org/jira/browse/HIVE-11711
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-11711.1.patch, HIVE-11711.2.patch, 
> HIVE-11711.3.patch, HIVE-11711.4.patch, HIVE-11711.5.patch, 
> HIVE-11711.6.patch, HIVE-11711.7.patch
>
>
> Major development of hbase-metastore is done and it's time to merge the 
> branch back into master.
> Currently hbase-metastore is only invoked when running TestMiniTezCliDriver. 
> The instruction for setting up hbase-metastore is captured in 
> https://cwiki.apache.org/confluence/display/Hive/HBaseMetastoreDevelopmentGuide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11645) Add in-place updates for dynamic partitions loading

2015-09-08 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11645:

Attachment: HIVE-11645.2.patch

Addressed review comments.

> Add in-place updates for dynamic partitions loading
> ---
>
> Key: HIVE-11645
> URL: https://issues.apache.org/jira/browse/HIVE-11645
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11645.2.patch, HIVE-11645.2.patch, 
> HIVE-11645.3.patch, HIVE-11645.patch
>
>
> Currently, updates go to log file and on console there is no visible progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11764) Verify the correctness of groupby_cube1.q with MR, Tez and Spark Mode with HIVE-11110

2015-09-08 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11764:
-
Summary: Verify the correctness of groupby_cube1.q with MR, Tez and Spark 
Mode with HIVE-0  (was: Verify the correctness of groupby_cube1.q with MR, 
Tez and Spark Mode with HIVE-1110)

> Verify the correctness of groupby_cube1.q with MR, Tez and Spark Mode with 
> HIVE-0
> -
>
> Key: HIVE-11764
> URL: https://issues.apache.org/jira/browse/HIVE-11764
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> While Working on HIVE-0, I ran into the following wrong results:
> https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/spark/groupby_cube1.q.out#L478
> This happens in spark mode. The following is the diff.
> @@ -475,7 +525,6 @@ POSTHOOK: Input: default@t1
>  3  1
>  7  1
>  8  2
> -NULL   6
> The purpose of this jira is to see why the above query is failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11705) refactor SARG stripe filtering for ORC into a method

2015-09-08 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11705:

Attachment: HIVE-11705.02.patch

Addressed the RB feedback.

> refactor SARG stripe filtering for ORC into a method
> 
>
> Key: HIVE-11705
> URL: https://issues.apache.org/jira/browse/HIVE-11705
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11705.01.patch, HIVE-11705.02.patch, 
> HIVE-11705.patch
>
>
> For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny 
> item to create it on OrcInputFormat.
> For metastore path, these methods will be called from expression proxy 
> similar to current objectstore expr filtering; it will change to have 
> serialized sarg and column list to come from request instead of conf; 
> includedCols/etc. will also come from request instead of assorted java 
> objects. 
> The types and stripe stats will need to be extracted from HBase. This is a 
> little bit of a problem, since ideally we want to be inside HBase 
> filter/coprocessor/ I'd need to take a look to see if this is possible... 
> since that filter would need to either deserialize orc, or we would need to 
> store types and stats information in some other, non-ORC manner on write. The 
> latter is probably a better idea, although it's dangerous because there's no 
> sync between this code and ORC itself.
> Meanwhile minimize dependencies for stripe picking to essentials (and conf 
> which is easy to remove).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11765) SMB Join fails in Hive 1.2

2015-09-08 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-11765:
---
Description: 
SMB join on Hive 1.2 fails with the following stack trace :
{code}
java.io.IOException: java.lang.reflect.InvocationTargetException
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:213)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:333)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:173)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:252)
... 11 more
Caused by: java.lang.IndexOutOfBoundsException: toIndex = 5
at java.util.ArrayList.subListRangeCheck(ArrayList.java:1004)
at java.util.ArrayList.subList(ArrayList.java:996)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.getSchemaOnRead(RecordReaderFactory.java:161)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderFactory.createTreeReader(RecordReaderFactory.java:66)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:202)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:230)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.(OrcInputFormat.java:163)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1104)
at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)

{code}

This error happens after adding the patch of HIVE-10591. Reverting HIVE-10591 
fixes this exception. 

Steps to reproduce:
{code}
SET hive.enforce.sorting=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition=true;
SET mapreduce.reduce.import.limit=-1;
SET hive.optimize.bucketmapjoin=true;
SET hive.optimize.bucketmapjoin.sortedmerge=true;
SET hive.auto.convert.join=true;
SET hive.auto.convert.sortmerge.join=true;

create Table table1 (empID int, name varchar(64), email varchar(64), company 
varchar(64), age int) clustered by (age) sorted by (age ASC) INTO 384 buckets 
stored as ORC;

create Table table2 (empID int, name varchar(64), email varchar(64), company 
varchar(64), age int) clustered by (age) sorted by (age ASC) into 384 buckets 
stored as ORC;

create Table table_tmp (empID int, name varchar(64), email varchar(64), company 
varchar(64), age int);

load data local inpath '/tmp/employee.csv’ into table table_tmp;

INSERT OVERWRITE table  table1 select * from table_tmp;
INSERT OVERWRITE table  table2 select * from table_tmp;

SELECT table1.age, table2.age from table1 inner join table2 on 
table1.age=table2.age;
{code}

  was:
SMB join on Hive 1.2 fails with the following stack trace :
{code}
java.io.IOException: java.lang.reflect.InvocationTargetException
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:266)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:213)
at
org.apache.hadoop.hive.shims.HadoopShimsSecure$Co

[jira] [Commented] (HIVE-11110) Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, improve Filter selectivity estimation

2015-09-08 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735779#comment-14735779
 ] 

Laljo John Pullokkaran commented on HIVE-0:
---

When a predicate involves deterministic & non deterministic udfs, the 
deterministic pieces needs to be pulled out.

Example: select a.* from srcpart a where rand(1) < 0.1 and a.ds = '2008-04-08' 
and not(key > 50 or key < 10) and a.hr like '%2';
This should be rewritten to push a.ds = '2008-04-08' and not(key > 50 or key < 
10) and a.hr like '%2';

> Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, 
> improve Filter selectivity estimation
> 
>
> Key: HIVE-0
> URL: https://issues.apache.org/jira/browse/HIVE-0
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-0-branch-1.2.patch, HIVE-0.1.patch, 
> HIVE-0.2.patch, HIVE-0.4.patch, HIVE-0.5.patch, 
> HIVE-0.6.patch, HIVE-0.7.patch, HIVE-0.8.patch, 
> HIVE-0.9.patch, HIVE-0.91.patch, HIVE-0.patch
>
>
> Query
> {code}
> select  count(*)
>  from store_sales
>  ,store_returns
>  ,date_dim d1
>  ,date_dim d2
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = ss_sold_date_sk
>and ss_customer_sk = sr_customer_sk
>and ss_item_sk = sr_item_sk
>and ss_ticket_number = sr_ticket_number
>and sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
> {code}
> The store_sales table is partitioned on ss_sold_date_sk, which is also used 
> in a join clause. The join clause should add a filter “filterExpr: 
> ss_sold_date_sk is not null”, which should get pushed the MetaStore when 
> fetching the stats. Currently this is not done in CBO planning, which results 
> in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in 
> the optimization phase. In particular, this increases the NDV for the join 
> columns and may result in wrong planning.
> Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-11110) Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, improve Filter selectivity estimation

2015-09-08 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-0:
-

Assignee: Laljo John Pullokkaran  (was: Hari Sankar Sivarama Subramaniyan)

> Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, 
> improve Filter selectivity estimation
> 
>
> Key: HIVE-0
> URL: https://issues.apache.org/jira/browse/HIVE-0
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-0-branch-1.2.patch, HIVE-0.1.patch, 
> HIVE-0.2.patch, HIVE-0.4.patch, HIVE-0.5.patch, 
> HIVE-0.6.patch, HIVE-0.7.patch, HIVE-0.8.patch, 
> HIVE-0.9.patch, HIVE-0.91.patch, HIVE-0.patch
>
>
> Query
> {code}
> select  count(*)
>  from store_sales
>  ,store_returns
>  ,date_dim d1
>  ,date_dim d2
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = ss_sold_date_sk
>and ss_customer_sk = sr_customer_sk
>and ss_item_sk = sr_item_sk
>and ss_ticket_number = sr_ticket_number
>and sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
> {code}
> The store_sales table is partitioned on ss_sold_date_sk, which is also used 
> in a join clause. The join clause should add a filter “filterExpr: 
> ss_sold_date_sk is not null”, which should get pushed the MetaStore when 
> fetching the stats. Currently this is not done in CBO planning, which results 
> in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in 
> the optimization phase. In particular, this increases the NDV for the join 
> columns and may result in wrong planning.
> Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11732:
-
Attachment: HIVE-11732.2.patch

Removed most of the reflections.

> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch, HIVE-11732.2.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-08 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735774#comment-14735774
 ] 

Wei Zheng commented on HIVE-11587:
--

Agree. If anything pops up for regular mapjoin in the future, we can always 
adjust that param.

Can you please commit the patch to master? Thanks!

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11727) Hive on Tez through Oozie: Some queries fail with fnf exception

2015-09-08 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735773#comment-14735773
 ] 

Vikram Dixit K commented on HIVE-11727:
---

LGTM +1.

> Hive on Tez through Oozie: Some queries fail with fnf exception
> ---
>
> Key: HIVE-11727
> URL: https://issues.apache.org/jira/browse/HIVE-11727
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-11727.1.patch
>
>
> When we read back row containers from disk, a misconfiguration causes us to 
> look for a non-existing file.
> {noformat}
> Caused by: java.io.FileNotFoundException: File 
> file:/grid/0/hadoop/yarn/local/usercache/appcache/application_1440685000561_0028/container_e26_1440685000561_0028_01_05/container_tokens
>  does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:608)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:821)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:598)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:414)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:169)
>   ... 31 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-09-08 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11684:
---
Attachment: HIVE-11684.04.patch

> Implement limit pushdown through outer join in CBO
> --
>
> Key: HIVE-11684
> URL: https://issues.apache.org/jira/browse/HIVE-11684
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11684.01.patch, HIVE-11684.02.patch, 
> HIVE-11684.03.patch, HIVE-11684.04.patch, HIVE-11684.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-08 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735732#comment-14735732
 ] 

Sergey Shelukhin commented on HIVE-11587:
-

Should be ok for regular join for now I guess

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4243) Fix column names in FileSinkOperator

2015-09-08 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735730#comment-14735730
 ] 

Owen O'Malley commented on HIVE-4243:
-

I've uploaded it to review board https://reviews.apache.org/r/38190/

> Fix column names in FileSinkOperator
> 
>
> Key: HIVE-4243
> URL: https://issues.apache.org/jira/browse/HIVE-4243
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-4243.patch
>
>
> All of the ObjectInspectors given to SerDe's by FileSinkOperator have virtual 
> column names. Since the files are part of tables, Hive knows the column 
> names. For self-describing file formats like ORC, having the real column 
> names will improve the understandability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735688#comment-14735688
 ] 

Hive QA commented on HIVE-11684:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754712/HIVE-11684.03.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9424 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_dynpart_hashjoin_3
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5202/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5202/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5202/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754712 - PreCommit-HIVE-TRUNK-Build

> Implement limit pushdown through outer join in CBO
> --
>
> Key: HIVE-11684
> URL: https://issues.apache.org/jira/browse/HIVE-11684
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11684.01.patch, HIVE-11684.02.patch, 
> HIVE-11684.03.patch, HIVE-11684.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-08 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-11755:
--
Attachment: HIVE-11755.003.patch

Extra testing noticed another case where the IllegalStateException was 
filtering up from the Accumulo API unexpectedly. Added some more unit tests for 
this scenario and reran unit and manual testing with success.

> Incorrect method called with Kerberos enabled in AccumuloStorageHandler
> ---
>
> Key: HIVE-11755
> URL: https://issues.apache.org/jira/browse/HIVE-11755
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.2.2
>
> Attachments: HIVE-11755.001.patch, HIVE-11755.002.patch, 
> HIVE-11755.003.patch
>
>
> The following exception was noticed in testing out the 
> AccumuloStorageHandler's OutputFormat:
> {noformat}
> java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
> only be set once per job
>   at 
> org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
>   at 
> org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
>   at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
>   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
>   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>   Job Submission failed with exception 
> 'java.lang.IllegalStateException(Connector info for AccumuloOutputFormat can 
>

[jira] [Updated] (HIVE-11745) Alter table Exchange partition with multiple partition_spec is not working

2015-09-08 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11745:

Attachment: HIVE-11745.1.patch

> Alter table Exchange partition with multiple partition_spec is not working
> --
>
> Key: HIVE-11745
> URL: https://issues.apache.org/jira/browse/HIVE-11745
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11745.1.patch
>
>
> Single partition works, but multiple partitions will not work.
> Reproduce steps:
> {noformat}
> DROP TABLE IF EXISTS t1;
> DROP TABLE IF EXISTS t2;
> DROP TABLE IF EXISTS t3;
> DROP TABLE IF EXISTS t4;
> CREATE TABLE t1 (a int) PARTITIONED BY (d1 int);
> CREATE TABLE t2 (a int) PARTITIONED BY (d1 int);
> CREATE TABLE t3 (a int) PARTITIONED BY (d1 int, d2 int);
> CREATE TABLE t4 (a int) PARTITIONED BY (d1 int, d2 int);
> INSERT OVERWRITE TABLE t1 PARTITION (d1 = 1) SELECT salary FROM jsmall LIMIT 
> 10;
> INSERT OVERWRITE TABLE t3 PARTITION (d1 = 1, d2 = 1) SELECT salary FROM 
> jsmall LIMIT 10;
> SELECT * FROM t1;
> SELECT * FROM t3;
> ALTER TABLE t2 EXCHANGE PARTITION (d1 = 1) WITH TABLE t1;
> SELECT * FROM t1;
> SELECT * FROM t2;
> ALTER TABLE t4 EXCHANGE PARTITION (d1 = 1, d2 = 1) WITH TABLE t3;
> SELECT * FROM t3;
> SELECT * FROM t4;
> {noformat}
> The output:
> {noformat}
> 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t3;
> +---+++--+
> | t3.a  | t3.d1  | t3.d2  |
> +---+++--+
> +---+++--+
> No rows selected (0.227 seconds)
> 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t4;
> +---+++--+
> | t4.a  | t4.d1  | t4.d2  |
> +---+++--+
> +---+++--+
> No rows selected (0.266 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11745) Alter table Exchange partition with multiple partition_spec is not working

2015-09-08 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-11745:

Affects Version/s: 2.0.0
   1.2.0

> Alter table Exchange partition with multiple partition_spec is not working
> --
>
> Key: HIVE-11745
> URL: https://issues.apache.org/jira/browse/HIVE-11745
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-11745.1.patch
>
>
> Single partition works, but multiple partitions will not work.
> Reproduce steps:
> {noformat}
> DROP TABLE IF EXISTS t1;
> DROP TABLE IF EXISTS t2;
> DROP TABLE IF EXISTS t3;
> DROP TABLE IF EXISTS t4;
> CREATE TABLE t1 (a int) PARTITIONED BY (d1 int);
> CREATE TABLE t2 (a int) PARTITIONED BY (d1 int);
> CREATE TABLE t3 (a int) PARTITIONED BY (d1 int, d2 int);
> CREATE TABLE t4 (a int) PARTITIONED BY (d1 int, d2 int);
> INSERT OVERWRITE TABLE t1 PARTITION (d1 = 1) SELECT salary FROM jsmall LIMIT 
> 10;
> INSERT OVERWRITE TABLE t3 PARTITION (d1 = 1, d2 = 1) SELECT salary FROM 
> jsmall LIMIT 10;
> SELECT * FROM t1;
> SELECT * FROM t3;
> ALTER TABLE t2 EXCHANGE PARTITION (d1 = 1) WITH TABLE t1;
> SELECT * FROM t1;
> SELECT * FROM t2;
> ALTER TABLE t4 EXCHANGE PARTITION (d1 = 1, d2 = 1) WITH TABLE t3;
> SELECT * FROM t3;
> SELECT * FROM t4;
> {noformat}
> The output:
> {noformat}
> 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t3;
> +---+++--+
> | t3.a  | t3.d1  | t3.d2  |
> +---+++--+
> +---+++--+
> No rows selected (0.227 seconds)
> 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t4;
> +---+++--+
> | t4.a  | t4.d1  | t4.d2  |
> +---+++--+
> +---+++--+
> No rows selected (0.266 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11745) Alter table Exchange partition with multiple partition_spec is not working

2015-09-08 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735603#comment-14735603
 ] 

Yongzhi Chen commented on HIVE-11745:
-

The issue can not be reproduced in local setup, so it works fine with local 
file system.
It can be reproduced in cluster mode. It seems that hdfs can not mkdir or 
rename a folder whose parent folder does not exist. Make the fix by
creating parent folders needed. Attached the patch, the minimr test will be 
failed without the fix. 

> Alter table Exchange partition with multiple partition_spec is not working
> --
>
> Key: HIVE-11745
> URL: https://issues.apache.org/jira/browse/HIVE-11745
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.1.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>
> Single partition works, but multiple partitions will not work.
> Reproduce steps:
> {noformat}
> DROP TABLE IF EXISTS t1;
> DROP TABLE IF EXISTS t2;
> DROP TABLE IF EXISTS t3;
> DROP TABLE IF EXISTS t4;
> CREATE TABLE t1 (a int) PARTITIONED BY (d1 int);
> CREATE TABLE t2 (a int) PARTITIONED BY (d1 int);
> CREATE TABLE t3 (a int) PARTITIONED BY (d1 int, d2 int);
> CREATE TABLE t4 (a int) PARTITIONED BY (d1 int, d2 int);
> INSERT OVERWRITE TABLE t1 PARTITION (d1 = 1) SELECT salary FROM jsmall LIMIT 
> 10;
> INSERT OVERWRITE TABLE t3 PARTITION (d1 = 1, d2 = 1) SELECT salary FROM 
> jsmall LIMIT 10;
> SELECT * FROM t1;
> SELECT * FROM t3;
> ALTER TABLE t2 EXCHANGE PARTITION (d1 = 1) WITH TABLE t1;
> SELECT * FROM t1;
> SELECT * FROM t2;
> ALTER TABLE t4 EXCHANGE PARTITION (d1 = 1, d2 = 1) WITH TABLE t3;
> SELECT * FROM t3;
> SELECT * FROM t4;
> {noformat}
> The output:
> {noformat}
> 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t3;
> +---+++--+
> | t3.a  | t3.d1  | t3.d2  |
> +---+++--+
> +---+++--+
> No rows selected (0.227 seconds)
> 0: jdbc:hive2://10.17.74.148:1/default> SELECT * FROM t4;
> +---+++--+
> | t4.a  | t4.d1  | t4.d2  |
> +---+++--+
> +---+++--+
> No rows selected (0.266 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11645) Add in-place updates for dynamic partitions loading

2015-09-08 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735597#comment-14735597
 ] 

Ashutosh Chauhan commented on HIVE-11645:
-

I like the idea of printiing partition specs when 
{{hive.tez.exec.print.summary}} For trash issue, filed : HDFS-9037 For now I 
will just disable inplace updates if trash is on. 

> Add in-place updates for dynamic partitions loading
> ---
>
> Key: HIVE-11645
> URL: https://issues.apache.org/jira/browse/HIVE-11645
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11645.2.patch, HIVE-11645.3.patch, HIVE-11645.patch
>
>
> Currently, updates go to log file and on console there is no visible progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11762) TestHCatLoaderEncryption failures when using Hadoop 2.7

2015-09-08 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735588#comment-14735588
 ] 

Jason Dere commented on HIVE-11762:
---

This is used within Hive for testing as well (when setting up MiniMR tests), if 
that makes it any more acceptable to use within Hive.
Someone else might have to comment if it's possible to eliminate its use in 
Hive - [~spena]?

> TestHCatLoaderEncryption failures when using Hadoop 2.7
> ---
>
> Key: HIVE-11762
> URL: https://issues.apache.org/jira/browse/HIVE-11762
> Project: Hive
>  Issue Type: Bug
>  Components: Shims, Tests
>Reporter: Jason Dere
>
> When running TestHCatLoaderEncryption with -Dhadoop23.version=2.7.0, we get 
> the following error during setup():
> {noformat}
> testReadDataFromEncryptedHiveTableByPig[5](org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption)
>   Time elapsed: 3.648 sec  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.setKeyProvider(Lorg/apache/hadoop/crypto/key/KeyProviderCryptoExtension;)V
>   at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniDfs(Hadoop23Shims.java:534)
>   at 
> org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.initEncryptionShim(TestHCatLoaderEncryption.java:252)
>   at 
> org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.setup(TestHCatLoaderEncryption.java:200)
> {noformat}
> It looks like between Hadoop 2.6 and Hadoop 2.7, the argument to 
> DFSClient.setKeyProvider() changed:
> {noformat}
>@VisibleForTesting
> -  public void setKeyProvider(KeyProviderCryptoExtension provider) {
> -this.provider = provider;
> +  public void setKeyProvider(KeyProvider provider) {
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735577#comment-14735577
 ] 

Sergey Shelukhin commented on HIVE-11732:
-

I wonder if so much reflection is needed. Could there be interface + single 
create method that would create the invisible impl + start/stop/... called 
normally on the interface? For absence case the factory could return null or a 
dummy.

> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-08 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11614:
---
Attachment: HIVE-11614.04.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after 
> order by has problem
> -
>
> Key: HIVE-11614
> URL: https://issues.apache.org/jira/browse/HIVE-11614
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11614.01.patch, HIVE-11614.02.patch, 
> HIVE-11614.03.patch, HIVE-11614.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-08 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11614:
---
Attachment: (was: HIVE-11614.04.patch)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after 
> order by has problem
> -
>
> Key: HIVE-11614
> URL: https://issues.apache.org/jira/browse/HIVE-11614
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11614.01.patch, HIVE-11614.02.patch, 
> HIVE-11614.03.patch, HIVE-11614.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11762) TestHCatLoaderEncryption failures when using Hadoop 2.7

2015-09-08 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735570#comment-14735570
 ] 

Colin Patrick McCabe commented on HIVE-11762:
-

This is marked \@VisibleForTesting, so it seems like Hive should not be using 
this.  Is there a way to avoid using this?

> TestHCatLoaderEncryption failures when using Hadoop 2.7
> ---
>
> Key: HIVE-11762
> URL: https://issues.apache.org/jira/browse/HIVE-11762
> Project: Hive
>  Issue Type: Bug
>  Components: Shims, Tests
>Reporter: Jason Dere
>
> When running TestHCatLoaderEncryption with -Dhadoop23.version=2.7.0, we get 
> the following error during setup():
> {noformat}
> testReadDataFromEncryptedHiveTableByPig[5](org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption)
>   Time elapsed: 3.648 sec  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.setKeyProvider(Lorg/apache/hadoop/crypto/key/KeyProviderCryptoExtension;)V
>   at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniDfs(Hadoop23Shims.java:534)
>   at 
> org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.initEncryptionShim(TestHCatLoaderEncryption.java:252)
>   at 
> org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.setup(TestHCatLoaderEncryption.java:200)
> {noformat}
> It looks like between Hadoop 2.6 and Hadoop 2.7, the argument to 
> DFSClient.setKeyProvider() changed:
> {noformat}
>@VisibleForTesting
> -  public void setKeyProvider(KeyProviderCryptoExtension provider) {
> -this.provider = provider;
> +  public void setKeyProvider(KeyProvider provider) {
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11614) CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after order by has problem

2015-09-08 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11614:
---
Attachment: HIVE-11614.04.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): ctas after 
> order by has problem
> -
>
> Key: HIVE-11614
> URL: https://issues.apache.org/jira/browse/HIVE-11614
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11614.01.patch, HIVE-11614.02.patch, 
> HIVE-11614.03.patch, HIVE-11614.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11761) DoubleWritable hashcode for GroupBy is not properly generated

2015-09-08 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735564#comment-14735564
 ] 

Aihua Xu commented on HIVE-11761:
-

That's correct. It's only for performance reason.

> DoubleWritable hashcode for GroupBy is not properly generated
> -
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11761.patch
>
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. In some cases such as 
> {{select avg(t) from (select * from over1k cross join src) t group by d;}} 
> where d is double type, the data is actually in DoubleWritable, not 
> LazyDouble. Thus, before HADOOP-12217 gets fixed, we need to fix hashcode for 
> LazyDouble as well as DoubleWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11762) TestHCatLoaderEncryption failures when using Hadoop 2.7

2015-09-08 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735538#comment-14735538
 ] 

Arun Suresh commented on HIVE-11762:


Hmmm.. thats funny... KeyProviderCryptoExtension extends KeyProviderExtension 
which extends KeyProvider, so don't see why there is a problem..

> TestHCatLoaderEncryption failures when using Hadoop 2.7
> ---
>
> Key: HIVE-11762
> URL: https://issues.apache.org/jira/browse/HIVE-11762
> Project: Hive
>  Issue Type: Bug
>  Components: Shims, Tests
>Reporter: Jason Dere
>
> When running TestHCatLoaderEncryption with -Dhadoop23.version=2.7.0, we get 
> the following error during setup():
> {noformat}
> testReadDataFromEncryptedHiveTableByPig[5](org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption)
>   Time elapsed: 3.648 sec  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.setKeyProvider(Lorg/apache/hadoop/crypto/key/KeyProviderCryptoExtension;)V
>   at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniDfs(Hadoop23Shims.java:534)
>   at 
> org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.initEncryptionShim(TestHCatLoaderEncryption.java:252)
>   at 
> org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.setup(TestHCatLoaderEncryption.java:200)
> {noformat}
> It looks like between Hadoop 2.6 and Hadoop 2.7, the argument to 
> DFSClient.setKeyProvider() changed:
> {noformat}
>@VisibleForTesting
> -  public void setKeyProvider(KeyProviderCryptoExtension provider) {
> -this.provider = provider;
> +  public void setKeyProvider(KeyProvider provider) {
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11761) DoubleWritable hashcode for GroupBy is not properly generated

2015-09-08 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735530#comment-14735530
 ] 

Ashutosh Chauhan commented on HIVE-11761:
-

Just for clarification. HIVE-11502 was for performance, I presume this one is 
also for performance. There is no functionality issue here. Is that true, 
[~aihuaxu] ?
cc: [~gopalv]

> DoubleWritable hashcode for GroupBy is not properly generated
> -
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11761.patch
>
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. In some cases such as 
> {{select avg(t) from (select * from over1k cross join src) t group by d;}} 
> where d is double type, the data is actually in DoubleWritable, not 
> LazyDouble. Thus, before HADOOP-12217 gets fixed, we need to fix hashcode for 
> LazyDouble as well as DoubleWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11433) NPE for a multiple inner join query

2015-09-08 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735529#comment-14735529
 ] 

Wei Zheng commented on HIVE-11433:
--

[~xuefuz] Do you have a repro case for the problem? I'm doing some tests for 
multiple joins. Thanks.

> NPE for a multiple inner join query
> ---
>
> Key: HIVE-11433
> URL: https://issues.apache.org/jira/browse/HIVE-11433
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11433.patch, HIVE-11433.patch
>
>
> NullPointException is thrown for query that has multiple (greater than 3) 
> inner joins. Stacktrace for 1.1.0
> {code}
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.getIndex(ParseUtils.java:149)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.checkJoinFilterRefersOneAlias(ParseUtils.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.checkJoinFilterRefersOneAlias(ParseUtils.java:185)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.checkJoinFilterRefersOneAlias(ParseUtils.java:185)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.mergeJoins(SemanticAnalyzer.java:8257)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.mergeJoinTree(SemanticAnalyzer.java:8422)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9805)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9714)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10150)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10161)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10078)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:101)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:386)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:373)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}.
> However, the problem can also be reproduced in latest master branch. Further 
> investigation shows that the following code (in ParseUtils.java) is 
> problematic:
> {code}
>   static int getIndex(String[] list, String elem) {
> for(int i=0; i < list.length; i++) {
>   if (list[i].toLowerCase().equals(elem)) {
> return i;
>   }
> }
> return -1;
>   }
> {code}
> The code assumes that every element in the list is not null, which isn't true 
> because of the following code in SemanticAnalyzer.java (method genJoinTree()):
> {code}
> if ((right.getToken().getType() == HiveParser.TOK_TABREF)
> || (right.getToken().getType() == HiveParser.TOK_SUBQUERY)
> || (right.getToken().getType() == HiveParser.TOK_PTBLFUNCTION)) {

[jira] [Commented] (HIVE-11762) TestHCatLoaderEncryption failures when using Hadoop 2.7

2015-09-08 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735526#comment-14735526
 ] 

Jason Dere commented on HIVE-11762:
---

[~asuresh] [~cmccabe] any suggestions on the right way to get the KeyProvider 
from KeyProviderCryptoExtension for Hadoop 2.7 onwards?

> TestHCatLoaderEncryption failures when using Hadoop 2.7
> ---
>
> Key: HIVE-11762
> URL: https://issues.apache.org/jira/browse/HIVE-11762
> Project: Hive
>  Issue Type: Bug
>  Components: Shims, Tests
>Reporter: Jason Dere
>
> When running TestHCatLoaderEncryption with -Dhadoop23.version=2.7.0, we get 
> the following error during setup():
> {noformat}
> testReadDataFromEncryptedHiveTableByPig[5](org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption)
>   Time elapsed: 3.648 sec  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.setKeyProvider(Lorg/apache/hadoop/crypto/key/KeyProviderCryptoExtension;)V
>   at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getMiniDfs(Hadoop23Shims.java:534)
>   at 
> org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.initEncryptionShim(TestHCatLoaderEncryption.java:252)
>   at 
> org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.setup(TestHCatLoaderEncryption.java:200)
> {noformat}
> It looks like between Hadoop 2.6 and Hadoop 2.7, the argument to 
> DFSClient.setKeyProvider() changed:
> {noformat}
>@VisibleForTesting
> -  public void setKeyProvider(KeyProviderCryptoExtension provider) {
> -this.provider = provider;
> +  public void setKeyProvider(KeyProvider provider) {
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11761) DoubleWritable hashcode for GroupBy is not properly generated

2015-09-08 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735523#comment-14735523
 ] 

Chao Sun commented on HIVE-11761:
-

+1

> DoubleWritable hashcode for GroupBy is not properly generated
> -
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11761.patch
>
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. In some cases such as 
> {{select avg(t) from (select * from over1k cross join src) t group by d;}} 
> where d is double type, the data is actually in DoubleWritable, not 
> LazyDouble. Thus, before HADOOP-12217 gets fixed, we need to fix hashcode for 
> LazyDouble as well as DoubleWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4243) Fix column names in FileSinkOperator

2015-09-08 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4243:

Attachment: HIVE-4243.patch

This patch:
* separates the type description functionality from object inspectors.
* for hive record writers, use the table column names and types
* simplify the code for finding columns for the bloom filters
* take a first pass at the orc q files results update

> Fix column names in FileSinkOperator
> 
>
> Key: HIVE-4243
> URL: https://issues.apache.org/jira/browse/HIVE-4243
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-4243.patch
>
>
> All of the ObjectInspectors given to SerDe's by FileSinkOperator have virtual 
> column names. Since the files are part of tables, Hive knows the column 
> names. For self-describing file formats like ORC, having the real column 
> names will improve the understandability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-08 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735498#comment-14735498
 ] 

Wei Zheng commented on HIVE-11587:
--

The only thing that is left alone is the "memUsage" param passed to 
MapJoinBytesTableContainer. I didn't change that since the regular join doesn't 
have any problem with the ballpark max probe space. I'm afraid it may cause 
some potential issues if I adjust this number. If we do want to change this for 
regular join case too, then we'd better create a separate JIRA to track that. 
Let me know your opinion. Thanks!

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735475#comment-14735475
 ] 

Prasanth Jayachandran commented on HIVE-11732:
--

[~sershe] Can you take a look at the mini llap cluster itest changes for 
hadoop-1?

> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11732) LLAP: MiniLlapCluster integration broke hadoop-1 build

2015-09-08 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11732:
-
Attachment: HIVE-11732.1.patch

> LLAP: MiniLlapCluster integration broke hadoop-1 build
> --
>
> Key: HIVE-11732
> URL: https://issues.apache.org/jira/browse/HIVE-11732
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11732.1.patch
>
>
> HIVE-9900 broke hadoop-1 build. Needs shimming for MiniLlapCluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-09-08 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11684:
---
Attachment: HIVE-11684.03.patch

> Implement limit pushdown through outer join in CBO
> --
>
> Key: HIVE-11684
> URL: https://issues.apache.org/jira/browse/HIVE-11684
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11684.01.patch, HIVE-11684.02.patch, 
> HIVE-11684.03.patch, HIVE-11684.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-08 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735465#comment-14735465
 ] 

Josh Elser commented on HIVE-11755:
---

Cancelling the patch for now. Noticed a related failure testing out some 
queries by hand. Will provide new patch today.

> Incorrect method called with Kerberos enabled in AccumuloStorageHandler
> ---
>
> Key: HIVE-11755
> URL: https://issues.apache.org/jira/browse/HIVE-11755
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.2.2
>
> Attachments: HIVE-11755.001.patch, HIVE-11755.002.patch
>
>
> The following exception was noticed in testing out the 
> AccumuloStorageHandler's OutputFormat:
> {noformat}
> java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
> only be set once per job
>   at 
> org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
>   at 
> org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
>   at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
>   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
>   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>   Job Submission failed with exception 
> 'java.lang.IllegalStateException(Connector info for AccumuloOutputFormat can 
> only be set once per job)'
> {noformat}
> The OutputFormat implementation already had a method in

[jira] [Commented] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-08 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735442#comment-14735442
 ] 

Josh Elser commented on HIVE-11755:
---

{quote}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
{quote}

These seem unrelated to me.

> Incorrect method called with Kerberos enabled in AccumuloStorageHandler
> ---
>
> Key: HIVE-11755
> URL: https://issues.apache.org/jira/browse/HIVE-11755
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.2.2
>
> Attachments: HIVE-11755.001.patch, HIVE-11755.002.patch
>
>
> The following exception was noticed in testing out the 
> AccumuloStorageHandler's OutputFormat:
> {noformat}
> java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
> only be set once per job
>   at 
> org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
>   at 
> org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
>   at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
>   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
>   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>   Job Submission failed with exception 
> 'java.lang.IllegalStateException(Connector info for AccumuloOutputFormat can 
> only be set once per job)'
>

[jira] [Assigned] (HIVE-11742) last_value window specifier enforces ordering as a partition

2015-09-08 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-11742:
--

Assignee: Prateek Rungta

> last_value window specifier enforces ordering as a partition
> 
>
> Key: HIVE-11742
> URL: https://issues.apache.org/jira/browse/HIVE-11742
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Prateek Rungta
>Assignee: Prateek Rungta
>
> [HIVE-4262|https://issues.apache.org/jira/browse/HIVE-4262] changed the 
> partitioning behavior of the last_value function. For a specified 
> last_value() OVER X. The ordering spec within X is used in addition to the 
> partition spec for partitioning. i.e. last_value(a) OVER (PARTITION BY i 
> ORDER BY j) operates last_value(a) on all rows within the unique combination 
> of (i,j). The behavior I'd expect is for PARTITION BY i to define the 
> partitioning, and ORDER BY to define the ordering within the PARTITION. i.e. 
> last_value(a) OVER (PARTITION BY i ORDER BY j) should operate last_value(a) 
> on all rows within the unique values of (i), ordered by j within the 
> partition. 
> This was changed to be consistent with how SQLServer handled such queries. 
> [SQLServer 
> Docs|https://msdn.microsoft.com/en-us/library/hh231517.aspx?f=255&MSPPError=-2147217396]
>  describe their example (which performs as Hive does): 
> {quote}
> The PARTITION BY clause partitions the employees by department and the 
> LAST_VALUE function is applied to each partition independently. The ORDER BY 
> clause specified in the OVER clause determines the logical order in which the 
> LAST_VALUE function is applied to the rows in each partition.
> {quote}
> To me, their behavior is inconsistent with their description. I've filled an 
> [upstream 
> bug|https://connect.microsoft.com/SQLServer/feedback/details/1753482] with 
> Microsoft for the same. 
> [Oracle|https://oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions]
>  and 
> [Redshift|http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_firstlast_WF.html]
>  both exhibit the behavior I'd expect.
> Considering Hive-4262 has been in core for 2+ years, I don't think we can 
> change the behavior without potentially impacting clients. But I would like a 
> way to enable the expected behavior at the least (behind a config flag 
> maybe?). What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11761) DoubleWritable hashcode for GroupBy is not properly generated

2015-09-08 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11761:

Summary: DoubleWritable hashcode for GroupBy is not properly generated  
(was: DoubleWritable hash code for GroupBy is not properly generated)

> DoubleWritable hashcode for GroupBy is not properly generated
> -
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11761.patch
>
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. In some cases such as 
> {{select avg(t) from (select * from over1k cross join src) t group by d;}} 
> where d is double type, the data is actually in DoubleWritable, not 
> LazyDouble. Thus, we need to fix hashcode for LazyDouble as well as 
> DoubleWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11761) DoubleWritable hashcode for GroupBy is not properly generated

2015-09-08 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11761:

Description: HIVE-11502 fixed the hashcode for LazyDouble. Additionally we 
should fix for DoubleWritable as well due to HADOOP-12217 issue. In some cases 
such as {{select avg(t) from (select * from over1k cross join src) t group by 
d;}} where d is double type, the data is actually in DoubleWritable, not 
LazyDouble. Thus, before HADOOP-12217 gets fixed, we need to fix hashcode for 
LazyDouble as well as DoubleWritable.  (was: HIVE-11502 fixed the hashcode for 
LazyDouble. Additionally we should fix for DoubleWritable as well due to 
HADOOP-12217 issue. In some cases such as {{select avg(t) from (select * from 
over1k cross join src) t group by d;}} where d is double type, the data is 
actually in DoubleWritable, not LazyDouble. Thus, we need to fix hashcode for 
LazyDouble as well as DoubleWritable.)

> DoubleWritable hashcode for GroupBy is not properly generated
> -
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11761.patch
>
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. In some cases such as 
> {{select avg(t) from (select * from over1k cross join src) t group by d;}} 
> where d is double type, the data is actually in DoubleWritable, not 
> LazyDouble. Thus, before HADOOP-12217 gets fixed, we need to fix hashcode for 
> LazyDouble as well as DoubleWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11761) DoubleWritable hash code for GroupBy is not properly generated

2015-09-08 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11761:

Attachment: HIVE-11761.patch

> DoubleWritable hash code for GroupBy is not properly generated
> --
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11761.patch
>
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. In some cases such as 
> {{select avg(t) from (select * from over1k cross join src) t group by d;}} 
> where d is double type, the data is actually in DoubleWritable, not 
> LazyDouble. Thus, we need to fix hashcode for LazyDouble as well as 
> DoubleWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11761) DoubleWritable hash code for GroupBy is not properly generated

2015-09-08 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11761:

Description: HIVE-11502 fixed the hashcode for LazyDouble. Additionally we 
should fix for DoubleWritable as well due to HADOOP-12217 issue. In some cases 
such as {{select avg(t) from (select * from over1k cross join src) t group by 
d;}} where d is double type, the data is actually in DoubleWritable, not 
LazyDouble. Thus, we need to fix hashcode for LazyDouble as well as 
DoubleWritable.  (was: HIVE-11502 fixed the hashcode for LazyDouble. 
Additionally we should fix for DoubleWritable as well due to HADOOP-12217 
issue. In some cases, the data is actually in DoubleWritable, not LazyDouble. 
Thus, we need to fix hashcode for LazyDouble as well as DoubleWritable.)

> DoubleWritable hash code for GroupBy is not properly generated
> --
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. In some cases such as 
> {{select avg(t) from (select * from over1k cross join src) t group by d;}} 
> where d is double type, the data is actually in DoubleWritable, not 
> LazyDouble. Thus, we need to fix hashcode for LazyDouble as well as 
> DoubleWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11761) DoubleWritable hash code for GroupBy is not properly generated

2015-09-08 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11761:

Description: HIVE-11502 fixed the hashcode for LazyDouble. Additionally we 
should fix for DoubleWritable as well due to HADOOP-12217 issue. In some cases, 
the data is actually in DoubleWritable, not LazyDouble. Thus, we need to fix 
hashcode for LazyDouble as well as DoubleWritable.  (was: HIVE-11502 fixed the 
hashcode for LazyDouble. Additionally we should fix for DoubleWritable as well 
due to HADOOP-12217 issue. )

> DoubleWritable hash code for GroupBy is not properly generated
> --
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. In some cases, the data is 
> actually in DoubleWritable, not LazyDouble. Thus, we need to fix hashcode for 
> LazyDouble as well as DoubleWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11587) Fix memory estimates for mapjoin hashtable

2015-09-08 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735346#comment-14735346
 ] 

Sergey Shelukhin commented on HIVE-11587:
-

+1... should we file separate JIRA for items that are not done, from the 
description? If any

> Fix memory estimates for mapjoin hashtable
> --
>
> Key: HIVE-11587
> URL: https://issues.apache.org/jira/browse/HIVE-11587
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-11587.01.patch, HIVE-11587.02.patch, 
> HIVE-11587.03.patch, HIVE-11587.04.patch, HIVE-11587.05.patch, 
> HIVE-11587.06.patch, HIVE-11587.07.patch, HIVE-11587.08.patch
>
>
> Due to the legacy in in-memory mapjoin and conservative planning, the memory 
> estimation code for mapjoin hashtable is currently not very good. It 
> allocates the probe erring on the side of more memory, not taking data into 
> account because unlike the probe, it's free to resize, so it's better for 
> perf to allocate big probe and hope for the best with regard to future data 
> size. It is not true for hybrid case.
> There's code to cap the initial allocation based on memory available 
> (memUsage argument), but due to some code rot, the memory estimates from 
> planning are not even passed to hashtable anymore (there used to be two 
> config settings, hashjoin size fraction by itself, or hashjoin size fraction 
> for group by case), so it never caps the memory anymore below 1 Gb. 
> Initial capacity is estimated from input key count, and in hybrid join cache 
> can exceed Java memory due to number of segments.
> There needs to be a review and fix of all this code.
> Suggested improvements:
> 1) Make sure "initialCapacity" argument from Hybrid case is correct given the 
> number of segments. See how it's calculated from keys for regular case; it 
> needs to be adjusted accordingly for hybrid case if not done already.
> 1.5) Note that, knowing the number of rows, the maximum capacity one will 
> ever need for probe size (in longs) is row count (assuming key per row, i.e. 
> maximum possible number of keys) divided by load factor, plus some very small 
> number to round up. That is for flat case. For hybrid case it may be more 
> complex due to skew, but that is still a good upper bound for the total probe 
> capacity of all segments.
> 2) Rename memUsage to maxProbeSize, or something, make sure it's passed 
> correctly based on estimates that take into account both probe and data size, 
> esp. in hybrid case.
> 3) Make sure that memory estimation for hybrid case also doesn't come up with 
> numbers that are too small, like 1-byte hashtable. I am not very familiar 
> with that code but it has happened in the past.
> Other issues we have seen:
> 4) Cap single write buffer size to 8-16Mb. The whole point of WBs is that you 
> should not allocate large array in advance. Even if some estimate passes 
> 500Mb or 40Mb or whatever, it doesn't make sense to allocate that.
> 5) For hybrid, don't pre-allocate WBs - only allocate on write.
> 6) Change everywhere rounding up to power of two is used to rounding down, at 
> least for hybrid case (?)
> I wanted to put all of these items in single JIRA so we could keep track of 
> fixing all of them.
> I think there are JIRAs for some of these already, feel free to link them to 
> this one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11761) DoubleWritable hash code for GroupBy is not properly generated

2015-09-08 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11761:

Description: HIVE-11502 fixed the hashcode for LazyDouble. Additionally we 
should fix for DoubleWritable as well due to HADOOP-12217 issue. 

> DoubleWritable hash code for GroupBy is not properly generated
> --
>
> Key: HIVE-11761
> URL: https://issues.apache.org/jira/browse/HIVE-11761
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> HIVE-11502 fixed the hashcode for LazyDouble. Additionally we should fix for 
> DoubleWritable as well due to HADOOP-12217 issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735315#comment-14735315
 ] 

Hive QA commented on HIVE-11755:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754665/HIVE-11755.002.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9423 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5201/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5201/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5201/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754665 - PreCommit-HIVE-TRUNK-Build

> Incorrect method called with Kerberos enabled in AccumuloStorageHandler
> ---
>
> Key: HIVE-11755
> URL: https://issues.apache.org/jira/browse/HIVE-11755
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.2.2
>
> Attachments: HIVE-11755.001.patch, HIVE-11755.002.patch
>
>
> The following exception was noticed in testing out the 
> AccumuloStorageHandler's OutputFormat:
> {noformat}
> java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
> only be set once per job
>   at 
> org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
>   at 
> org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
>   at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
>   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.ja

[jira] [Commented] (HIVE-11696) Exception when table-level serde is Parquet while partition-level serde is JSON

2015-09-08 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735253#comment-14735253
 ] 

Chao Sun commented on HIVE-11696:
-

+1

> Exception when table-level serde is Parquet while partition-level serde is 
> JSON
> ---
>
> Key: HIVE-11696
> URL: https://issues.apache.org/jira/browse/HIVE-11696
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11696.2.patch, HIVE-11696.patch
>
>
> Create a table with partitions and set the SerDe to be Json. The query 
> "select * from tbl1" works fine.
> Now set table level SerDe to be Parquet serde. Based on HIVE-6785, table 
> level and partition level can have different serdes. 
> {noformat}
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1764)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:417)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> ... 13 more
> Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
> java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:112)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:310)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:202)
> at 
> org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:61)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40)
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:90)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11696) Exception when table-level serde is Parquet while partition-level serde is JSON

2015-09-08 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735245#comment-14735245
 ] 

Aihua Xu commented on HIVE-11696:
-

Those failures are unrelated to the patch. 

> Exception when table-level serde is Parquet while partition-level serde is 
> JSON
> ---
>
> Key: HIVE-11696
> URL: https://issues.apache.org/jira/browse/HIVE-11696
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11696.2.patch, HIVE-11696.patch
>
>
> Create a table with partitions and set the SerDe to be Json. The query 
> "select * from tbl1" works fine.
> Now set table level SerDe to be Parquet serde. Based on HIVE-6785, table 
> level and partition level can have different serdes. 
> {noformat}
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:154)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1764)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.UnsupportedOperationException: Cannot inspect java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:93)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:812)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:417)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> ... 13 more
> Caused by: java.lang.UnsupportedOperationException: Cannot inspect 
> java.util.ArrayList
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:112)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:310)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:202)
> at 
> org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:61)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40)
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:90)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11751) hive-exec-log4j2.xml settings causes DEBUG messages to be generated and ignored

2015-09-08 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735219#comment-14735219
 ] 

Gunther Hagleitner commented on HIVE-11751:
---

fyi [~prasanth_j]

> hive-exec-log4j2.xml settings causes DEBUG messages to be generated and 
> ignored
> ---
>
> Key: HIVE-11751
> URL: https://issues.apache.org/jira/browse/HIVE-11751
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: hiveserver2_log4j.png
>
>
> Setting "INFO" in 
> dist/hive/conf/hive-exec-log4j2.xml fixes the problem. Should it be made as 
> default in hive-exec-log4j2.xml? "--hiveconf hive.log.level=INFO" from 
> commandline does not have any impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11751) hive-exec-log4j2.xml settings causes DEBUG messages to be generated and ignored

2015-09-08 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-11751:
--
Assignee: Prasanth Jayachandran

> hive-exec-log4j2.xml settings causes DEBUG messages to be generated and 
> ignored
> ---
>
> Key: HIVE-11751
> URL: https://issues.apache.org/jira/browse/HIVE-11751
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Prasanth Jayachandran
> Attachments: hiveserver2_log4j.png
>
>
> Setting "INFO" in 
> dist/hive/conf/hive-exec-log4j2.xml fixes the problem. Should it be made as 
> default in hive-exec-log4j2.xml? "--hiveconf hive.log.level=INFO" from 
> commandline does not have any impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735102#comment-14735102
 ] 

Hive QA commented on HIVE-11684:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754641/HIVE-11684.02.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9424 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_dynpart_hashjoin_3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testTimeOutReaper
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5200/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5200/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5200/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754641 - PreCommit-HIVE-TRUNK-Build

> Implement limit pushdown through outer join in CBO
> --
>
> Key: HIVE-11684
> URL: https://issues.apache.org/jira/browse/HIVE-11684
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11684.01.patch, HIVE-11684.02.patch, 
> HIVE-11684.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11755) Incorrect method called with Kerberos enabled in AccumuloStorageHandler

2015-09-08 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-11755:
--
Attachment: HIVE-11755.002.patch

Fix the Hadoop 1 compatibility issues

> Incorrect method called with Kerberos enabled in AccumuloStorageHandler
> ---
>
> Key: HIVE-11755
> URL: https://issues.apache.org/jira/browse/HIVE-11755
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.2.2
>
> Attachments: HIVE-11755.001.patch, HIVE-11755.002.patch
>
>
> The following exception was noticed in testing out the 
> AccumuloStorageHandler's OutputFormat:
> {noformat}
> java.lang.IllegalStateException: Connector info for AccumuloOutputFormat can 
> only be set once per job
>   at 
> org.apache.accumulo.core.client.mapreduce.lib.impl.ConfiguratorBase.setConnectorInfo(ConfiguratorBase.java:146)
>   at 
> org.apache.accumulo.core.client.mapred.AccumuloOutputFormat.setConnectorInfo(AccumuloOutputFormat.java:125)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.configureAccumuloOutputFormat(HiveAccumuloTableOutputFormat.java:95)
>   at 
> org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat.checkOutputSpecs(HiveAccumuloTableOutputFormat.java:51)
>   at 
> org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat.checkOutputSpecs(HivePassThroughOutputFormat.java:46)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:1124)
>   at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:431)
>   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
>   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>   Job Submission failed with exception 
> 'java.lang.IllegalStateException(Connector info for AccumuloOutputFormat can 
> only be set once per job)'
> {noformat}
> The OutputFormat implementation already had a method in place to account for 
> this exception but the method accidentally wasn't getting called.



--
This me

[jira] [Commented] (HIVE-11758) Querying nested parquet columns is case sensitive

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734929#comment-14734929
 ] 

Hive QA commented on HIVE-11758:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754623/HIVE-11758.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9423 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5199/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5199/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5199/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12754623 - PreCommit-HIVE-TRUNK-Build

> Querying nested parquet columns is case sensitive
> -
>
> Key: HIVE-11758
> URL: https://issues.apache.org/jira/browse/HIVE-11758
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Jakub Kukul
>Priority: Minor
> Attachments: HIVE-11758.patch
>
>
> Querying nested parquet columns (columns within a {{STRUCT}}) is case 
> sensitive. It should be case insensitive, to be compatible with querying 
> non-nested columns and querying nested columns with other file formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11684) Implement limit pushdown through outer join in CBO

2015-09-08 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11684:
---
Attachment: HIVE-11684.02.patch

> Implement limit pushdown through outer join in CBO
> --
>
> Key: HIVE-11684
> URL: https://issues.apache.org/jira/browse/HIVE-11684
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11684.01.patch, HIVE-11684.02.patch, 
> HIVE-11684.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11742) last_value window specifier enforces ordering as a partition

2015-09-08 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734784#comment-14734784
 ] 

Aihua Xu commented on HIVE-11742:
-

I didn't know that last_value behaves like that for {{last_value(x) over 
(partition by y order by z)}}. Seems it will have different behaviors from 
windowing function {{last_value(x) over (partition by y order by z rows between 
x preceding and y following)}} and may confuse people. I think it makes sense 
to make the change. 

Of course it 's non-compatible change. We may have to add a configuration to 
allow enabling the new behavior. 

> last_value window specifier enforces ordering as a partition
> 
>
> Key: HIVE-11742
> URL: https://issues.apache.org/jira/browse/HIVE-11742
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Prateek Rungta
>
> [HIVE-4262|https://issues.apache.org/jira/browse/HIVE-4262] changed the 
> partitioning behavior of the last_value function. For a specified 
> last_value() OVER X. The ordering spec within X is used in addition to the 
> partition spec for partitioning. i.e. last_value(a) OVER (PARTITION BY i 
> ORDER BY j) operates last_value(a) on all rows within the unique combination 
> of (i,j). The behavior I'd expect is for PARTITION BY i to define the 
> partitioning, and ORDER BY to define the ordering within the PARTITION. i.e. 
> last_value(a) OVER (PARTITION BY i ORDER BY j) should operate last_value(a) 
> on all rows within the unique values of (i), ordered by j within the 
> partition. 
> This was changed to be consistent with how SQLServer handled such queries. 
> [SQLServer 
> Docs|https://msdn.microsoft.com/en-us/library/hh231517.aspx?f=255&MSPPError=-2147217396]
>  describe their example (which performs as Hive does): 
> {quote}
> The PARTITION BY clause partitions the employees by department and the 
> LAST_VALUE function is applied to each partition independently. The ORDER BY 
> clause specified in the OVER clause determines the logical order in which the 
> LAST_VALUE function is applied to the rows in each partition.
> {quote}
> To me, their behavior is inconsistent with their description. I've filled an 
> [upstream 
> bug|https://connect.microsoft.com/SQLServer/feedback/details/1753482] with 
> Microsoft for the same. 
> [Oracle|https://oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions]
>  and 
> [Redshift|http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_firstlast_WF.html]
>  both exhibit the behavior I'd expect.
> Considering Hive-4262 has been in core for 2+ years, I don't think we can 
> change the behavior without potentially impacting clients. But I would like a 
> way to enable the expected behavior at the least (behind a config flag 
> maybe?). What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11756) Avoid redundant key serialization in RS for distinct query

2015-09-08 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734778#comment-14734778
 ] 

Hive QA commented on HIVE-11756:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12754603/HIVE-11756.1.patch.txt

{color:red}ERROR:{color} -1 due to 1758 failed/errored test(s), 9422 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table2_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_numbuckets_partitioned_table_h23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_invalidate_column_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_deep_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_create_temp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.tes

[jira] [Commented] (HIVE-11617) Explain plan for multiple lateral views is very slow

2015-09-08 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734754#comment-14734754
 ] 

Aihua Xu commented on HIVE-11617:
-

Thanks for reviewing and commit, [~jcamachorodriguez]

> Explain plan for multiple lateral views is very slow
> 
>
> Key: HIVE-11617
> URL: https://issues.apache.org/jira/browse/HIVE-11617
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11617.2.patch, HIVE-11617.patch, HIVE-11617.patch
>
>
> The following explain job will be very slow or never finish if there are many 
> lateral views involved. High CPU usage is also noticed.
> {noformat}
> CREATE TABLE `t1`(`pattern` array);
>   
> explain select * from t1 
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1;
> {noformat}
> From jstack, the job is busy with preorder tree traverse. 
> {noformat}
> at java.util.regex.Matcher.getTextLength(Matcher.java:1234)
> at java.util.regex.Matcher.reset(Matcher.java:308)
> at java.util.regex.Matcher.(Matcher.java:228)
> at java.util.regex.Pattern.matcher(Pattern.java:1088)
> at org.apache.hadoop.hive.ql.lib.RuleRegExp.cost(RuleRegExp.java:67)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:72)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(Pr

1 2 >

1 - 100 of 113 matches

Mail list logo