date:20140513


 [ 
https://issues.apache.org/jira/browse/HIVE-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7035:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks for the contribution Eugene!


 Templeton returns 500 for user errors - when job cannot be found
 

 Key: HIVE-7035
 URL: https://issues.apache.org/jira/browse/HIVE-7035
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7035.patch


 curl -i 
 'http://localhost:50111/templeton/v1/jobs/job_139949638_00011?user.name=ekoifman'
  should return HTTP Status code 4xx when no such job exists; it currently 
 returns 500.
 {noformat}
 {error:org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: 
 Application with id 'application_201304291205_0015' doesn't exist in 
 RM.\r\n\tat org.apache.hadoop.yarn.server.resourcemanager
 .ClientRMService.getApplicationReport(ClientRMService.java:247)\r\n\tat 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocol
 PBServiceImpl.java:120)\r\n\tat 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)\r\n\tat
  org.apache.hado
 op.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
 org.apache.hadoop.ipc.Server$Handler$1.run(Serve
 r.java:2053)\r\n\tat 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)\r\n\tat 
 java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
 javax.security.auth.Subject.doAs(Subject.ja
 va:415)\r\n\tat 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)\r\n\tat
  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)\r\n}
 {noformat}
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-13 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated HIVE-5733:
-

Description: 
Currently the artifact {{hive-exec}} that is available in 
[maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
 is shading all the dependencies (= the jar contains all Hive's dependencies). 
As other projects that are depending on Hive might be use slightly different 
version of the dependencies, it can easily happens that Hive's shaded version 
will be used instead which leads to very time consuming debugging of what is 
happening (for example SQOOP-1198).

Would it be feasible publish {{hive-exec}} jar that will be build without 
shading any dependency? For example 
[avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
 is having classifier nodeps that represents artifact without any 
dependencies.

  was:
Currently the artifact {{hive-exec}} that is available in 
[maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
 is shading all the dependencies (= the jar contains all Hive's dependencies). 
As other projects that are depending on Hive might be use slightly different 
version of the dependencies, it can easily happens that Hive's shadowed version 
will be used instead which leads to very time consuming debugging of what is 
happening (for example SQOOP-1198).

Would it be feasible publish {{hive-exec}} jar that will be build without 
shadowing any dependency? For example 
[avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
 is having classifier nodeps that represents artifact without any 
dependencies.


 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho

 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-4719) EmbeddedLockManager should be shared to all clients

2014-05-13 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4719:


Attachment: HIVE-4719.6.patch.txt

 EmbeddedLockManager should be shared to all clients
 ---

 Key: HIVE-4719
 URL: https://issues.apache.org/jira/browse/HIVE-4719
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-4719.5.patch.txt, HIVE-4719.6.patch.txt, 
 HIVE-4719.D11229.1.patch, HIVE-4719.D11229.2.patch, HIVE-4719.D11229.3.patch, 
 HIVE-4719.D11229.4.patch


 Currently, EmbeddedLockManager is created per Driver instance, so locking has 
 no meaning.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

2014-05-13 Thread Mohammad Islam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/
---

(Updated May 9, 2014, 12:23 a.m.)


Review request for hive, Ashutosh Chauhan and Jakob Homan.


Changes
---

Rebased with the latest commit.


Bugs: HIVE-3159
https://issues.apache.org/jira/browse/HIVE-3159


Repository: hive-git


Description
---

Problem:
Hive doesn't support to create a Avro-based table using HQL create table 
command. It currently requires to specify Avro schema literal or schema file 
name.
For multiple cases, it is very inconvenient for user.
Some of the un-supported use cases:
1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE
2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE
3. Create  table  without specifying Avro schema.


Diffs (updated)
-

  ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_nested_complex.q PRE-CREATION 
  ql/src/test/queries/clientpositive/avro_nullable_fields.q f90ceb9 
  ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
  ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_nested_complex.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/avro_nullable_fields.q.out 77a6a2e 
  ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 4564e75 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 
67d5570 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/11925/diff/


Testing
---

Wrote a new java Test class for a new Java class. Added a new test case into 
existing java test class. In addition, there are 4 .q file for testing multiple 
use-cases.


Thanks,

Mohammad Islam

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Status: Open  (was: Patch Available)

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6374) Hive job submitted with non-default name node (fs.default.name) doesn't process locations properly


[ 
https://issues.apache.org/jira/browse/HIVE-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995914#comment-13995914
 ] 

Ashutosh Chauhan commented on HIVE-6374:


+1

 Hive job submitted with non-default name node (fs.default.name) doesn't 
 process locations properly 
 ---

 Key: HIVE-6374
 URL: https://issues.apache.org/jira/browse/HIVE-6374
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0, 0.12.0, 0.13.0
 Environment: Any
Reporter: Benjamin Zhitomirsky
Assignee: Benjamin Zhitomirsky
 Attachments: Design of the fix HIVE-6374.docx, hive-6374.1.patch, 
 hive-6374.3.patch, hive-6374.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Create table/index/database and add partition DDL doesn't work properly if 
 all following conditions are true:
 - Metastore service is used
 - fs.default.name is specified and it differs from the default one
 - Location is not specified or specified as a not fully qualified URI
 The root cause of this behavior is that Hive client doesn't pass 
 configuration context to the metastore services which tries to resolve the 
 paths. The fix is it too resolve the path in the Hive client if 
 fs.default.name is specified and it differs from the default one (it is must 
 easier then start passing the context, which would be a major change).
 The CR will submitted shortly after tests are done



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Status: Open  (was: Patch Available)

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-13 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated HIVE-5733:
-

Description: 
Currently the artifact {{hive-exec}} that is available in 
[maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
 is shading all the dependencies (= the jar contains all Hive's dependencies). 
As other projects that are depending on Hive might be use slightly different 
version of the dependencies, it can easily happens that Hive's shadowed version 
will be used instead which leads to very time consuming debugging of what is 
happening (for example SQOOP-1198).

Would it be feasible publish {{hive-exec}} jar that will be build without 
shadowing any dependency? For example 
[avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
 is having classifier nodeps that represents artifact without any 
dependencies.

  was:
Currently the artifact {{hive-exec}} that is available in 
[maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
 is shadowing all the dependencies (= the jar contains all Hive's 
dependencies). As other projects that are depending on Hive might be use 
slightly different version of the dependencies, it can easily happens that 
Hive's shadowed version will be used instead which leads to very time consuming 
debugging of what is happening (for example SQOOP-1198).

Would it be feasible publish {{hive-exec}} jar that will be build without 
shadowing any dependency? For example 
[avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
 is having classifier nodeps that represents artifact without any 
dependencies.


 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho

 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shadowed version will be used instead which leads to very time 
 consuming debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shadowing any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 18492: HIVE-6473: Allow writing HFiles via HBaseStorageHandler table

2014-05-13 Thread nick dimiduk


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18492/
---

(Updated May 13, 2014, 4:07 a.m.)


Review request for hive.


Changes
---

Updating diff with HIVE-6473.1.patch.txt from JIRA.


Bugs: HIVE-6473
https://issues.apache.org/jira/browse/HIVE-6473


Repository: hive-git


Description
---

From the JIRA:

Generating HFiles for bulkload into HBase could be more convenient. Right now 
we require the user to register a new table with the appropriate output format. 
This patch allows the exact same functionality, but through an existing table 
managed by the HBaseStorageHandler.


Diffs (updated)
-

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
4fe1b1b 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java 
be1210e 
  hbase-handler/src/test/queries/negative/generatehfiles_require_family_path.q 
PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_bulk.m f8bb47d 
  hbase-handler/src/test/queries/positive/hbase_bulk.q PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_handler_bulk.q PRE-CREATION 
  
hbase-handler/src/test/results/negative/generatehfiles_require_family_path.q.out
 PRE-CREATION 
  hbase-handler/src/test/results/positive/hbase_bulk.q.out PRE-CREATION 
  hbase-handler/src/test/results/positive/hbase_handler_bulk.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/18492/diff/


Testing
---


Thanks,

nick dimiduk

[jira] [Commented] (HIVE-5342) Remove pre hadoop-0.20.0 related codes


[ 
https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996028#comment-13996028
 ] 

Hive QA commented on HIVE-5342:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644454/HIVE-5342.2.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/184/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/184/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644454

 Remove pre hadoop-0.20.0 related codes
 --

 Key: HIVE-5342
 URL: https://issues.apache.org/jira/browse/HIVE-5342
 Project: Hive
  Issue Type: Task
Reporter: Navis
Assignee: Jason Dere
Priority: Trivial
 Attachments: D13047.1.patch, HIVE-5342.1.patch, HIVE-5342.2.patch


 Recently, we discussed not supporting hadoop-0.20.0. If it would be done like 
 that or not, 0.17 related codes would be removed before that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6601) alter database commands should support schema synonym keyword


[ 
https://issues.apache.org/jira/browse/HIVE-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995393#comment-13995393
 ] 

Thejas M Nair commented on HIVE-6601:
-

This is the case with the other alter database command as well -  ALTER 
DATABASE database_name SET DBPROPERTIES


 alter database commands should support schema synonym keyword
 -

 Key: HIVE-6601
 URL: https://issues.apache.org/jira/browse/HIVE-6601
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair

 It should be possible to use alter schema  as an alternative to alter 
 database.  But the syntax is not currently supported.
 {code}
 alter schema db1 set owner user x;  
 NoViableAltException(215@[])
 FAILED: ParseException line 1:6 cannot recognize input near 'schema' 'db1' 
 'set' in alter statement
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION

2014-05-13 Thread Prasanth J (JIRA)

Prasanth J created HIVE-7051:


 Summary: Display partition level column stats in DESCRIBE 
EXTENDED/FORMATTED PARTITION
 Key: HIVE-7051
 URL: https://issues.apache.org/jira/browse/HIVE-7051
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J


Same as HIVE-7050 but for partitions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6957) SQL authorization does not work with HS2 binary mode and Kerberos auth

2014-05-13 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6957:
---

Fix Version/s: 0.13.1

 SQL authorization does not work with HS2 binary mode and Kerberos auth
 --

 Key: HIVE-6957
 URL: https://issues.apache.org/jira/browse/HIVE-6957
 Project: Hive
  Issue Type: Bug
  Components: Authorization, HiveServer2
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.14.0, 0.13.1

 Attachments: HIVE-6957.04-branch.0.13.patch, HIVE-6957.1.patch, 
 HIVE-6957.2.patch, HIVE-6957.3.patch, HIVE-6957.4.patch


 In HiveServer2, when Kerberos auth and binary transport modes are used, the 
 user name that gets passed on to authorization is the long kerberos username.
 The username that is used in grant/revoke statements tend to be the short 
 usernames.
 This also fails in authorizing statements that involve URI, as the 
 authorization mode checks the file system permissions for given user. It does 
 not recognize that the given long username actually owns the file or belongs 
 to the group that owns the file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Status: Patch Available  (was: Open)

Re-uploading as Hive QA failed to run tests.

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch, HIVE-7043.4.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-05-13 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: HIVE-6394.2.patch

Adding unit tests.

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables

2014-05-13 Thread Mohammad Kamrul Islam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995897#comment-13995897
 ] 

Mohammad Kamrul Islam commented on HIVE-3159:
-

Recently committed HIVE-5823, added some bug.
I created a separate JIRA (HIVE-7049) to address this. Uploaded a patch for 
that.



 Update AvroSerde to determine schema of new tables
 --

 Key: HIVE-3159
 URL: https://issues.apache.org/jira/browse/HIVE-3159
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jakob Homan
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
 HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
 HIVE-3159v1.patch


 Currently when writing tables to Avro one must manually provide an Avro 
 schema that matches what is being delivered by Hive. It'd be better to have 
 the serde infer this schema by converting the table's TypeInfo into an 
 appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6772) Virtual columns when used with Lateral View Explode results in SemanticException [Error 10004]

2014-05-13 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992558#comment-13992558
 ] 

Navis commented on HIVE-6772:
-

I think this is fixed by HIVE-3226 and others.

 Virtual columns when used with Lateral View Explode results in 
 SemanticException [Error 10004]
 --

 Key: HIVE-6772
 URL: https://issues.apache.org/jira/browse/HIVE-6772
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
 Environment: Red Hat Enterprise Linux Server release 6.3 (Santiago)
 Hadoop 2.0.0-cdh4.1.2
 Hive 0.9.0
Reporter: Steve Ogden
Priority: Minor

 When using the virtual columns with 'lateral view explode', I get the 
 following error:
 FAILED: SemanticException [Error 10004]: Line 3:22 Invalid table alias or 
 column reference 'INPUT__FILE__NAME': (possible column names are: _col0, 
 _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, 
 _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, 
 _col20, _col21, _col22)
 Here is the query:
 select
   newMd5(concat(INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE)) ukey,
   flat_ric_cd as ric_cd
 from edwpoc.ts_rtd_gs_stg
 lateral view explode(split(ric_cd,',')) subView as flat_ric_cd



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7033) grant statements should check if the role exists


 [ 
https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7033:


Attachment: HIVE-7033.3.patch

Thanks for pointing that out Ashutosh!
HIVE-7033.3.patch - changes to avoid TOCTOU issue.

 grant statements should check if the role exists
 

 Key: HIVE-7033
 URL: https://issues.apache.org/jira/browse/HIVE-7033
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch


 The following grant statement that grants to a role that does not exist 
 succeeds, but it should result in an error.
  grant all on t1 to role nosuchrole;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7016) Hive returns wrong results when execute UDF on top of DISTINCT column


[ 
https://issues.apache.org/jira/browse/HIVE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993829#comment-13993829
 ] 

Ashutosh Chauhan commented on HIVE-7016:


+1

 Hive returns wrong results when execute UDF on top of DISTINCT column
 -

 Key: HIVE-7016
 URL: https://issues.apache.org/jira/browse/HIVE-7016
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.1
Reporter: Selina Zhang
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7016.1.patch.txt


 The following query returns wrong result:
 select hash(distinct value) from table;
 This kind of query should be identified as syntax error. However, Hive 
 ignores DISTINCT and returns the result. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7013) Partition of type int has ambiguity for path like field=01


[ 
https://issues.apache.org/jira/browse/HIVE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995373#comment-13995373
 ] 

Ashutosh Chauhan commented on HIVE-7013:


[~reno] There are number of improvements in this area in later versions of 
Hive. 0.9 is too old. Can you try this with 0.13 ?



 Partition of type int has ambiguity for path like field=01
 --

 Key: HIVE-7013
 URL: https://issues.apache.org/jira/browse/HIVE-7013
 Project: Hive
  Issue Type: Bug
Reporter: Peng Zhang

 1. store data in path like /hive/table/year=2014/month=01/day=01
 2.create table with partitioned by (year int, month int, day int)
 3. add partition(year=2014, month=1, day=1)
 add partition(year=2014, month=01, day=01)
 This will create two partitions and locations are /year=2014/month=1/day=1 
 and year=2014/month=01/day=01 seperately.
 4. select   where month=1  = no data 
 select   where month=01 = no data
 select   where month=01  = OK
 I tested this scenario in 0.9, and add partition(year=2014, month=1) with 
 select where month=1 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-05-13 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Attachment: HIVE-6473.1.patch.txt

Rebased to trunk, addressing RB comments, fixed broken tests.

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch.txt


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table

2014-05-13 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6473:
---

Status: Patch Available  (was: Open)

 Allow writing HFiles via HBaseStorageHandler table
 --

 Key: HIVE-6473
 URL: https://issues.apache.org/jira/browse/HIVE-6473
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch.txt


 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

How to remote debug WebHCat?

2014-05-13 Thread Na Yang

Hi Folks,

Is there a way to remote debug webhcat? If so, how to enable the remote
debug?

Thanks,
Na

[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-13 Thread Mohammad Kamrul Islam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-7049:


Attachment: HIVE-7049.1.patch

patch uploaded

 Unable to deserialize AVRO data when file schema and record schema are 
 different and nullable
 -

 Key: HIVE-7049
 URL: https://issues.apache.org/jira/browse/HIVE-7049
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-7049.1.patch


 It mainly happens when 
 1 )file schema and record schema are not same
 2 ) Record schema is nullable  but file schema is not.
 The potential code location is at class AvroDeserialize
  
 {noformat}
  if(AvroSerdeUtils.isNullableType(recordSchema)) {
   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
 columnType);
 }
 {noformat}
 In the above code snippet, recordSchema is verified if it is nullable. But 
 the file schema is not checked.
 I tested with these values:
 {noformat}
 recordSchema= [null,string]
 fielSchema= string
 {noformat}
 And i got the following exception line numbers might not be the same due to 
 mu debugged code version.
 {noformat}
 org.apache.avro.AvroRuntimeException: Not a union: string 
 at org.apache.avro.Schema.getTypes(Schema.java:272)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6910) Invalid column access info for partitioned table


[ 
https://issues.apache.org/jira/browse/HIVE-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995557#comment-13995557
 ] 

Ashutosh Chauhan commented on HIVE-6910:


Patch looks good. But looks like there are few changes which may not be 
essential for the patch. Left comments on RB.

 Invalid column access info for partitioned table
 

 Key: HIVE-6910
 URL: https://issues.apache.org/jira/browse/HIVE-6910
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0, 0.13.0
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6910.1.patch.txt, HIVE-6910.2.patch.txt, 
 HIVE-6910.3.patch.txt, HIVE-6910.4.patch.txt


 From http://www.mail-archive.com/user@hive.apache.org/msg11324.html
 neededColumnIDs in TS is only for non-partition columns. But 
 ColumnAccessAnalyzer is calculating it on all columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7033) grant statements should check if the role exists


 [ 
https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7033:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Thejas!

 grant statements should check if the role exists
 

 Key: HIVE-7033
 URL: https://issues.apache.org/jira/browse/HIVE-7033
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.14.0

 Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch, 
 HIVE-7033.4.patch


 The following grant statement that grants to a role that does not exist 
 succeeds, but it should result in an error.
  grant all on t1 to role nosuchrole;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7031) Utiltites.createEmptyFile uses File.Separator instead of Path.Separator to create an empty file in HDFS


[ 
https://issues.apache.org/jira/browse/HIVE-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993499#comment-13993499
 ] 

Hive QA commented on HIVE-7031:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12643845/HIVE-7031.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5433 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/150/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/150/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12643845

 Utiltites.createEmptyFile uses File.Separator instead of Path.Separator to 
 create an empty file in HDFS
 ---

 Key: HIVE-7031
 URL: https://issues.apache.org/jira/browse/HIVE-7031
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Fix For: 0.14.0

 Attachments: HIVE-7031.1.patch


 This leads to inconsitent HDFS naming for empty partition/tables where a file 
 might be named as  hdfs://headnode0:9000/hive/scratch/hive_2
 014-04-07_22-39-52_649_4046112898053848089-1/-mr-10010\0 in windows operating 
 system



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6908) TestThriftBinaryCLIService.testExecuteStatementAsync has intermittent failures


 [ 
https://issues.apache.org/jira/browse/HIVE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6908:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Szehon!

 TestThriftBinaryCLIService.testExecuteStatementAsync has intermittent failures
 --

 Key: HIVE-6908
 URL: https://issues.apache.org/jira/browse/HIVE-6908
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Fix For: 0.14.0

 Attachments: HIVE-6908.patch


 This has failed sometimes in the pre-commit tests.
 ThriftCLIServiceTest.testExecuteStatementAsync runs two statements.  They are 
 given 100 second timeout total, not sure if its by intention.  As the first 
 is a select query, it will take a majority of the time.  The second statement 
 (create table) should be quicker, but it fails sometimes because timeout is 
 already mostly used up.
 The timeout should probably be reset after the first statement.  If the 
 operation finishes before the timeout, it wont have any effect as it'll break 
 out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-13 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996660#comment-13996660
 ] 

Xuefu Zhang commented on HIVE-7049:
---

Thanks for bringing this up.  I'm wondering if the situation you described is 
an issue of incompatibility of schemas rather than a bug. Record schema says 
that a field is union (nullable), while file schema says that the file is not a 
union, which seems suggesting that the data is not compatible with the schema. 
While we may need to provided a better error message for this, ignoring the 
file schema (by passing NULL down) will very likely break decimal support, 
which needs the file schema to read data correctly.

 Unable to deserialize AVRO data when file schema and record schema are 
 different and nullable
 -

 Key: HIVE-7049
 URL: https://issues.apache.org/jira/browse/HIVE-7049
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-7049.1.patch


 It mainly happens when 
 1 )file schema and record schema are not same
 2 ) Record schema is nullable  but file schema is not.
 The potential code location is at class AvroDeserialize
  
 {noformat}
  if(AvroSerdeUtils.isNullableType(recordSchema)) {
   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
 columnType);
 }
 {noformat}
 In the above code snippet, recordSchema is verified if it is nullable. But 
 the file schema is not checked.
 I tested with these values:
 {noformat}
 recordSchema= [null,string]
 fielSchema= string
 {noformat}
 And i got the following exception line numbers might not be the same due to 
 mu debugged code version.
 {noformat}
 org.apache.avro.AvroRuntimeException: Not a union: string 
 at org.apache.avro.Schema.getTypes(Schema.java:272)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer


 [ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7012:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
 

 Key: HIVE-7012
 URL: https://issues.apache.org/jira/browse/HIVE-7012
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Sun Rui
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt


 With HIVE 0.13.0, run the following test case:
 {code:sql}
 create table src(key bigint, value string);
 select  
count(distinct key) as col0
 from src
 order by col0;
 {code}
 The following exception will be thrown:
 {noformat}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 9 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
   ... 14 more
 Caused by: java.lang.RuntimeException: cannot find field _col0 from 
 [0:reducesinkkey0]
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
   ... 14 more
 {noformat}
 This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
 set to false, then this issue will be gone.
 Logical plan when hive.optimize.reducededuplication=false;
 {noformat}
 src 
   TableScan (TS_0)
 alias: src
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
 Select Operator (SEL_1)
   expressions: key (type: bigint)
   outputColumnNames: key
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
 NONE
   Group By Operator (GBY_2)
 aggregations: count(DISTINCT key)
 keys: key (type: bigint)
 mode: hash
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
 NONE
 Reduce Output Operator (RS_3)
   istinctColumnIndices:
   key expressions: _col0 (type: bigint)
   DistributionKeys: 0
   sort order: +
   OutputKeyColumnNames: _col0
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
 stats: NONE
   Group By Operator (GBY_4)
 aggregations: count(DISTINCT KEY._col0:0._col0)
 mode: mergepartial
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator (SEL_5)
   expressions: _col0 (type: bigint)
   outputColumnNames: _col0
   Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
 Column stats: NONE
   Reduce Output Operator (RS_6)
 key expressions: _col0 (type: bigint)

Re: Review Request 21138: Support more generic way of using composite key for HBaseHandler

2014-05-13 Thread Xuefu Zhang



 On May 12, 2014, 4:56 a.m., Swarnim Kulkarni wrote:
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java,
   line 132
  https://reviews.apache.org/r/21138/diff/1/?file=575776#file575776line132
 
  That said, I am also not a 100% positive on why Navis chose a 
  FamilyFilter here. In my latest patch, I updated the setupFilter method to 
  be protected so that it can be easily overridden. I'll ask Navis for his 
  choice of FamilyFilter here. If we don't get a response, my vote will be to 
  proceed with the protected scope of this method and log a follow up JIRA 
  to clean this up.

Okay. Makes sense. Could you log the followup JIRA and link it with this issue.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21138/#review42667
---


On May 8, 2014, 3:42 p.m., Swarnim Kulkarni wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/21138/
 ---
 
 (Updated May 8, 2014, 3:42 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-2599 introduced using custom object for the row key. But it forces key 
 objects to extend HBaseCompositeKey, which is again extension of LazyStruct. 
 If user provides proper Object and OI, we can replace internal key and keyOI 
 with those. 
 
 Initial implementation is based on factory interface.
 {code}
 public interface HBaseKeyFactory {
   void init(SerDeParameters parameters, Properties properties) throws 
 SerDeException;
   ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
   LazyObjectBase createObject(ObjectInspector inspector) throws 
 SerDeException;
 }
 {code}
 
 
 Diffs
 -
 
   hbase-handler/pom.xml 132af43 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java
  PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/ColumnMappings.java 
 PRE-CREATION 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java
  PRE-CREATION 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/DefaultHBaseKeyFactory.java
  PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 
 5008f15 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java 
 PRE-CREATION 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java
  PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseRowSerializer.java 
 PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java 
 PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java 
 b64590d 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
 4fe1b1b 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
  142bfd8 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 fc40195 
   
 hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java
  13c344b 
   
 hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java 
 PRE-CREATION 
   
 hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java 
 PRE-CREATION 
   
 hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 
 7c4fc9f 
   hbase-handler/src/test/queries/positive/hbase_custom_key.q PRE-CREATION 
   hbase-handler/src/test/queries/positive/hbase_custom_key2.q PRE-CREATION 
   hbase-handler/src/test/results/positive/hbase_custom_key.q.out PRE-CREATION 
   hbase-handler/src/test/results/positive/hbase_custom_key2.q.out 
 PRE-CREATION 
   itests/util/pom.xml e9720df 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 113227d 
   ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java 
 d39ee2e 
   ql/src/java/org/apache/hadoop/hive/ql/index/IndexSearchCondition.java 
 5f1329c 
   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 4921966 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java 293b74e 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
  2a7fdf9 
   
 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStoragePredicateHandler.java
  9f35575 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java e50026b 
   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java ecb82d7 
   ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java c0a8269 
   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
 5f32f2d

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Attachment: HIVE-7043.4.patch

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch, HIVE-7043.4.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6846) allow safe set commands with sql standard authorization


[ 
https://issues.apache.org/jira/browse/HIVE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995446#comment-13995446
 ] 

Thejas M Nair commented on HIVE-6846:
-

This is the default list of safe set command that this patch allows :

hive.exec.reducers.bytes.per.reducer
hive.exec.reducers.max
hive.map.aggr
hive.map.aggr.hash.percentmemory
hive.map.aggr.hash.force.flush.memory.threshold
hive.map.aggr.hash.min.reduction
hive.groupby.skewindata
hive.optimize.multigroupby.common.distincts
hive.optimize.index.groupby
hive.optimize.ppd
hive.optimize.ppd.storage
hive.optimize.ppd.storage
hive.ppd.recognizetransivity
hive.optimize.groupby
hive.optimize.sort.dynamic.partition
hive.optimize.skewjoin.compiletime
hive.optimize.union.remove
hive.multigroupby.singlereducer
hive.map.groupby.sorted
hive.map.groupby.sorted.testmode
hive.optimize.skewjoin
hive.optimize.skewjoin.compiletime
hive.mapred.mode
hive.enforce.bucketmapjoin
hive.exec.compress.output
hive.exec.compress.intermediate
hive.exec.parallel
hive.exec.parallel.thread.number
hive.exec.parallel.thread.number
hive.exec.rowoffset
hive.merge.mapfiles
hive.merge.mapredfiles
hive.merge.tezfiles
hive.ignore.mapjoin.hint
hive.auto.convert.join
hive.auto.convert.join.noconditionaltask
hive.auto.convert.join.noconditionaltask.size
hive.auto.convert.join.use.nonstaged
hive.auto.convert.join.noconditionaltask
hive.auto.convert.join.noconditionaltask.size
hive.auto.convert.join.use.nonstaged
hive.enforce.bucketing
hive.enforce.sorting
hive.enforce.sortmergebucketmapjoin
hive.auto.convert.sortmerge.join
hive.execution.engine
hive.vectorized.execution.enabled
hive.mapjoin.optimized.keys
hive.mapjoin.lazy.hashtable
hive.exec.check.crossproducts
hive.compat
hive.exec.dynamic.partition.mode
mapred.reduce.tasks
mapred.output.compression.codec
mapred.map.output.compression.codec
mapreduce.job.reduce.slowstart.completedmaps
mapreduce.job.queuename


 allow safe set commands with sql standard authorization
 ---

 Key: HIVE-6846
 URL: https://issues.apache.org/jira/browse/HIVE-6846
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.13.0

 Attachments: HIVE-6846.1.patch, HIVE-6846.2.patch, HIVE-6846.3.patch


 HIVE-6827 disables all set commands when SQL standard authorization is turned 
 on, but not all set commands are unsafe. We should allow safe set commands.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-05-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996633#comment-13996633
 ] 

Sergey Shelukhin commented on HIVE-6430:


will commit today evening

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6411) Support more generic way of using composite key for HBaseHandler

2014-05-13 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6411:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
 Release Note: The new feature needs to be documented at Hive-HBase 
integration page.
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks to Navis and Swarnim for working on the patch.

 Support more generic way of using composite key for HBaseHandler
 

 Key: HIVE-6411
 URL: https://issues.apache.org/jira/browse/HIVE-6411
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-6411.1.patch.txt, HIVE-6411.10.patch.txt, 
 HIVE-6411.11.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, 
 HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt, HIVE-6411.6.patch.txt, 
 HIVE-6411.7.patch.txt, HIVE-6411.8.patch.txt, HIVE-6411.9.patch.txt


 HIVE-2599 introduced using custom object for the row key. But it forces key 
 objects to extend HBaseCompositeKey, which is again extension of LazyStruct. 
 If user provides proper Object and OI, we can replace internal key and keyOI 
 with those. 
 Initial implementation is based on factory interface.
 {code}
 public interface HBaseKeyFactory {
   void init(SerDeParameters parameters, Properties properties) throws 
 SerDeException;
   ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
   LazyObjectBase createObject(ObjectInspector inspector) throws 
 SerDeException;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


[ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996038#comment-13996038
 ] 

Hive QA commented on HIVE-7043:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644516/HIVE-7043.3.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/186/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/186/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644516

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Attachment: HIVE-7043.2.patch

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6965) Transaction manager should use RDBMS time instead of machine time

2014-05-13 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-6965:
-

Status: Patch Available  (was: Open)

This patch changes the code to ask the database for the time rather than 
calling currentTimeMillis().

 Transaction manager should use RDBMS time instead of machine time
 -

 Key: HIVE-6965
 URL: https://issues.apache.org/jira/browse/HIVE-6965
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-6965.patch


 Current TxnHandler and CompactionTxnHandler use System.currentTimeMillis() 
 when they need to determine the time (such as heartbeating transactions).  In 
 situations where there are multiple Thrift metastore services or users are 
 using an embedded metastore this will lead to issues.  We should instead be 
 using time from the RDBMS, which is guaranteed to be the same for all users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: showing column stats

2014-05-13 Thread Prasanth Jayachandran

Create the JIRAs
https://issues.apache.org/jira/browse/HIVE-7050
https://issues.apache.org/jira/browse/HIVE-7051

Thanks
Prasanth Jayachandran

On May 12, 2014, at 6:52 PM, Prasanth Jayachandran
pjayachand...@hortonworks.com wrote:

I have a basic patch which prints table level column stats.. I can put up the
patch for it today/tomorrow.. but for displaying partition level column stats
we need to extend the “describe” statement to support column names..
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribePartition.
If you see the DDL describe partition does not accept column names.

I can create JIRAs for the following tasks
1) Showing column stats in describe table
2) Showing column stats in describe partition

If you would like to take up 2) please feel free to do so.

Thanks
Prasanth Jayachandran

On May 12, 2014, at 5:45 PM, Xuefu Zhang xzh...@cloudera.com wrote:

Hi all,

I'm wondering if there is a simpler way to show column stats than writing a
thrift client calling the thrift API, such as commands in Hive CLI. I have
tried desc extended as well as explain select, but none of them shows
column stats.

Thanks,
Xuefu

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[jira] [Updated] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names

2014-05-13 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-6187:
-

Status: Patch Available  (was: Open)

 Add test to verify that DESCRIBE TABLE works with quoted table names
 

 Key: HIVE-6187
 URL: https://issues.apache.org/jira/browse/HIVE-6187
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Andy Mok
 Attachments: HIVE-6187.1.patch


 Backticks around tables named after special keywords, such as items, allow us 
 to create, drop, and alter the table. For example
 {code:sql}
 CREATE TABLE foo.`items` (bar INT);
 DROP TABLE foo.`items`;
 ALTER TABLE `items` RENAME TO `items_`;
 {code}
 However, we cannot call
 {code:sql}
 DESCRIBE foo.`items`;
 DESCRIBE `items`;
 {code}
 The DESCRIBE query does not permit backticks to surround table names. The 
 error returned is
 {code:sql}
 FAILED: SemanticException [Error 10001]: Table not found `items`
 {code} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2

2014-05-13 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7042:
-

Attachment: HIVE-7042.1.patch.txt

Not sure why this patch was not picked up HIVE QA for days. Reuploading the 
patch again.

 Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
 --

 Key: HIVE-7042
 URL: https://issues.apache.org/jira/browse/HIVE-7042
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7042.1.patch, HIVE-7042.1.patch.txt


 stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as 
 opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression 
 (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield 
 different results for these tests. ORC should use HiveIF to generate ORC 
 splits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: showing column stats

2014-05-13 Thread Xuefu Zhang

Thanks, Prasanth. I tried your patch in HIVE-7050, and it helped me
demonstrate another problem related to stats, HIVE-7053.

I can review your patches. Thanks again!

--Xuefu

On Mon, May 12, 2014 at 6:57 PM, Prasanth Jayachandran
pjayachand...@hortonworks.com wrote:

Create the JIRAs
https://issues.apache.org/jira/browse/HIVE-7050
https://issues.apache.org/jira/browse/HIVE-7051

Thanks
Prasanth Jayachandran

On May 12, 2014, at 6:52 PM, Prasanth Jayachandran
pjayachand...@hortonworks.com wrote:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribePartition.
If you see the DDL describe partition does not accept column names.

I can create JIRAs for the following tasks
1) Showing column stats in describe table
2) Showing column stats in describe partition

If you would like to take up 2) please feel free to do so.

Thanks
Prasanth Jayachandran

On May 12, 2014, at 5:45 PM, Xuefu Zhang xzh...@cloudera.com wrote:

Hi all,

I'm wondering if there is a simpler way to show column stats than
writing a
thrift client calling the thrift API, such as commands in Hive CLI. I
have
tried desc extended as well as explain select, but none of them
shows
column stats.

Thanks,
Xuefu

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Attachment: HIVE-7043.1.patch

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.1.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7033) grant statements should check if the role exists


 [ 
https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7033:


Attachment: HIVE-7033.2.patch

HIVE-7033.2.patch - updating comment in .q file

 grant statements should check if the role exists
 

 Key: HIVE-7033
 URL: https://issues.apache.org/jira/browse/HIVE-7033
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch


 The following grant statement that grants to a role that does not exist 
 succeeds, but it should result in an error.
  grant all on t1 to role nosuchrole;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7055) cofig not propagating for PTFOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Attachment: HIVE-7055.patch

 cofig not propagating for PTFOperator
 -

 Key: HIVE-7055
 URL: https://issues.apache.org/jira/browse/HIVE-7055
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.12.0, 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7055.patch


 e.g. setting hive.join.cache.size has no effect and task nodes always got 
 default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7041) DoubleWritable/ByteWritable should extend their hadoop counterparts

2014-05-13 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7041:
-

Attachment: HIVE-7041.1.patch

tests didn't run for some reason, re-upload patch.

 DoubleWritable/ByteWritable should extend their hadoop counterparts
 ---

 Key: HIVE-7041
 URL: https://issues.apache.org/jira/browse/HIVE-7041
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7041.1.patch, HIVE-7041.1.patch


 Hive has its own implementations of 
 ByteWritable/DoubleWritable/ShortWritable.  We cannot replace usage of these 
 classes since they will break 3rd party UDFs/SerDes, however we can at least 
 extend from the Hadoop version of these classes when possible to avoid 
 duplicate code.
 When Hive finally moves to version 1.0 we might want to consider removing use 
 of these Hive-specific writables and switching over to using the Hadoop 
 version of these classes.
 ShortWritable didn't exist in Hadoop until 2.x so it looks like we can't do 
 it with this class until 0.20/1.x support is dropped from Hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7056) TestPig_11 fails with Pig 12.1 and earlier


 [ 
https://issues.apache.org/jira/browse/HIVE-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-7056:


Assignee: Eugene Koifman

 TestPig_11 fails with Pig 12.1 and earlier
 --

 Key: HIVE-7056
 URL: https://issues.apache.org/jira/browse/HIVE-7056
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 on trunk, pig script (http://svn.apache.org/repos/asf/pig/trunk/bin/pig) is 
 looking for *hcatalog-core-*.jar etc.  In Pig 12.1 it's looking for 
 hcatalog-core-*.jar, which doesn't work with Hive 0.13.
 The TestPig_11 job fails with
 {noformat}
 2014-05-13 17:47:10,760 [main] ERROR org.apache.pig.PigServer - exception 
 during parsing: Error during parsing. Could not resolve 
 org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
 org.apache.pig.builtin., org.apache.pig.impl.builtin.]
 Failed to parse: Pig script failed to parse: 
 file hcatloadstore.pig, line 19, column 34 pig script failed to validate: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not 
 resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
 org.apache.pig.builtin., org.apache.pig.impl.builtin.]
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196)
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
   at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
   at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
   at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
   at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
   at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
   at org.apache.pig.Main.run(Main.java:478)
   at org.apache.pig.Main.main(Main.java:156)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: 
 file hcatloadstore.pig, line 19, column 34 pig script failed to validate: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not 
 resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
 org.apache.pig.builtin., org.apache.pig.impl.builtin.]
   at 
 org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1299)
   at 
 org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1284)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5158)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:7756)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1669)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
   ... 16 more
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: 
 Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, 
 java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
   at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653)
   at 
 org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1296)
   ... 24 more
 {noformat}
 the key to this is 
 {noformat}
 ls: 
 /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/lib/slf4j-api-*.jar:
  No such file or directory
 ls: 
 /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/hcatalog/share/hcatalog/hcatalog-core-*.jar:
  No such file or directory
 ls:

[jira] [Updated] (HIVE-7056) TestPig_11 fails with Pig 12.1 and earlier


 [ 
https://issues.apache.org/jira/browse/HIVE-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7056:
-

Description: 
on trunk, pig script (http://svn.apache.org/repos/asf/pig/trunk/bin/pig) is 
looking for \*hcatalog-core-\*.jar etc.  In Pig 12.1 it's looking for 
hcatalog-core-\*.jar, which doesn't work with Hive 0.13.

The TestPig_11 job fails with
{noformat}
2014-05-13 17:47:10,760 [main] ERROR org.apache.pig.PigServer - exception 
during parsing: Error during parsing. Could not resolve 
org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse: 
file hcatloadstore.pig, line 19, column 34 pig script failed to validate: 
org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not 
resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
at 
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:478)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: 
file hcatloadstore.pig, line 19, column 34 pig script failed to validate: 
org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not 
resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at 
org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1299)
at 
org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1284)
at 
org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5158)
at 
org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:7756)
at 
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1669)
at 
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at 
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at 
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 16 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: 
Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, 
java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653)
at 
org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1296)
... 24 more
{noformat}

the key to this is 
{noformat}
ls: 
/private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/lib/slf4j-api-*.jar:
 No such file or directory
ls: 
/private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/hcatalog/share/hcatalog/hcatalog-core-*.jar:
 No such file or directory
ls: 
/private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/hcatalog/share/hcatalog/hcatalog-*.jar:
 No such file or directory
ls:

[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-05-13 Thread Nick Dimiduk (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HIVE-6584:
---

Attachment: HIVE-6584.0.patch

Attaching preliminary patch, based on the patch attached to HBASE-11137.

In order to test this properly, I need an HBase table snapshot created. Short 
of exposing this through hive sql, how can I write a .q file test for this?

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
 Attachments: HIVE-6584.0.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1

2014-05-13 Thread Eugene Koifman

I downloaded src tar, built it and ran webhcat e2e tests.
I see 2 failures (which I don't see on trunk)

TestHive_7 fails with
got percentComplete map 100% reduce 0%,  expected  map 100% reduce 100%

TestHeartbeat_1 fails to even launch the job.  This looks like the root
cause

ERROR | 13 May 2014 18:24:00,394 |
org.apache.hive.hcatalog.templeton.CatchallExceptionMapper |
java.lang.NullPointerException
at
org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:312)
at
org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:479)
at
org.apache.hadoop.util.GenericOptionsParser.init(GenericOptionsParser.java:170)
at
org.apache.hadoop.util.GenericOptionsParser.init(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hive.hcatalog.templeton.LauncherDelegator$1.run(LauncherDelegator.java:107)
at
org.apache.hive.hcatalog.templeton.LauncherDelegator$1.run(LauncherDelegator.java:103)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at
org.apache.hive.hcatalog.templeton.LauncherDelegator.queueAsUser(LauncherDelegator.java:103)
at
org.apache.hive.hcatalog.templeton.LauncherDelegator.enqueueController(LauncherDelegator.java:81)
at
org.apache.hive.hcatalog.templeton.JarDelegator.run(JarDelegator.java:55)
at
org.apache.hive.hcatalog.templeton.Server.mapReduceJar(Server.java:711)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1480)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1411)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1360)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1350)
at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1360)
at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:392)
at
org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:87)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1331)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:477)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:349)
at

[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-05-13 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996954#comment-13996954
 ] 

Gopal V commented on HIVE-6430:
---

Seems to be only breaking on JDK7 javac. 

And only on rebuilds with modifications - never on mvn clean package builds.

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7055) cofig not propagating for PTFOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7055:
---

Status: Patch Available  (was: Open)

 cofig not propagating for PTFOperator
 -

 Key: HIVE-7055
 URL: https://issues.apache.org/jira/browse/HIVE-7055
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.13.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7055.patch


 e.g. setting hive.join.cache.size has no effect and task nodes always got 
 default value of 25000



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6290) Add support for hbase filters for composite keys

2014-05-13 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6290:
--

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Resolved via HIVE-6411.

 Add support for hbase filters for composite keys
 

 Key: HIVE-6290
 URL: https://issues.apache.org/jira/browse/HIVE-6290
 Project: Hive
  Issue Type: Sub-task
  Components: HBase Handler
Affects Versions: 0.12.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Fix For: 0.14.0

 Attachments: HIVE-6290.1.patch.txt, HIVE-6290.2.patch.txt, 
 HIVE-6290.3.patch.txt


 Add support for filters to be provided via the composite key class



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue

2014-05-13 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995747#comment-13995747
 ] 

Gunther Hagleitner commented on HIVE-7043:
--

+1

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.1.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-05-13 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997172#comment-13997172
 ] 

Sergey Shelukhin commented on HIVE-6430:


Hmm... I cannot repro this... tried JDK 6 or 7, clean build or not, and with 
modifications. Can you make an addendum patch that fixes it? So I could apply 
on top

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7054) Support ELT UDF in vectorized mode

2014-05-13 Thread Deepesh Khandelwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-7054:
-

Attachment: HIVE-7054.patch

Here is the review board entry:
https://reviews.apache.org/r/21416/
Please review.

 Support ELT UDF in vectorized mode
 --

 Key: HIVE-7054
 URL: https://issues.apache.org/jira/browse/HIVE-7054
 Project: Hive
  Issue Type: New Feature
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-7054.patch


 Implement support for ELT udf in vectorized execution mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7046) Propagate addition of new columns to partition schema

2014-05-13 Thread Mariano Dominguez (JIRA)

Mariano Dominguez created HIVE-7046:
---

 Summary: Propagate addition of new columns to partition schema
 Key: HIVE-7046
 URL: https://issues.apache.org/jira/browse/HIVE-7046
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Affects Versions: 0.12.0
Reporter: Mariano Dominguez


Hive reads data according to the partition schema, not the table schema 
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
changes are not propagated to partitions. Thus, the schema of a partition will 
differ from that of the table after altering the table schema; this is done to 
preserve the ability to read existing data, particularly when using binary 
formats such as RCFile. Binary formats do not allow changing the type of a 
field because of the way serialization works; a field serialized as a string 
will be displayed incorrectly if read as an integer.

Unfortunately, as a side effect, this behavior limits the ability to add new 
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible 
workaround is to recreate the partitions, but this process could be 
unnecessarily cumbersome if the number of partitions is high. New columns 
should be propagated to existing partitions automatically instead.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7057) webhcat e2e deployment scripts don't have x bit set


 [ 
https://issues.apache.org/jira/browse/HIVE-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7057:
-

Attachment: HIVE-7057.patch

@Thejas could you review this?  When checking in please chmod u+x on all .sh 
files.  The patch files can't capture this.

 webhcat e2e deployment scripts don't have x bit set
 ---

 Key: HIVE-7057
 URL: https://issues.apache.org/jira/browse/HIVE-7057
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-7057.patch


 also, update env.sh to use latest Pig release
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7052) Optimize split calculation time

2014-05-13 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-7052:
---

Attachment: HIVE-7052-profiler-2.png
HIVE-7052-profiler-1.png

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5268) HiveServer2 accumulates orphaned OperationHandle objects when a client fails while executing query

2014-05-13 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5268:


Fix Version/s: (was: 0.13.0)
   0.14.0

 HiveServer2 accumulates orphaned OperationHandle objects when a client fails 
 while executing query
 --

 Key: HIVE-5268
 URL: https://issues.apache.org/jira/browse/HIVE-5268
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Vaibhav Gumashta
Assignee: Thiruvel Thirumoolan
 Fix For: 0.14.0

 Attachments: HIVE-5268_prototype.patch


 When queries are executed against the HiveServer2 an OperationHandle object 
 is stored in the OperationManager.handleToOperation HashMap. Currently its 
 the duty of the JDBC client to explicitly close to cleanup the entry in the 
 map. But if the client fails to close the statement then the OperationHandle 
 object is never cleaned up and gets accumulated in the server.
 This can potentially cause OOM on the server over time. This also can be used 
 as a loophole by a malicious client to bring down the Hive server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-13 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997276#comment-13997276
 ] 

Amareshwari Sriramadasu commented on HIVE-5733:
---

+1 This is much required.
I agree it has become difficult to depend on hive exec jar, because of ql 
module shading all the dependencies.

I will try to put a patch.

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho

 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-3276) optimize union sub-queries

2014-05-13 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996244#comment-13996244
 ] 

Lefty Leverenz commented on HIVE-3276:
--

The configuration parameters are now documented in the wiki:

* [hive.optimize.union.remove 
|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.union.remove]
* [hive.mapred.supports.subdirectories 
|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapred.supports.subdirectories]

 optimize union sub-queries
 --

 Key: HIVE-3276
 URL: https://issues.apache.org/jira/browse/HIVE-3276
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.10.0

 Attachments: HIVE-3276.1.patch, hive.3276.10.patch, 
 hive.3276.11.patch, hive.3276.12.patch, hive.3276.13.patch, 
 hive.3276.14.patch, hive.3276.2.patch, hive.3276.3.patch, hive.3276.4.patch, 
 hive.3276.5.patch, hive.3276.6.patch, hive.3276.7.patch, hive.3276.8.patch, 
 hive.3276.9.patch


 It might be a good idea to optimize simple union queries containing 
 map-reduce jobs in at least one of the sub-qeuries.
 For eg:
 a query like:
 insert overwrite table T1 partition P1
 select * from 
 (
   subq1
 union all
   subq2
 ) u;
 today creates 3 map-reduce jobs, one for subq1, another for subq2 and 
 the final one for the union. 
 It might be a good idea to optimize this. Instead of creating the union 
 task, it might be simpler to create a move task (or something like a move
 task), where the outputs of the two sub-queries will be moved to the final 
 directory. This can easily extend to more than 2 sub-queries in the union.
 This is very useful if there is a select * followed by filesink after the
 union. This can be independently useful, and also be used to optimize the
 skewed joins -- 
 https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization.
 If there is a select, filter between the union and the filesink, the select
 and the filter can be moved before the union, and the follow-up job can
 still be removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7056) TestPig_11 fails with Pig 12.1 and earlier


 [ 
https://issues.apache.org/jira/browse/HIVE-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7056:
-

Assignee: (was: Eugene Koifman)

 TestPig_11 fails with Pig 12.1 and earlier
 --

 Key: HIVE-7056
 URL: https://issues.apache.org/jira/browse/HIVE-7056
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman

 on trunk, pig script (http://svn.apache.org/repos/asf/pig/trunk/bin/pig) is 
 looking for \*hcatalog-core-\*.jar etc.  In Pig 12.1 it's looking for 
 hcatalog-core-\*.jar, which doesn't work with Hive 0.13.
 The TestPig_11 job fails with
 {noformat}
 2014-05-13 17:47:10,760 [main] ERROR org.apache.pig.PigServer - exception 
 during parsing: Error during parsing. Could not resolve 
 org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
 org.apache.pig.builtin., org.apache.pig.impl.builtin.]
 Failed to parse: Pig script failed to parse: 
 file hcatloadstore.pig, line 19, column 34 pig script failed to validate: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not 
 resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
 org.apache.pig.builtin., org.apache.pig.impl.builtin.]
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196)
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
   at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
   at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
   at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
   at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
   at 
 org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
   at org.apache.pig.Main.run(Main.java:478)
   at org.apache.pig.Main.main(Main.java:156)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: 
 file hcatloadstore.pig, line 19, column 34 pig script failed to validate: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not 
 resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., 
 org.apache.pig.builtin., org.apache.pig.impl.builtin.]
   at 
 org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1299)
   at 
 org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1284)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5158)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:7756)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1669)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
   ... 16 more
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: 
 Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, 
 java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
   at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653)
   at 
 org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1296)
   ... 24 more
 {noformat}
 the key to this is 
 {noformat}
 ls: 
 /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/lib/slf4j-api-*.jar:
  No such file or directory
 ls: 
 /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/hcatalog/share/hcatalog/hcatalog-core-*.jar:
  No such file or directory
 ls:

[jira] [Commented] (HIVE-2137) JDBC driver doesn't encode string properly.