date:20140512


[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994895#comment-13994895
 ] 

Navis commented on HIVE-7012:
-

[~ashutoshc] Yes, it's intended. In the query ppd2.q
{code}
select a.*
  from (
select key, count(value) as cc
from srcpart a
where a.ds = '2008-04-08' and a.hr = '11'
group by key
  )a
  distribute by a.key
  sort by a.key,a.cc desc
{code}
cc is generated field by GBY operator, so It's semantically wrong to merged RS 
for GBY with following RS. But the same time, sort on a.cc is meaningless so 
it can be removed in optimizing, but not in here (maybe in SemanticAnalyzer?).

[~sunrui] Yes, RS for distinct should be avoided from any dedup process. Could 
you take this issue? I think you knows better than me.

 Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
 

 Key: HIVE-7012
 URL: https://issues.apache.org/jira/browse/HIVE-7012
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Sun Rui
Assignee: Navis
 Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt


 With HIVE 0.13.0, run the following test case:
 {code:sql}
 create table src(key bigint, value string);
 select  
count(distinct key) as col0
 from src
 order by col0;
 {code}
 The following exception will be thrown:
 {noformat}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 9 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
   ... 14 more
 Caused by: java.lang.RuntimeException: cannot find field _col0 from 
 [0:reducesinkkey0]
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
   ... 14 more
 {noformat}
 This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
 set to false, then this issue will be gone.
 Logical plan when hive.optimize.reducededuplication=false;
 {noformat}
 src 
   TableScan (TS_0)
 alias: src
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
 Select Operator (SEL_1)
   expressions: key (type: bigint)
   outputColumnNames: key
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
 NONE
   Group By Operator (GBY_2)
 aggregations: count(DISTINCT key)
 keys: key (type: bigint)
 mode: hash
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
 NONE
 Reduce Output Operator (RS_3)
   istinctColumnIndices:
   key expressions: _col0 (type: bigint)
   DistributionKeys: 0
   sort order: +
   OutputKeyColumnNames: _col0
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
 stats: NONE
   Group By Operator (GBY_4)
 aggregations: count(DISTINCT

[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables


 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Status: Open  (was: Patch Available)

 Update AvroSerde to determine schema of new tables
 --

 Key: HIVE-3159
 URL: https://issues.apache.org/jira/browse/HIVE-3159
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jakob Homan
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, 
 HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, 
 HIVE-3159v1.patch


 Currently when writing tables to Avro one must manually provide an Avro 
 schema that matches what is being delivered by Hive. It'd be better to have 
 the serde infer this schema by converting the table's TypeInfo into an 
 appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 21138: Support more generic way of using composite key for HBaseHandler

2014-05-12 Thread Swarnim Kulkarni


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21138/
---

(Updated May 8, 2014, 3:42 p.m.)


Review request for hive.


Changes
---

Updating RB with latest patch.


Repository: hive-git


Description
---

HIVE-2599 introduced using custom object for the row key. But it forces key 
objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If 
user provides proper Object and OI, we can replace internal key and keyOI with 
those. 

Initial implementation is based on factory interface.
{code}
public interface HBaseKeyFactory {
  void init(SerDeParameters parameters, Properties properties) throws 
SerDeException;
  ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
  LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException;
}
{code}


Diffs (updated)
-

  hbase-handler/pom.xml 132af43 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java
 PRE-CREATION 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/ColumnMappings.java 
PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java
 PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/DefaultHBaseKeyFactory.java 
PRE-CREATION 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 
5008f15 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java 
PRE-CREATION 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java 
PRE-CREATION 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseRowSerializer.java 
PRE-CREATION 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java 
PRE-CREATION 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java 
b64590d 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
4fe1b1b 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 142bfd8 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java fc40195 
  
hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java 
13c344b 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java 
PRE-CREATION 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java 
PRE-CREATION 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 
7c4fc9f 
  hbase-handler/src/test/queries/positive/hbase_custom_key.q PRE-CREATION 
  hbase-handler/src/test/queries/positive/hbase_custom_key2.q PRE-CREATION 
  hbase-handler/src/test/results/positive/hbase_custom_key.q.out PRE-CREATION 
  hbase-handler/src/test/results/positive/hbase_custom_key2.q.out PRE-CREATION 
  itests/util/pom.xml e9720df 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 113227d 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java 
d39ee2e 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexSearchCondition.java 5f1329c 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 4921966 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java 293b74e 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
 2a7fdf9 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStoragePredicateHandler.java 
9f35575 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java e50026b 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java ecb82d7 
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java c0a8269 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
5f32f2d 
  serde/src/java/org/apache/hadoop/hive/serde2/BaseStructObjectInspector.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/NullStructSerDe.java dba5e33 
  serde/src/java/org/apache/hadoop/hive/serde2/StructObject.java PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 
1fd6853 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 10f4c05 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java 3334dff 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java 8a1ea46 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazySimpleStructObjectInspector.java
 8a5386a 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java 
598683f 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryStruct.java 
caf3517 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java
 7d0d91c 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/DelegatedStructObjectInspector.java
 5e1a369

[jira] [Commented] (HIVE-6187) Cannot use backticks around table name when using DESCRIBE query


[ 
https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994965#comment-13994965
 ] 

Carl Steinbach commented on HIVE-6187:
--

I can confirm that this functionality is currently working on trunk, and also 
that it's broken in the 0.12.0 release. I'm not sure when it was fixed, and 
there doesn't appear to be any test coverage that will prevent someone from 
breaking it again in the future.


 Cannot use backticks around table name when using DESCRIBE query
 

 Key: HIVE-6187
 URL: https://issues.apache.org/jira/browse/HIVE-6187
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Andy Mok

 Backticks around tables named after special keywords, such as items, allow us 
 to create, drop, and alter the table. For example
 {code:sql}
 CREATE TABLE foo.`items` (bar INT);
 DROP TABLE foo.`items`;
 ALTER TABLE `items` RENAME TO `items_`;
 {code}
 However, we cannot call
 {code:sql}
 DESCRIBE foo.`items`;
 DESCRIBE `items`;
 {code}
 The DESCRIBE query does not permit backticks to surround table names. The 
 error returned is
 {code:sql}
 FAILED: SemanticException [Error 10001]: Table not found `items`
 {code} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 21315: HIVE-6187: Add test to verify that DESCRIBE TABLE works with quoted table names

2014-05-12 Thread Carl Steinbach


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21315/
---

Review request for hive.


Bugs: HIVE-6187
https://issues.apache.org/jira/browse/HIVE-6187


Repository: hive-git


Description
---

commit df0e9255b23add069f3bcafd0fa14f4710723160
Author: Carl Steinbach cwsteinb...@gmail.com
Date:   Mon May 12 02:35:08 2014 -0700

HADOOP-6187. Cannot use backticks around table name when using DESCRIBE 
query

 ql/src/test/queries/clientpositive/describe_table.q |  12 +++
 ql/src/test/results/clientpositive/describe_table.q.out | 170 

 2 files changed, 182 insertions(+)


Diffs
-

  ql/src/test/queries/clientpositive/describe_table.q f72cae9 
  ql/src/test/results/clientpositive/describe_table.q.out a8b2bec 

Diff: https://reviews.apache.org/r/21315/diff/


Testing
---


Thanks,

Carl Steinbach

[jira] [Updated] (HIVE-6187) Cannot use backticks around table name when using DESCRIBE query


 [ 
https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-6187:
-

Attachment: HIVE-6187.1.patch

Attaching a patch that adds several quoted testcases to describe_table.q.

 Cannot use backticks around table name when using DESCRIBE query
 

 Key: HIVE-6187
 URL: https://issues.apache.org/jira/browse/HIVE-6187
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Andy Mok
 Attachments: HIVE-6187.1.patch


 Backticks around tables named after special keywords, such as items, allow us 
 to create, drop, and alter the table. For example
 {code:sql}
 CREATE TABLE foo.`items` (bar INT);
 DROP TABLE foo.`items`;
 ALTER TABLE `items` RENAME TO `items_`;
 {code}
 However, we cannot call
 {code:sql}
 DESCRIBE foo.`items`;
 DESCRIBE `items`;
 {code}
 The DESCRIBE query does not permit backticks to surround table names. The 
 error returned is
 {code:sql}
 FAILED: SemanticException [Error 10001]: Table not found `items`
 {code} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names


 [ 
https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-6187:
-

Summary: Add test to verify that DESCRIBE TABLE works with quoted table 
names  (was: Cannot use backticks around table name when using DESCRIBE query)

 Add test to verify that DESCRIBE TABLE works with quoted table names
 

 Key: HIVE-6187
 URL: https://issues.apache.org/jira/browse/HIVE-6187
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Andy Mok
 Attachments: HIVE-6187.1.patch


 Backticks around tables named after special keywords, such as items, allow us 
 to create, drop, and alter the table. For example
 {code:sql}
 CREATE TABLE foo.`items` (bar INT);
 DROP TABLE foo.`items`;
 ALTER TABLE `items` RENAME TO `items_`;
 {code}
 However, we cannot call
 {code:sql}
 DESCRIBE foo.`items`;
 DESCRIBE `items`;
 {code}
 The DESCRIBE query does not permit backticks to surround table names. The 
 error returned is
 {code:sql}
 FAILED: SemanticException [Error 10001]: Table not found `items`
 {code} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6440) sql std auth - add command to change owner of database

2014-05-12 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994903#comment-13994903
 ] 

Lefty Leverenz commented on HIVE-6440:
--

Added to the wiki here:

* [DDL -- Alter Database 
|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27362034#LanguageManualDDL-AlterDatabase]

But the two ALTER DB statements differ in the use of parentheses around the 
DATABASE keyword.  Is that correct?

 sql std auth - add command to change owner of database
 --

 Key: HIVE-6440
 URL: https://issues.apache.org/jira/browse/HIVE-6440
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.13.0

 Attachments: HIVE-6440.1.patch, HIVE-6440.2.patch, HIVE-6440.3.patch


 It should be possible to change the owner of a database once it is created.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7027) Hive job fails when referencing a view that explodes an array


 [ 
https://issues.apache.org/jira/browse/HIVE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7027:


Assignee: Navis
  Status: Patch Available  (was: Open)

 Hive job fails when referencing a view that explodes an array
 -

 Key: HIVE-7027
 URL: https://issues.apache.org/jira/browse/HIVE-7027
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Navis
 Attachments: HIVE-7027.1.patch.txt


 For a table created with following DDL
 CREATE TABLE test_issue (fileid int, infos ARRAYSTRUCTuser:INT, 
 test_c STRUCTuser_c:STRUCTage:INT), 
 create a view that lateral view explodes the array column like
 CREATE VIEW v_test_issue AS SELECT fileid, i.user, test_c.user_c.age FROM 
 test_issue LATERAL VIEW explode(infos) info AS i; 
 Querying the view such as:
 SELECT *  FROM v_test_issue WHERE age = 25; 
 Will failed with following errors:
 {code}
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 11 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 16 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 19 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
 ... 24 more
 Caused by: java.lang.RuntimeException: cannot find field test_c from 
 [0:_col0, 1:_col5]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
 at

[jira] [Updated] (HIVE-7027) Hive job fails when referencing a view that explodes an array


 [ 
https://issues.apache.org/jira/browse/HIVE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7027:


Attachment: HIVE-7027.1.patch.txt

 Hive job fails when referencing a view that explodes an array
 -

 Key: HIVE-7027
 URL: https://issues.apache.org/jira/browse/HIVE-7027
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Navis
 Attachments: HIVE-7027.1.patch.txt


 For a table created with following DDL
 CREATE TABLE test_issue (fileid int, infos ARRAYSTRUCTuser:INT, 
 test_c STRUCTuser_c:STRUCTage:INT), 
 create a view that lateral view explodes the array column like
 CREATE VIEW v_test_issue AS SELECT fileid, i.user, test_c.user_c.age FROM 
 test_issue LATERAL VIEW explode(infos) info AS i; 
 Querying the view such as:
 SELECT *  FROM v_test_issue WHERE age = 25; 
 Will failed with following errors:
 {code}
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 11 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 16 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 19 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
 ... 24 more
 Caused by: java.lang.RuntimeException: cannot find field test_c from 
 [0:_col0, 1:_col5]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
 at

[jira] [Updated] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator


 [ 
https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4867:


Status: Open  (was: Patch Available)

 Deduplicate columns appearing in both the key list and value list of 
 ReduceSinkOperator
 ---

 Key: HIVE-4867
 URL: https://issues.apache.org/jira/browse/HIVE-4867
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai

 A ReduceSinkOperator emits data in the format of keys and values. Right now, 
 a column may appear in both the key list and value list, which result in 
 unnecessary overhead for shuffling. 
 Example:
 We have a query shown below ...
 {code:sql}
 explain select ss_ticket_number from store_sales cluster by ss_ticket_number;
 {\code}
 The plan is ...
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 store_sales 
   TableScan
 alias: store_sales
 Select Operator
   expressions:
 expr: ss_ticket_number
 type: int
   outputColumnNames: _col0
   Reduce Output Operator
 key expressions:
   expr: _col0
   type: int
 sort order: +
 Map-reduce partition columns:
   expr: _col0
   type: int
 tag: -1
 value expressions:
   expr: _col0
   type: int
   Reduce Operator Tree:
 Extract
   File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {\code}
 The column 'ss_ticket_number' is in both the key list and value list of the 
 ReduceSinkOperator. The type of ss_ticket_number is int. For this case, 
 BinarySortableSerDe will introduce 1 byte more for every int in the key. 
 LazyBinarySerDe will also introduce overhead when recording the length of a 
 int. For every int, 10 bytes should be a rough estimation of the size of data 
 emitted from the Map phase. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause

2014-05-12 Thread dima machlin (JIRA)

dima machlin created HIVE-7045:
--

 Summary: Wrong results in multi-table insert aggregating without 
group by clause
 Key: HIVE-7045
 URL: https://issues.apache.org/jira/browse/HIVE-7045
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.10.0
Reporter: dima machlin


The scenario :

CREATE  TABLE t1 (a int, b int);
CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);

insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt 
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;

select * from t2;
returns : 
2 a
2 b

as expected.

Setting the number of reducers higher than 1 :

set mapred.reduce.tasks=2;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;

select * from t2;
1   a
1   a
1   b
1   b

Wrong results.

This happens when ever t1 is big enough to automatically generate more than 1 
reducers and without specifying it directly.

adding group by 1 in the end of each insert solves the problem :

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 
1;

generates : 
2 a
2 b

This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6938) Add Support for Parquet Column Rename

2014-05-12 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995114#comment-13995114
 ] 

Brock Noland commented on HIVE-6938:


[~dweeks-netflix] looks like one of the parquet tests failed. Can you look into 
that?

 Add Support for Parquet Column Rename
 -

 Key: HIVE-6938
 URL: https://issues.apache.org/jira/browse/HIVE-6938
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.0
Reporter: Daniel Weeks
Assignee: Daniel Weeks
 Attachments: HIVE-6938.1.patch, HIVE-6938.2.patch, HIVE-6938.2.patch


 Parquet was originally introduced without 'replace columns' support in ql.  
 In addition, the default behavior for parquet is to access columns by name as 
 opposed to by index by the Serde.  
 Parquet should allow for either columnar (index based) access or name based 
 access because it can support either.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7015) Failing to inherit group/permission should not fail the operation

2014-05-12 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995142#comment-13995142
 ] 

Brock Noland commented on HIVE-7015:


Committed to trunk! Thank you Szehon!

 Failing to inherit group/permission should not fail the operation
 -

 Key: HIVE-7015
 URL: https://issues.apache.org/jira/browse/HIVE-7015
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 0.14.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Fix For: 0.14.0

 Attachments: HIVE-7015.patch


 In the previous changes, chgrp and chmod were put on the critical path of 
 directory creation and file copy/mv
 These should not be, for instance existing users may not have hive-users in 
 the same group as hive group, so chgrp would fail if they turn on the flag 
 hive.warehouse.subdir.inherit.perms.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7045) Wrong results in multi-table insert aggregating without group by clause

2014-05-12 Thread dima machlin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dima machlin updated HIVE-7045:
---

Description: 
The scenario :

CREATE  TABLE t1 (a int, b int);
CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);

insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;

t1 contains :
1 1
2 2

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt 
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;

select * from t2;
returns : 
2 a
2 b

as expected.

Setting the number of reducers higher than 1 :

set mapred.reduce.tasks=2;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;

select * from t2;
1   a
1   a
1   b
1   b

Wrong results.

This happens when ever t1 is big enough to automatically generate more than 1 
reducers and without specifying it directly.

adding group by 1 in the end of each insert solves the problem :

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 
1;

generates : 
2 a
2 b

This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.



  was:
The scenario :

CREATE  TABLE t1 (a int, b int);
CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);

insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt 
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;

select * from t2;
returns : 
2 a
2 b

as expected.

Setting the number of reducers higher than 1 :

set mapred.reduce.tasks=2;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;

select * from t2;
1   a
1   a
1   b
1   b

Wrong results.

This happens when ever t1 is big enough to automatically generate more than 1 
reducers and without specifying it directly.

adding group by 1 in the end of each insert solves the problem :

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 
1;

generates : 
2 a
2 b

This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.




 Wrong results in multi-table insert aggregating without group by clause
 ---

 Key: HIVE-7045
 URL: https://issues.apache.org/jira/browse/HIVE-7045
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.12.0
Reporter: dima machlin

 The scenario :
 CREATE  TABLE t1 (a int, b int);
 CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);
 insert into table t1 select 1,1 from asd limit 1;
 insert into table t1 select 2,2 from asd limit 1;
 t1 contains :
 1 1
 2 2
 from  t1
 insert overwrite table t2 partition(var_name='a') select count(a) cnt 
 insert overwrite table t2 partition(var_name='b') select count(b) cnt ;
 select * from t2;
 returns : 
 2 a
 2 b
 as expected.
 Setting the number of reducers higher than 1 :
 set mapred.reduce.tasks=2;
 from  t1
 insert overwrite table t2 partition(var_name='a') select count(a) cnt
 insert overwrite table t2 partition(var_name='b') select count(b) cnt;
 select * from t2;
 1 a
 1 a
 1 b
 1 b
 Wrong results.
 This happens when ever t1 is big enough to automatically generate more than 1 
 reducers and without specifying it directly.
 adding group by 1 in the end of each insert solves the problem :
 from  t1
 insert overwrite table t2 partition(var_name='a') select count(a) cnt group 
 by 1
 insert overwrite table t2 partition(var_name='b') select count(b) cnt group 
 by 1;
 generates : 
 2 a
 2 b
 This should work without the group by...
 The number of rows for each partition will be the amount of reducers.
 Each reducer calculated a sub total of the count.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7027) Hive job fails when referencing a view that explodes an array


 [ 
https://issues.apache.org/jira/browse/HIVE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7027:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Hive job fails when referencing a view that explodes an array
 -

 Key: HIVE-7027
 URL: https://issues.apache.org/jira/browse/HIVE-7027
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7027.1.patch.txt


 For a table created with following DDL
 CREATE TABLE test_issue (fileid int, infos ARRAYSTRUCTuser:INT, 
 test_c STRUCTuser_c:STRUCTage:INT), 
 create a view that lateral view explodes the array column like
 CREATE VIEW v_test_issue AS SELECT fileid, i.user, test_c.user_c.age FROM 
 test_issue LATERAL VIEW explode(infos) info AS i; 
 Querying the view such as:
 SELECT *  FROM v_test_issue WHERE age = 25; 
 Will failed with following errors:
 {code}
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 11 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 16 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 19 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
 ... 24 more
 Caused by: java.lang.RuntimeException: cannot find field test_c from 
 [0:_col0, 1:_col5]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960)
 at

[jira] [Commented] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer


[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995204#comment-13995204
 ] 

Ashutosh Chauhan commented on HIVE-7012:


+1 Issue raised by [~sunrui] if exists will probably require a different fix, 
which we shall take up in separate jira. 

 Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
 

 Key: HIVE-7012
 URL: https://issues.apache.org/jira/browse/HIVE-7012
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Sun Rui
Assignee: Navis
 Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt


 With HIVE 0.13.0, run the following test case:
 {code:sql}
 create table src(key bigint, value string);
 select  
count(distinct key) as col0
 from src
 order by col0;
 {code}
 The following exception will be thrown:
 {noformat}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 9 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
   ... 14 more
 Caused by: java.lang.RuntimeException: cannot find field _col0 from 
 [0:reducesinkkey0]
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
   ... 14 more
 {noformat}
 This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
 set to false, then this issue will be gone.
 Logical plan when hive.optimize.reducededuplication=false;
 {noformat}
 src 
   TableScan (TS_0)
 alias: src
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
 Select Operator (SEL_1)
   expressions: key (type: bigint)
   outputColumnNames: key
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
 NONE
   Group By Operator (GBY_2)
 aggregations: count(DISTINCT key)
 keys: key (type: bigint)
 mode: hash
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
 NONE
 Reduce Output Operator (RS_3)
   istinctColumnIndices:
   key expressions: _col0 (type: bigint)
   DistributionKeys: 0
   sort order: +
   OutputKeyColumnNames: _col0
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
 stats: NONE
   Group By Operator (GBY_4)
 aggregations: count(DISTINCT KEY._col0:0._col0)
 mode: mergepartial
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator (SEL_5)
   expressions: _col0 (type: bigint)
   outputColumnNames: _col0
   Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
 Column stats: NONE
   Reduce Output Operator (RS_6)
 key expressions: _col0 (type:

[jira] [Updated] (HIVE-7036) get_json_object bug when extract list of list with index


 [ 
https://issues.apache.org/jira/browse/HIVE-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7036:


 Assignee: Navis
Affects Version/s: 0.13.0
   Status: Patch Available  (was: Open)

 get_json_object bug when extract list of list with index
 

 Key: HIVE-7036
 URL: https://issues.apache.org/jira/browse/HIVE-7036
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.12.0, 0.13.0
 Environment: all
Reporter: Ming Ma
Assignee: Navis
Priority: Minor
  Labels: udf
 Attachments: HIVE-7036.1.patch.txt


 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFJson.java#L250
 this line should be out of the for-loop
 For example 
 json = '{h:[1, [2, 3], {i: 0}, [{p: 11}, {p: 12}, {pp: 13}]}'
 get_json_object(json, '$.h[*][0]') should return back the first node(if 
 exists) of every childrenof '$.h'
 which specifically should be 
 [2,{p:11}] 
 but hive returns only
 2
 because when hive pick the node '2' out, the tmp_jsonList will change to a 
 list only contains one node '2':
 [2]
 then it was assigned to variable jsonList, in the next loop, value of i would 
 be 2 which is greater than the size(always 1) of jsonList, then the loop 
 broke out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7027) Hive job fails when referencing a view that explodes an array

2014-05-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994246#comment-13994246
 ] 

Hive QA commented on HIVE-7027:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12643906/HIVE-7027.1.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5504 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/164/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/164/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12643906

 Hive job fails when referencing a view that explodes an array
 -

 Key: HIVE-7027
 URL: https://issues.apache.org/jira/browse/HIVE-7027
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Navis
 Attachments: HIVE-7027.1.patch.txt


 For a table created with following DDL
 CREATE TABLE test_issue (fileid int, infos ARRAYSTRUCTuser:INT, 
 test_c STRUCTuser_c:STRUCTage:INT), 
 create a view that lateral view explodes the array column like
 CREATE VIEW v_test_issue AS SELECT fileid, i.user, test_c.user_c.age FROM 
 test_issue LATERAL VIEW explode(infos) info AS i; 
 Querying the view such as:
 SELECT *  FROM v_test_issue WHERE age = 25; 
 Will failed with following errors:
 {code}
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:695)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 11 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 16 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 19 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
 ... 24 more
 Caused by:

[jira] [Updated] (HIVE-6994) parquet-hive createArray strips null elements

2014-05-12 Thread Justin Coffey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Coffey updated HIVE-6994:


Attachment: HIVE-6994.2.patch

Updated based on comments on review board and fixed to include the right 
extension for retesting :).

 parquet-hive createArray strips null elements
 -

 Key: HIVE-6994
 URL: https://issues.apache.org/jira/browse/HIVE-6994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Justin Coffey
Assignee: Justin Coffey
 Fix For: 0.14.0

 Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.patch


 The createArray method in ParquetHiveSerDe strips null values from resultant 
 ArrayWritables.
 tracked here as well: https://github.com/Parquet/parquet-mr/issues/377



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer


 [ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7012:
---

Status: Patch Available  (was: Open)

Please ignore my previous comment, it seems your new patch takes care of those 
failures.

 Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
 

 Key: HIVE-7012
 URL: https://issues.apache.org/jira/browse/HIVE-7012
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Sun Rui
Assignee: Navis
 Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt


 With HIVE 0.13.0, run the following test case:
 {code:sql}
 create table src(key bigint, value string);
 select  
count(distinct key) as col0
 from src
 order by col0;
 {code}
 The following exception will be thrown:
 {noformat}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 9 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
   ... 14 more
 Caused by: java.lang.RuntimeException: cannot find field _col0 from 
 [0:reducesinkkey0]
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
   ... 14 more
 {noformat}
 This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
 set to false, then this issue will be gone.
 Logical plan when hive.optimize.reducededuplication=false;
 {noformat}
 src 
   TableScan (TS_0)
 alias: src
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
 Select Operator (SEL_1)
   expressions: key (type: bigint)
   outputColumnNames: key
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
 NONE
   Group By Operator (GBY_2)
 aggregations: count(DISTINCT key)
 keys: key (type: bigint)
 mode: hash
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: 
 NONE
 Reduce Output Operator (RS_3)
   istinctColumnIndices:
   key expressions: _col0 (type: bigint)
   DistributionKeys: 0
   sort order: +
   OutputKeyColumnNames: _col0
   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
 stats: NONE
   Group By Operator (GBY_4)
 aggregations: count(DISTINCT KEY._col0:0._col0)
 mode: mergepartial
 outputColumnNames: _col0
 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator (SEL_5)
   expressions: _col0 (type: bigint)
   outputColumnNames: _col0
   Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE 
 Column stats: NONE
   Reduce Output Operator (RS_6)
 key expressions: _col0 (type: bigint)
 DistributionKeys: 1

[jira] [Reopened] (HIVE-7040) TCP KeepAlive for HiveServer2

2014-05-12 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Thiébaud reopened HIVE-7040:



Closed by mistake

 TCP KeepAlive for HiveServer2
 -

 Key: HIVE-7040
 URL: https://issues.apache.org/jira/browse/HIVE-7040
 Project: Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Nicolas Thiébaud
 Attachments: HIVE-7040.patch


 Implement TCP KeepAlive for HiverServer 2 to avoid half open connections.
 A setting could be added
 {code}
 property
   namehive.server2.tcp.keepalive/name
   valuetrue/value
   descriptionWhether to enable TCP keepalive for Hive Server 2/description
 /property
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements

2014-05-12 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995311#comment-13995311
 ] 

Szehon Ho commented on HIVE-6994:
-

OK mostly looks good, but I think the latest review board is not updated so 
hard to read, can you also update it as well?  

 parquet-hive createArray strips null elements
 -

 Key: HIVE-6994
 URL: https://issues.apache.org/jira/browse/HIVE-6994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Justin Coffey
Assignee: Justin Coffey
 Fix For: 0.14.0

 Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.patch


 The createArray method in ParquetHiveSerDe strips null values from resultant 
 ArrayWritables.
 tracked here as well: https://github.com/Parquet/parquet-mr/issues/377



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7038) Join of external tables of elasticsearch giving an error.

2014-05-12 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7038:
--

Description: 
Select * is working while the Join of the tables is giving the following error:
{code}
hive select * from failedauth f, failedauth2 f1 where f.username=f1.username;
Total jobs = 1
14/05/09 10:57:11 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
14/05/09 10:57:11 WARN conf.Configuration: 
file:/tmp/hduser/hive_2014-05-09_10-57-09_954_5441752347301140125-1/-local-10006/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/05/09 10:57:11 WARN conf.Configuration: 
file:/tmp/hduser/hive_2014-05-09_10-57-09_954_5441752347301140125-1/-local-10006/jobconf.xml:an
 attempt to override final parameter: mapreduce.jobtracker.system.dir;  
Ignoring.
14/05/09 10:57:11 WARN conf.Configuration: 
file:/tmp/hduser/hive_2014-05-09_10-57-09_954_5441752347301140125-1/-local-10006/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.
14/05/09 10:57:12 INFO Configuration.deprecation: mapred.reduce.tasks is 
deprecated. Instead, use mapreduce.job.reduces
14/05/09 10:57:12 INFO Configuration.deprecation: mapred.min.split.size is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/05/09 10:57:12 INFO Configuration.deprecation: 
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.reduce.speculative
14/05/09 10:57:12 INFO Configuration.deprecation: 
mapred.min.split.size.per.node is deprecated. Instead, use 
mapreduce.input.fileinputformat.split.minsize.per.node
14/05/09 10:57:12 INFO Configuration.deprecation: mapred.input.dir.recursive is 
deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/05/09 10:57:12 INFO Configuration.deprecation: 
mapred.min.split.size.per.rack is deprecated. Instead, use 
mapreduce.input.fileinputformat.split.minsize.per.rack
14/05/09 10:57:12 INFO Configuration.deprecation: mapred.max.split.size is 
deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/05/09 10:57:12 INFO Configuration.deprecation: 
mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
mapreduce.job.committer.setup.cleanup.needed
Execution log at: 
/tmp/hduser/hduser_20140509105757_945cc986-7fb1-491e-9bc1-a17cc150c6c6.log
2014-05-09 10:57:12 Starting to launch local task to process map join;  
maximum memory = 503840768
Execution failed with exit status: 2
Obtaining error information

Task failed!
Task ID:
  Stage-4

Logs:

/tmp/hduser/hive.log
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
{code}
The Following exception was seen in /tmp/hduser/hive.log
{code}
2014-05-07 15:31:58,942 INFO  mr.ExecDriver (SessionState.java:printInfo(410)) 
- Execution log at: /tmp/hduser/.log
2014-05-07 15:31:59,016 WARN  conf.Configuration 
(Configuration.java:loadProperty(2172)) - 
file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: dfs.datanode.data.dir;  Ignoring.
2014-05-07 15:31:59,017 WARN  conf.Configuration 
(Configuration.java:loadProperty(2172)) - 
file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2014-05-07 15:31:59,019 WARN  conf.Configuration 
(Configuration.java:loadProperty(2172)) - 
file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: dfs.namenode.name.dir;  Ignoring.
2014-05-07 15:31:59,020 WARN  conf.Configuration 
(Configuration.java:loadProperty(2172)) - 
file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: dfs.namenode.name.dir;  Ignoring.
2014-05-07 15:31:59,020 WARN  conf.Configuration 
(Configuration.java:loadProperty(2172)) - 
file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: mapreduce.jobtracker.system.dir;  
Ignoring.
2014-05-07 15:31:59,021 WARN  conf.Configuration 
(Configuration.java:loadProperty(2172)) - 
file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: dfs.datanode.data.dir;  Ignoring.
2014-05-07 15:31:59,021 WARN  conf.Configuration 
(Configuration.java:loadProperty(2172)) - 
file:/tmp/hduser/hive_2014-05-07_15-31-57_274_2380982880290259806-1/-local-10005/jobconf.xml:an
 attempt to override final parameter: dfs.tmp.dir;  Ignoring.
2014-05-07 15:31:59,022 WARN  conf.Configuration

[jira] [Commented] (HIVE-5664) Drop cascade database fails when the db has any tables with indexes

2014-05-12 Thread Selina Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995390#comment-13995390
 ] 

Selina Zhang commented on HIVE-5664:


We had the same issue. In case we cannot move the drop db cascade to server 
side for now, the simple fix can be client request table name/index name list 
again after each drop request. It is not perfect solution, but is simple and 
more general. 

 Drop cascade database fails when the db has any tables with indexes
 ---

 Key: HIVE-5664
 URL: https://issues.apache.org/jira/browse/HIVE-5664
 Project: Hive
  Issue Type: Bug
  Components: Indexing, Metastore
Affects Versions: 0.10.0, 0.11.0, 0.12.0
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 0.14.0

 Attachments: HIVE-5664.1.patch.txt


 {code}
 CREATE DATABASE db2; 
 USE db2; 
 CREATE TABLE tab1 (id int, name string); 
 CREATE INDEX idx1 ON TABLE tab1(id) as 'COMPACT' with DEFERRED REBUILD IN 
 TABLE tab1_indx; 
 DROP DATABASE db2 CASCADE;
 {code}
 Last DDL fails with the following error:
 {code}
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. Database does not exist: db2
 Hive.log has following exception
 2013-10-27 20:46:16,629 ERROR exec.DDLTask (DDLTask.java:execute(434)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: db2
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.dropDatabase(DDLTask.java:3473)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:231)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1441)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1219)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1047)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:915)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 Caused by: NoSuchObjectException(message:db2.tab1_indx table not found)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1376)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
 at com.sun.proxy.$Proxy7.get_table(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:890)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:660)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:652)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:546)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
 at com.sun.proxy.$Proxy8.dropDatabase(Unknown Source)
 at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:284)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.dropDatabase(DDLTask.java:3470)
 ... 18 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

2014-05-12 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995217#comment-13995217
 ] 

Sergey Shelukhin commented on HIVE-6430:


ping?

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 21138: Support more generic way of using composite key for HBaseHandler

2014-05-12 Thread Xuefu Zhang



 On May 12, 2014, 4:53 a.m., Swarnim Kulkarni wrote:
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java,
   line 132
  https://reviews.apache.org/r/21138/diff/1/?file=575776#file575776line132
 
  Sure. Basically the current implementation in hive supports filter 
  pushdown but not for complex keys like structs and because a composite key 
  is represented as struct, this functionality was needed to pushdown any 
  queries run on the composite keys.
  
  The change for this was pretty simple. Basically if you look into 
  ExprNodeDescUtils class, there is an extractFields method in it. When 
  dealing with a struct, it gets represented as a ExprNodeDesc object.For 
  instance, for a struct with definition 
  test:structa:int,b:string,c:string, when we do a test.a for the key, in 
  order to behave like traditional pushdown of primitive type we need to 
  extract the field a from the given ExprNodeDesc. The validator will 
  validate that this is the first field in the struct, or else it won't 
  pushdown anything.
  
  So if the user did something like test.a=5, we pushdown the value of 
  5 as well down to the custom implementation so that the user can choose 
  to convert it into a hbase scan filter the way he way which would then get 
  applied back onto the hbase scan.
  
  This is pretty much what this patch attempts to do. Please let me know 
  if there is something else that you would want an explanation on. Thanks.

That's pretty much what I had in mind. FamilyFilter really got me confused.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21138/#review42666
---


On May 8, 2014, 3:42 p.m., Swarnim Kulkarni wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/21138/
 ---
 
 (Updated May 8, 2014, 3:42 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-2599 introduced using custom object for the row key. But it forces key 
 objects to extend HBaseCompositeKey, which is again extension of LazyStruct. 
 If user provides proper Object and OI, we can replace internal key and keyOI 
 with those. 
 
 Initial implementation is based on factory interface.
 {code}
 public interface HBaseKeyFactory {
   void init(SerDeParameters parameters, Properties properties) throws 
 SerDeException;
   ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
   LazyObjectBase createObject(ObjectInspector inspector) throws 
 SerDeException;
 }
 {code}
 
 
 Diffs
 -
 
   hbase-handler/pom.xml 132af43 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java
  PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/ColumnMappings.java 
 PRE-CREATION 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java
  PRE-CREATION 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/DefaultHBaseKeyFactory.java
  PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 
 5008f15 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java 
 PRE-CREATION 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java
  PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseRowSerializer.java 
 PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java 
 PRE-CREATION 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java 
 b64590d 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
 4fe1b1b 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
  142bfd8 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 fc40195 
   
 hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java
  13c344b 
   
 hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java 
 PRE-CREATION 
   
 hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java 
 PRE-CREATION 
   
 hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 
 7c4fc9f 
   hbase-handler/src/test/queries/positive/hbase_custom_key.q PRE-CREATION 
   hbase-handler/src/test/queries/positive/hbase_custom_key2.q PRE-CREATION 
   hbase-handler/src/test/results/positive/hbase_custom_key.q.out PRE-CREATION 
   hbase-handler/src/test/results/positive/hbase_custom_key2.q.out 
 PRE-CREATION 
   itests/util/pom.xml e9720df 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java

[jira] [Commented] (HIVE-6411) Support more generic way of using composite key for HBaseHandler

2014-05-12 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995423#comment-13995423
 ] 

Xuefu Zhang commented on HIVE-6411:
---

+1

@Swarnim K Please log a followup JIRA to track the FamilyFilter issue and link 
it here.

 Support more generic way of using composite key for HBaseHandler
 

 Key: HIVE-6411
 URL: https://issues.apache.org/jira/browse/HIVE-6411
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6411.1.patch.txt, HIVE-6411.10.patch.txt, 
 HIVE-6411.11.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, 
 HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt, HIVE-6411.6.patch.txt, 
 HIVE-6411.7.patch.txt, HIVE-6411.8.patch.txt, HIVE-6411.9.patch.txt


 HIVE-2599 introduced using custom object for the row key. But it forces key 
 objects to extend HBaseCompositeKey, which is again extension of LazyStruct. 
 If user provides proper Object and OI, we can replace internal key and keyOI 
 with those. 
 Initial implementation is based on factory interface.
 {code}
 public interface HBaseKeyFactory {
   void init(SerDeParameters parameters, Properties properties) throws 
 SerDeException;
   ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
   LazyObjectBase createObject(ObjectInspector inspector) throws 
 SerDeException;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5342) Remove pre hadoop-0.20.0 related codes

2014-05-12 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5342:
-

Attachment: HIVE-5342.2.patch

Incorporating Ashutosh's feedback.

 Remove pre hadoop-0.20.0 related codes
 --

 Key: HIVE-5342
 URL: https://issues.apache.org/jira/browse/HIVE-5342
 Project: Hive
  Issue Type: Task
Reporter: Navis
Assignee: Jason Dere
Priority: Trivial
 Attachments: D13047.1.patch, HIVE-5342.1.patch, HIVE-5342.2.patch


 Recently, we discussed not supporting hadoop-0.20.0. If it would be done like 
 that or not, 0.17 related codes would be removed before that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7046) Propagate addition of new columns to partition schema

2014-05-12 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995384#comment-13995384
 ] 

Szehon Ho commented on HIVE-7046:
-

This is also related to HIVE-6131

 Propagate addition of new columns to partition schema
 -

 Key: HIVE-7046
 URL: https://issues.apache.org/jira/browse/HIVE-7046
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Affects Versions: 0.12.0
Reporter: Mariano Dominguez

 Hive reads data according to the partition schema, not the table schema 
 (because of HIVE-3833). ALTER TABLE only updates the table schema, and the 
 changes are not propagated to partitions. Thus, the schema of a partition 
 will differ from that of the table after altering the table schema; this is 
 done to preserve the ability to read existing data, particularly when using 
 binary formats such as RCFile. Binary formats do not allow changing the type 
 of a field because of the way serialization works; a field serialized as a 
 string will be displayed incorrectly if read as an integer.
 Unfortunately, as a side effect, this behavior limits the ability to add new 
 columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A 
 possible workaround is to manually recreate the partitions, but this process 
 could be unnecessarily cumbersome if the number of partitions is high. New 
 columns should be propagated to existing partitions automatically instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2


 [ 
https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7042:
---

Status: Patch Available  (was: Open)

 Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
 --

 Key: HIVE-7042
 URL: https://issues.apache.org/jira/browse/HIVE-7042
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7042.1.patch


 stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as 
 opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression 
 (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield 
 different results for these tests. ORC should use HiveIF to generate ORC 
 splits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer


[ 
https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994895#comment-13994895
 ] 

Navis edited comment on HIVE-7012 at 5/12/14 7:42 AM:
--

[~ashutoshc] Yes, it's intended. In the query ppd2.q
{code}
select a.*
  from (
select key, count(value) as cc
from srcpart a
where a.ds = '2008-04-08' and a.hr = '11'
group by key
  )a
  distribute by a.key
  sort by a.key,a.cc desc
{code}
cc is generated field by GBY operator, so It's semantically wrong to merge the 
RS for GBY with any following RS. But the same time, sort on a.cc is 
meaningless so it can be removed in optimizing, but not in here (maybe in 
SemanticAnalyzer?).

[~sunrui] Yes, RS for distinct should be avoided from any dedup process. Could 
you take this issue? I think you knows better than me.


was (Author: navis):
[~ashutoshc] Yes, it's intended. In the query ppd2.q
{code}
select a.*
  from (
select key, count(value) as cc
from srcpart a
where a.ds = '2008-04-08' and a.hr = '11'
group by key
  )a
  distribute by a.key
  sort by a.key,a.cc desc
{code}
cc is generated field by GBY operator, so It's semantically wrong to merged RS 
for GBY with following RS. But the same time, sort on a.cc is meaningless so 
it can be removed in optimizing, but not in here (maybe in SemanticAnalyzer?).

[~sunrui] Yes, RS for distinct should be avoided from any dedup process. Could 
you take this issue? I think you knows better than me.

 Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
 

 Key: HIVE-7012
 URL: https://issues.apache.org/jira/browse/HIVE-7012
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Sun Rui
Assignee: Navis
 Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt


 With HIVE 0.13.0, run the following test case:
 {code:sql}
 create table src(key bigint, value string);
 select  
count(distinct key) as col0
 from src
 order by col0;
 {code}
 The following exception will be thrown:
 {noformat}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 9 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
   ... 14 more
 Caused by: java.lang.RuntimeException: cannot find field _col0 from 
 [0:reducesinkkey0]
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
   ... 14 more
 {noformat}
 This issue is related to HIVE-6455. When hive.optimize.reducededuplication is 
 set to false, then this issue will be gone.
 Logical plan when hive.optimize.reducededuplication=false;
 {noformat}
 src 
   TableScan (TS_0)
 alias: src
 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE
 Select Operator (SEL_1)
   expressions: key (type: bigint)
   outputColumnNames: key
   Statistics:

[jira] [Updated] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2


 [ 
https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7042:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Tested this manually on mac os  on ubuntu. Results are consistent. Committed 
to trunk. Thanks, Prasanth!

 Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
 --

 Key: HIVE-7042
 URL: https://issues.apache.org/jira/browse/HIVE-7042
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7042.1.patch, HIVE-7042.1.patch.txt


 stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as 
 opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression 
 (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield 
 different results for these tests. ORC should use HiveIF to generate ORC 
 splits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7046) Propagate addition of new columns to partition schema

2014-05-12 Thread Mariano Dominguez (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mariano Dominguez updated HIVE-7046:

Description:
Hive reads data according to the partition schema, not the table schema
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the
changes are not propagated to partitions. Thus, the schema of a partition will
differ from that of the table after altering the table schema; this is done to
preserve the ability to read existing data, particularly when using binary
formats such as RCFile. Binary formats do not allow changing the type of a
field because of the way serialization works; a field serialized as a string
will be displayed incorrectly if read as an integer.

Unfortunately, as a side effect, this behavior limits the ability to add new
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible
workaround is to manually recreate the partitions, but this process could be
unnecessarily cumbersome if the number of partitions is high. New columns
should be propagated to existing partitions automatically instead.

was:
Hive reads data according to the partition schema, not the table schema
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the
changes are not propagated to partitions. Thus, the schema of a partition will
differ from that of the table after altering the table schema; this is done to
preserve the ability to read existing data, particularly when using binary
formats such as RCFile. Binary formats do not allow changing the type of a
field because of the way serialization works; a field serialized as a string
will be displayed incorrectly if read as an integer.

Unfortunately, as a side effect, this behavior limits the ability to add new
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible
workaround is to recreate the partitions, but this process could be
unnecessarily cumbersome if the number of partitions is high. New columns
should be propagated to existing partitions automatically instead.

Propagate addition of new columns to partition schema
-

Key: HIVE-7046
URL: https://issues.apache.org/jira/browse/HIVE-7046
Project: Hive
Issue Type: Improvement
Components: Database/Schema
Affects Versions: 0.12.0
Reporter: Mariano Dominguez

Hive reads data according to the partition schema, not the table schema
(because of HIVE-3833). ALTER TABLE only updates the table schema, and the
changes are not propagated to partitions. Thus, the schema of a partition
will differ from that of the table after altering the table schema; this is
done to preserve the ability to read existing data, particularly when using
binary formats such as RCFile. Binary formats do not allow changing the type
of a field because of the way serialization works; a field serialized as a
string will be displayed incorrectly if read as an integer.
Unfortunately, as a side effect, this behavior limits the ability to add new
columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A
possible workaround is to manually recreate the partitions, but this process
could be unnecessarily cumbersome if the number of partitions is high. New
columns should be propagated to existing partitions automatically instead.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 20399: Invalid column access info for partitioned table

2014-05-12 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20399/#review42627
---


Patch looks good. But looks like there are few changes which may not be 
essential for the patch.


ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
https://reviews.apache.org/r/20399/#comment76596

Its not clear whats the difference between neededColumns  
referencedColumns. If not, can we just use neededColumns? If there is any, it 
would be good to add a comment, why neededColumns is not sufficient here.



ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
https://reviews.apache.org/r/20399/#comment76454

Operator should not contain any compile time info, only runtime info. 
Compile time info belongs to Desc classes. So, move this field to TableScanDesc 
class.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
https://reviews.apache.org/r/20399/#comment76455

In line with above comment, this should then be 
scanOp.getConf().setReferencedColumns()



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
https://reviews.apache.org/r/20399/#comment76598

Its not clear how this referredColumns are used. Its populated, but seems 
like no one is making use of it.


- Ashutosh Chauhan


On May 7, 2014, 4:06 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/20399/
 ---
 
 (Updated May 7, 2014, 4:06 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-6910
 https://issues.apache.org/jira/browse/HIVE-6910
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 From http://www.mail-archive.com/user@hive.apache.org/msg11324.html
 
 neededColumnIDs in TS is only for non-partition columns. But 
 ColumnAccessAnalyzer is calculating it on all columns.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 58ed550 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
 6a4dc9b 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 
 8c4b891 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java f285312 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
 6bdf394 
   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessAnalyzer.java 
 74b595a 
   ql/src/java/org/apache/hadoop/hive/ql/parse/ProcessAnalyzeTable.java 
 c26be3c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/PrunedPartitionList.java 
 d3268dd 
   ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java a7cec5d 
   ql/src/test/queries/clientpositive/column_access_stats.q fbf8bba 
   ql/src/test/results/clientpositive/column_access_stats.q.out 7eee4ba 
 
 Diff: https://reviews.apache.org/r/20399/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Navis Ryu

[jira] [Updated] (HIVE-6820) HiveServer(2) ignores HIVE_OPTS


 [ 
https://issues.apache.org/jira/browse/HIVE-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6820:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk.
Thanks for the contribution [~libing]


 HiveServer(2) ignores HIVE_OPTS
 ---

 Key: HIVE-6820
 URL: https://issues.apache.org/jira/browse/HIVE-6820
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Richard Ding
Assignee: Bing Li
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-6820.1.patch


 In hiveserver2.sh:
 {code}
 exec $HADOOP jar $JAR $CLASS $@
 {code}
 While cli.sh having:
 {code}
 exec $HADOOP jar ${HIVE_LIB}/hive-cli-*.jar $CLASS $HIVE_OPTS $@
 {code}
 Hence some hive commands that run properly in Hive shell fail in HiveServer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7033) grant statements should check if the role exists


 [ 
https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7033:


Status: Patch Available  (was: Open)

 grant statements should check if the role exists
 

 Key: HIVE-7033
 URL: https://issues.apache.org/jira/browse/HIVE-7033
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch, 
 HIVE-7033.4.patch


 The following grant statement that grants to a role that does not exist 
 succeeds, but it should result in an error.
  grant all on t1 to role nosuchrole;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 21289: HIVE-7033 : grant statements should check if the role exists

2014-05-12 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21289/
---

(Updated May 12, 2014, 8:25 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Fix possibility of TOCTOU issue.


Bugs: HIVE-7033
https://issues.apache.org/jira/browse/HIVE-7033


Repository: hive-git


Description
---

The following grant statement that grants to a role that does not exist 
succeeds, but it should result in an error.

 grant all on t1 to role nosuchrole;

Patch also fixes the handling of role names in some cases to be case 
insensitive.


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4b4f4f2 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrincipal.java
 62b8994 
  ql/src/test/queries/clientnegative/authorization_role_grant_nosuchrole.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/authorization_table_grant_nosuchrole.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/authorization_1_sql_std.q 79ae17a 
  ql/src/test/queries/clientpositive/authorization_role_grant1.q f89d0dc 
  ql/src/test/queries/clientpositive/authorization_role_grant2.q 984d7ed 
  ql/src/test/results/clientnegative/authorization_role_grant_nosuchrole.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/authorization_table_grant_nosuchrole.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/authorization_1_sql_std.q.out 718ff31 
  ql/src/test/results/clientpositive/authorization_role_grant1.q.out 3c846eb 
  ql/src/test/results/clientpositive/authorization_role_grant2.q.out 1e8f88a 

Diff: https://reviews.apache.org/r/21289/diff/


Testing
---

New tests included


Thanks,

Thejas Nair

Re: Review Request 20899: HIVE-6994 - parquet-hive createArray strips null elements

2014-05-12 Thread justin coffey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20899/
---

(Updated May 12, 2014, 2:18 p.m.)


Review request for hive.


Changes
---

added back finals and cleaned up commentary.


Repository: hive-git


Description
---

- Fix for bug in createArray() that strips null elements.
- In the process refactored serde for simplification purposes.
- Refactored tests for better regression testing.


Diffs (updated)
-

  data/files/parquet_create.txt ccd48ee 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
b689336 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 
3b56fc7 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveSerDe.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_create.q 0b976bd 
  ql/src/test/results/clientpositive/parquet_create.q.out 3220be5 

Diff: https://reviews.apache.org/r/20899/diff/


Testing
---


Thanks,

justin coffey

[jira] [Commented] (HIVE-7037) Add additional tests for transform clauses with Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993092#comment-13993092
 ] 

Vikram Dixit K commented on HIVE-7037:
--

LGTM +1.

 Add additional tests for transform clauses with Tez
 ---

 Key: HIVE-7037
 URL: https://issues.apache.org/jira/browse/HIVE-7037
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7037.1.patch


 Enabling some q tests for Tez wrt to ScriptOperator/Stream/Transform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 21289: HIVE-7033 : grant statements should check if the role exists

2014-05-12 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21289/
---

(Updated May 12, 2014, 8:50 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

HIVE-7033.4.patch - q.out files didn't have the comment update.


Bugs: HIVE-7033
https://issues.apache.org/jira/browse/HIVE-7033


Repository: hive-git


Description
---

The following grant statement that grants to a role that does not exist 
succeeds, but it should result in an error.

 grant all on t1 to role nosuchrole;

Patch also fixes the handling of role names in some cases to be case 
insensitive.


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4b4f4f2 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrincipal.java
 62b8994 
  ql/src/test/queries/clientnegative/authorization_role_grant_nosuchrole.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/authorization_table_grant_nosuchrole.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/authorization_1_sql_std.q 79ae17a 
  ql/src/test/queries/clientpositive/authorization_role_grant1.q f89d0dc 
  ql/src/test/queries/clientpositive/authorization_role_grant2.q 984d7ed 
  ql/src/test/results/clientnegative/authorization_role_grant_nosuchrole.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/authorization_table_grant_nosuchrole.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/authorization_1_sql_std.q.out 718ff31 
  ql/src/test/results/clientpositive/authorization_role_grant1.q.out 3c846eb 
  ql/src/test/results/clientpositive/authorization_role_grant2.q.out 1e8f88a 

Diff: https://reviews.apache.org/r/21289/diff/


Testing
---

New tests included


Thanks,

Thejas Nair

[jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead

2014-05-12 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6430:
---

Attachment: HIVE-6430.13.patch

CR feedback. RB was never posted in the JIRA, apparently... it's at 
https://reviews.apache.org/r/18936/

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7037) Add additional tests for transform clauses with Tez


[ 
https://issues.apache.org/jira/browse/HIVE-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995712#comment-13995712
 ] 

Gunther Hagleitner commented on HIVE-7037:
--

Test failures are unrelated.

 Add additional tests for transform clauses with Tez
 ---

 Key: HIVE-7037
 URL: https://issues.apache.org/jira/browse/HIVE-7037
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7037.1.patch


 Enabling some q tests for Tez wrt to ScriptOperator/Stream/Transform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable

Mohammad Kamrul Islam created HIVE-7049:
---

 Summary: Unable to deserialize AVRO data when file schema and 
record schema are different and nullable
 Key: HIVE-7049
 URL: https://issues.apache.org/jira/browse/HIVE-7049
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


It mainly happens when 
1 )file schema and record schema are not same
2 ) Record schema is nullable  but file schema is not.

The potential code location is at class AvroDeserialize
 
{noformat}
 if(AvroSerdeUtils.isNullableType(recordSchema)) {
  return deserializeNullableUnion(datum, fileSchema, recordSchema, 
columnType);
}
{noformat}

In the above code snippet, recordSchema is verified if it is nullable. But the 
file schema is not checked.

I tested with these values:
{noformat}
recordSchema= [null,string]
fielSchema= string
{noformat}

And i got the following exception line numbers might not be the same due to mu 
debugged code version.

{noformat}
org.apache.avro.AvroRuntimeException: Not a union: string 
at org.apache.avro.Schema.getTypes(Schema.java:272)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
at 
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
at 
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
at 
org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)

{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6989) Error with arithmetic operators with javaXML serialization


[ 
https://issues.apache.org/jira/browse/HIVE-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995568#comment-13995568
 ] 

Ashutosh Chauhan commented on HIVE-6989:


+1

 Error with arithmetic operators with javaXML serialization
 --

 Key: HIVE-6989
 URL: https://issues.apache.org/jira/browse/HIVE-6989
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-6989.1.patch, HIVE-6989.2.patch


 A couple of members in GenericUDFBaseNumeric do not have getters/setters, 
 which prevents them from being serialized as part of the query plan when 
 using javaXML serialization. As a result, the following query:
 {noformat}
 select key + key from src limit 5;
 {noformat}
 fails with the following error:
 {noformat}
 java.lang.Exception: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:401)
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:233)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:680)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 11 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
 ... 16 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 19 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
 ... 24 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseNumeric.initialize(GenericUDFBaseNumeric.java:109)
 at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:116)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorHead.initialize(ExprNodeEvaluatorHead.java:39)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:931)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:957)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)

[jira] [Commented] (HIVE-6976) Show query id only when there's jobs on the cluster


[ 
https://issues.apache.org/jira/browse/HIVE-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995683#comment-13995683
 ] 

Gunther Hagleitner commented on HIVE-6976:
--

Failures unrelated. Happened in the run before as well.

 Show query id only when there's jobs on the cluster
 ---

 Key: HIVE-6976
 URL: https://issues.apache.org/jira/browse/HIVE-6976
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor
 Attachments: HIVE-6976.1.patch


 No need to print the query id for local-only execution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5342) Remove pre hadoop-0.20.0 related codes


[ 
https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995573#comment-13995573
 ] 

Ashutosh Chauhan commented on HIVE-5342:


+1

 Remove pre hadoop-0.20.0 related codes
 --

 Key: HIVE-5342
 URL: https://issues.apache.org/jira/browse/HIVE-5342
 Project: Hive
  Issue Type: Task
Reporter: Navis
Assignee: Jason Dere
Priority: Trivial
 Attachments: D13047.1.patch, HIVE-5342.1.patch, HIVE-5342.2.patch


 Recently, we discussed not supporting hadoop-0.20.0. If it would be done like 
 that or not, 0.17 related codes would be removed before that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Status: Patch Available  (was: Open)

Resubmitting so that hive-qa picks it up.

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Status: Patch Available  (was: Open)

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7048) CompositeKeyHBaseFactory should not use FamilyFilter

2014-05-12 Thread Swarnim Kulkarni (JIRA)

Swarnim Kulkarni created HIVE-7048:
--

 Summary: CompositeKeyHBaseFactory should not use FamilyFilter
 Key: HIVE-7048
 URL: https://issues.apache.org/jira/browse/HIVE-7048
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Swarnim Kulkarni


HIVE-6411 introduced a more generic way to provide composite key 
implementations via custom factory implementations. However it seems like the 
CompositeHBaseKeyFactory implementation uses a FamilyFilter for row key scans 
which doesn't seem appropriate. This should be investigated further and if 
possible replaced with a RowRangeScanFilter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Attachment: HIVE-7043.3.patch

Minor nit fixed. Queue name could potentially be null.

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue


 [ 
https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7043:
-

Attachment: (was: HIVE-7043.1.patch)

 When using the tez session pool via hive, once sessions time out, all queries 
 go to the default queue
 -

 Key: HIVE-7043
 URL: https://issues.apache.org/jira/browse/HIVE-7043
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: HIVE-7043.2.patch


 When using a tez session pool to run multiple queries, once the sessions time 
 out, we always end up using the default queue to launch queries. The load 
 balancing doesn't work in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable


 [ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-7049:


Status: Patch Available  (was: Open)

 Unable to deserialize AVRO data when file schema and record schema are 
 different and nullable
 -

 Key: HIVE-7049
 URL: https://issues.apache.org/jira/browse/HIVE-7049
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-7049.1.patch


 It mainly happens when 
 1 )file schema and record schema are not same
 2 ) Record schema is nullable  but file schema is not.
 The potential code location is at class AvroDeserialize
  
 {noformat}
  if(AvroSerdeUtils.isNullableType(recordSchema)) {
   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
 columnType);
 }
 {noformat}
 In the above code snippet, recordSchema is verified if it is nullable. But 
 the file schema is not checked.
 I tested with these values:
 {noformat}
 recordSchema= [null,string]
 fielSchema= string
 {noformat}
 And i got the following exception line numbers might not be the same due to 
 mu debugged code version.
 {noformat}
 org.apache.avro.AvroRuntimeException: Not a union: string 
 at org.apache.avro.Schema.getTypes(Schema.java:272)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 21353: Unable to deserialize AVRO data when file schema and record schema are different and nullable

2014-05-12 Thread Mohammad Islam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21353/
---

Review request for hive, Ashutosh Chauhan and Xuefu Zhang.


Bugs: HIVE-7049
https://issues.apache.org/jira/browse/HIVE-7049


Repository: hive-git


Description
---

See https://issues.apache.org/jira/browse/HIVE-7049


Diffs
-

  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
ce933ff 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 
198bd24 
  
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java
 76c1940 

Diff: https://reviews.apache.org/r/21353/diff/


Testing
---


Thanks,

Mohammad Islam

Re: Review Request 18492: HIVE-6473: Allow writing HFiles via HBaseStorageHandler table

2014-05-12 Thread nick dimiduk



 On March 7, 2014, 9:01 p.m., Swarnim Kulkarni wrote:
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java,
   line 351
  https://reviews.apache.org/r/18492/diff/1/?file=503857#file503857line351
 
  Do we have bugs logged for this or it would be covered in future 
  revisions on the same patch?

My intention is to address this in a future patch. Nothing filed as of yet. Let 
me refresh my memory here and log them. Do you want this patch to refer to 
those issue numbers?


- nick


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18492/#review36574
---


On Feb. 26, 2014, 12:07 a.m., nick dimiduk wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/18492/
 ---
 
 (Updated Feb. 26, 2014, 12:07 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-6473
 https://issues.apache.org/jira/browse/HIVE-6473
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 From the JIRA:
 
 Generating HFiles for bulkload into HBase could be more convenient. Right now 
 we require the user to register a new table with the appropriate output 
 format. This patch allows the exact same functionality, but through an 
 existing table managed by the HBaseStorageHandler.
 
 
 Diffs
 -
 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 
 8cd594b 
   
 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java
  6d383b5 
   
 hbase-handler/src/test/queries/negative/generatehfiles_require_family_path.q 
 PRE-CREATION 
   hbase-handler/src/test/queries/positive/hbase_bulk.m f8bb47d 
   hbase-handler/src/test/queries/positive/hbase_bulk.q PRE-CREATION 
   hbase-handler/src/test/queries/positive/hbase_handler_bulk.q PRE-CREATION 
   
 hbase-handler/src/test/results/negative/generatehfiles_require_family_path.q.out
  PRE-CREATION 
   hbase-handler/src/test/results/positive/hbase_handler_bulk.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/18492/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 nick dimiduk

[jira] [Resolved] (HIVE-5803) Support CTAS from a non-avro table to an avro table


 [ 
https://issues.apache.org/jira/browse/HIVE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam resolved HIVE-5803.
-

Resolution: Won't Fix

 Support CTAS from a non-avro table to an avro table
 ---

 Key: HIVE-5803
 URL: https://issues.apache.org/jira/browse/HIVE-5803
 Project: Hive
  Issue Type: Task
Reporter: Mohammad Kamrul Islam
Assignee: Carl Steinbach

 Hive currently does not work with HQL like :
 CREATE TABLE AVRO-BASE-TABLE as SELECT * from NON_AVRO_TABLE;
 Actual it works successfully. But when I run SELECT * from 
 AVRO-BASED-TABLE .. it fails.
 This JIRA depends on HIVE-3159 that translates TypeInfo to Avro schema.
 Findings so far: CTAS uses internal column names (in place of using the 
 column names provided in select) when crating the AVRO data file. In other 
 words, avro data file has column names in this form  of: _col0, _col1 where 
 as table column names are different.
 I tested with the following test cases and it failed:
 - verify 1) can create table using create table as select from non-avro table 
 2) LOAD avro data into new table and read data from the new table
 CREATE TABLE simple_kv_txt (key STRING, value STRING) STORED AS TEXTFILE;
 DESCRIBE simple_kv_txt;
 LOAD DATA LOCAL INPATH '../data/files/kv1.txt' INTO TABLE simple_kv_txt;
 SELECT * FROM simple_kv_txt ORDER BY KEY;
 CREATE TABLE copy_doctors ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' as SELECT key 
 as key, value as value FROM simple_kv_txt;
 DESCRIBE copy_doctors;
 SELECT * FROM copy_doctors;
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable


[ 
https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995896#comment-13995896
 ] 

Mohammad Kamrul Islam commented on HIVE-7049:
-

RB at: https://reviews.apache.org/r/21353/

 Unable to deserialize AVRO data when file schema and record schema are 
 different and nullable
 -

 Key: HIVE-7049
 URL: https://issues.apache.org/jira/browse/HIVE-7049
 Project: Hive
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-7049.1.patch


 It mainly happens when 
 1 )file schema and record schema are not same
 2 ) Record schema is nullable  but file schema is not.
 The potential code location is at class AvroDeserialize
  
 {noformat}
  if(AvroSerdeUtils.isNullableType(recordSchema)) {
   return deserializeNullableUnion(datum, fileSchema, recordSchema, 
 columnType);
 }
 {noformat}
 In the above code snippet, recordSchema is verified if it is nullable. But 
 the file schema is not checked.
 I tested with these values:
 {noformat}
 recordSchema= [null,string]
 fielSchema= string
 {noformat}
 And i got the following exception line numbers might not be the same due to 
 mu debugged code version.
 {noformat}
 org.apache.avro.AvroRuntimeException: Not a union: string 
 at org.apache.avro.Schema.getTypes(Schema.java:272)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188)
 at 
 org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487)
 at 
 org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names


[ 
https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995197#comment-13995197
 ] 

Ashutosh Chauhan commented on HIVE-6187:


+1

 Add test to verify that DESCRIBE TABLE works with quoted table names
 

 Key: HIVE-6187
 URL: https://issues.apache.org/jira/browse/HIVE-6187
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Andy Mok
 Attachments: HIVE-6187.1.patch


 Backticks around tables named after special keywords, such as items, allow us 
 to create, drop, and alter the table. For example
 {code:sql}
 CREATE TABLE foo.`items` (bar INT);
 DROP TABLE foo.`items`;
 ALTER TABLE `items` RENAME TO `items_`;
 {code}
 However, we cannot call
 {code:sql}
 DESCRIBE foo.`items`;
 DESCRIBE `items`;
 {code}
 The DESCRIBE query does not permit backticks to surround table names. The 
 error returned is
 {code:sql}
 FAILED: SemanticException [Error 10001]: Table not found `items`
 {code} 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5831) filter input files for bucketed tables

2014-05-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995905#comment-13995905
 ] 

Hive QA commented on HIVE-5831:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12614033/hive-5831.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/180/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/180/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-180/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'data/files/parquet_create.txt'
Reverted 'ql/src/test/results/clientpositive/parquet_create.q.out'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java'
Reverted 'ql/src/test/queries/clientpositive/parquet_create.q'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-minikdc/target itests/hive-unit/target 
itests/custom-serde/target itests/util/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
hwi/target common/target common/src/gen service/target contrib/target 
serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target 
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveSerDe.java
+ svn update
Uql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java
Uql/src/java/org/apache/hadoop/hive/ql/Driver.java

Fetching external item into 'hcatalog/src/test/e2e/harness'
Updated external to revision 1594111.

Updated to revision 1594111.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12614033

 filter input files for bucketed tables
 --

 Key: HIVE-5831
 URL: https://issues.apache.org/jira/browse/HIVE-5831
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Rui Li
 Attachments: hive-5831.patch


 When the users query a bucketed table and use the bucketed column in the 
 predicate, only the buckets that satisfy the predicate need to be scanned, 
 thus improving the performance.
 Given a table test:
 CREATE TABLE test (x INT, y STRING) CLUSTERED BY ( x ) INTO 10 BUCKETS;
 The following query only has to scan bucket 5:
 SELECT * FROM test WHERE x=5;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6901) Explain plan doesn't show operator tree for the fetch operator


[ 
https://issues.apache.org/jira/browse/HIVE-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995639#comment-13995639
 ] 

Ashutosh Chauhan commented on HIVE-6901:


+1

 Explain plan doesn't show operator tree for the fetch operator
 --

 Key: HIVE-6901
 URL: https://issues.apache.org/jira/browse/HIVE-6901
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Attachments: HIVE-6109.10.patch, HIVE-6901.1.patch, 
 HIVE-6901.2.patch, HIVE-6901.3.patch, HIVE-6901.4.patch, HIVE-6901.5.patch, 
 HIVE-6901.6.patch, HIVE-6901.7.patch, HIVE-6901.8.patch, HIVE-6901.9.patch, 
 HIVE-6901.patch


 Explaining a simple select query that involves a MR phase doesn't show 
 processor tree for the fetch operator.
 {code}
 hive explain select d from test;
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
 ...
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {code}
 It would be nice if the operator tree is shown even if there is only one node.
 Please note that in local execution, the operator tree is complete:
 {code}
 hive explain select * from test;
 OK
 STAGE DEPENDENCIES:
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 TableScan
   alias: test
   Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE Column 
 stats: NONE
   Select Operator
 expressions: d (type: int)
 outputColumnNames: _col0
 Statistics: Num rows: 8 Data size: 34 Basic stats: COMPLETE 
 Column stats: NONE
 ListSink
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6986) MatchPath fails with small resultExprString


[ 
https://issues.apache.org/jira/browse/HIVE-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995594#comment-13995594
 ] 

Ashutosh Chauhan commented on HIVE-6986:


Thanks [~fpin] for patch. I think better fix might be to just do 
r.startsWith(select) Can you try that?

 MatchPath fails with small resultExprString
 ---

 Key: HIVE-6986
 URL: https://issues.apache.org/jira/browse/HIVE-6986
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Furcy Pin
Priority: Trivial
 Attachments: HIVE-6986.1.patch


 When using MatchPath, a query like this:
 select year
 from matchpath(on 
 flights_tiny 
 sort by fl_num, year, month, day_of_month  
   arg1('LATE.LATE+'), 
   arg2('LATE'), arg3(arr_delay  15), 
 arg4('year') 
)
 ;
 will fail with error message 
 FAILED: StringIndexOutOfBoundsException String index out of range: 6



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (HIVE-5342) Remove pre hadoop-0.20.0 related codes

2014-05-12 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reopened HIVE-5342:
--

  Assignee: Jason Dere  (was: Navis)

Found some miscellaneous references to hadoop 0.17 workarounds in the code, 
will look into trying to remove some of those.

 Remove pre hadoop-0.20.0 related codes
 --

 Key: HIVE-5342
 URL: https://issues.apache.org/jira/browse/HIVE-5342
 Project: Hive
  Issue Type: Task
Reporter: Navis
Assignee: Jason Dere
Priority: Trivial
 Attachments: D13047.1.patch, HIVE-5342.1.patch


 Recently, we discussed not supporting hadoop-0.20.0. If it would be done like 
 that or not, 0.17 related codes would be removed before that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6440) sql std auth - add command to change owner of database

2014-05-12 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995820#comment-13995820
 ] 

Lefty Leverenz commented on HIVE-6440:
--

Thanks for fixing my errors, Thejas.  (It's good to know you've got my back.)

 sql std auth - add command to change owner of database
 --

 Key: HIVE-6440
 URL: https://issues.apache.org/jira/browse/HIVE-6440
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.13.0

 Attachments: HIVE-6440.1.patch, HIVE-6440.2.patch, HIVE-6440.3.patch


 It should be possible to change the owner of a database once it is created.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: showing column stats

2014-05-12 Thread Prasanth Jayachandran

I have a basic patch which prints table level column stats.. I can put up the
patch for it today/tomorrow.. but for displaying partition level column stats
we need to extend the “describe” statement to support column names..
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribePartition.
If you see the DDL describe partition does not accept column names.

I can create JIRAs for the following tasks
1) Showing column stats in describe table
2) Showing column stats in describe partition

If you would like to take up 2) please feel free to do so.

Thanks
Prasanth Jayachandran

On May 12, 2014, at 5:45 PM, Xuefu Zhang xzh...@cloudera.com wrote:

Hi all,

I'm wondering if there is a simpler way to show column stats than writing a
thrift client calling the thrift API, such as commands in Hive CLI. I have
tried desc extended as well as explain select, but none of them shows
column stats.

Thanks,
Xuefu

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[jira] [Created] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE

Prasanth J created HIVE-7050:


 Summary: Display table level column stats in DESCRIBE 
EXTENDED/FORMATTED TABLE
 Key: HIVE-7050
 URL: https://issues.apache.org/jira/browse/HIVE-7050
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J


There is currently no way to display the column level stats from hive CLI. It 
will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE


 [ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J reassigned HIVE-7050:


Assignee: Prasanth J

 Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
 -

 Key: HIVE-7050
 URL: https://issues.apache.org/jira/browse/HIVE-7050
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7050.1.patch


 There is currently no way to display the column level stats from hive CLI. It 
 will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7052) Optimize split calculation time

2014-05-12 Thread Rajesh Balamohan (JIRA)

Rajesh Balamohan created HIVE-7052:
--

 Summary: Optimize split calculation time
 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan


When running a TPC-DS query (query_27),  significant amount of time was spent 
in split computation on a dataset of size 200 GB (ORC format).

Profiling revealed that, 
1. Lot of time was spent in Config's subtitutevar (regex) in 
HiveInputFormat.getSplits() method.  
2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 

I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2


 [ 
https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7042:
---

Status: Open  (was: Patch Available)

 Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
 --

 Key: HIVE-7042
 URL: https://issues.apache.org/jira/browse/HIVE-7042
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7042.1.patch


 stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as 
 opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression 
 (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield 
 different results for these tests. ORC should use HiveIF to generate ORC 
 splits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-05-12 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996000#comment-13996000
 ] 

Patrick Wendell commented on HIVE-5733:
---

Hey just wanted to add a +1 and say that the current approach makes depending 
on Hive difficult or impossible for certain Hadoop versions due to conflicts 
with the protobuf library.

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho

 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE


 [ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7050:
-

Attachment: HIVE-7050.1.patch

 Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
 -

 Key: HIVE-7050
 URL: https://issues.apache.org/jira/browse/HIVE-7050
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
 Attachments: HIVE-7050.1.patch


 There is currently no way to display the column level stats from hive CLI. It 
 will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6440) sql std auth - add command to change owner of database


[ 
https://issues.apache.org/jira/browse/HIVE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995424#comment-13995424
 ] 

Thejas M Nair commented on HIVE-6440:
-

The alter-database set properties command also does not support SCHEMA keyword 
(the parenthesis also goes away without the optional keyword) Though I added 
this command as part of the sql std auth, it works even without having sql std 
auth disabled. I have made the edits in wiki.
Thanks for bringing it up!


 sql std auth - add command to change owner of database
 --

 Key: HIVE-6440
 URL: https://issues.apache.org/jira/browse/HIVE-6440
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.13.0

 Attachments: HIVE-6440.1.patch, HIVE-6440.2.patch, HIVE-6440.3.patch


 It should be possible to change the owner of a database once it is created.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-6974) Make Metastore Version Check work with Custom version suffixes


 [ 
https://issues.apache.org/jira/browse/HIVE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach resolved HIVE-6974.
--

Resolution: Duplicate

 Make Metastore Version Check work with Custom version suffixes
 --

 Key: HIVE-6974
 URL: https://issues.apache.org/jira/browse/HIVE-6974
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Carl Steinbach

 HIVE-3764 added support for doing a version consistency check between the 
 Hive JARs on the classpath and the metastore schema in the backend database. 
 This is a nice feature, but it currently doesn't work for well for folks who 
 appending the release version with their own suffixes, e.g. 0.12.0.li_20.
 We can fix this problem by modifying 
 MetaStoreSchemaInfo.getHiveSchemaVersion() to match against ^\d+\.\d+\.\d+ 
 and ignore anything that remains.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6994) parquet-hive createArray strips null elements

2014-05-12 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995901#comment-13995901
 ] 

Hive QA commented on HIVE-6994:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12644415/HIVE-6994.2.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 5446 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitions
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testNameMethods
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testPartition
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testListPartitions
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testNameMethods
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartition
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/179/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/179/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12644415

 parquet-hive createArray strips null elements
 -

 Key: HIVE-6994
 URL: https://issues.apache.org/jira/browse/HIVE-6994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Justin Coffey
Assignee: Justin Coffey
 Fix For: 0.14.0

 Attachments: HIVE-6994-1.patch, HIVE-6994.2.patch, HIVE-6994.patch


 The createArray method in ParquetHiveSerDe strips null values from resultant 
 ArrayWritables.
 tracked here as well: https://github.com/Parquet/parquet-mr/issues/377



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7033) grant statements should check if the role exists


 [ 
https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7033:


Attachment: HIVE-7033.4.patch

HIVE-7033.4.patch - q.out files didn't have the comment update.

 grant statements should check if the role exists
 

 Key: HIVE-7033
 URL: https://issues.apache.org/jira/browse/HIVE-7033
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch, 
 HIVE-7033.4.patch


 The following grant statement that grants to a role that does not exist 
 succeeds, but it should result in an error.
  grant all on t1 to role nosuchrole;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6549) remove templeton.jar from webhcat-default.xml, remove hcatalog/bin/hive-config.sh


 [ 
https://issues.apache.org/jira/browse/HIVE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6549:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks for the contribution Eugene!


 remove templeton.jar from webhcat-default.xml, remove 
 hcatalog/bin/hive-config.sh
 -

 Key: HIVE-6549
 URL: https://issues.apache.org/jira/browse/HIVE-6549
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-6549.2.patch, HIVE-6549.patch


 this property is no longer used
 also removed corresponding AppConfig.TEMPLETON_JAR_NAME
 hcatalog/bin/hive-config.sh is not used
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[VOTE] Apache Hive 0.13.1 Release Candidate 1

2014-05-12 Thread Sushanth Sowmyan


Apache Hive 0.13.1 Release Candidate 1 is available here:

http://people.apache.org/~khorgath/releases/0.13.1_RC1/artifacts/

Maven artifacts are available here:

https://repository.apache.org/content/repositories/orgapachehive-1013/

Source tag for RC1 is at : 
https://svn.apache.org/viewvc/hive/tags/release-0.13.1-rc1/

Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks.
-Sushanth

[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead


[ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996014#comment-13996014
 ] 

Gunther Hagleitner commented on HIVE-6430:
--

+1 looks good!

 MapJoin hash table has large memory overhead
 

 Key: HIVE-6430
 URL: https://issues.apache.org/jira/browse/HIVE-6430
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
 HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
 HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, 
 HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, 
 HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch


 Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
 for row) can take several hundred bytes, which is ridiculous. I am reducing 
 the size of MJKey and MJRowContainer in other jiras, but in general we don't 
 need to have java hash table there.  We can either use primitive-friendly 
 hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
 primitive keys to single row storage structure without an object per row 
 (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7033) grant statements should check if the role exists


[ 
https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995616#comment-13995616
 ] 

Ashutosh Chauhan commented on HIVE-7033:


+1

 grant statements should check if the role exists
 

 Key: HIVE-7033
 URL: https://issues.apache.org/jira/browse/HIVE-7033
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Affects Versions: 0.13.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch, 
 HIVE-7033.4.patch


 The following grant statement that grants to a role that does not exist 
 succeeds, but it should result in an error.
  grant all on t1 to role nosuchrole;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE


 [ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7050:
-

Status: Patch Available  (was: Open)

 Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
 -

 Key: HIVE-7050
 URL: https://issues.apache.org/jira/browse/HIVE-7050
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
 Attachments: HIVE-7050.1.patch


 There is currently no way to display the column level stats from hive CLI. It 
 will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7052) Optimize split calculation time

2014-05-12 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-7052:
---

Attachment: HIVE-7052-v3.patch

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
 HIVE-7052-v3.patch


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7053) Unable to fetch column stats from decimal columns

2014-05-12 Thread Xuefu Zhang (JIRA)

Xuefu Zhang created HIVE-7053:
-

 Summary: Unable to fetch column stats from decimal columns
 Key: HIVE-7053
 URL: https://issues.apache.org/jira/browse/HIVE-7053
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0
Reporter: Xuefu Zhang


After HIVE-6701, column stats for decimal columns can be computed. However, 
when the stats are fetched, nothing is returned. The problem was originally 
reproducible using thrift API. With the patch in HIVE-7050, the problem can be 
also produced using desc formatted table column.

{code}
hive desc formatted dec i;   
OK
# col_name  data_type   min max 
num_nulls   distinct_count  avg_col_len 
max_col_len num_trues   num_falses  
comment 

 
i   int 0   4   
0   5   null
nullnullnull
from deserializer   
hive desc formatted dec d;
OK
# col_name  data_type   min max 
num_nulls   distinct_count  avg_col_len 
max_col_len num_trues   num_falses  
comment 

 
d   decimal(5,2)from deserializer   
 
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7037) Add additional tests for transform clauses with Tez


 [ 
https://issues.apache.org/jira/browse/HIVE-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7037:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Vikram for the review!

 Add additional tests for transform clauses with Tez
 ---

 Key: HIVE-7037
 URL: https://issues.apache.org/jira/browse/HIVE-7037
 Project: Hive
  Issue Type: Bug
  Components: Tez
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.14.0

 Attachments: HIVE-7037.1.patch


 Enabling some q tests for Tez wrt to ScriptOperator/Stream/Transform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7000) Several issues with javadoc generation