date:20140811


 [ 
https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-7669:
---

Assignee: Navis

 parallel order by clause on a string column fails with IOException: Split 
 points are out of order
 -

 Key: HIVE-7669
 URL: https://issues.apache.org/jira/browse/HIVE-7669
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor, SQL
Affects Versions: 0.12.0
 Environment: Hive 0.12.0-cdh5.0.0
 OS: Redhat linux
Reporter: Vishal Kamath
Assignee: Navis
  Labels: orderby
 Attachments: HIVE-7669.1.patch.txt


 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
 ... 17 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order


 [ 
https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7669:


Attachment: HIVE-7669.1.patch.txt

 parallel order by clause on a string column fails with IOException: Split 
 points are out of order
 -

 Key: HIVE-7669
 URL: https://issues.apache.org/jira/browse/HIVE-7669
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor, SQL
Affects Versions: 0.12.0
 Environment: Hive 0.12.0-cdh5.0.0
 OS: Redhat linux
Reporter: Vishal Kamath
  Labels: orderby
 Attachments: HIVE-7669.1.patch.txt


 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
 ... 17 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order


 [ 
https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7669:


Status: Patch Available  (was: Open)

Running preliminary test

 parallel order by clause on a string column fails with IOException: Split 
 points are out of order
 -

 Key: HIVE-7669
 URL: https://issues.apache.org/jira/browse/HIVE-7669
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor, SQL
Affects Versions: 0.12.0
 Environment: Hive 0.12.0-cdh5.0.0
 OS: Redhat linux
Reporter: Vishal Kamath
Assignee: Navis
  Labels: orderby
 Attachments: HIVE-7669.1.patch.txt


 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96)
 ... 17 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7675) Implement native HiveMapFunction


 [ 
https://issues.apache.org/jira/browse/HIVE-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7675:


Issue Type: Sub-task  (was: New Feature)
Parent: HIVE-7292

 Implement native HiveMapFunction
 

 Key: HIVE-7675
 URL: https://issues.apache.org/jira/browse/HIVE-7675
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li

 Currently, Hive on Spark depend on ExecMapper to execute operator logic, full 
 stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. 
 HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several 
 problems as following:
 # ExecMapper is designed for MR single process task mode, it does not work 
 well under Spark multi-thread task node.
 # ExecMapper introduce extra API level restriction.
 We need implement native HiveMapFunction, as the bridge between Spark 
 framework and Hive operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7675) Implement native HiveMapFunction

Chengxiang Li created HIVE-7675:
---

 Summary: Implement native HiveMapFunction
 Key: HIVE-7675
 URL: https://issues.apache.org/jira/browse/HIVE-7675
 Project: Hive
  Issue Type: New Feature
  Components: Spark
Reporter: Chengxiang Li


Currently, Hive on Spark depend on ExecMapper to execute operator logic, full 
stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. 
HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several 
problems as following:
# ExecMapper is designed for MR single process task mode, it does not work well 
under Spark multi-thread task node.
# ExecMapper introduce extra API level restriction.

We need implement native HiveMapFunction, as the bridge between Spark framework 
and Hive operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7676) Support more methods in DatabaseMetaData

Alexander Pivovarov created HIVE-7676:
-

 Summary: Support more methods in DatabaseMetaData
 Key: HIVE-7676
 URL: https://issues.apache.org/jira/browse/HIVE-7676
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Alexander Pivovarov


I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
of returning true/false. Many JDBC clients expects implementations for 
particular methods in order to work. 

E.g. SQuirrel SQL show databases only if supportsSchemasInTableDefinitions 
returns true.

Also hive 0.13.1 supports UNION ALL and does not support UNION
we can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
supported exception.

getIdentifierQuoteString  should return space if not supported.
http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7676) Support more methods in DatabaseMetaData


 [ 
https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7676:
--

Description: 
I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
of returning true/false. Many JDBC clients expects implementations for 
particular methods in order to work. 

E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions 
returns true.

Also hive 0.13.1 supports UNION ALL and does not support UNION
we can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
supported exception.

getIdentifierQuoteString  should return space if not supported.
http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29


  was:
I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
of returning true/false. Many JDBC clients expects implementations for 
particular methods in order to work. 

E.g. SQuirrel SQL show databases only if supportsSchemasInTableDefinitions 
returns true.

Also hive 0.13.1 supports UNION ALL and does not support UNION
we can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
supported exception.

getIdentifierQuoteString  should return space if not supported.
http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29



 Support more methods in DatabaseMetaData
 

 Key: HIVE-7676
 URL: https://issues.apache.org/jira/browse/HIVE-7676
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Alexander Pivovarov

 I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
 of returning true/false. Many JDBC clients expects implementations for 
 particular methods in order to work. 
 E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions 
 returns true.
 Also hive 0.13.1 supports UNION ALL and does not support UNION
 we can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
 supported exception.
 getIdentifierQuoteString  should return space if not supported.
 http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7677) Implement native HiveReduceFunction

Chengxiang Li created HIVE-7677:
---

 Summary: Implement native HiveReduceFunction
 Key: HIVE-7677
 URL: https://issues.apache.org/jira/browse/HIVE-7677
 Project: Hive
  Issue Type: New Feature
  Components: Spark
Reporter: Chengxiang Li


Similar as HiveMapFunction, We need implement native HiveReduceFunction, as the 
bridge between Spark framework and Hive operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7677) Implement native HiveReduceFunction


 [ 
https://issues.apache.org/jira/browse/HIVE-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7677:


Issue Type: Sub-task  (was: New Feature)
Parent: HIVE-7292

 Implement native HiveReduceFunction
 ---

 Key: HIVE-7677
 URL: https://issues.apache.org/jira/browse/HIVE-7677
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li

 Similar as HiveMapFunction, We need implement native HiveReduceFunction, as 
 the bridge between Spark framework and Hive operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7643) ExecMapper static states lead to unpredictable query result.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7643:


Summary: ExecMapper static states lead to unpredictable query result.[Spark 
Branch]  (was: ExecMapper statis states lead to unpredictable query 
result.[Spark Branch])

 ExecMapper static states lead to unpredictable query result.[Spark Branch]
 --

 Key: HIVE-7643
 URL: https://issues.apache.org/jira/browse/HIVE-7643
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 ExecMapper contain static states, static variable done for example. Spark 
 executor may execute multi tasks concurrently, ExecMapper static state 
 updated by one task would influence the logic of another task, which may lead 
 to unpredictable result. To reproduce, execute 
 {code:sql}
 SELECT COUNT(*) FROM TEST TABLESAMPLE(1 ROWS) s
 {code}, 
 TEST should be a table with several blocks source data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7675) Implement native HiveMapFunction

[
https://issues.apache.org/jira/browse/HIVE-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chengxiang Li updated HIVE-7675:

Description:
Currently, Hive on Spark depend on ExecMapper to execute operator logic, full
stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators.
HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several
problems as following:
# ExecMapper is designed for MR single process task mode, it does not work well
under Spark multi-thread task node.
# ExecMapper introduce extra API level restriction and process logic.

We need implement native HiveMapFunction, as the bridge between Spark framework
and Hive operators.

was:
Currently, Hive on Spark depend on ExecMapper to execute operator logic, full
stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators.
HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several
problems as following:
# ExecMapper is designed for MR single process task mode, it does not work well
under Spark multi-thread task node.
# ExecMapper introduce extra API level restriction.

We need implement native HiveMapFunction, as the bridge between Spark framework
and Hive operators.

Implement native HiveMapFunction

Key: HIVE-7675
URL: https://issues.apache.org/jira/browse/HIVE-7675
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Chengxiang Li

Currently, Hive on Spark depend on ExecMapper to execute operator logic, full
stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators.
HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several
problems as following:
# ExecMapper is designed for MR single process task mode, it does not work
well under Spark multi-thread task node.
# ExecMapper introduce extra API level restriction and process logic.
We need implement native HiveMapFunction, as the bridge between Spark
framework and Hive operators.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23674: Handle db qualified names consistently across all HiveQL statements

2014-08-11 Thread Navis Ryu



 On Aug. 11, 2014, 3:51 a.m., Lefty Leverenz wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 2071
  https://reviews.apache.org/r/23674/diff/3/?file=657207#file657207line2071
 
  which in the form should be which is in the form

fixed.


 On Aug. 11, 2014, 3:51 a.m., Lefty Leverenz wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 2086
  https://reviews.apache.org/r/23674/diff/3/?file=657207#file657207line2086
 
  which in the form should be which is in the form

fixed.


 On Aug. 11, 2014, 3:51 a.m., Lefty Leverenz wrote:
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java, lines 
  103-105
  https://reviews.apache.org/r/23674/diff/3/?file=657209#file657209line103
 
  should does contain database name be does not contain database name?

You are right. Fixed again. Thanks.


- Navis


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23674/#review50132
---


On Aug. 11, 2014, 12:53 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23674/
 ---
 
 (Updated Aug. 11, 2014, 12:53 a.m.)
 
 
 Review request for hive and Thejas Nair.
 
 
 Bugs: HIVE-4064
 https://issues.apache.org/jira/browse/HIVE-4064
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 
 
 
 Diffs
 -
 
   
 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerCheckInvocation.java
  c91b15c 
   
 itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/CheckColumnAccessHook.java
  14fc430 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 
 ea866c5 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 6e689d0 
   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
 5a56ced 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 760777a 
   metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 74b1432 
   ql/src/java/org/apache/hadoop/hive/ql/Driver.java ea6ddbf 
   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 376e040 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java d22b1f6 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 39b032e 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java 2e32fee 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteGBUsingIndex.java
  989d0b5 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 22945e3 
   ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java 939dc65 
   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
 67a3aa7 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ab1188a 
   ql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java 856ec2f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 7b86414 
   
 ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java
  826bdf3 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterIndexDesc.java 0318e4b 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableAlterPartDesc.java 
 cf67e16 
   ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableSimpleDesc.java 
 541675c 
   ql/src/java/org/apache/hadoop/hive/ql/plan/PrivilegeObjectDesc.java 9417220 
   ql/src/java/org/apache/hadoop/hive/ql/plan/RenamePartitionDesc.java 1b5fb9e 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ShowColumnsDesc.java fe6a91e 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ShowGrantDesc.java aa88153 
   
 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java
  5c94217 
   
 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrivilegeObject.java
  9e9ef71 
   
 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveV1Authorizer.java
  fbc0090 
   ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 98c2924 
   ql/src/test/org/apache/hadoop/hive/ql/parse/TestQBCompact.java 5f32d5f 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/PrivilegesTestBase.java
  93901ec 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/TestHiveAuthorizationTaskFactory.java
  ab0d80e 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/TestPrivilegesV1.java
  fd827ad 
   
 ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/TestPrivilegesV2.java
  9499986 
   ql/src/test/queries/clientpositive/alter_rename_table.q PRE-CREATION

[jira] [Updated] (HIVE-7648) authorization api should provide table/db object for create table/dbname


 [ 
https://issues.apache.org/jira/browse/HIVE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7648:


Status: Patch Available  (was: Open)

 authorization api should provide table/db object for create table/dbname
 

 Key: HIVE-7648
 URL: https://issues.apache.org/jira/browse/HIVE-7648
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7648.1.patch


 For create table, the Authorizer.checkPrivileges call provides only the 
 database name. If the table name is passed, it will be possible for the 
 authorization api implementation to appropriately set the permissions of the 
 new table.
 Similarly, in case of create-database, the api call should provide database 
 object for the database being created.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7648) authorization api should provide table/db object for create table/dbname


 [ 
https://issues.apache.org/jira/browse/HIVE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7648:


Attachment: HIVE-7648.1.patch

HIVE-7648.1.patch - Initial patch more q.out files need to be updated

Also has update to provide database name in case of 'use db;' , and propagation 
of base table information in index commands.


 authorization api should provide table/db object for create table/dbname
 

 Key: HIVE-7648
 URL: https://issues.apache.org/jira/browse/HIVE-7648
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7648.1.patch


 For create table, the Authorizer.checkPrivileges call provides only the 
 database name. If the table name is passed, it will be possible for the 
 authorization api implementation to appropriately set the permissions of the 
 new table.
 Similarly, in case of create-database, the api call should provide database 
 object for the database being created.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements


[ 
https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092514#comment-14092514
 ] 

Navis commented on HIVE-4064:
-

input3.q.out needed to be updated, but cannot reproduce schemeAuthority.q and 
ql_rewrite_gbtoidx.q (tried with hadoop-1 and hadoop-2).

 Handle db qualified names consistently across all HiveQL statements
 ---

 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Navis
 Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, 
 HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, 
 HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, 
 HIVE-4064.8.patch.txt


 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements


 [ 
https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4064:


Attachment: HIVE-4064.8.patch.txt

 Handle db qualified names consistently across all HiveQL statements
 ---

 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Navis
 Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, 
 HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, 
 HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, 
 HIVE-4064.8.patch.txt


 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6099) Multi insert does not work properly with distinct count


[ 
https://issues.apache.org/jira/browse/HIVE-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092516#comment-14092516
 ] 

Navis commented on HIVE-6099:
-

[~leftylev] You are right, as always. I've confirmed that it's included in 
hive-0.11.0. 
[~ashutoshc] I cannot sure but the optimization seemed not valid. If this will 
not be fixed till next release process, we should disabled it by default.

 Multi insert does not work properly with distinct count
 ---

 Key: HIVE-6099
 URL: https://issues.apache.org/jira/browse/HIVE-6099
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0
Reporter: Pavan Gadam Manohar
Assignee: Navis
  Labels: count, distinct, insert, multi-insert
 Attachments: explain_hive_0.10.0.txt, with_disabled.txt, 
 with_enabled.txt


 Need 2 rows to reproduce this Bug. Here are the steps.
 Step 1) Create a table Table_A
 CREATE EXTERNAL TABLE Table_A
 (
 user string
 , type int
 )
 PARTITIONED BY (dt string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
  STORED AS RCFILE
 LOCATION '/hive/path/Table_A';
 Step 2) Scenario: Lets us say consider user tommy belong to both usertypes 
 111 and 123. Insert 2 records into the table created above.
 select * from  Table_A;
 hive  select * from table_a;
 OK
 tommy   123 2013-12-02
 tommy   111 2013-12-02
 Step 3) Create 2 destination tables to simulate multi-insert.
 CREATE EXTERNAL TABLE dest_Table_A
 (
 p_date string
 , Distinct_Users int
 , Type111Users int
 , Type123Users int
 )
 PARTITIONED BY (dt string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
  STORED AS RCFILE
 LOCATION '/hive/path/dest_Table_A';
  
 CREATE EXTERNAL TABLE dest_Table_B
 (
 p_date string
 , Distinct_Users int
 , Type111Users int
 , Type123Users int
 )
 PARTITIONED BY (dt string)
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
  STORED AS RCFILE
 LOCATION '/hive/path/dest_Table_B';
 Step 4) Multi insert statement
 from Table_A a
 INSERT OVERWRITE TABLE dest_Table_A PARTITION(dt='2013-12-02')
 select a.dt
 ,count(distinct a.user) as AllDist
 ,count(distinct case when a.type = 111 then a.user else null end) as 
 Type111User
 ,count(distinct case when a.type != 111 then a.user else null end) as 
 Type123User
 group by a.dt
  
 INSERT OVERWRITE TABLE dest_Table_B PARTITION(dt='2013-12-02')
 select a.dt
 ,count(distinct a.user) as AllDist
 ,count(distinct case when a.type = 111 then a.user else null end) as 
 Type111User
 ,count(distinct case when a.type != 111 then a.user else null end) as 
 Type123User
 group by a.dt
 ;
  
 Step 5) Verify results.
 hive  select * from dest_table_a;
 OK
 2013-12-02  2   1   1   2013-12-02
 Time taken: 0.116 seconds
 hive  select * from dest_table_b;
 OK
 2013-12-02  2   1   1   2013-12-02
 Time taken: 0.13 seconds
 Conclusion: Hive gives a count of 2 for distinct users although there is 
 only one distinct user. After trying many datasets observed that Hive is 
 doing Type111Users + Typoe123Users = DistinctUsers which is wrong.
 hive select count(distinct a.user) from table_a a;
 Gives:
 Total MapReduce CPU Time Spent: 4 seconds 350 msec
 OK
 1



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24377: HIVE-7142 Hive multi serialization encoding support

2014-08-11 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24377/
---

(Updated Aug. 11, 2014, 7:30 a.m.)


Review request for hive.


Bugs: HIVE-7142
https://issues.apache.org/jira/browse/HIVE-7142


Repository: hive-git


Description
---

Currently Hive only support serialize data into UTF-8 charset bytes or 
deserialize from UTF-8 bytes, real world users may want to load different kinds 
of encoded data into hive directly. This jira is dedicated to support 
serialize/deserialize all kinds of encoded data in SerDe layer.
For user, only need to configure serialization encoding on table level by set 
serialization encoding through serde parameter, for example:
CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES(serialization.encoding='GBK');
or
ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property in 
this patch.


Diffs (updated)
-

  
serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java
 515cf25 
  serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java 179f9b5 
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java b7fb048 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
fb55c70 

Diff: https://reviews.apache.org/r/24377/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Updated] (HIVE-7676) Support more methods in DatabaseMetaData


 [ 
https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7676:
--

Attachment: HIVE-7676.patch

 Support more methods in DatabaseMetaData
 

 Key: HIVE-7676
 URL: https://issues.apache.org/jira/browse/HIVE-7676
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Alexander Pivovarov
 Attachments: HIVE-7676.patch


 I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
 of returning true/false. Many JDBC clients expects implementations for 
 particular methods in order to work. 
 E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions 
 returns true.
 Also hive 0.13.1 supports UNION ALL and does not support UNION
 we can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
 supported exception.
 getIdentifierQuoteString  should return space if not supported.
 http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24377: HIVE-7142 Hive multi serialization encoding support

2014-08-11 Thread chengxiang li



 On Aug. 11, 2014, 4:52 a.m., Brock Noland wrote:
  serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java,
   line 43
  https://reviews.apache.org/r/24377/diff/3/?file=653662#file653662line43
 
  Can we make these constants? serialization.encoding is probably 
  already available somewhere.

add serialization.encoding to serdeConstant class if that's what you mean 
here.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24377/#review50145
---


On Aug. 6, 2014, 9:11 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24377/
 ---
 
 (Updated Aug. 6, 2014, 9:11 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7142
 https://issues.apache.org/jira/browse/HIVE-7142
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently Hive only support serialize data into UTF-8 charset bytes or 
 deserialize from UTF-8 bytes, real world users may want to load different 
 kinds of encoded data into hive directly. This jira is dedicated to support 
 serialize/deserialize all kinds of encoded data in SerDe layer.
 For user, only need to configure serialization encoding on table level by set 
 serialization encoding through serde parameter, for example:
 CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES(serialization.encoding='GBK');
 or
 ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
 LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property 
 in this patch.
 
 
 Diffs
 -
 
   
 serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java 
 PRE-CREATION 
   serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java 
 179f9b5 
   serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java b7fb048 
   serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
 fb55c70 
 
 Diff: https://reviews.apache.org/r/24377/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Updated] (HIVE-7142) Hive multi serialization encoding support


 [ 
https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7142:


Attachment: HIVE-7142.3.patch

 Hive multi serialization encoding support
 -

 Key: HIVE-7142
 URL: https://issues.apache.org/jira/browse/HIVE-7142
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, 
 HIVE-7142.3.patch


 Currently Hive only support serialize data into UTF-8 charset bytes or 
 deserialize from UTF-8 bytes, real world users may want to load different 
 kinds of encoded data into hive directly. This jira is dedicated to support 
 serialize/deserialize all kinds of encoded data in SerDe layer. 
 For user, only need to configure serialization encoding on table level by set 
 serialization encoding through serde parameter, for example:
 {code:sql}
 CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES(serialization.encoding='GBK');
 {code}
 or
 {code:sql}
 ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
 {code}
 LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property 
 in this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7676) Support more methods in DatabaseMetaData


 [ 
https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7676:
--

Description: 
I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
of returning true/false. Many JDBC clients expects implementations for 
particular methods in order to work. 

E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions 
returns true.

Also hive 0.14.0 supports UNION ALL and does not support UNION
We can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
supported exception.

getIdentifierQuoteString  should return space if not supported.
http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29


  was:
I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
of returning true/false. Many JDBC clients expects implementations for 
particular methods in order to work. 

E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions 
returns true.

Also hive 0.13.1 supports UNION ALL and does not support UNION
we can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
supported exception.

getIdentifierQuoteString  should return space if not supported.
http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29



 Support more methods in DatabaseMetaData
 

 Key: HIVE-7676
 URL: https://issues.apache.org/jira/browse/HIVE-7676
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Alexander Pivovarov
 Attachments: HIVE-7676.patch


 I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
 of returning true/false. Many JDBC clients expects implementations for 
 particular methods in order to work. 
 E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions 
 returns true.
 Also hive 0.14.0 supports UNION ALL and does not support UNION
 We can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
 supported exception.
 getIdentifierQuoteString  should return space if not supported.
 http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7676) Support more methods in HiveDatabaseMetaData


 [ 
https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7676:
--

Summary: Support more methods in HiveDatabaseMetaData  (was: Support more 
methods in DatabaseMetaData)

 Support more methods in HiveDatabaseMetaData
 

 Key: HIVE-7676
 URL: https://issues.apache.org/jira/browse/HIVE-7676
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Alexander Pivovarov
 Attachments: HIVE-7676.patch


 I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
 of returning true/false. Many JDBC clients expects implementations for 
 particular methods in order to work. 
 E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions 
 returns true.
 Also hive 0.14.0 supports UNION ALL and does not support UNION
 We can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
 supported exception.
 getIdentifierQuoteString  should return space if not supported.
 http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7532) allow disabling direct sql per query with external metastore


 [ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7532:


Attachment: HIVE-7532.6.patch.txt

 allow disabling direct sql per query with external metastore
 

 Key: HIVE-7532
 URL: https://issues.apache.org/jira/browse/HIVE-7532
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Navis
 Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, 
 HIVE-7532.2.patch.txt, HIVE-7532.3.patch.txt, HIVE-7532.4.patch.txt, 
 HIVE-7532.5.patch.txt, HIVE-7532.6.patch.txt


 Currently with external metastore, direct sql can only be disabled via 
 metastore config globally. Perhaps it makes sense to have the ability to 
 propagate the setting per query from client to override the metastore 
 setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24137: allow disabling direct sql per query with external metastore

2014-08-11 Thread Navis Ryu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24137/
---

(Updated Aug. 11, 2014, 7:37 a.m.)


Review request for hive.


Changes
---

added getMetaConf(), which shows current value of the meta variable.


Bugs: HIVE-7532
https://issues.apache.org/jira/browse/HIVE-7532


Repository: hive-git


Description
---

Currently with external metastore, direct sql can only be disabled via 
metastore config globally. Perhaps it makes sense to have the ability to 
propagate the setting per query from client to override the metastore setting, 
e.g. if one particular query causes it to fail.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 
  common/src/java/org/apache/hadoop/hive/conf/SystemVariables.java ee98d17 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java
 9e416b5 
  metastore/if/hive_metastore.thrift 9e93b95 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
6e689d0 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
85a77d9 
  metastore/src/java/org/apache/hadoop/hive/metastore/IHMSHandler.java 1675751 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
8746c37 
  
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 
c28c46a 
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 
86172b9 
  
metastore/src/java/org/apache/hadoop/hive/metastore/events/ConfigChangeEvent.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 39b032e 
  ql/src/java/org/apache/hadoop/hive/ql/processors/SetProcessor.java 2baa24a 
  ql/src/test/queries/clientpositive/set_metaconf.q PRE-CREATION 
  ql/src/test/results/clientpositive/set_metaconf.q.out PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
4c3164e 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
b39d64d 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
c2f0495 

Diff: https://reviews.apache.org/r/24137/diff/


Testing
---


Thanks,

Navis Ryu

[jira] [Commented] (HIVE-6806) CREATE TABLE should support STORED AS AVRO

2014-08-11 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092534#comment-14092534
 ] 

Lefty Leverenz commented on HIVE-6806:
--

[~singhashish], why did you outdent union1 to bytes1 in the examples?  I 
aligned them with the rest of the data types, then indented all of them two 
more spaces to make STORED AS AVRO stand out -- but if you wanted the outdent, 
please revert my changes or ask me to do it.

Also, your example in Hive 0.14 and later versions under Creating 
Avro-backed Hive tables is identical to the one you added to the code block in 
All Hive versions just before it -- was that deliberate, or an editing 
artifact?  It seems to me the Hive 0.14 example in All Hive versions isn't 
necessary, but I left it in for now.

Please review my changes, because I moved some information around.

* [Avro SerDe | https://cwiki.apache.org/confluence/display/Hive/AvroSerDe]

 CREATE TABLE should support STORED AS AVRO
 --

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro, TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6806.1.patch, HIVE-6806.2.patch, HIVE-6806.3.patch, 
 HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24445: HIVE-7642, Set hive input format by configuration.

2014-08-11 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24445/
---

(Updated Aug. 11, 2014, 7:45 a.m.)


Review request for hive, Brock Noland and Szehon Ho.


Bugs: HIVE-7642
https://issues.apache.org/jira/browse/HIVE-7642


Repository: hive-git


Description
---

Currently hive input format is hard coded as HiveInputFormat, we should set 
this parameter from configuration.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
45eff67 

Diff: https://reviews.apache.org/r/24445/diff/


Testing
---


Thanks,

chengxiang li

[jira] [Updated] (HIVE-7642) Set hive input format by configuration.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7642:


Attachment: HIVE-7642.2-spark.patch

 Set hive input format by configuration.[Spark Branch]
 -

 Key: HIVE-7642
 URL: https://issues.apache.org/jira/browse/HIVE-7642
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7642.1-spark.patch, HIVE-7642.2-spark.patch


 Currently hive input format is hard coded as HiveInputFormat, we should set 
 this parameter from configuration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6329) Support column level encryption/decryption


 [ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6329:


Description: 
Receiving some requirements on encryption recently but hive is not supporting 
it. Before the full implementation via HIVE-5207, this might be useful for some 
cases.

{noformat}
hive create table encode_test(id int, name STRING, phone STRING, address 
STRING) 
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
STORED AS TEXTFILE;
OK
Time taken: 0.584 seconds
hive insert into table encode_test select 100,'navis','010--','Seoul, 
Seocho' from src tablesample (1 rows);
..
OK
Time taken: 5.121 seconds
hive select * from encode_test;
OK
100 navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
Time taken: 0.078 seconds, Fetched: 1 row(s)
hive 
{noformat}

  was:
Receiving some requirements on encryption recently but hive is not supporting 
it. Before the full implementation via HIVE-5207, this might be useful for some 
cases.

{noformat}
hive create table encode_test(id int, name STRING, phone STRING, address 
STRING) 
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 
'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
STORED AS TEXTFILE;
OK
Time taken: 0.584 seconds
hive insert into table encode_test select 100,'navis','010--','Seoul, 
Seocho' from src tablesample (1 rows);
..
OK
Time taken: 5.121 seconds
hive select * from encode_test;
OK
100 navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
Time taken: 0.078 seconds, Fetched: 1 row(s)
hive 
{noformat}


 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, 
 HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, 
 HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7142) Hive multi serialization encoding support


[ 
https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092539#comment-14092539
 ] 

Navis commented on HIVE-7142:
-

I think this can be implemented on HIVE-6329. Seemed need some more check 
(decoding should be applied on strings only, for example).

 Hive multi serialization encoding support
 -

 Key: HIVE-7142
 URL: https://issues.apache.org/jira/browse/HIVE-7142
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, 
 HIVE-7142.3.patch


 Currently Hive only support serialize data into UTF-8 charset bytes or 
 deserialize from UTF-8 bytes, real world users may want to load different 
 kinds of encoded data into hive directly. This jira is dedicated to support 
 serialize/deserialize all kinds of encoded data in SerDe layer. 
 For user, only need to configure serialization encoding on table level by set 
 serialization encoding through serde parameter, for example:
 {code:sql}
 CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES(serialization.encoding='GBK');
 {code}
 or
 {code:sql}
 ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
 {code}
 LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property 
 in this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7623) hive partition rename fails if filesystem cache is disabled


 [ 
https://issues.apache.org/jira/browse/HIVE-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7623:


Attachment: HIVE-7623.1.patch.txt

 hive partition rename fails if filesystem cache is disabled
 ---

 Key: HIVE-7623
 URL: https://issues.apache.org/jira/browse/HIVE-7623
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0, 0.13.1
Reporter: agate
 Attachments: HIVE-7623.1.patch.txt


 Seems to be similar issue https://issues.apache.org/jira/browse/HIVE-3815 
 when calling alterPartition (when renaming partitions)
 Setting fs.hdfs.impl.disable.cache=false  and  
 fs.file.impl.disable.cache=falseworks around this problem
 Error:
 =
 2014-08-05 21:46:14,522 ERROR [pool-3-thread-1]: metastore.RetryingHMSHandler 
 (RetryingHMSHandler.java:invoke(143)) - 
 InvalidOperationException(message:table new location 
 hdfs://hadoop-namenode:8020/user/hive/warehouse/sample_logs/XX=AA/YY=123 is 
 on a different file system than the old location 
 hdfs://hadoop-namenode:8020/user/hive/warehouse/sample_logs/XX=AA/YY=456. 
 This operation is not supported)
 at 
 org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartition(HiveAlterHandler.java:361)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:2629)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:2602)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:622)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
 at com.sun.proxy.$Proxy5.rename_partition(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$rename_partition.getResult(ThriftHiveMetastore.java:9057)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$rename_partition.getResult(ThriftHiveMetastore.java:9041)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 Looking at the code 
 apache-hive-0.13.1-src/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
  on line 361 see that its using != to compare filesystem objects 
 // check that src and dest are on the same file system
   if (srcFs != destFs) {
 throw new InvalidOperationException(table new location  + 
 destPath
   +  is on a different file system than the old location 
   + srcPath + . This operation is not supported);
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements

2014-08-11 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092540#comment-14092540
 ] 

Lefty Leverenz commented on HIVE-4064:
--

+1 for the javadocs and code comments

 Handle db qualified names consistently across all HiveQL statements
 ---

 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Navis
 Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, 
 HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, 
 HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, 
 HIVE-4064.8.patch.txt


 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7623) hive partition rename fails if filesystem cache is disabled


 [ 
https://issues.apache.org/jira/browse/HIVE-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7623:


Assignee: Navis
  Status: Patch Available  (was: Open)

Missed this in HIVE-3815

 hive partition rename fails if filesystem cache is disabled
 ---

 Key: HIVE-7623
 URL: https://issues.apache.org/jira/browse/HIVE-7623
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1, 0.13.0
Reporter: agate
Assignee: Navis
 Attachments: HIVE-7623.1.patch.txt


 Seems to be similar issue https://issues.apache.org/jira/browse/HIVE-3815 
 when calling alterPartition (when renaming partitions)
 Setting fs.hdfs.impl.disable.cache=false  and  
 fs.file.impl.disable.cache=falseworks around this problem
 Error:
 =
 2014-08-05 21:46:14,522 ERROR [pool-3-thread-1]: metastore.RetryingHMSHandler 
 (RetryingHMSHandler.java:invoke(143)) - 
 InvalidOperationException(message:table new location 
 hdfs://hadoop-namenode:8020/user/hive/warehouse/sample_logs/XX=AA/YY=123 is 
 on a different file system than the old location 
 hdfs://hadoop-namenode:8020/user/hive/warehouse/sample_logs/XX=AA/YY=456. 
 This operation is not supported)
 at 
 org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartition(HiveAlterHandler.java:361)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:2629)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.rename_partition(HiveMetaStore.java:2602)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:622)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
 at com.sun.proxy.$Proxy5.rename_partition(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$rename_partition.getResult(ThriftHiveMetastore.java:9057)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$rename_partition.getResult(ThriftHiveMetastore.java:9041)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 Looking at the code 
 apache-hive-0.13.1-src/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
  on line 361 see that its using != to compare filesystem objects 
 // check that src and dest are on the same file system
   if (srcFs != destFs) {
 throw new InvalidOperationException(table new location  + 
 destPath
   +  is on a different file system than the old location 
   + srcPath + . This operation is not supported);
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output


[ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092547#comment-14092547
 ] 

Hive QA commented on HIVE-7390:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660905/HIVE-7390.9.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5873 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/248/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/248/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-248/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660905

 Make quote character optional and configurable in BeeLine CSV/TSV output
 

 Key: HIVE-7390
 URL: https://issues.apache.org/jira/browse/HIVE-7390
 Project: Hive
  Issue Type: New Feature
  Components: Clients
Affects Versions: 0.13.1
Reporter: Jim Halfpenny
Assignee: Ferdinand Xu
 Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
 HIVE-7390.4.patch, HIVE-7390.5.patch, HIVE-7390.6.patch, HIVE-7390.7.patch, 
 HIVE-7390.8.patch, HIVE-7390.9.patch, HIVE-7390.patch


 Currently when either the CSV or TSV output formats are used in beeline each 
 column is wrapped in single quotes. Quote wrapping of columns should be 
 optional and the user should be able to choose the character used to wrap the 
 columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7641) INSERT ... SELECT with no source table leads to NPE


 [ 
https://issues.apache.org/jira/browse/HIVE-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7641:


Assignee: Navis
  Status: Patch Available  (was: Open)

 INSERT ... SELECT with no source table leads to NPE
 ---

 Key: HIVE-7641
 URL: https://issues.apache.org/jira/browse/HIVE-7641
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Lenni Kuff
Assignee: Navis
 Attachments: HIVE-7641.1.patch.txt


 When no source table is provided for an INSERT statement Hive fails with NPE. 
 {code}
 0: jdbc:hive2://localhost:11050/default create table test_tbl(i int);
 No rows affected (0.333 seconds)
 0: jdbc:hive2://localhost:11050/default insert into table test_tbl select 1;
 Error: Error while compiling statement: FAILED: NullPointerException null 
 (state=42000,code=4)
 -- Get a NPE even when using incorrect syntax (no TABLE keyword)
 0: jdbc:hive2://localhost:11050/default insert into test_tbl select 1;
 Error: Error while compiling statement: FAILED: NullPointerException null 
 (state=42000,code=4)
 -- Works when a source table is provided
 0: jdbc:hive2://localhost:11050/default insert into table test_tbl select 1 
 from foo;
 No rows affected (5.751 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7641) INSERT ... SELECT with no source table leads to NPE


 [ 
https://issues.apache.org/jira/browse/HIVE-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7641:


Attachment: HIVE-7641.1.patch.txt

 INSERT ... SELECT with no source table leads to NPE
 ---

 Key: HIVE-7641
 URL: https://issues.apache.org/jira/browse/HIVE-7641
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Lenni Kuff
 Attachments: HIVE-7641.1.patch.txt


 When no source table is provided for an INSERT statement Hive fails with NPE. 
 {code}
 0: jdbc:hive2://localhost:11050/default create table test_tbl(i int);
 No rows affected (0.333 seconds)
 0: jdbc:hive2://localhost:11050/default insert into table test_tbl select 1;
 Error: Error while compiling statement: FAILED: NullPointerException null 
 (state=42000,code=4)
 -- Get a NPE even when using incorrect syntax (no TABLE keyword)
 0: jdbc:hive2://localhost:11050/default insert into test_tbl select 1;
 Error: Error while compiling statement: FAILED: NullPointerException null 
 (state=42000,code=4)
 -- Works when a source table is provided
 0: jdbc:hive2://localhost:11050/default insert into table test_tbl select 1 
 from foo;
 No rows affected (5.751 seconds)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-11 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7624:
-

Attachment: HIVE-7624.5-spark.patch

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7653) Hive AvroSerDe does not support circular references in Schema


[ 
https://issues.apache.org/jira/browse/HIVE-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092591#comment-14092591
 ] 

Hive QA commented on HIVE-7653:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660923/HIVE-7653.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5889 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/249/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/249/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-249/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660923

 Hive AvroSerDe does not support circular references in Schema
 -

 Key: HIVE-7653
 URL: https://issues.apache.org/jira/browse/HIVE-7653
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Sachin Goyal
 Attachments: HIVE-7653.1.patch, HIVE-7653.2.patch


 Avro allows nullable circular references but Hive AvroSerDe does not.
 Example of circular references (passing in Avro but failing in AvroSerDe):
 {code}
 class AvroCycleParent {
   AvroCycleChild child;
   public AvroCycleChild getChild () {return child;}
   public void setChild (AvroCycleChild child) {this.child = child;}
 }
 class AvroCycleChild {
   AvroCycleParent parent;
   public AvroCycleParent getParent () {return parent;}
   public void setParent (AvroCycleParent parent) {this.parent = parent;}
 }
 {code}
 Due to this discrepancy, Hive is unable to read Avro records having 
 circular-references. For some third-party code with such references, it 
 becomes very hard to directly serialize it with Avro and use in Hive.
 I have a patch for this with a unit-test and I will submit it shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7661) Observed performance issues while sorting using Hive's Parallel Order by clause while retaining pre-existing sort order.


[ 
https://issues.apache.org/jira/browse/HIVE-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092595#comment-14092595
 ] 

Navis commented on HIVE-7661:
-

[~vishal.kamath] Thinking of implementing something like a 
InputSampler.SplitSampler. Would it be helpful for this case?

 Observed performance issues while sorting using Hive's Parallel Order by 
 clause while retaining pre-existing sort order.
 

 Key: HIVE-7661
 URL: https://issues.apache.org/jira/browse/HIVE-7661
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.12.0
 Environment: Cloudera 5.0
 hive-0.12.0-cdh5.0.0
 Red Hat Linux
Reporter: Vishal Kamath
  Labels: performance
 Fix For: 0.12.1


 Improve Hive's sampling logic to accommodate use cases that require to retain 
 the pre existing sort in the underlying source table. 
 In order to support Parallel order by clause, Hive Samples the source table 
 based on values provided to hive.optimize.sampling.orderby.number and 
 hive.optimize.sampling.orderby.percent. 
 This does work with reasonable performance when sorting is performed on a 
 columns having random distribution of data but has severe performance issues 
 when retaining the sort order. 
 Let us try to understand this with an example. 
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_orderkey, l_partkey, l_suppkey;
 Sample data set for lineitem table. The first column represents the 
 l_orderKey and is sorted.
  
 l_orderkey|l_partkey|l_suppkey|l_linenumber|l_quantity|l_extendedprice|l_discount|l_tax|l_returnflag|l_linestatus|l_shipdate|l_commitdate|l_receiptdate|l_shipinstruct|l_shipmode|l_comment
 197|1771022|96040|2|8|8743.52|0.09|0.02|A|F|1995-04-17|1995-07-01|1995-0
 197|1771022|96040|2|8|4-27|DELIVER IN PERSON|SHIP|y blithely even 
 197|1771022|96040|2|8|deposits. blithely fina|
 197|1558290|83306|3|17|22919.74|0.06|0.02|N|O|1995-08-02|1995-06-23|1995
 197|1558290|83306|3|17|-08-03|COLLECT COD|REG AIR|ts. careful|
 197|179355|29358|4|25|35858.75|0.04|0.01|N|F|1995-06-13|1995-05-23|1995-
 197|179355|29358|4|25|06-24|TAKE BACK RETURN|FOB|s-- quickly final 
 197|179355|29358|4|25|accounts|
 197|414653|39658|5|14|21946.82|0.09|0.01|R|F|1995-05-08|1995-05-24|1995-
 197|414653|39658|5|14|05-12|TAKE BACK RETURN|RAIL|use slyly slyly silent 
 197|414653|39658|5|14|depo|
 197|1058800|8821|6|1|1758.75|0.07|0.05|N|O|1995-07-15|1995-06-21|1995-08
 197|1058800|8821|6|1|-11|COLLECT COD|RAIL| even, thin dependencies sno|
 198|560609|60610|1|33|55096.14|0.07|0.02|N|O|1998-01-05|1998-03-20|1998-
 198|560609|60610|1|33|01-10|TAKE BACK RETURN|TRUCK|carefully caref|
 198|152287|77289|2|20|26785.60|0.03|0.00|N|O|1998-01-15|1998-03-31|1998-
 198|152287|77289|2|20|01-25|DELIVER IN PERSON|FOB|carefully final 
 198|152287|77289|2|20|escapades a|
 224|1899665|74720|3|41|68247.37|0.07|0.04|A|F|1994-09-01|1994-09-15|1994
 224|1899665|74720|3|41|-09-02|TAKE BACK RETURN|SHIP|after the furiou|
 When we try to either sort on a presorted column or do a multi-column sort 
 while trying to retain the sort order on the source table,
 Source table lineitem has 600 million rows. 
 We don't see equal distribution of data to the reducers. Out of 100 reducers, 
 99 complete in less than 40 seconds. The last reducer is doing the bulk of 
 the work processing nearly 570 million rows. 
 So, let us understand what is going wrong here ..
 on a table having 600 million records with orderkey column sorted, i created 
 temp table with 10% sampling.  
 insert overwrite table sampTempTbl (select * from lineitem tablesample (10 
 percent) t);
 select min(l_orderkey), max(l_orderkey) from sampTempTbl ;
 12306309,142321700
 where as on the source table, the orderkey range (select min(l_orderkey), 
 max(l_orderkey) from lineitem)  is 1 and 6  
 So naturally bulk of the records will be directed towards single reducer. 
 One way to work around this problem is to increase the 
 hive.optimize.sampling.orderby.number to a larger value (as close as the # 
 rows in the input source table). But then we will have to provide higher heap 
 (hive-env.sh) for hive, otherwise it will fail while creating the Sampling 
 Data. With larger data volume, it is not practical to sample the entire data 
 set. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7642) Set hive input format by configuration.[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092598#comment-14092598
 ] 

Hive QA commented on HIVE-7642:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660946/HIVE-7642.2-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5856 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/27/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/27/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-27/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660946

 Set hive input format by configuration.[Spark Branch]
 -

 Key: HIVE-7642
 URL: https://issues.apache.org/jira/browse/HIVE-7642
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7642.1-spark.patch, HIVE-7642.2-spark.patch


 Currently hive input format is hard coded as HiveInputFormat, we should set 
 this parameter from configuration.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark


[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092606#comment-14092606
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660957/HIVE-7624.5-spark.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/28/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/28/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-28/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-28/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-spark-source ]]
+ [[ ! -d apache-svn-spark-source/.svn ]]
+ [[ ! -d apache-svn-spark-source ]]
+ cd apache-svn-spark-source
+ svn revert -R .
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java'
++ svn status --no-ignore
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target cli/target odbc/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1617233.

At revision 1617233.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660957

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at

[jira] [Commented] (HIVE-7142) Hive multi serialization encoding support


[ 
https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092610#comment-14092610
 ] 

Chengxiang Li commented on HIVE-7142:
-

Hi, [~navis], this jira is trying to support table level configurable encoding, 
i take a look at HIVE-6329, do you mean you want to implement column level 
configurable encoding? If yes, that should be a quite different implementation. 
But that would be valuable as well, and i'm glad to see that.

 Hive multi serialization encoding support
 -

 Key: HIVE-7142
 URL: https://issues.apache.org/jira/browse/HIVE-7142
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, 
 HIVE-7142.3.patch


 Currently Hive only support serialize data into UTF-8 charset bytes or 
 deserialize from UTF-8 bytes, real world users may want to load different 
 kinds of encoded data into hive directly. This jira is dedicated to support 
 serialize/deserialize all kinds of encoded data in SerDe layer. 
 For user, only need to configure serialization encoding on table level by set 
 serialization encoding through serde parameter, for example:
 {code:sql}
 CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES(serialization.encoding='GBK');
 {code}
 or
 {code:sql}
 ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
 {code}
 LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property 
 in this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7675) Implement native HiveMapFunction


 [ 
https://issues.apache.org/jira/browse/HIVE-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li reassigned HIVE-7675:
---

Assignee: Chengxiang Li

 Implement native HiveMapFunction
 

 Key: HIVE-7675
 URL: https://issues.apache.org/jira/browse/HIVE-7675
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 Currently, Hive on Spark depend on ExecMapper to execute operator logic, full 
 stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. 
 HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several 
 problems as following:
 # ExecMapper is designed for MR single process task mode, it does not work 
 well under Spark multi-thread task node.
 # ExecMapper introduce extra API level restriction and process logic.
 We need implement native HiveMapFunction, as the bridge between Spark 
 framework and Hive operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-08-11 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092622#comment-14092622
 ] 

Vaibhav Gumashta commented on HIVE-6847:


I'll update the tests - errors seem related. The TestScratchDir tests are for 
the older HS2 scratch dir code.

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 23320: HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-11 Thread Vaibhav Gumashta


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23320/#review50174
---



service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
https://reviews.apache.org/r/23320/#comment87765

I'll move this back to HiveSessionImpl#open as this won't pick the doAs 
setting since open goes through the appropriate proxy (which has UGI.doAs).


- Vaibhav Gumashta


On Aug. 6, 2014, 4:11 p.m., Vaibhav Gumashta wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23320/
 ---
 
 (Updated Aug. 6, 2014, 4:11 p.m.)
 
 
 Review request for hive, Navis Ryu, Sushanth Sowmyan, Szehon Ho, and Thejas 
 Nair.
 
 
 Bugs: HIVE-7353
 https://issues.apache.org/jira/browse/HIVE-7353
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 https://issues.apache.org/jira/browse/HIVE-7353
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8490558 
   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
 ff282c5 
   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 760777a 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ebf2443 
   service/src/java/org/apache/hive/service/cli/CLIService.java 80d7b82 
   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
 de54ca1 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 b39d64d 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 c2f0495 
   
 service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
  b009a88 
   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
 be2eb01 
   
 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
 98d75b5 
   
 service/src/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
  PRE-CREATION 
   
 service/src/java/org/apache/hive/service/server/ThreadWithGarbageCleanup.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/23320/diff/
 
 
 Testing
 ---
 
 Manual testing using Yourkit.
 
 
 Thanks,
 
 Vaibhav Gumashta

[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order


[ 
https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092649#comment-14092649
 ] 

Hive QA commented on HIVE-7669:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660933/HIVE-7669.1.patch.txt

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5888 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/250/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/250/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-250/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660933

 parallel order by clause on a string column fails with IOException: Split 
 points are out of order
 -

 Key: HIVE-7669
 URL: https://issues.apache.org/jira/browse/HIVE-7669
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor, SQL
Affects Versions: 0.12.0
 Environment: Hive 0.12.0-cdh5.0.0
 OS: Redhat linux
Reporter: Vishal Kamath
Assignee: Navis
  Labels: orderby
 Attachments: HIVE-7669.1.patch.txt


 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are out of order
 at

[jira] [Commented] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements


[ 
https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092739#comment-14092739
 ] 

Hive QA commented on HIVE-4064:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660939/HIVE-4064.8.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5874 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/251/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/251/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-251/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660939

 Handle db qualified names consistently across all HiveQL statements
 ---

 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Navis
 Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, 
 HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, 
 HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, 
 HIVE-4064.8.patch.txt


 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7648) authorization api should provide table/db object for create table/dbname


[ 
https://issues.apache.org/jira/browse/HIVE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092793#comment-14092793
 ] 

Hive QA commented on HIVE-7648:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660937/HIVE-7648.1.patch

{color:red}ERROR:{color} -1 due to 1074 failed/errored test(s), 5890 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketizedhiveinputformat_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_case_sensitivity
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cast1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_nested_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_pad_convert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_udf1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_union1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_varchar_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_colstats_all_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnarserde_create_shortcut
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl_dp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_tbllvl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_binary
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_boolean
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_double
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_empty_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_long
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_compute_stats_string
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_concatenate_inherit_table_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constprog_dp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constprog_type
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer5

[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore


[ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092873#comment-14092873
 ] 

Hive QA commented on HIVE-7532:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660945/HIVE-7532.6.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5889 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/253/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/253/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-253/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660945

 allow disabling direct sql per query with external metastore
 

 Key: HIVE-7532
 URL: https://issues.apache.org/jira/browse/HIVE-7532
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Navis
 Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, 
 HIVE-7532.2.patch.txt, HIVE-7532.3.patch.txt, HIVE-7532.4.patch.txt, 
 HIVE-7532.5.patch.txt, HIVE-7532.6.patch.txt


 Currently with external metastore, direct sql can only be disabled via 
 metastore config globally. Perhaps it makes sense to have the ability to 
 propagate the setting per query from client to override the metastore 
 setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4629:
---

Assignee: Dong Chen  (was: Shreepadma Venugopalan)

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Dong Chen
 Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
 HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
 HIVE-4629.5.patch, HIVE-4629.6.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs


[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092876#comment-14092876
 ] 

Brock Noland commented on HIVE-4629:


Nice work [~dongc]!!

[~thejas] [~cwsteinbach] you two had some good feedback on the earlier design. 
Can you take a look at the latest patch? [~romainr], I know Hue uses this API, 
do you want to take a look?

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Dong Chen
 Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
 HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
 HIVE-4629.5.patch, HIVE-4629.6.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark


[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092900#comment-14092900
 ] 

Brock Noland commented on HIVE-7624:


Nice work!!

bq. The patch does not appear to apply with p0, p1, or p2

Looks like the patch needs to be rebased.

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24445: HIVE-7642, Set hive input format by configuration.

2014-08-11 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24445/#review50178
---


Thank you so much! We can commit this very, soon. Just two small nits below 
and then we'll commit this.


ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/24445/#comment87773

nit: Can the right hand side here can use StringUtils.isBlank so we can 
avoid the double negative?


https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html#isBlank(java.lang.String)



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
https://reviews.apache.org/r/24445/#comment87774

nit: How about changing this to:

String msg = Failed to load specified input format class: + inpFormat;
LOG.error(msg, e);
throw new HiveException(msg, e);

which might provide better information to our users?


- Brock Noland


On Aug. 11, 2014, 7:45 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24445/
 ---
 
 (Updated Aug. 11, 2014, 7:45 a.m.)
 
 
 Review request for hive, Brock Noland and Szehon Ho.
 
 
 Bugs: HIVE-7642
 https://issues.apache.org/jira/browse/HIVE-7642
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently hive input format is hard coded as HiveInputFormat, we should set 
 this parameter from configuration.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
 45eff67 
 
 Diff: https://reviews.apache.org/r/24445/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

Re: Review Request 24497: HIVE-7629 - Map joins between two parquet tables failing

2014-08-11 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24497/#review50182
---


Thank you very much! Two comments below.


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
https://reviews.apache.org/r/24497/#comment87783

nit: Missing space between if and (



ql/src/test/queries/clientpositive/parquet_join.q
https://reviews.apache.org/r/24497/#comment87782

Can you add comments (start with --) which describe how this reproduces the 
bug?


- Brock Noland


On Aug. 8, 2014, 6:21 a.m., Suma Shivaprasad wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24497/
 ---
 
 (Updated Aug. 8, 2014, 6:21 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7629
 https://issues.apache.org/jira/browse/HIVE-7629
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Map Joins between 2 parquet tables are failing since the Mapper is trying to 
 access the columns of the first table(bigger table) while trying to load the  
 second table(smaller map join table). Fixed this by adding a guard on the 
 column indexes passed by hive
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java 
 2f155f6 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
  d6be4bd 
   ql/src/test/queries/clientpositive/parquet_join.q PRE-CREATION 
   ql/src/test/results/clientpositive/parquet_join.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24497/diff/
 
 
 Testing
 ---
 
 parquet_join.q covers most types of joins between 2 parquet tables - Normal, 
 Map join, SMB join
 
 
 Thanks,
 
 Suma Shivaprasad

[jira] [Commented] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported

2014-08-11 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092991#comment-14092991
 ] 

Ashutosh Chauhan commented on HIVE-7160:


I think design issue here is one raised in HIVE-7632 {{Vectorizer}} currently 
inserts casts and than evaluates it so that types of all operands match for 
UDF. It does so because currently Hive doesnt upcast operands while it does 
semantic checking and leave this to runtime where it is achieved, mainly via 
the logic in {{GenericUDFBaseNumeric}} Instead of delegating type casting to 
runtime, this should happen at compile time, when we are doing type checking 
and should upcast operands as necessary. Once we do this in 
{{TypeCheckProcFactory}} there will be no need to insert and evaluate cast 
later in compilation (like vectorizer) or runtime (GenericUDFOpNumeric)

 Vectorization Udf: GenericUDFConcat for non-string columns input, is not 
 supported
 --

 Key: HIVE-7160
 URL: https://issues.apache.org/jira/browse/HIVE-7160
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7160.1.patch.txt


 simple UDF missing vectorization - simple example would be 
 hive explain select concat( l_orderkey, ' msecs') from lineitem;
 is not vectorized while
 hive  explain select concat(cast(l_orderkey as string), ' msecs') from 
 lineitem;
 can be vectorized.
 {code}
 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf 
 found for GenericUDFConcat, descriptor: Argument Count = 2, mode = 
 PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = 
 {COLUMN,COLUMN}
 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize
 org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is 
 not supported
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Need help on HIVE-7653 (AvroSerde)

2014-08-11 Thread S G

Hi,

I submitted a patch for the following issue:
https://issues.apache.org/jira/browse/HIVE-7653

But the build is failing due to some other issue.
Its been failing for the past 70 builds or so and I don't think its related
to my change.
Also, my local build is passing.

Can someone please help me override/fix this test-failure?

Thanks
Sachin

[jira] [Updated] (HIVE-4064) Handle db qualified names consistently across all HiveQL statements


 [ 
https://issues.apache.org/jira/browse/HIVE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4064:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks for the contribution [~navis]!
This was some long pending cleanup!


 Handle db qualified names consistently across all HiveQL statements
 ---

 Key: HIVE-4064
 URL: https://issues.apache.org/jira/browse/HIVE-4064
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-4064-1.patch, HIVE-4064.1.patch.txt, 
 HIVE-4064.2.patch.txt, HIVE-4064.3.patch.txt, HIVE-4064.4.patch.txt, 
 HIVE-4064.5.patch.txt, HIVE-4064.6.patch.txt, HIVE-4064.7.patch.txt, 
 HIVE-4064.8.patch.txt


 Hive doesn't consistently handle db qualified names across all HiveQL 
 statements. While some HiveQL statements such as SELECT support DB qualified 
 names, other such as CREATE INDEX doesn't. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7678) add more test cases for tables qualified with database/schema name


 [ 
https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7678:


Description: 
HIVE-4064 fixed many cases where table names qualified with database names 
could not be used (eg db1.table1). The fix needs more test cases.


 add more test cases for tables qualified with database/schema name
 --

 Key: HIVE-7678
 URL: https://issues.apache.org/jira/browse/HIVE-7678
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Thejas M Nair
Assignee: Thejas M Nair

 HIVE-4064 fixed many cases where table names qualified with database names 
 could not be used (eg db1.table1). The fix needs more test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7678) add more test cases for tables qualified with database/schema name

Thejas M Nair created HIVE-7678:
---

 Summary: add more test cases for tables qualified with 
database/schema name
 Key: HIVE-7678
 URL: https://issues.apache.org/jira/browse/HIVE-7678
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Thejas M Nair
Assignee: Thejas M Nair






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7658) Hive search order for hive-site.xml when using --config option


[ 
https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093032#comment-14093032
 ] 

Venki Korukanti commented on HIVE-7658:
---

Hive uses ClassLoader.getResource(hive-site.xml) for finding the path to 
hive-site.xml file. ClassLoader is retrieved using 
Thread.currentThread().getContextClassLoader() which returns a chain of class 
loaders. One of the ClassLoaders in the chain is 
sun.misc.Launcher$AppClassLoader. This particular ClassLoader treats the empty 
entry in ClassPath (example: /path/to/jar1.jar::/path/to/jar2) as current 
working directory of the process (see 
[here|https://community.oracle.com/thread/2456122?start=0tstart=0]). If you 
look at the classpath of the Hive process, there is one such empty entry after 
the hadoop jars and before hive conf dir and hive jars. As the empty entry is 
before hive conf directory, ClassLoader picks up the first occurrence of 
hive-site.xml in current working directory.

Looking at the Hive scripts, adding empty path is an issue in hive scripts it 
self. Following line in {{bin/hive}} script causes an extra : before the hive 
constructed classpath when HADOOP_CLASSPATH is empty. hadoop scripts adds 
another : to its classpath and appends given hive classpath.

{code}
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${CLASSPATH}
{code}


 Hive search order for hive-site.xml when using --config option
 --

 Key: HIVE-7658
 URL: https://issues.apache.org/jira/browse/HIVE-7658
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
 Hive 0.13.0-mapr-1406
 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
 4ff8f8b4a8fc4862727108204399710ef7ee7abc
 Compiled by root on Tue Jul 1 14:18:09 PDT 2014
 From source with checksum 208afc25260342b51aefd2e0edf4c9d6
Reporter: James Spurin
Priority: Minor

 When using the hive cli, the tool appears to favour a hive-site.xml file in 
 the current working directory even if the --config option is used with a 
 valid directory containing a hive-site.xml file.
 I would have expected the directory specified with --config to take 
 precedence in the CLASSPATH search order.
 Here's an example -
 /home/spurija/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example1/value
 /property
 /configuration
 /tmp/hive/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example2/value
 /property
 /configuration
 -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml
 23c23
  value/tmp/example1/value
 ---
  value/tmp/example2/value
 { check the value of scratchdir, should be example 1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { run with a specified config, check the value of scratchdir, should be 
 example2 … still reported as example1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { remove the local config, check the value of scratchdir, should be example2 
 … now correct }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ rm hive-site.xml
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example2
 Is this expected behavior or should it use the directory supplied with 
 --config as the preferred configuration?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7541) Support union all on Spark


 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7541:
--

Attachment: HIVE-7541.2-spark.patch

 Support union all on Spark
 --

 Key: HIVE-7541
 URL: https://issues.apache.org/jira/browse/HIVE-7541
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Na Yang
 Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive 
 on Spark Union All design.pdf


 For union all operator, we will use Spark's union transformation. Refer to 
 the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7658) Hive search order for hive-site.xml when using --config option


 [ 
https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7658:
--

Attachment: HIVE-7658.1.patch

Attached patch resolves the issue by checking whether HADOOP_CLASSPATH is 
non-empty before using it.

 Hive search order for hive-site.xml when using --config option
 --

 Key: HIVE-7658
 URL: https://issues.apache.org/jira/browse/HIVE-7658
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
 Hive 0.13.0-mapr-1406
 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
 4ff8f8b4a8fc4862727108204399710ef7ee7abc
 Compiled by root on Tue Jul 1 14:18:09 PDT 2014
 From source with checksum 208afc25260342b51aefd2e0edf4c9d6
Reporter: James Spurin
Priority: Minor
 Attachments: HIVE-7658.1.patch


 When using the hive cli, the tool appears to favour a hive-site.xml file in 
 the current working directory even if the --config option is used with a 
 valid directory containing a hive-site.xml file.
 I would have expected the directory specified with --config to take 
 precedence in the CLASSPATH search order.
 Here's an example -
 /home/spurija/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example1/value
 /property
 /configuration
 /tmp/hive/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example2/value
 /property
 /configuration
 -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml
 23c23
  value/tmp/example1/value
 ---
  value/tmp/example2/value
 { check the value of scratchdir, should be example 1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { run with a specified config, check the value of scratchdir, should be 
 example2 … still reported as example1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { remove the local config, check the value of scratchdir, should be example2 
 … now correct }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ rm hive-site.xml
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example2
 Is this expected behavior or should it use the directory supplied with 
 --config as the preferred configuration?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HIVE-7658) Hive search order for hive-site.xml when using --config option


 [ 
https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti reassigned HIVE-7658:
-

Assignee: Venki Korukanti

 Hive search order for hive-site.xml when using --config option
 --

 Key: HIVE-7658
 URL: https://issues.apache.org/jira/browse/HIVE-7658
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
 Hive 0.13.0-mapr-1406
 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
 4ff8f8b4a8fc4862727108204399710ef7ee7abc
 Compiled by root on Tue Jul 1 14:18:09 PDT 2014
 From source with checksum 208afc25260342b51aefd2e0edf4c9d6
Reporter: James Spurin
Assignee: Venki Korukanti
Priority: Minor
 Attachments: HIVE-7658.1.patch


 When using the hive cli, the tool appears to favour a hive-site.xml file in 
 the current working directory even if the --config option is used with a 
 valid directory containing a hive-site.xml file.
 I would have expected the directory specified with --config to take 
 precedence in the CLASSPATH search order.
 Here's an example -
 /home/spurija/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example1/value
 /property
 /configuration
 /tmp/hive/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example2/value
 /property
 /configuration
 -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml
 23c23
  value/tmp/example1/value
 ---
  value/tmp/example2/value
 { check the value of scratchdir, should be example 1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { run with a specified config, check the value of scratchdir, should be 
 example2 … still reported as example1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { remove the local config, check the value of scratchdir, should be example2 
 … now correct }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ rm hive-site.xml
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example2
 Is this expected behavior or should it use the directory supplied with 
 --config as the preferred configuration?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7541) Support union all on Spark


[ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093038#comment-14093038
 ] 

Na Yang commented on HIVE-7541:
---

Hi Szehon,

Thank you for the comments. Please review the new patch. 

Thanks,
Na 

 Support union all on Spark
 --

 Key: HIVE-7541
 URL: https://issues.apache.org/jira/browse/HIVE-7541
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Na Yang
 Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive 
 on Spark Union All design.pdf


 For union all operator, we will use Spark's union transformation. Refer to 
 the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete

2014-08-11 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-7646:
-

Attachment: delete.patch

changes to parse DELETE.

 Modify parser to support new grammar for Insert,Update,Delete
 -

 Key: HIVE-7646
 URL: https://issues.apache.org/jira/browse/HIVE-7646
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: delete.patch


 need parser to recognize constructs such as
 INSERT INTO Cust (Customer_Number, Balance, Address)
 VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave');
 DELETE FROM Cust WHERE Balance  5.0;
 UPDATE Cust
 SET column1=value1,column2=value2,...
 WHERE some_column=some_value;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7142) Hive multi serialization encoding support


[ 
https://issues.apache.org/jira/browse/HIVE-7142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093069#comment-14093069
 ] 

Hive QA commented on HIVE-7142:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660943/HIVE-7142.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5873 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/254/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/254/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-254/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660943

 Hive multi serialization encoding support
 -

 Key: HIVE-7142
 URL: https://issues.apache.org/jira/browse/HIVE-7142
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7142.1.patch.txt, HIVE-7142.2.patch, 
 HIVE-7142.3.patch


 Currently Hive only support serialize data into UTF-8 charset bytes or 
 deserialize from UTF-8 bytes, real world users may want to load different 
 kinds of encoded data into hive directly. This jira is dedicated to support 
 serialize/deserialize all kinds of encoded data in SerDe layer. 
 For user, only need to configure serialization encoding on table level by set 
 serialization encoding through serde parameter, for example:
 {code:sql}
 CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES(serialization.encoding='GBK');
 {code}
 or
 {code:sql}
 ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
 {code}
 LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property 
 in this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7541) Support union all on Spark


 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7541:
--

Attachment: (was: HIVE-7541.2-spark.patch)

 Support union all on Spark
 --

 Key: HIVE-7541
 URL: https://issues.apache.org/jira/browse/HIVE-7541
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Na Yang
 Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive 
 on Spark Union All design.pdf


 For union all operator, we will use Spark's union transformation. Refer to 
 the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7541) Support union all on Spark


 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7541:
--

Attachment: HIVE-7541.2-spark.patch

 Support union all on Spark
 --

 Key: HIVE-7541
 URL: https://issues.apache.org/jira/browse/HIVE-7541
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Na Yang
 Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive 
 on Spark Union All design.pdf


 For union all operator, we will use Spark's union transformation. Refer to 
 the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4806) Add more implementations of JDBC API methods to Hive and Hive2 drivers


[ 
https://issues.apache.org/jira/browse/HIVE-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093076#comment-14093076
 ] 

Hive QA commented on HIVE-4806:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12590514/HIVE-4806.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/255/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/255/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-255/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-255/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java'
Reverted 
'serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
hwi/target common/target common/src/gen service/target contrib/target 
serde/target 
serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
Umetastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java
Umetastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Umetastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
Umetastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Umetastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java
Uql/src/java/org/apache/hadoop/hive/ql/plan/ShowGrantDesc.java
Uql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableAlterPartDesc.java
Uql/src/java/org/apache/hadoop/hive/ql/plan/AlterIndexDesc.java
Uql/src/java/org/apache/hadoop/hive/ql/plan/PrivilegeObjectDesc.java
Uql/src/java/org/apache/hadoop/hive/ql/plan/RenamePartitionDesc.java
Uql/src/java/org/apache/hadoop/hive/ql/plan/ShowColumnsDesc.java
Uql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableSimpleDesc.java
Uql/src/java/org/apache/hadoop/hive/ql/parse/ColumnAccessInfo.java
Uql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java
Uql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Uql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
U
ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java
U

[jira] [Commented] (HIVE-7651) Investigate why union two RDDs generated from two MapTrans does not get the right result


[ 
https://issues.apache.org/jira/browse/HIVE-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093083#comment-14093083
 ] 

Na Yang commented on HIVE-7651:
---

This issue was caused by a single jobConf instance are used by multiple 
MapTrans. The fix is included in the patch of HIVE-7541.  

 Investigate why union two RDDs generated from two MapTrans does not get the 
 right result
 

 Key: HIVE-7651
 URL: https://issues.apache.org/jira/browse/HIVE-7651
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Na Yang

 If the SparkWork has two map works as root, then use the current 
 generate(basework) API to generate two mapTran. union the RDDs processed by 
 the two mapTrans does not generate the correct result. 
 If two input RDDs come from different data tables, then the union result is 
 empty.
 if two input RDDs come from the same data table, then the union result is not 
 correct. The same row of data happen 4 times in the union result.
 Need to investigate why this happen and how to fix it.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-7651) Investigate why union two RDDs generated from two MapTrans does not get the right result


 [ 
https://issues.apache.org/jira/browse/HIVE-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang resolved HIVE-7651.
---

Resolution: Implemented
  Assignee: Na Yang

 Investigate why union two RDDs generated from two MapTrans does not get the 
 right result
 

 Key: HIVE-7651
 URL: https://issues.apache.org/jira/browse/HIVE-7651
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang

 If the SparkWork has two map works as root, then use the current 
 generate(basework) API to generate two mapTran. union the RDDs processed by 
 the two mapTrans does not generate the correct result. 
 If two input RDDs come from different data tables, then the union result is 
 empty.
 if two input RDDs come from the same data table, then the union result is not 
 correct. The same row of data happen 4 times in the union result.
 Need to investigate why this happen and how to fix it.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4806) Add more implementations of JDBC API methods to Hive and Hive2 drivers


[ 
https://issues.apache.org/jira/browse/HIVE-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093105#comment-14093105
 ] 

Alexander Pivovarov commented on HIVE-4806:
---

I found at least 2 issues with HIVE-4806.patch

1. getIdentifierQuoteString  returns '
Actually there is a limited support for quoting identifiers in hive.  You can 
quote column names but not database or table names.
It means that IdentifierQuoteString is not fully supported and most probably 
getIdentifierQuoteString should return space (according to JDBC spec).
This method returns a space   if identifier quoting is not supported.
In this case SQL client will generate correct sql statements (without quotes 
for column, table and database names).

2. isReadOnly returns true.  Also method description says Returns a true as 
the database meta data is readonly.
in fact JDBC spec defines this method as Retrieves whether this database is in 
read-only mode.So, it's about database but not about metadata.

In most cases hive databases are NOT readonly.  We can run create table as 
select, insert into table,   insert overwrite
I think isReadOnly should return false.

Look at my patch HIVE-7676





 Add more implementations of JDBC API methods to Hive and Hive2 drivers
 --

 Key: HIVE-4806
 URL: https://issues.apache.org/jira/browse/HIVE-4806
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.11.0
Reporter: Matt Burgess
Assignee: Matt Burgess
 Attachments: HIVE-4806.patch


 Third-party client software such as Pentaho Data Integration (PDI) uses many 
 different JDBC API calls when interacting with JDBC data sources. Several of 
 these calls have not yet been implemented in the Hive and Hive 2 drivers and 
 by default will throw Method not supported SQLExceptions when there could 
 be default implementations instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7679) JOIN operator should update the column stats when number of rows changes

2014-08-11 Thread Prasanth J (JIRA)

Prasanth J created HIVE-7679:


 Summary: JOIN operator should update the column stats when number 
of rows changes
 Key: HIVE-7679
 URL: https://issues.apache.org/jira/browse/HIVE-7679
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor


JOIN operator does not update the column stats when the number of rows changes. 
All other operators scales up/down the column statistics when the number of 
rows changes. Same should be done for JOIN operator as well. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7679) JOIN operator should update the column stats when number of rows changes

2014-08-11 Thread Prasanth J (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasanth J updated HIVE-7679:
-

Description: JOIN operator does not update the column stats when the number
of rows changes. All other operators scales up/down the column statistics when
the number of rows changes. Same should be done for JOIN operator as well.
Because of this dataSize might become negative as numNulls can get bigger than
numRows (if scaling down of column stats is not done). (was: JOIN operator
does not update the column stats when the number of rows changes. All other
operators scales up/down the column statistics when the number of rows changes.
Same should be done for JOIN operator as well. )

JOIN operator should update the column stats when number of rows changes

Key: HIVE-7679
URL: https://issues.apache.org/jira/browse/HIVE-7679
Project: Hive
Issue Type: Sub-task
Components: Query Processor, Statistics
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Minor
Fix For: 0.13.0

JOIN operator does not update the column stats when the number of rows
changes. All other operators scales up/down the column statistics when the
number of rows changes. Same should be done for JOIN operator as well.
Because of this dataSize might become negative as numNulls can get bigger
than numRows (if scaling down of column stats is not done).

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7366) getDatabase using direct sql

2014-08-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093124#comment-14093124
 ] 

Sergey Shelukhin commented on HIVE-7366:


The comment correction for isConfigEnabled is not quite correct, we still use 
it in tx-es if enabled. That code checks 2 config settings.
Can you post an RB?

 getDatabase using direct sql
 

 Key: HIVE-7366
 URL: https://issues.apache.org/jira/browse/HIVE-7366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-7366.patch


 Given that get_database is easily one of the most frequent calls made on the 
 metastore, we should have the ability to bypass datanucleus for that, and use 
 direct SQL instead.
 This was something that I did initially as part of debugging HIVE-7368, but I 
 think that given the frequency of this call, it's useful to have it in 
 mainline direct sql.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7205) Wrong results when union all of grouping followed by group by with correlation optimization

2014-08-11 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093129#comment-14093129
 ] 

Yin Huai commented on HIVE-7205:


Yeah, fixing correctness bug is very important. 

However, the current patch also introduces a significant refactoring of the 
query evaluation path. I am not sure if this refactoring will not break other 
things. [~navis] Can you post a summary of how those operators work with your 
refactoring?

 Wrong results when union all of grouping followed by group by with 
 correlation optimization
 ---

 Key: HIVE-7205
 URL: https://issues.apache.org/jira/browse/HIVE-7205
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0, 0.13.1
Reporter: dima machlin
Assignee: Navis
Priority: Critical
 Attachments: HIVE-7205.1.patch.txt, HIVE-7205.2.patch.txt, 
 HIVE-7205.3.patch.txt


 use case :
 table TBL (a string,b string) contains single row : 'a','a'
 the following query :
 {code:sql}
 select b, sum(cc) from (
 select b,count(1) as cc from TBL group by b
 union all
 select a as b,count(1) as cc from TBL group by a
 ) z
 group by b
 {code}
 returns 
 a 1
 a 1
 while set hive.optimize.correlation=true;
 if we change set hive.optimize.correlation=false;
 it returns correct results : a 2
 The plan with correlation optimization :
 {code:sql}
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM 
 (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR 
 (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL b (TOK_QUERY 
 (TOK_FROM (TOK_TABREF (TOK_TABNAME DB TBL))) (TOK_INSERT (TOK_DESTINATION 
 (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a) b) 
 (TOK_SELEXPR (TOK_FUNCTION count 1) cc)) (TOK_GROUPBY (TOK_TABLE_OR_COL 
 a) z)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT 
 (TOK_SELEXPR (TOK_TABLE_OR_COL b)) (TOK_SELEXPR (TOK_FUNCTION sum 
 (TOK_TABLE_OR_COL cc (TOK_GROUPBY (TOK_TABLE_OR_COL b
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 null-subquery1:z-subquery1:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: b
 type: string
   outputColumnNames: b
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: b
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 0
   value expressions:
 expr: _col1
 type: bigint
 null-subquery2:z-subquery2:TBL 
   TableScan
 alias: TBL
 Select Operator
   expressions:
 expr: a
 type: string
   outputColumnNames: a
   Group By Operator
 aggregations:
   expr: count(1)
 bucketGroup: false
 keys:
   expr: a
   type: string
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
   sort order: +
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: 1
   value expressions:
 expr: _col1
 type: bigint
   Reduce Operator Tree:
 Demux Operator
   Group By Operator
 aggregations:
   expr: count(VALUE._col0)
 bucketGroup: false
 keys:
   expr: KEY._col0
   type: string
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
 expr: _col0
 type: string

[jira] [Commented] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.

2014-08-11 Thread Shivaraju Gowda (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093143#comment-14093143
 ] 

Shivaraju Gowda commented on HIVE-6486:
---


Should that closing curly bracket be included in the doc?

Yes, that would help by making the method self-contained. 

Thanks for checking and documenting it.

 Support secure Subject.doAs() in HiveServer2 JDBC client.
 -

 Key: HIVE-6486
 URL: https://issues.apache.org/jira/browse/HIVE-6486
 Project: Hive
  Issue Type: Improvement
  Components: Authentication, HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Shivaraju Gowda
Assignee: Shivaraju Gowda
 Fix For: 0.13.0

 Attachments: HIVE-6486.1.patch, HIVE-6486.2.patch, HIVE-6486.3.patch, 
 HIVE-6486_Hive0.11.patch, TestCase_HIVE-6486.java


 HIVE-5155 addresses the problem of kerberos authentication in multi-user 
 middleware server using proxy user.  In this mode the principal used by the 
 middle ware server has privileges to impersonate selected users in 
 Hive/Hadoop. 
 This enhancement is to support Subject.doAs() authentication in  Hive JDBC 
 layer so that the end users Kerberos Subject is passed through in the middle 
 ware server. With this improvement there won't be any additional setup in the 
 server to grant proxy privileges to some users and there won't be need to 
 specify a proxy user in the JDBC client. This version should also be more 
 secure since it won't require principals with the privileges to impersonate 
 other users in Hive/Hadoop setup.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7366) getDatabase using direct sql


 [ 
https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-7366:
---

Status: Open  (was: Patch Available)

Unsetting patch-available, since some of the errors reported are relevant to 
this patch. Looking into it.

 getDatabase using direct sql
 

 Key: HIVE-7366
 URL: https://issues.apache.org/jira/browse/HIVE-7366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-7366.patch


 Given that get_database is easily one of the most frequent calls made on the 
 metastore, we should have the ability to bypass datanucleus for that, and use 
 direct SQL instead.
 This was something that I did initially as part of debugging HIVE-7368, but I 
 think that given the frequency of this call, it's useful to have it in 
 mainline direct sql.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7616) pre-size mapjoin hashtable based on statistics

2014-08-11 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7616:
---

Attachment: HIVE-7616.07.patch

fix forgotten test output

 pre-size mapjoin hashtable based on statistics
 --

 Key: HIVE-7616
 URL: https://issues.apache.org/jira/browse/HIVE-7616
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-7616.01.patch, HIVE-7616.02.patch, 
 HIVE-7616.03.patch, HIVE-7616.04.patch, HIVE-7616.05.patch, 
 HIVE-7616.06.patch, HIVE-7616.07.patch, HIVE-7616.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7366) getDatabase using direct sql


[ 
https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093156#comment-14093156
 ] 

Sushanth Sowmyan commented on HIVE-7366:


Will do. I still need to update the patch a bit, and will upload it to rb with 
that.

 getDatabase using direct sql
 

 Key: HIVE-7366
 URL: https://issues.apache.org/jira/browse/HIVE-7366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-7366.patch


 Given that get_database is easily one of the most frequent calls made on the 
 metastore, we should have the ability to bypass datanucleus for that, and use 
 direct SQL instead.
 This was something that I did initially as part of debugging HIVE-7368, but I 
 think that given the frequency of this call, it's useful to have it in 
 mainline direct sql.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore

2014-08-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093157#comment-14093157
 ] 

Sergey Shelukhin commented on HIVE-7532:


+1

 allow disabling direct sql per query with external metastore
 

 Key: HIVE-7532
 URL: https://issues.apache.org/jira/browse/HIVE-7532
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Navis
 Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, 
 HIVE-7532.2.patch.txt, HIVE-7532.3.patch.txt, HIVE-7532.4.patch.txt, 
 HIVE-7532.5.patch.txt, HIVE-7532.6.patch.txt


 Currently with external metastore, direct sql can only be disabled via 
 metastore config globally. Perhaps it makes sense to have the ability to 
 propagate the setting per query from client to override the metastore 
 setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7541) Support union all on Spark

2014-08-11 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093168#comment-14093168
 ] 

Szehon Ho commented on HIVE-7541:
-

Thanks , can you upload the new patch to the review board too so its easier to 
look ?

 Support union all on Spark
 --

 Key: HIVE-7541
 URL: https://issues.apache.org/jira/browse/HIVE-7541
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Na Yang
 Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, Hive 
 on Spark Union All design.pdf


 For union all operator, we will use Spark's union transformation. Refer to 
 the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

ArrayWritableGroupConverter

2014-08-11 Thread Raymond Lau

Hi, I was just wondering how come the field count has to be either 1 or 2?
 I'm trying to read a column where the amount is fields is 3 and I'm
getting an invalid parquet hive schema (in hive 0.12) error when I try to
do so.  It looks like it links back to here.

*https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java*


Thanks,
-Raymond

[jira] [Updated] (HIVE-7658) Hive search order for hive-site.xml when using --config option


 [ 
https://issues.apache.org/jira/browse/HIVE-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7658:
--

Status: Patch Available  (was: Open)

 Hive search order for hive-site.xml when using --config option
 --

 Key: HIVE-7658
 URL: https://issues.apache.org/jira/browse/HIVE-7658
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
 Environment: Red Hat Enterprise Linux Server release 5.9 (Tikanga)
 Hive 0.13.0-mapr-1406
 Subversion git://rhbuild/root/builds/opensource/node/ecosystem/dl/hive -r 
 4ff8f8b4a8fc4862727108204399710ef7ee7abc
 Compiled by root on Tue Jul 1 14:18:09 PDT 2014
 From source with checksum 208afc25260342b51aefd2e0edf4c9d6
Reporter: James Spurin
Assignee: Venki Korukanti
Priority: Minor
 Attachments: HIVE-7658.1.patch


 When using the hive cli, the tool appears to favour a hive-site.xml file in 
 the current working directory even if the --config option is used with a 
 valid directory containing a hive-site.xml file.
 I would have expected the directory specified with --config to take 
 precedence in the CLASSPATH search order.
 Here's an example -
 /home/spurija/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example1/value
 /property
 /configuration
 /tmp/hive/hive-site.xml =
 configuration
 property
 namehive.exec.local.scratchdir/name
 value/tmp/example2/value
 /property
 /configuration
 -bash-4.1$ diff /home/spurija/hive-site.xml /tmp/hive/hive-site.xml
 23c23
  value/tmp/example1/value
 ---
  value/tmp/example2/value
 { check the value of scratchdir, should be example 1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { run with a specified config, check the value of scratchdir, should be 
 example2 … still reported as example1 }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example1
 { remove the local config, check the value of scratchdir, should be example2 
 … now correct }
 -bash-4.1$ pwd
 /home/spurija
 -bash-4.1$ rm hive-site.xml
 -bash-4.1$ hive --config /tmp/hive
 Logging initialized using configuration in 
 jar:file:/opt/mapr/hive/hive-0.13/lib/hive-common-0.13.0-mapr-1405.jar!/hive-log4j.properties
 hive set hive.exec.local.scratchdir;
 hive.exec.local.scratchdir=/tmp/example2
 Is this expected behavior or should it use the directory supplied with 
 --config as the preferred configuration?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24377: HIVE-7142 Hive multi serialization encoding support

2014-08-11 Thread Brock Noland



 On Aug. 11, 2014, 4:52 a.m., Brock Noland wrote:
  serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java,
   line 43
  https://reviews.apache.org/r/24377/diff/3/?file=653662#file653662line43
 
  Can we make these constants? serialization.encoding is probably 
  already available somewhere.
 
 chengxiang li wrote:
 add serialization.encoding to serdeConstant class if that's what you 
 mean here.

That file is auto-generated. In order to add the constant their, you'll have to 
edit serde/if/serde.thrift and then re-generate.


- Brock


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24377/#review50145
---


On Aug. 11, 2014, 7:30 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24377/
 ---
 
 (Updated Aug. 11, 2014, 7:30 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7142
 https://issues.apache.org/jira/browse/HIVE-7142
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently Hive only support serialize data into UTF-8 charset bytes or 
 deserialize from UTF-8 bytes, real world users may want to load different 
 kinds of encoded data into hive directly. This jira is dedicated to support 
 serialize/deserialize all kinds of encoded data in SerDe layer.
 For user, only need to configure serialization encoding on table level by set 
 serialization encoding through serde parameter, for example:
 CREATE TABLE person(id INT, name STRING, desc STRING)ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
 SERDEPROPERTIES(serialization.encoding='GBK');
 or
 ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); 
 LIMITATIONS: Only LazySimpleSerDe support serialization.encoding property 
 in this patch.
 
 
 Diffs
 -
 
   
 serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java
  515cf25 
   
 serde/src/java/org/apache/hadoop/hive/serde2/AbstractEncodingAwareSerDe.java 
 PRE-CREATION 
   serde/src/java/org/apache/hadoop/hive/serde2/DelimitedJSONSerDe.java 
 179f9b5 
   serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java b7fb048 
   serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
 fb55c70 
 
 Diff: https://reviews.apache.org/r/24377/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Updated] (HIVE-7678) add more test cases for tables qualified with database/schema name


 [ 
https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7678:


Attachment: HIVE-7678.1.patch

 add more test cases for tables qualified with database/schema name
 --

 Key: HIVE-7678
 URL: https://issues.apache.org/jira/browse/HIVE-7678
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7678.1.patch


 HIVE-4064 fixed many cases where table names qualified with database names 
 could not be used (eg db1.table1). The fix needs more test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7678) add more test cases for tables qualified with database/schema name


 [ 
https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7678:


Status: Patch Available  (was: Open)

 add more test cases for tables qualified with database/schema name
 --

 Key: HIVE-7678
 URL: https://issues.apache.org/jira/browse/HIVE-7678
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7678.1.patch


 HIVE-4064 fixed many cases where table names qualified with database names 
 could not be used (eg db1.table1). The fix needs more test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7678) add more test cases for tables qualified with database/schema name


[ 
https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093201#comment-14093201
 ] 

Thejas M Nair commented on HIVE-7678:
-

Added test cases for 'show partition', 'show table properties', msck.

But I found parse issues in several alter table commands when qualified table 
names are used, I will open another jira for that.


 add more test cases for tables qualified with database/schema name
 --

 Key: HIVE-7678
 URL: https://issues.apache.org/jira/browse/HIVE-7678
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7678.1.patch


 HIVE-4064 fixed many cases where table names qualified with database names 
 could not be used (eg db1.table1). The fix needs more test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7678) add more test cases for tables qualified with database/schema name


[ 
https://issues.apache.org/jira/browse/HIVE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093203#comment-14093203
 ] 

Thejas M Nair commented on HIVE-7678:
-

Some of the tests are taken from HIVE-3589.


 add more test cases for tables qualified with database/schema name
 --

 Key: HIVE-7678
 URL: https://issues.apache.org/jira/browse/HIVE-7678
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7678.1.patch


 HIVE-4064 fixed many cases where table names qualified with database names 
 could not be used (eg db1.table1). The fix needs more test cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7680) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false)

Alexander Pivovarov created HIVE-7680:
-

 Summary: Do not throw SQLException for HiveStatement 
getMoreResults and setEscapeProcessing(false)
 Key: HIVE-7680
 URL: https://issues.apache.org/jira/browse/HIVE-7680
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Alexander Pivovarov
Priority: Minor


1. Some JDBC clients call method setEscapeProcessing(false)  (e.g. SQL 
Workbench)
Looks like setEscapeProcessing(false) should do nothing.So, lets do  nothing 
instead of throwing SQLException

2. getMoreResults is needed in case Statements returns several ReseltSet.
Hive does not support Multiple ResultSets. So this method can safely always 
return false.

3. getUpdateCount. Currently this method always returns 0. Hive cannot tell us 
how many rows were inserted. According to JDBC spec it should return  -1 if 
the current result is a ResultSet object or there are no more results 

if this method returns 0 then in case of execution insert statement JDBC client 
shows 0 rows were inserted which is not true.
if this method returns -1 then JDBC client runs insert statements and  shows 
that it was executed successfully, no result were returned. 
I think the latter behaviour is more correct.

4. Some methods in Statement class should throw SQLFeatureNotSupportedException 
if they are not supported.  Current implementation throws SQLException instead 
which means database access error.







--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7681) qualified tablenames usage does not work with several alter-table commands

Thejas M Nair created HIVE-7681:
---

 Summary: qualified tablenames usage does not work with several 
alter-table commands
 Key: HIVE-7681
 URL: https://issues.apache.org/jira/browse/HIVE-7681
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair


Changes were made in HIVE-4064 for use of qualified table names in more types 
of queries. But several alter table commands don't work with qualified 
- alter table default.tmpfoo set tblproperties (bar = bar value)
- ALTER TABLE default.kv_rename_test CHANGE a a STRING
- add,drop partition
- alter index rebuild





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7627:
---

Description: 
Hive table statistic failed on FSStatsPublisher mode, with the following 
exception in Spark executor side:
{noformat}
14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception
java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 
20278 for file 
/tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
at 
org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID 
mismatch. Request id and saved id: 20277 , 20278 for file 
/tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0
at 
org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

[jira] [Updated] (HIVE-7680) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false)


 [ 
https://issues.apache.org/jira/browse/HIVE-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7680:
--

Attachment: HIVE-7680.patch

 Do not throw SQLException for HiveStatement getMoreResults and 
 setEscapeProcessing(false)
 -

 Key: HIVE-7680
 URL: https://issues.apache.org/jira/browse/HIVE-7680
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-7680.patch


 1. Some JDBC clients call method setEscapeProcessing(false)  (e.g. SQL 
 Workbench)
 Looks like setEscapeProcessing(false) should do nothing.So, lets do  nothing 
 instead of throwing SQLException
 2. getMoreResults is needed in case Statements returns several ReseltSet.
 Hive does not support Multiple ResultSets. So this method can safely always 
 return false.
 3. getUpdateCount. Currently this method always returns 0. Hive cannot tell 
 us how many rows were inserted. According to JDBC spec it should return  -1 
 if the current result is a ResultSet object or there are no more results 
 if this method returns 0 then in case of execution insert statement JDBC 
 client shows 0 rows were inserted which is not true.
 if this method returns -1 then JDBC client runs insert statements and  shows 
 that it was executed successfully, no result were returned. 
 I think the latter behaviour is more correct.
 4. Some methods in Statement class should throw 
 SQLFeatureNotSupportedException if they are not supported.  Current 
 implementation throws SQLException instead which means database access error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7680) Do not throw SQLException for HiveStatement getMoreResults and setEscapeProcessing(false)


 [ 
https://issues.apache.org/jira/browse/HIVE-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7680:
--

Status: Patch Available  (was: Open)

 Do not throw SQLException for HiveStatement getMoreResults and 
 setEscapeProcessing(false)
 -

 Key: HIVE-7680
 URL: https://issues.apache.org/jira/browse/HIVE-7680
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-7680.patch


 1. Some JDBC clients call method setEscapeProcessing(false)  (e.g. SQL 
 Workbench)
 Looks like setEscapeProcessing(false) should do nothing.So, lets do  nothing 
 instead of throwing SQLException
 2. getMoreResults is needed in case Statements returns several ReseltSet.
 Hive does not support Multiple ResultSets. So this method can safely always 
 return false.
 3. getUpdateCount. Currently this method always returns 0. Hive cannot tell 
 us how many rows were inserted. According to JDBC spec it should return  -1 
 if the current result is a ResultSet object or there are no more results 
 if this method returns 0 then in case of execution insert statement JDBC 
 client shows 0 rows were inserted which is not true.
 if this method returns -1 then JDBC client runs insert statements and  shows 
 that it was executed successfully, no result were returned. 
 I think the latter behaviour is more correct.
 4. Some methods in Statement class should throw 
 SQLFeatureNotSupportedException if they are not supported.  Current 
 implementation throws SQLException instead which means database access error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7676) Support more methods in HiveDatabaseMetaData


 [ 
https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7676:
--

Status: Patch Available  (was: Open)

 Support more methods in HiveDatabaseMetaData
 

 Key: HIVE-7676
 URL: https://issues.apache.org/jira/browse/HIVE-7676
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Alexander Pivovarov
 Attachments: HIVE-7676.patch


 I noticed that some methods in HiveDatabaseMetaData throws exceptions instead 
 of returning true/false. Many JDBC clients expects implementations for 
 particular methods in order to work. 
 E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions 
 returns true.
 Also hive 0.14.0 supports UNION ALL and does not support UNION
 We can indicate this in HiveDatabaseMetaData instead of throwing Method Not 
 supported exception.
 getIdentifierQuoteString  should return space if not supported.
 http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7366) getDatabase using direct sql


 [ 
https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-7366:
---

Attachment: HIVE-7366.2.patch

Updated patch to fix test failures - the test failures were due to my change 
not taking into account recent role/owner changes.

 getDatabase using direct sql
 

 Key: HIVE-7366
 URL: https://issues.apache.org/jira/browse/HIVE-7366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-7366.2.patch, HIVE-7366.patch


 Given that get_database is easily one of the most frequent calls made on the 
 metastore, we should have the ability to bypass datanucleus for that, and use 
 direct SQL instead.
 This was something that I did initially as part of debugging HIVE-7368, but I 
 think that given the frequency of this call, it's useful to have it in 
 mainline direct sql.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7366) getDatabase using direct sql


[ 
https://issues.apache.org/jira/browse/HIVE-7366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093230#comment-14093230
 ] 

Sushanth Sowmyan commented on HIVE-7366:


[~sershe], I've created a reviewboard link for the latest patch : 
https://reviews.apache.org/r/24574/

 getDatabase using direct sql
 

 Key: HIVE-7366
 URL: https://issues.apache.org/jira/browse/HIVE-7366
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-7366.2.patch, HIVE-7366.patch


 Given that get_database is easily one of the most frequent calls made on the 
 metastore, we should have the ability to bypass datanucleus for that, and use 
 direct SQL instead.
 This was something that I did initially as part of debugging HIVE-7368, but I 
 think that given the frequency of this call, it's useful to have it in 
 mainline direct sql.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7676) Support more methods in HiveDatabaseMetaData

[
https://issues.apache.org/jira/browse/HIVE-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093234#comment-14093234
]

Alexander Pivovarov commented on HIVE-7676:
---

patch content:
1. getIdentifierQuoteString returns space
2. getIndexInfo returns empty ResultSet similar to getPrimaryKeys
3.1 supportsFullOuterJoins = true
3.2 supportsLimitedOuterJoins = true
4.1 supportsSchemasInDataManipulation = true
4.2 supportsSchemasInTableDefinitions = true
5.1 supportsUnion = false
5.2 supportsUnionAll = true
6. HiveResultSetMetaData.isReadOnly = true

Support more methods in HiveDatabaseMetaData

Key: HIVE-7676
URL: https://issues.apache.org/jira/browse/HIVE-7676
Project: Hive
Issue Type: Improvement
Components: JDBC
Affects Versions: 0.13.1
Reporter: Alexander Pivovarov
Attachments: HIVE-7676.patch

I noticed that some methods in HiveDatabaseMetaData throws exceptions instead
of returning true/false. Many JDBC clients expects implementations for
particular methods in order to work.
E.g. SQuirreL SQL shows databases only if supportsSchemasInTableDefinitions
returns true.
Also hive 0.14.0 supports UNION ALL and does not support UNION
We can indicate this in HiveDatabaseMetaData instead of throwing Method Not
supported exception.
getIdentifierQuoteString should return space if not supported.
http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIdentifierQuoteString%28%29

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7682) HadoopThriftAuthBridge20S should not reset configuration unless required