[jira] [Created] (HIVE-7178) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it

2014-06-05 Thread Yibing Shi (JIRA)
Yibing Shi created HIVE-7178:


 Summary: Table alias cannot be used in GROUPING SETS clause if 
there are more than one column in it
 Key: HIVE-7178
 URL: https://issues.apache.org/jira/browse/HIVE-7178
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.13.0
Reporter: Yibing Shi


The following SQL doesn't work: 
EXPLAIN 
SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) 
FROM table_name alias 
GROUP BY alias.a, alias.b, alias.c 
GROUPING SETS( (alias.a), (alias.b, alias.a) ); 
FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' 
line 16:0 extraneous input ')' expecting EOF near 'EOF' 

The following SQL works (without alias in grouping set): 
EXPLAIN 
SELECT a, b, c, COUNT(DISTINCT d) 
FROM table_name 
GROUP BY a, b, c 
GROUPING SETS( (a), (b, a) ); 

Alias works for just one column: 
EXPLAIN 
SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) 
FROM table_name alias 
GROUP BY alias.a, alias.b, alias.c 
GROUPING SETS( (alias.a) ); 

Using alias in GROUPING SETS could be very useful if multiple tables are 
involved in the SELECT (via JOIN)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Thejas Nair
Yu,
I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
I tried it with both file system cache enabled and disabled.
What are the non default configurations you have on your machine ?

Thanks,
Thejas



On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote:
 Hive-0.13.0 works well in my test cluster.





 -1

 Verified with hadoop-2.4.0 and tez-0.5-snapshot,
 Hive cannot start?

 And I also built hive branch-0.13, the same error.

 [test@vm-10-154-** tmp]$ hive
 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive 
 is deprecated. Instead, use 
 mapreduce.input.fileinputformat.input.dir.recursive
 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is 
 deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is 
 deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
 14/06/05 13:53:45 INFO Configuration.deprecation: 
 mapred.min.split.size.per.node is deprecated. Instead, use 
 mapreduce.input.fileinputformat.split.minsize.per.node
 14/06/05 13:53:45 INFO Configuration.deprecation: 
 mapred.min.split.size.per.rack is deprecated. Instead, use 
 mapreduce.input.fileinputformat.split.minsize.per.rack
 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is 
 deprecated. Instead, use mapreduce.job.reduces
 14/06/05 13:53:45 INFO Configuration.deprecation: 
 mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
 mapreduce.reduce.speculative
 14/06/05 13:53:45 INFO Configuration.deprecation: 
 mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
 mapreduce.job.committer.setup.cleanup.needed
 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* 
 no longer has any effect.  Use hive.hmshandler.retry.* instead
 Logging initialized using configuration in 
 jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
 Exception in thread main java.lang.RuntimeException: java.io.IOException: 
 Filesystem closed
  at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.io.IOException: Filesystem closed
  at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
  at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
  at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
  at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062)
  at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
  at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353)
  at 
 org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297)
  at 
 org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144)
  at 
 org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
  at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355)
  ... 7 more

 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
 From: hbut...@hortonworks.com
 Date: Wed, 4 Jun 2014 22:42:32 -0700
 To: dev@hive.apache.org

 +1

 - Verified signature and checksum
 - Checked release notes
 - built source
 - ran a few unit tests
 - ran a few queries in local mode

 On Jun 2, 2014, at 7:20 PM, Thejas Nair the...@hortonworks.com wrote:

  +1
 
  - Verified signatures and checksum of both packages
  - Checked release notes
  - Build source package
  - Ran simple hive queries on newly built package in local mode
  - Ran simple queries on package on single node cluster with both tez
  and mr as execution engines
  - Ran unit tests
 
 
  On Mon, Jun 2, 2014 at 1:02 PM, Sushanth Sowmyan khorg...@apache.org 
  wrote:
  Apache Hive 0.13.1 Release Candidate 3 is available here:
 
  http://people.apache.org/~khorgath/releases/0.13.1_RC3/artifacts/
 
  Maven artifacts are available here:
 
  https://repository.apache.org/content/repositories/orgapachehive-1015
 
  Source tag for RC3 is at:
 
  http://svn.apache.org/viewvc/hive/tags/release-0.13.1-rc3/
 
  Voting will remain open for 72 hours.
 
  Hive PMC Members: Please test and vote.
 
  Thanks,
  -Sushanth
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This 

[jira] [Commented] (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2014-06-05 Thread xuanjinlee (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018530#comment-14018530
 ] 

xuanjinlee commented on HIVE-1019:
--

Hello ,how to use this patch?

 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 

 Key: HIVE-1019
 URL: https://issues.apache.org/jira/browse/HIVE-1019
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.6.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, 
 HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, 
 HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt


 I keep getting errors like this:
 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 and :
 java.io.IOException: cannot find dir = 
 hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in 
 partToPartitionInfo!
 when running multiple threads with roughly similar queries.
 I have a patch for this which works for me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018536#comment-14018536
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-7166:
-

I looked at this issue. It seems that vectorization cannot be performed 
trivially for the above example because constant folding  is supported only for 
unary expressions as of now in vectorization. Once HIVE-5771 is committed, this 
query can be vectorized. The current fix is to disable vectorization in such a 
scenario so that we fall back to row-mode.

cc-ing [~jnp] and [~ehans] for reviewing the patch.

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, UDF, Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor

 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7166:


Status: Patch Available  (was: Open)

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7166:


Component/s: (was: HiveServer2)
 (was: UDF)

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7166:


Attachment: HIVE-7166.1.patch

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)

2014-06-05 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018555#comment-14018555
 ] 

Bennie Schut commented on HIVE-1019:


xuanjinlee this is a somewhat prehistoric patch which I forgot to close. Most 
people have moved to hiveserver2 which doesn't suffer from these threading 
issues. Unless anyone objects I would actually like to close this issue.

 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 

 Key: HIVE-1019
 URL: https://issues.apache.org/jira/browse/HIVE-1019
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.6.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, 
 HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, 
 HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt


 I keep getting errors like this:
 java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
 and :
 java.io.IOException: cannot find dir = 
 hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in 
 partToPartitionInfo!
 when running multiple threads with roughly similar queries.
 I have a patch for this which works for me.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde

2014-06-05 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22174/
---

(Updated June 5, 2014, 7:33 a.m.)


Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.


Changes
---

Cache the Calendar in thread-local as suggested for performance.


Bugs: HIVE-6394
https://issues.apache.org/jira/browse/HIVE-6394


Repository: hive-git


Description
---

This uses the Jodd library to convert java.sql.Timestamp type used by Hive into 
the {julian-day:nanos} format expected by parquet, and vice-versa.


Diffs (updated)
-

  data/files/parquet_types.txt 0be390b 
  pom.xml 4bb8880 
  ql/pom.xml 13c477a 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
4da0d30 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
 29f7e11 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
 57161d8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
fb2f5a8 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
3490061 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/parquet_types.q 5d6333c 
  ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 

Diff: https://reviews.apache.org/r/22174/diff/


Testing
---

Unit tests the new libraries, and also added timestamp data in the 
parquet_types q-test.


Thanks,

Szehon Ho



[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-05 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6394:


Attachment: HIVE-6394.5.patch

I don't think so, as I am modifying the values with the given timestamp.  I 
added a thread-local cache of calendar that is lazily-created.

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator

2014-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018563#comment-14018563
 ] 

Hive QA commented on HIVE-4867:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648427/HIVE-4867.5.patch.txt

{color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 5510 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_rearrange
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_reorder4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_filter_on_outerjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_subquery2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_join_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reduce_deduplicate_exclude_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_sortmerge_mapjoin_mismatch_1
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/390/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/390/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-390/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 43 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648427

 Deduplicate columns appearing in both the key list and value list of 
 ReduceSinkOperator
 ---

 Key: HIVE-4867
 URL: https://issues.apache.org/jira/browse/HIVE-4867
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Navis
 Attachments: HIVE-4867.1.patch.txt, HIVE-4867.2.patch.txt, 
 HIVE-4867.3.patch.txt, HIVE-4867.4.patch.txt, HIVE-4867.5.patch.txt, 
 source_only.txt


 A ReduceSinkOperator emits data in the format of keys and values. Right now, 
 a column may appear in both the key list and value 

[jira] [Commented] (HIVE-7110) TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile

2014-06-05 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018562#comment-14018562
 ] 

Szehon Ho commented on HIVE-7110:
-

I mean, this test is working for me and also on the main build, so I don't see 
a need for a fix.  These were indeed using maven.

You can check on-going builds here: 
[http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build|http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build]


 TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile
 -

 Key: HIVE-7110
 URL: https://issues.apache.org/jira/browse/HIVE-7110
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7110.1.patch, HIVE-7110.2.patch, HIVE-7110.3.patch, 
 HIVE-7110.4.patch


 I got the following TestHCatPartitionPublish test failure when running all 
 unit tests against Hadoop 1. This also appears when testing against Hadoop 2.
 {code}
  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.06 sec 
  FAILURE! - in org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish
 testPartitionPublish(org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish)
   Time elapsed: 1.361 sec   ERROR!
 org.apache.hive.hcatalog.common.HCatException: 
 org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output 
 information. Cause : java.io.IOException: No FileSystem for scheme: pfile
 at 
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1443)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 at 
 org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:212)
 at 
 org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
 at 
 org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.runMRCreateFail(TestHCatPartitionPublish.java:191)
 at 
 org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:155)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-1539) Concurrent metastore threading problem

2014-06-05 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut resolved HIVE-1539.


  Resolution: Fixed
Release Note: We switched to a datanucleus version = 2.2 a long time ago 
so this is fixed.

 Concurrent metastore threading problem 
 ---

 Key: HIVE-1539
 URL: https://issues.apache.org/jira/browse/HIVE-1539
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Bennie Schut
Assignee: Bennie Schut
 Attachments: ClassLoaderResolver.patch, HIVE-1539-1.patch, 
 HIVE-1539.patch, thread_dump_hanging.txt


 When running hive as a service and running a high number of queries 
 concurrently I end up with multiple threads running at 100% cpu without any 
 progress.
 Looking at these threads I notice this thread(484e):
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598)
 But on a different thread(63a2):
 at 
 org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7179) hive connect to hbase cause select results error

2014-06-05 Thread zhengzhuangjie (JIRA)
zhengzhuangjie created HIVE-7179:


 Summary: hive connect to hbase cause select results error
 Key: HIVE-7179
 URL: https://issues.apache.org/jira/browse/HIVE-7179
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.12.0
 Environment: Hadoop 1.0.4, HBase 0.94.6.1, Hive 0.12.0
Reporter: zhengzhuangjie


CREATE EXTERNAL TABLE hb_test(key string, `_plat` int, `_uid` int)STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
(hbase.columns.mapping = 
:key,d:_plat#b,d:_uid#b)TBLPROPERTIES(hbase.table.name = test);
insert overwrite local directory '/data/mf/app/data/check' ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' select key, `_plat`, `_uid` from hb_test where 
key'00604' and key'00605';

after finish the query, under fold /data/mf/app/data/check output eight files. 
and each file has the same duplicate data




--
This message was sent by Atlassian JIRA
(v6.2#6252)


RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Yu Azuryy
I just changed ConnectionURL and warehouse in hive-site.xml, 
 
As for tez-0.5-snapshot, 
  property
nametez.lib.uris/name
 value${fs.defaultFS}/data/tez/lib/value
  /property
  property
nametez.am.resource.memory.mb/name
value1280/value
  /property 
 
 
None others.

 
 Date: Wed, 4 Jun 2014 23:28:07 -0700
 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
 From: the...@hortonworks.com
 To: dev@hive.apache.org
 
 Yu,
 I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
 I tried it with both file system cache enabled and disabled.
 What are the non default configurations you have on your machine ?
 
 Thanks,
 Thejas
 
 
 
 On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote:
  Hive-0.13.0 works well in my test cluster.
 
 
 
 
 
  -1
 
  Verified with hadoop-2.4.0 and tez-0.5-snapshot,
  Hive cannot start?
 
  And I also built hive branch-0.13, the same error.
 
  [test@vm-10-154-** tmp]$ hive
  14/06/05 13:53:45 INFO Configuration.deprecation: 
  mapred.input.dir.recursive is deprecated. Instead, use 
  mapreduce.input.fileinputformat.input.dir.recursive
  14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is 
  deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
  14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is 
  deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
  14/06/05 13:53:45 INFO Configuration.deprecation: 
  mapred.min.split.size.per.node is deprecated. Instead, use 
  mapreduce.input.fileinputformat.split.minsize.per.node
  14/06/05 13:53:45 INFO Configuration.deprecation: 
  mapred.min.split.size.per.rack is deprecated. Instead, use 
  mapreduce.input.fileinputformat.split.minsize.per.rack
  14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is 
  deprecated. Instead, use mapreduce.job.reduces
  14/06/05 13:53:45 INFO Configuration.deprecation: 
  mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
  mapreduce.reduce.speculative
  14/06/05 13:53:45 INFO Configuration.deprecation: 
  mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use 
  mapreduce.job.committer.setup.cleanup.needed
  14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* 
  no longer has any effect.  Use hive.hmshandler.retry.* instead
  Logging initialized using configuration in 
  jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
  Exception in thread main java.lang.RuntimeException: java.io.IOException: 
  Filesystem closed
   at 
  org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
  Caused by: java.io.IOException: Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
   at 
  org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
   at 
  org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062)
   at 
  org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
  org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062)
   at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
   at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353)
   at 
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297)
   at 
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144)
   at 
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
   at 
  org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355)
   ... 7 more
 
  Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
  From: hbut...@hortonworks.com
  Date: Wed, 4 Jun 2014 22:42:32 -0700
  To: dev@hive.apache.org
 
  +1
 
  - Verified signature and checksum
  - Checked release notes
  - built source
  - ran a few unit tests
  - ran a few queries in local mode
 
  On Jun 2, 2014, at 7:20 PM, Thejas Nair the...@hortonworks.com wrote:
 
   +1
  
   - Verified signatures and checksum of both packages
   - Checked release notes
   - Build source package
   - Ran simple hive queries on newly built package in local mode
   - Ran simple queries on package on single node cluster with both tez
   and mr as execution engines
   - Ran unit tests
  
  
   On Mon, Jun 2, 

Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Vaibhav Gumashta
Yu,

I don't think tez-0.5 has been released yet. Can you try with tez-0.4 (
http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)?

Thanks,
--Vaibhav


On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote:

 I just changed ConnectionURL and warehouse in hive-site.xml,

 As for tez-0.5-snapshot,
   property
 nametez.lib.uris/name
  value${fs.defaultFS}/data/tez/lib/value
   /property
   property
 nametez.am.resource.memory.mb/name
 value1280/value
   /property


 None others.


  Date: Wed, 4 Jun 2014 23:28:07 -0700
  Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
  From: the...@hortonworks.com
  To: dev@hive.apache.org
 
  Yu,
  I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
  I tried it with both file system cache enabled and disabled.
  What are the non default configurations you have on your machine ?
 
  Thanks,
  Thejas
 
 
 
  On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote:
   Hive-0.13.0 works well in my test cluster.
  
  
  
  
  
   -1
  
   Verified with hadoop-2.4.0 and tez-0.5-snapshot,
   Hive cannot start?
  
   And I also built hive branch-0.13, the same error.
  
   [test@vm-10-154-** tmp]$ hive
   14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.input.dir.recursive is deprecated. Instead, use
 mapreduce.input.fileinputformat.input.dir.recursive
   14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.max.split.size is deprecated. Instead, use
 mapreduce.input.fileinputformat.split.maxsize
   14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.min.split.size is deprecated. Instead, use
 mapreduce.input.fileinputformat.split.minsize
   14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.min.split.size.per.node is deprecated. Instead, use
 mapreduce.input.fileinputformat.split.minsize.per.node
   14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.min.split.size.per.rack is deprecated. Instead, use
 mapreduce.input.fileinputformat.split.minsize.per.rack
   14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks
 is deprecated. Instead, use mapreduce.job.reduces
   14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
 mapreduce.reduce.speculative
   14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use
 mapreduce.job.committer.setup.cleanup.needed
   14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED:
 hive.metastore.ds.retry.* no longer has any effect.  Use
 hive.hmshandler.retry.* instead
   Logging initialized using configuration in
 jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
   Exception in thread main java.lang.RuntimeException:
 java.io.IOException: Filesystem closed
at
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
   Caused by: java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
at
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
at
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062)
at
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353)
at
 org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297)
at
 org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144)
at
 org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
at
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355)
... 7 more
  
   Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
   From: hbut...@hortonworks.com
   Date: Wed, 4 Jun 2014 22:42:32 -0700
   To: dev@hive.apache.org
  
   +1
  
   - Verified signature and checksum
   - Checked release notes
   - built source
   - ran a few unit tests
   - ran a few queries in local mode
  
   On Jun 2, 2014, at 7:20 PM, Thejas Nair the...@hortonworks.com
 wrote:
  
+1
   
- Verified signatures and checksum of both 

RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Yu Azuryy
Ok, I'll try it today. and post back.  
 
 Date: Thu, 5 Jun 2014 01:20:10 -0700
 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
 From: vgumas...@hortonworks.com
 To: dev@hive.apache.org
 
 Yu,
 
 I don't think tez-0.5 has been released yet. Can you try with tez-0.4 (
 http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)?
 
 Thanks,
 --Vaibhav
 
 
 On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote:
 
  I just changed ConnectionURL and warehouse in hive-site.xml,
 
  As for tez-0.5-snapshot,
property
  nametez.lib.uris/name
   value${fs.defaultFS}/data/tez/lib/value
/property
property
  nametez.am.resource.memory.mb/name
  value1280/value
/property
 
 
  None others.
 
 
   Date: Wed, 4 Jun 2014 23:28:07 -0700
   Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
   From: the...@hortonworks.com
   To: dev@hive.apache.org
  
   Yu,
   I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
   I tried it with both file system cache enabled and disabled.
   What are the non default configurations you have on your machine ?
  
   Thanks,
   Thejas
  
  
  
   On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote:
Hive-0.13.0 works well in my test cluster.
   
   
   
   
   
-1
   
Verified with hadoop-2.4.0 and tez-0.5-snapshot,
Hive cannot start?
   
And I also built hive branch-0.13, the same error.
   
[test@vm-10-154-** tmp]$ hive
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.input.dir.recursive is deprecated. Instead, use
  mapreduce.input.fileinputformat.input.dir.recursive
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.max.split.size is deprecated. Instead, use
  mapreduce.input.fileinputformat.split.maxsize
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.min.split.size is deprecated. Instead, use
  mapreduce.input.fileinputformat.split.minsize
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.min.split.size.per.node is deprecated. Instead, use
  mapreduce.input.fileinputformat.split.minsize.per.node
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.min.split.size.per.rack is deprecated. Instead, use
  mapreduce.input.fileinputformat.split.minsize.per.rack
14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks
  is deprecated. Instead, use mapreduce.job.reduces
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
  mapreduce.reduce.speculative
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use
  mapreduce.job.committer.setup.cleanup.needed
14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED:
  hive.metastore.ds.retry.* no longer has any effect.  Use
  hive.hmshandler.retry.* instead
Logging initialized using configuration in
  jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
Exception in thread main java.lang.RuntimeException:
  java.io.IOException: Filesystem closed
 at
  org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
 at
  org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
 at
  org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062)
 at
  org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at
  org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
 at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353)
 at
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297)
 at
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144)
 at
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
 at
  org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355)
 ... 7 more
   
Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
From: hbut...@hortonworks.com
Date: Wed, 4 Jun 2014 22:42:32 -0700
  

Re: Hive 0.13/Hcatalog : Mapreduce Exception : java.lang.IncompatibleClassChangeError

2014-06-05 Thread Navis류승우
I don't have environment to confirm this. But if the this happens, we
should include HIVE-6432 into HIVE-0.13.1.


2014-06-05 12:44 GMT+09:00 Navis류승우 navis@nexr.com:

 It's fixed in HIVE-6432. I think you should rebuild your own hcatalog from
 source with profile -Phadoop-1.


 2014-06-05 9:08 GMT+09:00 Sundaramoorthy, Malliyanathan 
 malliyanathan.sundaramoor...@citi.com:

   Hi,

 I am using Hadoop 2.4.0 with Hive 0.13 + included package of HCatalog .
 Wrote a simple map-reduce job from the example and running the code below
 .. getting “*Exception in thread main
 java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.JobContext, but class was expected“ * .. Not
 sure of the error I am making ..

 Not sure if there a compatibility issue .. please help..



 *boolean* success = *true*;

   *try* {

   Configuration conf = getConf();

   args = *new* GenericOptionsParser(conf,
 args).getRemainingArgs();

   //Hive Table Details

   String dbName = args[0];

   String inputTableName= args[1];

   String outputTableName= args[2];



   //Job Input

   Job job = *new* *Job**(conf,**Scenarios**)*;


   //Initialize Map/Reducer Input/Output

   HCatInputFormat.*setInput*(job,dbName,inputTableName);

   //HCatInputFormat.ssetInput(job,InputJobInfo.create(dbName,
 inputTableName, null));

   job.setInputFormatClass(HCatInputFormat.*class*);

   job.setJarByClass(MainRunner.*class*);

job.setMapperClass(ScenarioMapper.*class*);

 job.setReducerClass(ScenarioReducer.*class*);

job.setMapOutputKeyClass(IntWritable.*class*);

 job.setMapOutputValueClass(IntWritable.*class*);



 job.setOutputKeyClass(WritableComparable.*class*);

 job.setOutputValueClass(DefaultHCatRecord.*class*);



 HCatOutputFormat.*setOutput*(job, OutputJobInfo.*create*(dbName,
 outputTableName, *null*));

 HCatSchema outSchema = HCatOutputFormat.*getTableSchema*(conf);

 System.*err*.println(INFO: output schema explicitly set for
 writing:+ outSchema);

HCatOutputFormat.*setSchema*(job, outSchema);

 job.setOutputFormatClass(HCatOutputFormat.*class*);





 14/06/02 18:52:57 INFO client.RMProxy: Connecting to ResourceManager at
 localhost/00.04.07.174:8040

 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected

 at
 org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:104)

 at
 org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:84)

 at
 org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:73)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)

 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)

 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)

 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)

 at
 org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)

 at
 com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.run(MainRunner.java:79)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

 at
 com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.main(MainRunner.java:89)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



 Regards,

 Malli







[jira] [Commented] (HIVE-7062) Support Streaming mode in Windowing

2014-06-05 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018584#comment-14018584
 ] 

Lefty Leverenz commented on HIVE-7062:
--

Okay, thanks [~rhbutani].  I've put this with my doc-by-0.14 tasks.

 Support Streaming mode in Windowing
 ---

 Key: HIVE-7062
 URL: https://issues.apache.org/jira/browse/HIVE-7062
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0

 Attachments: HIVE-7062.1.patch, HIVE-7062.4.patch, HIVE-7062.5.patch, 
 HIVE-7062.6.patch


 1. Have the Windowing Table Function support streaming mode.
 2. Have special handling for Ranking UDAFs.
 3. Have special handling for Sum/Avg for fixed size Wdws.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported

2014-06-05 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7160:


Attachment: HIVE-7160.1.patch.txt

 Vectorization Udf: GenericUDFConcat for non-string columns input, is not 
 supported
 --

 Key: HIVE-7160
 URL: https://issues.apache.org/jira/browse/HIVE-7160
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gopal V
Priority: Minor
 Attachments: HIVE-7160.1.patch.txt


 simple UDF missing vectorization - simple example would be 
 hive explain select concat( l_orderkey, ' msecs') from lineitem;
 is not vectorized while
 hive  explain select concat(cast(l_orderkey as string), ' msecs') from 
 lineitem;
 can be vectorized.
 {code}
 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf 
 found for GenericUDFConcat, descriptor: Argument Count = 2, mode = 
 PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = 
 {COLUMN,COLUMN}
 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize
 org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is 
 not supported
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported

2014-06-05 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7160:


Summary: Vectorization Udf: GenericUDFConcat for non-string columns input, 
is not supported  (was: Vectorization Udf: GenericUDFConcat, is not supported)

 Vectorization Udf: GenericUDFConcat for non-string columns input, is not 
 supported
 --

 Key: HIVE-7160
 URL: https://issues.apache.org/jira/browse/HIVE-7160
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gopal V
Priority: Minor
 Attachments: HIVE-7160.1.patch.txt


 simple UDF missing vectorization - simple example would be 
 hive explain select concat( l_orderkey, ' msecs') from lineitem;
 is not vectorized while
 hive  explain select concat(cast(l_orderkey as string), ' msecs') from 
 lineitem;
 can be vectorized.
 {code}
 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf 
 found for GenericUDFConcat, descriptor: Argument Count = 2, mode = 
 PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = 
 {COLUMN,COLUMN}
 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize
 org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is 
 not supported
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported

2014-06-05 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7160:


Assignee: Navis
  Status: Patch Available  (was: Open)

There might some design issue. added preferredType for VectorizedExpressions 
annotation. Will try argument conversion if there are not proper 
VectorExpression. 

In this case, concat(columnint/columnstring) is not supported but adding 
preferredType=String... makes final try with first argument casted to string 
type.

 Vectorization Udf: GenericUDFConcat for non-string columns input, is not 
 supported
 --

 Key: HIVE-7160
 URL: https://issues.apache.org/jira/browse/HIVE-7160
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7160.1.patch.txt


 simple UDF missing vectorization - simple example would be 
 hive explain select concat( l_orderkey, ' msecs') from lineitem;
 is not vectorized while
 hive  explain select concat(cast(l_orderkey as string), ' msecs') from 
 lineitem;
 can be vectorized.
 {code}
 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf 
 found for GenericUDFConcat, descriptor: Argument Count = 2, mode = 
 PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = 
 {COLUMN,COLUMN}
 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize
 org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is 
 not supported
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde

2014-06-05 Thread justin coffey

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22174/#review44805
---



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java
https://reviews.apache.org/r/22174/#comment79343

A stupid question perhaps, but is INT96 reserved for timestamps in parquet?

I dug this up, but not sure if it's definitive: 
https://github.com/Parquet/parquet-mr/issues/101


- justin coffey


On June 5, 2014, 7:33 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22174/
 ---
 
 (Updated June 5, 2014, 7:33 a.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Bugs: HIVE-6394
 https://issues.apache.org/jira/browse/HIVE-6394
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This uses the Jodd library to convert java.sql.Timestamp type used by Hive 
 into the {julian-day:nanos} format expected by parquet, and vice-versa.
 
 
 Diffs
 -
 
   data/files/parquet_types.txt 0be390b 
   pom.xml 4bb8880 
   ql/pom.xml 13c477a 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
 4da0d30 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
  29f7e11 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
  57161d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
 fb2f5a8 
   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
  3490061 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/parquet_types.q 5d6333c 
   ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 
 
 Diff: https://reviews.apache.org/r/22174/diff/
 
 
 Testing
 ---
 
 Unit tests the new libraries, and also added timestamp data in the 
 parquet_types q-test.
 
 
 Thanks,
 
 Szehon Ho
 




RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Yu Azuryy
Unfortunately,   I got the same error.
 
Hadoop-2.4.0(HA enabled),  Hive-0.13.1-rc3,  tez-0.4
 
If I set 'hive.execution.engine' to mr by default, then Hive does work. but if 
changed it to 'tez', I got the followig error.
 
please look at here for hive and tez configuration:
http://pastebin.com/hid1AA83
 

 
 From: azur...@outlook.com
 To: dev@hive.apache.org
 Subject: RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3
 Date: Thu, 5 Jun 2014 08:27:27 +
 
 Ok, I'll try it today. and post back.  
  
  Date: Thu, 5 Jun 2014 01:20:10 -0700
  Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
  From: vgumas...@hortonworks.com
  To: dev@hive.apache.org
  
  Yu,
  
  I don't think tez-0.5 has been released yet. Can you try with tez-0.4 (
  http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)?
  
  Thanks,
  --Vaibhav
  
  
  On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote:
  
   I just changed ConnectionURL and warehouse in hive-site.xml,
  
   As for tez-0.5-snapshot,
 property
   nametez.lib.uris/name
value${fs.defaultFS}/data/tez/lib/value
 /property
 property
   nametez.am.resource.memory.mb/name
   value1280/value
 /property
  
  
   None others.
  
  
Date: Wed, 4 Jun 2014 23:28:07 -0700
Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
From: the...@hortonworks.com
To: dev@hive.apache.org
   
Yu,
I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
I tried it with both file system cache enabled and disabled.
What are the non default configurations you have on your machine ?
   
Thanks,
Thejas
   
   
   
On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote:
 Hive-0.13.0 works well in my test cluster.





 -1

 Verified with hadoop-2.4.0 and tez-0.5-snapshot,
 Hive cannot start?

 And I also built hive branch-0.13, the same error.

 [test@vm-10-154-** tmp]$ hive
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.input.dir.recursive is deprecated. Instead, use
   mapreduce.input.fileinputformat.input.dir.recursive
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.max.split.size is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.maxsize
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size.per.node is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize.per.node
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size.per.rack is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize.per.rack
 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks
   is deprecated. Instead, use mapreduce.job.reduces
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
   mapreduce.reduce.speculative
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use
   mapreduce.job.committer.setup.cleanup.needed
 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED:
   hive.metastore.ds.retry.* no longer has any effect.  Use
   hive.hmshandler.retry.* instead
 Logging initialized using configuration in
   jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
 Exception in thread main java.lang.RuntimeException:
   java.io.IOException: Filesystem closed
  at
   org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.io.IOException: Filesystem closed
  at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
  at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
  at
   org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
  at
   org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062)
  at
   org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at
   org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
  at 

RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Yu Azuryy
Hive-0.13.0 works well under both tez and mr.
 
 
 From: azur...@outlook.com
 To: dev@hive.apache.org
 Subject: RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3
 Date: Thu, 5 Jun 2014 09:07:41 +
 
 Unfortunately,   I got the same error.
  
 Hadoop-2.4.0(HA enabled),  Hive-0.13.1-rc3,  tez-0.4
  
 If I set 'hive.execution.engine' to mr by default, then Hive does work. but 
 if changed it to 'tez', I got the followig error.
  
 please look at here for hive and tez configuration:
 http://pastebin.com/hid1AA83
  
 
  
  From: azur...@outlook.com
  To: dev@hive.apache.org
  Subject: RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3
  Date: Thu, 5 Jun 2014 08:27:27 +
  
  Ok, I'll try it today. and post back.  
   
   Date: Thu, 5 Jun 2014 01:20:10 -0700
   Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
   From: vgumas...@hortonworks.com
   To: dev@hive.apache.org
   
   Yu,
   
   I don't think tez-0.5 has been released yet. Can you try with tez-0.4 (
   http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)?
   
   Thanks,
   --Vaibhav
   
   
   On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote:
   
I just changed ConnectionURL and warehouse in hive-site.xml,
   
As for tez-0.5-snapshot,
  property
nametez.lib.uris/name
 value${fs.defaultFS}/data/tez/lib/value
  /property
  property
nametez.am.resource.memory.mb/name
value1280/value
  /property
   
   
None others.
   
   
 Date: Wed, 4 Jun 2014 23:28:07 -0700
 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
 From: the...@hortonworks.com
 To: dev@hive.apache.org

 Yu,
 I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
 I tried it with both file system cache enabled and disabled.
 What are the non default configurations you have on your machine ?

 Thanks,
 Thejas



 On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com 
 wrote:
  Hive-0.13.0 works well in my test cluster.
 
 
 
 
 
  -1
 
  Verified with hadoop-2.4.0 and tez-0.5-snapshot,
  Hive cannot start?
 
  And I also built hive branch-0.13, the same error.
 
  [test@vm-10-154-** tmp]$ hive
  14/06/05 13:53:45 INFO Configuration.deprecation:
mapred.input.dir.recursive is deprecated. Instead, use
mapreduce.input.fileinputformat.input.dir.recursive
  14/06/05 13:53:45 INFO Configuration.deprecation:
mapred.max.split.size is deprecated. Instead, use
mapreduce.input.fileinputformat.split.maxsize
  14/06/05 13:53:45 INFO Configuration.deprecation:
mapred.min.split.size is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize
  14/06/05 13:53:45 INFO Configuration.deprecation:
mapred.min.split.size.per.node is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize.per.node
  14/06/05 13:53:45 INFO Configuration.deprecation:
mapred.min.split.size.per.rack is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize.per.rack
  14/06/05 13:53:45 INFO Configuration.deprecation: 
  mapred.reduce.tasks
is deprecated. Instead, use mapreduce.job.reduces
  14/06/05 13:53:45 INFO Configuration.deprecation:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
  14/06/05 13:53:45 INFO Configuration.deprecation:
mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use
mapreduce.job.committer.setup.cleanup.needed
  14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED:
hive.metastore.ds.retry.* no longer has any effect.  Use
hive.hmshandler.retry.* instead
  Logging initialized using configuration in
jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
  Exception in thread main java.lang.RuntimeException:
java.io.IOException: Filesystem closed
   at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
  Caused by: java.io.IOException: Filesystem closed
   at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
   at 
  org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
   at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
   at

[jira] [Commented] (HIVE-6625) HiveServer2 running in http mode should support trusted proxy access

2014-06-05 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018611#comment-14018611
 ] 

Lefty Leverenz commented on HIVE-6625:
--

Just for the record, [~vaibhavgumashta] updated these user docs on the wiki:

* [Admin Manual:  Setting Up HiveServer2 | 
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2]
** diffs: 
[https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=30758712selectedPageVersions=25selectedPageVersions=16]
* [HiveServer2 Clients | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients]
** diffs: 
[https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=30758725selectedPageVersions=43selectedPageVersions=39]

 HiveServer2 running in http mode should support trusted proxy access
 

 Key: HIVE-6625
 URL: https://issues.apache.org/jira/browse/HIVE-6625
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.13.0

 Attachments: HIVE-6625.1.patch, HIVE-6625.2.patch


 HIVE-5155 adds trusted proxy access to HiveServer2. This patch a minor change 
 to have it used when running HiveServer2 in http mode. Patch to be applied on 
 top of HIVE-4764  HIVE-5155.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018651#comment-14018651
 ] 

Hive QA commented on HIVE-7166:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648433/HIVE-7166.1.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 5585 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_schema_evolution
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testBetweenFilters
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/392/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/392/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-392/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648433

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: 

RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Yu Azuryy
+1 now.
 
Sorry, It's my fault. I hacked Hadoop major version in my test cluster as 
1.3.0, actually it was 2.4.0.
But ShimLoader parse major version as 1, so get HDFSv1 FileSystem. then I get 
this error.
 
 

 
 Date: Thu, 5 Jun 2014 01:20:10 -0700
 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
 From: vgumas...@hortonworks.com
 To: dev@hive.apache.org
 
 Yu,
 
 I don't think tez-0.5 has been released yet. Can you try with tez-0.4 (
 http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)?
 
 Thanks,
 --Vaibhav
 
 
 On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote:
 
  I just changed ConnectionURL and warehouse in hive-site.xml,
 
  As for tez-0.5-snapshot,
property
  nametez.lib.uris/name
   value${fs.defaultFS}/data/tez/lib/value
/property
property
  nametez.am.resource.memory.mb/name
  value1280/value
/property
 
 
  None others.
 
 
   Date: Wed, 4 Jun 2014 23:28:07 -0700
   Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
   From: the...@hortonworks.com
   To: dev@hive.apache.org
  
   Yu,
   I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
   I tried it with both file system cache enabled and disabled.
   What are the non default configurations you have on your machine ?
  
   Thanks,
   Thejas
  
  
  
   On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote:
Hive-0.13.0 works well in my test cluster.
   
   
   
   
   
-1
   
Verified with hadoop-2.4.0 and tez-0.5-snapshot,
Hive cannot start?
   
And I also built hive branch-0.13, the same error.
   
[test@vm-10-154-** tmp]$ hive
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.input.dir.recursive is deprecated. Instead, use
  mapreduce.input.fileinputformat.input.dir.recursive
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.max.split.size is deprecated. Instead, use
  mapreduce.input.fileinputformat.split.maxsize
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.min.split.size is deprecated. Instead, use
  mapreduce.input.fileinputformat.split.minsize
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.min.split.size.per.node is deprecated. Instead, use
  mapreduce.input.fileinputformat.split.minsize.per.node
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.min.split.size.per.rack is deprecated. Instead, use
  mapreduce.input.fileinputformat.split.minsize.per.rack
14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks
  is deprecated. Instead, use mapreduce.job.reduces
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
  mapreduce.reduce.speculative
14/06/05 13:53:45 INFO Configuration.deprecation:
  mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use
  mapreduce.job.committer.setup.cleanup.needed
14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED:
  hive.metastore.ds.retry.* no longer has any effect.  Use
  hive.hmshandler.retry.* instead
Logging initialized using configuration in
  jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
Exception in thread main java.lang.RuntimeException:
  java.io.IOException: Filesystem closed
 at
  org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
  sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
  sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
 at
  org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
 at
  org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062)
 at
  org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at
  org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
 at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353)
 at
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297)
 at
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144)
 at
  org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
 at
  

[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde

2014-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018764#comment-14018764
 ] 

Hive QA commented on HIVE-6394:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648439/HIVE-6394.5.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 5589 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_schema_evolution
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_parquet_timestamp
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/393/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/393/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-393/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648439

 Implement Timestmap in ParquetSerde
 ---

 Key: HIVE-6394
 URL: https://issues.apache.org/jira/browse/HIVE-6394
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Jarek Jarcec Cecho
Assignee: Szehon Ho
  Labels: Parquet
 Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, 
 HIVE-6394.5.patch, HIVE-6394.patch


 This JIRA is to implement timestamp support in Parquet SerDe.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition

2014-06-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018776#comment-14018776
 ] 

Xuefu Zhang commented on HIVE-7117:
---

+1

 Partitions not inheriting table permissions after alter rename partition
 

 Key: HIVE-7117
 URL: https://issues.apache.org/jira/browse/HIVE-7117
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, 
 HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, 
 HIVE-7117.patch


 On altering/renaming a partition it must inherit permission of the parent 
 directory, if the flag hive.warehouse.subdir.inherit.perms is set.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Ashutosh Chauhan
+1


On Thu, Jun 5, 2014 at 5:32 AM, Yu Azuryy azur...@outlook.com wrote:

 +1 now.

 Sorry, It's my fault. I hacked Hadoop major version in my test cluster as
 1.3.0, actually it was 2.4.0.
 But ShimLoader parse major version as 1, so get HDFSv1 FileSystem. then I
 get this error.




  Date: Thu, 5 Jun 2014 01:20:10 -0700
  Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
  From: vgumas...@hortonworks.com
  To: dev@hive.apache.org
 
  Yu,
 
  I don't think tez-0.5 has been released yet. Can you try with tez-0.4 (
  http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/
 )?
 
  Thanks,
  --Vaibhav
 
 
  On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote:
 
   I just changed ConnectionURL and warehouse in hive-site.xml,
  
   As for tez-0.5-snapshot,
 property
   nametez.lib.uris/name
value${fs.defaultFS}/data/tez/lib/value
 /property
 property
   nametez.am.resource.memory.mb/name
   value1280/value
 /property
  
  
   None others.
  
  
Date: Wed, 4 Jun 2014 23:28:07 -0700
Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
From: the...@hortonworks.com
To: dev@hive.apache.org
   
Yu,
I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
I tried it with both file system cache enabled and disabled.
What are the non default configurations you have on your machine ?
   
Thanks,
Thejas
   
   
   
On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com
 wrote:
 Hive-0.13.0 works well in my test cluster.





 -1

 Verified with hadoop-2.4.0 and tez-0.5-snapshot,
 Hive cannot start?

 And I also built hive branch-0.13, the same error.

 [test@vm-10-154-** tmp]$ hive
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.input.dir.recursive is deprecated. Instead, use
   mapreduce.input.fileinputformat.input.dir.recursive
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.max.split.size is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.maxsize
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size.per.node is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize.per.node
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size.per.rack is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize.per.rack
 14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.reduce.tasks
   is deprecated. Instead, use mapreduce.job.reduces
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
   mapreduce.reduce.speculative
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use
   mapreduce.job.committer.setup.cleanup.needed
 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED:
   hive.metastore.ds.retry.* no longer has any effect.  Use
   hive.hmshandler.retry.* instead
 Logging initialized using configuration in
  
 jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
 Exception in thread main java.lang.RuntimeException:
   java.io.IOException: Filesystem closed
  at
  
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
  
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.io.IOException: Filesystem closed
  at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
  at
 org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
  at
  
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
  at
  
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062)
  at
  
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at
  
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
  at
 org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353)
  at
  
 org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297)
  at
  
 

[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit

2014-06-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018909#comment-14018909
 ] 

Eugene Koifman commented on HIVE-7155:
--

[~shanyu] I can't comment on RB.  Did you perhaps not publish it?

 WebHCat controller job exceeds container memory limit
 -

 Key: HIVE-7155
 URL: https://issues.apache.org/jira/browse/HIVE-7155
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HIVE-7155.1.patch, HIVE-7155.patch


 Submit a Hive query on a large table via WebHCat results in failure because 
 the WebHCat controller job is killed by Yarn since it exceeds the memory 
 limit (set by mapreduce.map.memory.mb, defaults to 1GB):
 {code}
  INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from 
 Stage_InjusticeEvents where LogTimestamp  '2014-03-01 00:00:00' and 
 LogTimestamp = '2014-03-01 01:00:00';
 {code}
 We could increase mapreduce.map.memory.mb to solve this problem, but this way 
 we are changing this setting system wise.
 We need to provide a WebHCat configuration to overwrite 
 mapreduce.map.memory.mb when submitting the controller job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2014-06-05 Thread Ted Yu (JIRA)
Ted Yu created HIVE-7180:


 Summary: BufferedReader is not closed in MetaStoreSchemaInfo ctor
 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is related code:
{code}
  BufferedReader bfReader =
new BufferedReader(new FileReader(upgradeListFile));
  String currSchemaVersion;
  while ((currSchemaVersion = bfReader.readLine()) != null) {
upgradeOrderList.add(currSchemaVersion.trim());
{code}
BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported

2014-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018973#comment-14018973
 ] 

Hive QA commented on HIVE-7160:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648447/HIVE-7160.1.patch.txt

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 5511 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testDropTable
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitionNames
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitions
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-394/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648447

 Vectorization Udf: GenericUDFConcat for non-string columns input, is not 
 supported
 --

 Key: HIVE-7160
 URL: https://issues.apache.org/jira/browse/HIVE-7160
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7160.1.patch.txt


 simple UDF missing vectorization - simple example would be 
 hive explain select concat( l_orderkey, ' msecs') from lineitem;
 is not vectorized while
 hive  explain select concat(cast(l_orderkey as string), ' msecs') from 
 lineitem;
 can be vectorized.
 {code}
 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf 
 found for GenericUDFConcat, descriptor: Argument Count = 2, mode = 
 PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = 
 {COLUMN,COLUMN}
 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize
 org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is 
 not supported
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7176) FileInputStream is not closed in Commands#properties()

2014-06-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019005#comment-14019005
 ] 

Ashutosh Chauhan commented on HIVE-7176:


+1

 FileInputStream is not closed in Commands#properties()
 --

 Key: HIVE-7176
 URL: https://issues.apache.org/jira/browse/HIVE-7176
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7176.1.patch.txt


 NO PRECOMMIT TESTS
 In beeline.Commands, around line 834:
 {code}
   props.load(new FileInputStream(parts[i]));
 {code}
 The FileInputStream is not closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7075) JsonSerde raises NullPointerException when object key is not lower case

2014-06-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019003#comment-14019003
 ] 

Ashutosh Chauhan commented on HIVE-7075:


cc: [~sushanth] Would you like to review this one?

 JsonSerde raises NullPointerException when object key is not lower case
 ---

 Key: HIVE-7075
 URL: https://issues.apache.org/jira/browse/HIVE-7075
 Project: Hive
  Issue Type: Bug
  Components: HCatalog, Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Yibing Shi
Assignee: Navis
 Attachments: HIVE-7075.1.patch.txt, HIVE-7075.2.patch.txt, 
 HIVE-7075.3.patch.txt


 We have noticed that the JsonSerde produces a NullPointerException if a JSON 
 object has a key value that is not lower case. For example. Assume we have 
 the file one.json: 
 { empId : 123, name : John } 
 { empId : 456, name : Jane } 
 hive CREATE TABLE emps (empId INT, name STRING) 
 ROW FORMAT SERDE org.apache.hive.hcatalog.data.JsonSerDe; 
 hive LOAD DATA LOCAL INPATH 'one.json' INTO TABLE emps; 
 hive SELECT * FROM emps; 
 Failed with exception java.io.IOException:java.lang.NullPointerException 
  
 Notice, it seems to work if the keys are lower case. Assume we have the file 
 'two.json': 
 { empid : 123, name : John } 
 { empid : 456, name : Jane } 
 hive DROP TABLE emps; 
 hive CREATE TABLE emps (empId INT, name STRING) 
 ROW FORMAT SERDE org.apache.hive.hcatalog.data.JsonSerDe; 
 hive LOAD DATA LOCAL INPATH 'two.json' INTO TABLE emps;
 hive SELECT * FROM emps; 
 OK 
 123   John 
 456   Jane



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7131) Dependencies of fetch task for tez are not shown properly

2014-06-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7131:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Dependencies of fetch task for tez are not shown properly
 -

 Key: HIVE-7131
 URL: https://issues.apache.org/jira/browse/HIVE-7131
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.14.0

 Attachments: HIVE-7131.1.patch.txt, HIVE-7131.2.patch.txt, 
 HIVE-7131.3.patch.txt


 HIVE-3925 made dependencies for fetch task. But missed that for Tez tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7135) Fix test fail of TestTezTask.testSubmit

2014-06-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019023#comment-14019023
 ] 

Ashutosh Chauhan commented on HIVE-7135:


+1

 Fix test fail of TestTezTask.testSubmit
 ---

 Key: HIVE-7135
 URL: https://issues.apache.org/jira/browse/HIVE-7135
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7135.1.patch, HIVE-7135.2.patch.txt


 HIVE-7043 broke a tez test case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7086) TestHiveServer2.testConnection is failing on trunk

2014-06-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7086:
---

Fix Version/s: (was: 0.14.0)
   Status: Open  (was: Patch Available)

Seems like further investigation is required for this one.

 TestHiveServer2.testConnection is failing on trunk
 --

 Key: HIVE-7086
 URL: https://issues.apache.org/jira/browse/HIVE-7086
 Project: Hive
  Issue Type: Test
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Vaibhav Gumashta
 Attachments: HIVE-7086.1.patch


 Able to repro locally on fresh checkout



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7136:
---

Status: Open  (was: Patch Available)

Our ptest framework accepts patches named only in certain format. Please upload 
patch with filename in one of these supported format : 
https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing

 Allow Hive to read hive scripts from any of the supported file systems in 
 hadoop eco-system
 ---

 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
Priority: Minor
 Attachments: HIVE-7136-1.patch, HIVE-7136.patch


 Current hive cli assumes that the source file (hive script) is always on the 
 local file system. This patch implements support for reading source files 
 from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
 the default behavior intact to be reading from default filesystem (local) in 
 case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor

2014-06-05 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni reassigned HIVE-7180:
--

Assignee: Swarnim Kulkarni

 BufferedReader is not closed in MetaStoreSchemaInfo ctor
 

 Key: HIVE-7180
 URL: https://issues.apache.org/jira/browse/HIVE-7180
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Swarnim Kulkarni
Priority: Minor

 Here is related code:
 {code}
   BufferedReader bfReader =
 new BufferedReader(new FileReader(upgradeListFile));
   String currSchemaVersion;
   while ((currSchemaVersion = bfReader.readLine()) != null) {
 upgradeOrderList.add(currSchemaVersion.trim());
 {code}
 BufferedReader / FileReader should be closed upon return from ctor.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator

2014-06-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4867:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed .4 patch since that has lot less failures. Lets take up removal of 
values for mapjoin for smaller table in HIVE-7173 since looks like there are 
some additional test fails which we need to work through.

 Deduplicate columns appearing in both the key list and value list of 
 ReduceSinkOperator
 ---

 Key: HIVE-4867
 URL: https://issues.apache.org/jira/browse/HIVE-4867
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-4867.1.patch.txt, HIVE-4867.2.patch.txt, 
 HIVE-4867.3.patch.txt, HIVE-4867.4.patch.txt, HIVE-4867.5.patch.txt, 
 source_only.txt


 A ReduceSinkOperator emits data in the format of keys and values. Right now, 
 a column may appear in both the key list and value list, which result in 
 unnecessary overhead for shuffling. 
 Example:
 We have a query shown below ...
 {code:sql}
 explain select ss_ticket_number from store_sales cluster by ss_ticket_number;
 {\code}
 The plan is ...
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 store_sales 
   TableScan
 alias: store_sales
 Select Operator
   expressions:
 expr: ss_ticket_number
 type: int
   outputColumnNames: _col0
   Reduce Output Operator
 key expressions:
   expr: _col0
   type: int
 sort order: +
 Map-reduce partition columns:
   expr: _col0
   type: int
 tag: -1
 value expressions:
   expr: _col0
   type: int
   Reduce Operator Tree:
 Extract
   File Output Operator
 compressed: false
 GlobalTableId: 0
 table:
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {\code}
 The column 'ss_ticket_number' is in both the key list and value list of the 
 ReduceSinkOperator. The type of ss_ticket_number is int. For this case, 
 BinarySortableSerDe will introduce 1 byte more for every int in the key. 
 LazyBinarySerDe will also introduce overhead when recording the length of a 
 int. For every int, 10 bytes should be a rough estimation of the size of data 
 emitted from the Map phase. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit

2014-06-05 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019089#comment-14019089
 ] 

shanyu zhao commented on HIVE-7155:
---

[~ekoifman] I did publish it. I can add comments to it. Can you please double 
check? Thx.

 WebHCat controller job exceeds container memory limit
 -

 Key: HIVE-7155
 URL: https://issues.apache.org/jira/browse/HIVE-7155
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: HIVE-7155.1.patch, HIVE-7155.patch


 Submit a Hive query on a large table via WebHCat results in failure because 
 the WebHCat controller job is killed by Yarn since it exceeds the memory 
 limit (set by mapreduce.map.memory.mb, defaults to 1GB):
 {code}
  INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from 
 Stage_InjusticeEvents where LogTimestamp  '2014-03-01 00:00:00' and 
 LogTimestamp = '2014-03-01 01:00:00';
 {code}
 We could increase mapreduce.map.memory.mb to solve this problem, but this way 
 we are changing this setting system wise.
 We need to provide a WebHCat configuration to overwrite 
 mapreduce.map.memory.mb when submitting the controller job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7181) Beginner User On Apache Jira

2014-06-05 Thread Nishant Kelkar (JIRA)
Nishant Kelkar created HIVE-7181:


 Summary: Beginner User On Apache Jira
 Key: HIVE-7181
 URL: https://issues.apache.org/jira/browse/HIVE-7181
 Project: Hive
  Issue Type: Wish
Reporter: Nishant Kelkar
Priority: Minor


Hi All! 

I've just started to use Apache's Jira board (I registered today). I've used 
Jira for my work before, so I know how to navigate within Jira. But my main 
question, was understanding how issues are handled in the open source community 
(to which I want to contribute, but I'm a noob here too). So basically, a 
person comes up with a ticket when he/she thinks that the issue they are 
facing, is a bug/improvement. 

Questions:
1. Whom am I supposed to assign the ticket to? (myself?)
2. Who would be the QA assignee? 
3. If addressing the issue requires looking at the code, how am I supposed to 
change the code and bring into effect those changes? (At work, we maintain a 
Git repo on our private server. So everyone always has access to the latest 
code).
4. Where can I find a list of all the people who are active on this project 
(Hive)? It would be nice if I could tag people by their names in my ticket 
comments. 
5. Where can I find well formatted documentation about how to take issues from 
discovery to fixture on Apache Jira? 

I apologize in advance, if my questions are too simple.

Thanks, and any/all help is appreciated! 




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE

2014-06-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019133#comment-14019133
 ] 

Ashutosh Chauhan commented on HIVE-7050:


[~prasanth_j] Does this also support display of column stats for a particular 
partition of table? Test case doesnt cover it, so not sure. I was hoping 
following syntax to work, but seems like not supported yet.
{code}
describe formatted T partition (k1=v1) c1;
{code}

 Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
 -

 Key: HIVE-7050
 URL: https://issues.apache.org/jira/browse/HIVE-7050
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, 
 HIVE-7050.4.patch, HIVE-7050.5.patch, HIVE-7050.6.patch


 There is currently no way to display the column level stats from hive CLI. It 
 will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22170: analyze table T compute statistics for columns; will now compute stats for all columns.

2014-06-05 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22170/
---

(Updated June 5, 2014, 7:37 p.m.)


Review request for hive and Prasanth_J.


Changes
---

Incorporated Prashanth's suggestion for displaying column stats.


Bugs: HIVE-7168
https://issues.apache.org/jira/browse/HIVE-7168


Repository: hive-git


Description
---

analyze table T compute statistics for columns; will now compute stats for all 
columns.


Diffs (updated)
-

  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 1245d80 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
5b77e6f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 6d958fd 
  ql/src/test/queries/clientpositive/columnstats_partlvl.q 9dfe8ff 
  ql/src/test/queries/clientpositive/columnstats_tbllvl.q 170fbc5 
  ql/src/test/results/clientpositive/columnstats_partlvl.q.out d91be8d 
  ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 3d3d0e2 

Diff: https://reviews.apache.org/r/22170/diff/


Testing
---

Added new tests.


Thanks,

Ashutosh Chauhan



[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Attachment: HIVE-7168.1.patch

Incorporated Prashanth's feedback.

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Status: Patch Available  (was: Open)

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7168:
---

Status: Open  (was: Patch Available)

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE

2014-06-05 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019147#comment-14019147
 ] 

Prasanth J commented on HIVE-7050:
--

No it is not supported yet. HIVE-7051 is created to support it. 

 Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
 -

 Key: HIVE-7050
 URL: https://issues.apache.org/jira/browse/HIVE-7050
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, 
 HIVE-7050.4.patch, HIVE-7050.5.patch, HIVE-7050.6.patch


 There is currently no way to display the column level stats from hive CLI. It 
 will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-05 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Attachment: (was: HIVE-7136-1.patch)

 Allow Hive to read hive scripts from any of the supported file systems in 
 hadoop eco-system
 ---

 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
Priority: Minor
 Attachments: HIVE-7136.01.patch, HIVE-7136.patch


 Current hive cli assumes that the source file (hive script) is always on the 
 local file system. This patch implements support for reading source files 
 from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
 the default behavior intact to be reading from default filesystem (local) in 
 case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-05 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Attachment: HIVE-7136.01.patch

Renaming the patch file name to meet ptest requirements

 Allow Hive to read hive scripts from any of the supported file systems in 
 hadoop eco-system
 ---

 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
Priority: Minor
 Attachments: HIVE-7136.01.patch, HIVE-7136.patch


 Current hive cli assumes that the source file (hive script) is always on the 
 local file system. This patch implements support for reading source files 
 from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
 the default behavior intact to be reading from default filesystem (local) in 
 case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-05 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Status: Patch Available  (was: Open)

 Allow Hive to read hive scripts from any of the supported file systems in 
 hadoop eco-system
 ---

 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
Priority: Minor
 Attachments: HIVE-7136.01.patch, HIVE-7136.patch


 Current hive cli assumes that the source file (hive script) is always on the 
 local file system. This patch implements support for reading source files 
 from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
 the default behavior intact to be reading from default filesystem (local) in 
 case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3

2014-06-05 Thread Sushanth Sowmyan
Thanks for testing it out, folks. With 3 PMC +1s, and no outstanding
-1s, and 72 hours having passed, the vote passes. I will proceed with
the release process and send out an announce mail shortly.

Thanks to all the others that helped out with the previous RCs as well! :)

On Thu, Jun 5, 2014 at 9:09 AM, Ashutosh Chauhan hashut...@apache.org wrote:
 +1


 On Thu, Jun 5, 2014 at 5:32 AM, Yu Azuryy azur...@outlook.com wrote:

 +1 now.

 Sorry, It's my fault. I hacked Hadoop major version in my test cluster as
 1.3.0, actually it was 2.4.0.
 But ShimLoader parse major version as 1, so get HDFSv1 FileSystem. then I
 get this error.




  Date: Thu, 5 Jun 2014 01:20:10 -0700
  Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
  From: vgumas...@hortonworks.com
  To: dev@hive.apache.org
 
  Yu,
 
  I don't think tez-0.5 has been released yet. Can you try with tez-0.4 (
  http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/
 )?
 
  Thanks,
  --Vaibhav
 
 
  On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote:
 
   I just changed ConnectionURL and warehouse in hive-site.xml,
  
   As for tez-0.5-snapshot,
 property
   nametez.lib.uris/name
value${fs.defaultFS}/data/tez/lib/value
 /property
 property
   nametez.am.resource.memory.mb/name
   value1280/value
 /property
  
  
   None others.
  
  
Date: Wed, 4 Jun 2014 23:28:07 -0700
Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
From: the...@hortonworks.com
To: dev@hive.apache.org
   
Yu,
I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 .
I tried it with both file system cache enabled and disabled.
What are the non default configurations you have on your machine ?
   
Thanks,
Thejas
   
   
   
On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com
 wrote:
 Hive-0.13.0 works well in my test cluster.





 -1

 Verified with hadoop-2.4.0 and tez-0.5-snapshot,
 Hive cannot start?

 And I also built hive branch-0.13, the same error.

 [test@vm-10-154-** tmp]$ hive
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.input.dir.recursive is deprecated. Instead, use
   mapreduce.input.fileinputformat.input.dir.recursive
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.max.split.size is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.maxsize
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size.per.node is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize.per.node
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.min.split.size.per.rack is deprecated. Instead, use
   mapreduce.input.fileinputformat.split.minsize.per.rack
 14/06/05 13:53:45 INFO Configuration.deprecation:
 mapred.reduce.tasks
   is deprecated. Instead, use mapreduce.job.reduces
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
   mapreduce.reduce.speculative
 14/06/05 13:53:45 INFO Configuration.deprecation:
   mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use
   mapreduce.job.committer.setup.cleanup.needed
 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED:
   hive.metastore.ds.retry.* no longer has any effect.  Use
   hive.hmshandler.retry.* instead
 Logging initialized using configuration in
  
 jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties
 Exception in thread main java.lang.RuntimeException:
   java.io.IOException: Filesystem closed
  at
  
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
  
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
  
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.io.IOException: Filesystem closed
  at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727)
  at
 org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780)
  at
  
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066)
  at
  
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062)
  at
  
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at
  
 

[jira] [Created] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()

2014-06-05 Thread Ted Yu (JIRA)
Ted Yu created HIVE-7182:


 Summary: ResultSet is not closed in JDBCStatsPublisher#init()
 Key: HIVE-7182
 URL: https://issues.apache.org/jira/browse/HIVE-7182
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
ResultSet rs = dbm.getTables(null, null, 
JDBCStatsUtils.getStatTableName(), null);
boolean tblExists = rs.next();
{code}
rs is not closed upon return from init()

If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the 
close() call should be placed in finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7181) Beginner User On Apache Jira

2014-06-05 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019346#comment-14019346
 ] 

Lefty Leverenz commented on HIVE-7181:
--

Welcome to the community, [~nkelkar]!

1. In general, you would only assign a Jira ticket to yourself if you intended 
to fix it.  You can leave it unassigned if you don't know who will work on it.  
For example, this ticket wouldn't be assigned to you ... but actually questions 
like these don't belong in the Jira, they should be sent to 
dev@hive.apache.org.  Have you joined the Hive mailing lists yet?  (See link 
below.)
2.  We don't have QA assignees, or if we do it's news to me.
3.  See the contributor documentation in the wiki (How to Contribute).
4.  Good question -- I keep a list of Jira usernames which I'll post in a 
separate comment, but it's far from complete.  The wiki has a People page which 
links to a chart  list of contributors.  But you can just type @firstName 
lastName in the comment box and a list of possibilities will appear, then 
click on one of them to insert the tag.
5.  See the contributor documentation in the wiki.

* [Hive mailing lists | http://hive.apache.org/mailing_lists.html]
* [People page | http://hive.apache.org/people.html]
** [Chart  list of contributors | 
https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12310843statistictype=assigneesselectedProjectId=12310843reportKey=com.atlassian.jira.plugin.system.reports%3Apie-reportNext=Next]
* [Hive Wiki:  Resources for Contributors | 
https://cwiki.apache.org/confluence/display/Hive/Home#Home-ResourcesforContributors]
** [How to Contribute | 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute]
** [Developer Guide | 
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide]
** [Building Hive | 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Building]


 Beginner User On Apache Jira
 

 Key: HIVE-7181
 URL: https://issues.apache.org/jira/browse/HIVE-7181
 Project: Hive
  Issue Type: Wish
Reporter: Nishant Kelkar
Priority: Minor
  Labels: documentation, newbie

 Hi All! 
 I've just started to use Apache's Jira board (I registered today). I've used 
 Jira for my work before, so I know how to navigate within Jira. But my main 
 question, was understanding how issues are handled in the open source 
 community (to which I want to contribute, but I'm a noob here too). So 
 basically, a person comes up with a ticket when he/she thinks that the issue 
 they are facing, is a bug/improvement. 
 Questions:
 1. Whom am I supposed to assign the ticket to? (myself?)
 2. Who would be the QA assignee? 
 3. If addressing the issue requires looking at the code, how am I supposed to 
 change the code and bring into effect those changes? (At work, we maintain a 
 Git repo on our private server. So everyone always has access to the latest 
 code).
 4. Where can I find a list of all the people who are active on this project 
 (Hive)? It would be nice if I could tag people by their names in my ticket 
 comments. 
 5. Where can I find well formatted documentation about how to take issues 
 from discovery to fixture on Apache Jira? 
 I apologize in advance, if my questions are too simple.
 Thanks, and any/all help is appreciated! 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7181) Beginner User On Apache Jira

2014-06-05 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019361#comment-14019361
 ] 

Lefty Leverenz commented on HIVE-7181:
--

Here's an incomplete list of Hive contributors alphabetized by username, with 
-- (duplicates) for first-name lookup:

alangates --  Alan Gates 
amalakar -- Arup Malakar 
apivovarov -- Alexander Pivovarov
appodictic -- Edward Capriolo 
ashutoshc -- Ashutosh Chauhan 
brocknoland -- Brock Noland 
busbey -- Sean Busbey 
cdrome -- Chris Drome 
chouhan -- Rakesh Chouhan 
cos -- Konstantin Boudnik 
cwsteinbach -- Carl Steinbach 
deepesh -- Deepesh Khandelwal 
drankye -- Kai Zheng 
dschorow -- David Schorow  
-- (appodictic) -- Edward Capriolo 
ehans -- Eric Hanson 
ekoifman -- Eugene Koifman 
-- (toffer) -- Francis Liu 
-- (wangfsh) -- Fusheng Wang 
kevinwilfong -- Kevin Wilfong  
-- (cos) -- Konstantin Boudnik 
fwiffo -- Joey Echeverria  
gopalv -- Gopal V 
hagleitn -- Gunther Hagleitner 
-- (rhbutani) -- Harish Butani  
hsubramaniyan -- Hari Sankar Sivarama Subramaniyan  
-- (qwertymaniac) -- Harsh J 
jarcec -- Jarek Jarcec Cecho
jdere -- Jason Dere 
jnp -- Jitendra Nath Pandey 
-- (fwiffo) -- Joey Echeverria  
jcoffey -- Justin Coffey 
-- (drankye) -- Kai Zheng 
lars_francke -- Lars Francke 
leftylev -- Lefty Leverenz 
mattf -- Matt Foley 
namit -- Namit Jain 
navis -- Navis Ryu 
ndimiduk -- Nick Dimiduk 
nitinpawar432 -- Nitin Pawar 
owen.omalley -- Owen O'Malley 
prasadm -- Prasad Mujumdar 
prasanth_j -- Prasanth Jayachandran 
rhbutani -- Harish Butani 
-- (chouhan) -- Rakesh Chouhan 
roshan_naik -- Roshan Naik 
rusanu -- Remus Rusanu 
qwertymaniac -- Harsh J 
sershe -- Sergey Shelukhin 
shivshi -- Shivaraju Gowda  
shuainie -- Shuaishuai Nie 
subrotosanyal -- Subroto Sanyal 
sushanth -- Sushanth Sowmyan 
sxyuan -- Samuel Yuan 
-- (busbey) -- Sean Busbey 
szehon -- Szehon Ho 
teddy.choi -- Teddy Choi  
thejas -- Thejas Nair 
thiruvel -- Thiruvel Thirumoolan 
toffer -- Francis Liu 
vgumashta -- Vaibhav Gumashta 
vikram.dixit -- Vikram Dixit Kumaraswamy 
vikramsi -- Vikram S  
vinodkv -- Vinod Kumar Vavilapalli 
viraj -- Viraj Bhat
xuefuz -- Xuefu Zhang 
wangfsh -- Fusheng Wang 
wzc1989 -- Zhichun Wu 
yhuai -- Yin Huai
-- (wzc1989) -- Zhichun Wu 


 Beginner User On Apache Jira
 

 Key: HIVE-7181
 URL: https://issues.apache.org/jira/browse/HIVE-7181
 Project: Hive
  Issue Type: Wish
Reporter: Nishant Kelkar
Priority: Minor
  Labels: documentation, newbie

 Hi All! 
 I've just started to use Apache's Jira board (I registered today). I've used 
 Jira for my work before, so I know how to navigate within Jira. But my main 
 question, was understanding how issues are handled in the open source 
 community (to which I want to contribute, but I'm a noob here too). So 
 basically, a person comes up with a ticket when he/she thinks that the issue 
 they are facing, is a bug/improvement. 
 Questions:
 1. Whom am I supposed to assign the ticket to? (myself?)
 2. Who would be the QA assignee? 
 3. If addressing the issue requires looking at the code, how am I supposed to 
 change the code and bring into effect those changes? (At work, we maintain a 
 Git repo on our private server. So everyone always has access to the latest 
 code).
 4. Where can I find a list of all the people who are active on this project 
 (Hive)? It would be nice if I could tag people by their names in my ticket 
 comments. 
 5. Where can I find well formatted documentation about how to take issues 
 from discovery to fixture on Apache Jira? 
 I apologize in advance, if my questions are too simple.
 Thanks, and any/all help is appreciated! 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()

2014-06-05 Thread Ted Yu (JIRA)
Ted Yu created HIVE-7183:


 Summary: Size of partColumnGrants should be checked in 
ObjectStore#removeRole()
 Key: HIVE-7183
 URL: https://issues.apache.org/jira/browse/HIVE-7183
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is related code:
{code}
ListMPartitionColumnPrivilege partColumnGrants = 
listPrincipalAllPartitionColumnGrants(
mRol.getRoleName(), PrincipalType.ROLE);
if (tblColumnGrants.size()  0) {
  pm.deletePersistentAll(partColumnGrants);
{code}
Size of tblColumnGrants is currently checked.
Size of partColumnGrants should be checked instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7181) Beginner User On Apache Jira

2014-06-05 Thread Nishant Kelkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019393#comment-14019393
 ] 

Nishant Kelkar commented on HIVE-7181:
--

[~leftylev], thanks so much! These links are really helpful! Specially, the 
People page and the How to Contribute pages. 

Also, I've sent a request email at user-subscr...@hive.apache.org and 
dev-subscr...@hive.apache.org, so I guess I should be hearing soon. 

Thanks again! 

 Beginner User On Apache Jira
 

 Key: HIVE-7181
 URL: https://issues.apache.org/jira/browse/HIVE-7181
 Project: Hive
  Issue Type: Wish
Reporter: Nishant Kelkar
Priority: Minor
  Labels: documentation, newbie

 Hi All! 
 I've just started to use Apache's Jira board (I registered today). I've used 
 Jira for my work before, so I know how to navigate within Jira. But my main 
 question, was understanding how issues are handled in the open source 
 community (to which I want to contribute, but I'm a noob here too). So 
 basically, a person comes up with a ticket when he/she thinks that the issue 
 they are facing, is a bug/improvement. 
 Questions:
 1. Whom am I supposed to assign the ticket to? (myself?)
 2. Who would be the QA assignee? 
 3. If addressing the issue requires looking at the code, how am I supposed to 
 change the code and bring into effect those changes? (At work, we maintain a 
 Git repo on our private server. So everyone always has access to the latest 
 code).
 4. Where can I find a list of all the people who are active on this project 
 (Hive)? It would be nice if I could tag people by their names in my ticket 
 comments. 
 5. Where can I find well formatted documentation about how to take issues 
 from discovery to fixture on Apache Jira? 
 I apologize in advance, if my questions are too simple.
 Thanks, and any/all help is appreciated! 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7185) KeyWrapperFactory#TextKeyWrapper#equals() extracts Text incorrectly when isCopy is false

2014-06-05 Thread Ted Yu (JIRA)
Ted Yu created HIVE-7185:


 Summary: KeyWrapperFactory#TextKeyWrapper#equals() extracts Text 
incorrectly when isCopy is false
 Key: HIVE-7185
 URL: https://issues.apache.org/jira/browse/HIVE-7185
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
  } else {
t1 = soi_new.getPrimitiveWritableObject(key);
t2 = soi_copy.getPrimitiveWritableObject(obj);
{code}
t2 should be assigned soi_new.getPrimitiveWritableObject(obj)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7184) TestHadoop20SAuthBridge no longer compiles after HADOOP-10448

2014-06-05 Thread Jason Dere (JIRA)
Jason Dere created HIVE-7184:


 Summary: TestHadoop20SAuthBridge no longer compiles after 
HADOOP-10448
 Key: HIVE-7184
 URL: https://issues.apache.org/jira/browse/HIVE-7184
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.14.0
Reporter: Jason Dere


HADOOP-10448 moves a couple of methods which were being used by the 
TestHadoop20SAuthBridge test. If/when Hive build uses Hadoop 2.5 as a 
dependency, this will cause compilation errors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7184) TestHadoop20SAuthBridge no longer compiles after HADOOP-10448

2014-06-05 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7184:
-

Attachment: HIVE-7184.1.patch

Attaching patch that should allow test to compile once Hive starts compiling 
with Hadoop 2.5

 TestHadoop20SAuthBridge no longer compiles after HADOOP-10448
 -

 Key: HIVE-7184
 URL: https://issues.apache.org/jira/browse/HIVE-7184
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.14.0
Reporter: Jason Dere
 Attachments: HIVE-7184.1.patch


 HADOOP-10448 moves a couple of methods which were being used by the 
 TestHadoop20SAuthBridge test. If/when Hive build uses Hadoop 2.5 as a 
 dependency, this will cause compilation errors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns

2014-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019430#comment-14019430
 ] 

Hive QA commented on HIVE-7168:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648534/HIVE-7168.1.patch

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 5510 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_collect_set
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_top_level
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/396/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/396/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-396/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648534

 Don't require to name all columns in analyze statements if stats collection 
 is for all columns
 --

 Key: HIVE-7168
 URL: https://issues.apache.org/jira/browse/HIVE-7168
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-7168.1.patch, HIVE-7168.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7166:


Status: Open  (was: Patch Available)

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results

2014-06-05 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7166:


Attachment: HIVE-7166.2.patch

 Vectorization with UDFs returns incorrect results
 -

 Key: HIVE-7166
 URL: https://issues.apache.org/jira/browse/HIVE-7166
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.13.0
 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster 
Reporter: Benjamin Bowman
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Minor
 Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch


 Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect 
 query results. 
 Example Query:  SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - 
 X) and UDF_1
 The following test scenario will reproduce the problem:
 TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1):  
 package com.test;
 import org.apache.hadoop.hive.ql.exec.Description;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import java.lang.String;
 import java.lang.*;
 public class tenThousand extends UDF {
   private final LongWritable result = new LongWritable();
   public LongWritable evaluate() {
 result.set(1);
 return result;
   }
 }
 TEST DATA (test.input):
 1|CBCABC|12
 2|DBCABC|13
 3|EBCABC|14
 4|ABCABC|15
 5|BBCABC|16
 6|CBCABC|17
 CREATING ORC TABLE:
 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, 
 second varchar(20), third int) partitioned by (range int) clustered by 
 (first) sorted by (first) into 8 buckets stored as orc tblproperties 
 (orc.compress = SNAPPY, orc.index = true);
 CREATE LOADING TABLE:
 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, 
 second varchar(20), third int) partitioned by (range int) row format 
 delimited fields terminated by '|' stored as textfile;
 COPY IN DATA:
 [root@server]#  hadoop fs -copyFromLocal /tmp/test.input /db/loading/.
 ORC DATA:
 [root@server]#  beeline -u jdbc:hive2://server:10002/db -n root --hiveconf 
 hive.exec.dynamic.partition.mode=nonstrict --hiveconf 
 hive.enforce.sorting=true -e insert into table testTabOrc partition(range) 
 select * from loadingDir;
 LOAD TEST FUNCTION:
 0: jdbc:hive2://server:10002/db  add jar /opt/hadoop/lib/testFunction.jar
 0: jdbc:hive2://server:10002/db  create temporary function ten_thousand as 
 'com.test.tenThousand';
 TURN OFF VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=false;
 QUERY (RESULTS AS EXPECTED):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 | 1  |
 | 2  |
 | 3  |
 ++
 3 rows selected (15.286 seconds)
 TURN ON VECTORIZATION:
 0: jdbc:hive2://server:10002/db  set hive.vectorized.execution.enabled=true;
 QUERY AGAIN (WRONG RESULTS):
 0: jdbc:hive2://server:10002/db select first from testTabOrc where first 
 between ten_thousand()-1 and ten_thousand()-9995;
 ++
 | first  |
 ++
 ++
 No rows selected (17.763 seconds)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically

2014-06-05 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-2777:
--

Status: Open  (was: Patch Available)

 ability to add and drop partitions atomically
 -

 Key: HIVE-2777
 URL: https://issues.apache.org/jira/browse/HIVE-2777
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch, 
 hive-2777.patch


 Hive should have ability to atomically add and drop partitions. This way 
 admins can change partitions atomically without breaking the running jobs. It 
 allows admin to merge several partitions into one.
 Essentially, we would like to have an api- add_drop_partitions(String db, 
 String tbl_name, ListPartition addParts, ListListString dropParts, 
 boolean deleteData);
 This jira covers changes required for metastore and thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019514#comment-14019514
 ] 

Hive QA commented on HIVE-7136:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648539/HIVE-7136.01.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 5585 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats16
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testListPartitionNames
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testListPartitions
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartition
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/397/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/397/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-397/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12648539

 Allow Hive to read hive scripts from any of the supported file systems in 
 hadoop eco-system
 ---

 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
Priority: Minor
 Attachments: HIVE-7136.01.patch, HIVE-7136.patch


 Current hive cli assumes that the source file (hive script) is always on the 
 local file system. This patch implements support for reading source files 
 from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
 the default behavior intact to be reading from default filesystem (local) in 
 case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7174) Do not accept string as scale and precision when reading Avro schema

2014-06-05 Thread Jarek Jarcec Cecho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jarek Jarcec Cecho updated HIVE-7174:
-

Attachment: dec.avro

 Do not accept string as scale and precision when reading Avro schema
 

 Key: HIVE-7174
 URL: https://issues.apache.org/jira/browse/HIVE-7174
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho
 Fix For: 0.14.0

 Attachments: HIVE-7174.patch, dec.avro


 I've noticed that the current AvroSerde will happily accept schema that uses 
 string instead of integer for scale and precision, e.g. fragment 
 {{precision:4,scale:1}} from following table:
 {code}
 CREATE TABLE `avro_dec1`(
   `name` string COMMENT 'from deserializer',
   `value` decimal(4,1) COMMENT 'from deserializer')
 COMMENT 'just drop the schema right into the HQL'
 ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
   'numFiles'='1',
   
 'avro.schema.literal'='{\namespace\:\com.howdy\,\name\:\some_schema\,\type\:\record\,\fields\:[{\name\:\name\,\type\:\string\},{\name\:\value\,\type\:{\type\:\bytes\,\logicalType\:\decimal\,\precision\:\4\,\scale\:\1\}}]}'
 );
 {code}
 However the Decimal spec defined in AVRO-1402 requires only integer to be 
 there and hence is allowing only following fragment instead 
 {{precision:4,scale:1}} (e.g. no double quotes around numbers).
 As Hive can propagate this incorrect schema to new files and hence creating 
 files with invalid schema, I think that we should alter the behavior and 
 insist on the correct schema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7174) Do not accept string as scale and precision when reading Avro schema

2014-06-05 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019542#comment-14019542
 ] 

Jarek Jarcec Cecho commented on HIVE-7174:
--

I've noticed that file {{dev.avro}} has been created with incorrect schema, so 
I've attached fixed version.  Attached file should replace the one in 
{{data/files/dec.avro}}.

 Do not accept string as scale and precision when reading Avro schema
 

 Key: HIVE-7174
 URL: https://issues.apache.org/jira/browse/HIVE-7174
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho
 Fix For: 0.14.0

 Attachments: HIVE-7174.patch, dec.avro


 I've noticed that the current AvroSerde will happily accept schema that uses 
 string instead of integer for scale and precision, e.g. fragment 
 {{precision:4,scale:1}} from following table:
 {code}
 CREATE TABLE `avro_dec1`(
   `name` string COMMENT 'from deserializer',
   `value` decimal(4,1) COMMENT 'from deserializer')
 COMMENT 'just drop the schema right into the HQL'
 ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
   'numFiles'='1',
   
 'avro.schema.literal'='{\namespace\:\com.howdy\,\name\:\some_schema\,\type\:\record\,\fields\:[{\name\:\name\,\type\:\string\},{\name\:\value\,\type\:{\type\:\bytes\,\logicalType\:\decimal\,\precision\:\4\,\scale\:\1\}}]}'
 );
 {code}
 However the Decimal spec defined in AVRO-1402 requires only integer to be 
 there and hence is allowing only following fragment instead 
 {{precision:4,scale:1}} (e.g. no double quotes around numbers).
 As Hive can propagate this incorrect schema to new files and hence creating 
 files with invalid schema, I think that we should alter the behavior and 
 insist on the correct schema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE

2014-06-05 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019549#comment-14019549
 ] 

Ashutosh Chauhan commented on HIVE-7050:


Ok..cool

 Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
 -

 Key: HIVE-7050
 URL: https://issues.apache.org/jira/browse/HIVE-7050
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, 
 HIVE-7050.4.patch, HIVE-7050.5.patch, HIVE-7050.6.patch


 There is currently no way to display the column level stats from hive CLI. It 
 will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS

2014-06-05 Thread pankit thapar
Hi,

I am trying to build hive on my local desktop.
I am facing an issue with test case  :
TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS

The issue is only with hadoop-2 and not with hadoop-1

Has anyone been able to run this test case?

Trace :
org.apache.hadoop.ipc.RemoteException: File /path/to/schema/schema.avsc
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 1 datanode(s) running and no node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1406)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2596)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:563)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:407)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:592)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:211)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:348)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1275)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1123)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:527)


Thanks,
Pankit


TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS fails on hive-13 for hadoop-2

2014-06-05 Thread pankit thapar
Hi,

I am trying to build hive on my local desktop.
I am facing an issue with test case  :
TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS

The issue is only with hadoop-2 and not with hadoop-1

Has anyone been able to run this test case?

Trace :
org.apache.hadoop.ipc.RemoteException: File /path/to/schema/schema.avsc
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 1 datanode(s) running and no node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1406)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2596)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:563)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:407)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:592)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)

at org.apache.hadoop.ipc.Client.call(Client.java:1406)
at org.apache.hadoop.ipc.Client.call(Client.java:1359)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:211)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:348)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1275)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1123)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:527)


Thanks,
Pankit


[jira] [Commented] (HIVE-7174) Do not accept string as scale and precision when reading Avro schema

2014-06-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019552#comment-14019552
 ] 

Xuefu Zhang commented on HIVE-7174:
---

+1

 Do not accept string as scale and precision when reading Avro schema
 

 Key: HIVE-7174
 URL: https://issues.apache.org/jira/browse/HIVE-7174
 Project: Hive
  Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho
 Fix For: 0.14.0

 Attachments: HIVE-7174.patch, dec.avro


 I've noticed that the current AvroSerde will happily accept schema that uses 
 string instead of integer for scale and precision, e.g. fragment 
 {{precision:4,scale:1}} from following table:
 {code}
 CREATE TABLE `avro_dec1`(
   `name` string COMMENT 'from deserializer',
   `value` decimal(4,1) COMMENT 'from deserializer')
 COMMENT 'just drop the schema right into the HQL'
 ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
   'numFiles'='1',
   
 'avro.schema.literal'='{\namespace\:\com.howdy\,\name\:\some_schema\,\type\:\record\,\fields\:[{\name\:\name\,\type\:\string\},{\name\:\value\,\type\:{\type\:\bytes\,\logicalType\:\decimal\,\precision\:\4\,\scale\:\1\}}]}'
 );
 {code}
 However the Decimal spec defined in AVRO-1402 requires only integer to be 
 there and hence is allowing only following fragment instead 
 {{precision:4,scale:1}} (e.g. no double quotes around numbers).
 As Hive can propagate this incorrect schema to new files and hence creating 
 files with invalid schema, I think that we should alter the behavior and 
 insist on the correct schema.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde

2014-06-05 Thread Szehon Ho


 On June 5, 2014, 8:43 a.m., justin coffey wrote:
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java,
   line 165
  https://reviews.apache.org/r/22174/diff/3/?file=603954#file603954line165
 
  A stupid question perhaps, but is INT96 reserved for timestamps in 
  parquet?
  
  I dug this up, but not sure if it's definitive: 
  https://github.com/Parquet/parquet-mr/issues/101

Yea I dont think its reserved, but there is missing an OriginalType annotation 
called 'Timestamp' in parquet for the application to recognize, which will 
require yet another parquet version-bump.

Do you think we can go ahead with it now and then add it later in a follow-up 
JIRA?  Or wait for that to be added first?


- Szehon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22174/#review44805
---


On June 5, 2014, 7:33 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22174/
 ---
 
 (Updated June 5, 2014, 7:33 a.m.)
 
 
 Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang.
 
 
 Bugs: HIVE-6394
 https://issues.apache.org/jira/browse/HIVE-6394
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This uses the Jodd library to convert java.sql.Timestamp type used by Hive 
 into the {julian-day:nanos} format expected by parquet, and vice-versa.
 
 
 Diffs
 -
 
   data/files/parquet_types.txt 0be390b 
   pom.xml 4bb8880 
   ql/pom.xml 13c477a 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
 4da0d30 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java
  29f7e11 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
  57161d8 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
 fb2f5a8 
   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
  3490061 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/parquet_types.q 5d6333c 
   ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 
 
 Diff: https://reviews.apache.org/r/22174/diff/
 
 
 Testing
 ---
 
 Unit tests the new libraries, and also added timestamp data in the 
 parquet_types q-test.
 
 
 Thanks,
 
 Szehon Ho