[jira] [Created] (HIVE-7178) Table alias cannot be used in GROUPING SETS clause if there are more than one column in it
Yibing Shi created HIVE-7178: Summary: Table alias cannot be used in GROUPING SETS clause if there are more than one column in it Key: HIVE-7178 URL: https://issues.apache.org/jira/browse/HIVE-7178 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.13.0 Reporter: Yibing Shi The following SQL doesn't work: EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a), (alias.b, alias.a) ); FAILED: ParseException line 15:34 missing ) at ',' near 'EOF' line 16:0 extraneous input ')' expecting EOF near 'EOF' The following SQL works (without alias in grouping set): EXPLAIN SELECT a, b, c, COUNT(DISTINCT d) FROM table_name GROUP BY a, b, c GROUPING SETS( (a), (b, a) ); Alias works for just one column: EXPLAIN SELECT alias.a, alias.b, alias.c, COUNT(DISTINCT d) FROM table_name alias GROUP BY alias.a, alias.b, alias.c GROUPING SETS( (alias.a) ); Using alias in GROUPING SETS could be very useful if multiple tables are involved in the SELECT (via JOIN) -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355) ... 7 more Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: hbut...@hortonworks.com Date: Wed, 4 Jun 2014 22:42:32 -0700 To: dev@hive.apache.org +1 - Verified signature and checksum - Checked release notes - built source - ran a few unit tests - ran a few queries in local mode On Jun 2, 2014, at 7:20 PM, Thejas Nair the...@hortonworks.com wrote: +1 - Verified signatures and checksum of both packages - Checked release notes - Build source package - Ran simple hive queries on newly built package in local mode - Ran simple queries on package on single node cluster with both tez and mr as execution engines - Ran unit tests On Mon, Jun 2, 2014 at 1:02 PM, Sushanth Sowmyan khorg...@apache.org wrote: Apache Hive 0.13.1 Release Candidate 3 is available here: http://people.apache.org/~khorgath/releases/0.13.1_RC3/artifacts/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1015 Source tag for RC3 is at: http://svn.apache.org/viewvc/hive/tags/release-0.13.1-rc3/ Voting will remain open for 72 hours. Hive PMC Members: Please test and vote. Thanks, -Sushanth -- CONFIDENTIALITY NOTICE NOTICE: This
[jira] [Commented] (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
[ https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018530#comment-14018530 ] xuanjinlee commented on HIVE-1019: -- Hello ,how to use this patch? java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) Key: HIVE-1019 URL: https://issues.apache.org/jira/browse/HIVE-1019 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.6.0 Reporter: Bennie Schut Assignee: Bennie Schut Priority: Minor Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt I keep getting errors like this: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) and : java.io.IOException: cannot find dir = hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in partToPartitionInfo! when running multiple threads with roughly similar queries. I have a patch for this which works for me. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018536#comment-14018536 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-7166: - I looked at this issue. It seems that vectorization cannot be performed trivially for the above example because constant folding is supported only for unary expressions as of now in vectorization. Once HIVE-5771 is committed, this query can be vectorized. The current fix is to disable vectorization in such a scenario so that we fall back to row-mode. cc-ing [~jnp] and [~ehans] for reviewing the patch. Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: HiveServer2, UDF, Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ | 1 | | 2 | | 3 | ++ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ ++ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7166: Status: Patch Available (was: Open) Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Attachments: HIVE-7166.1.patch Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ | 1 | | 2 | | 3 | ++ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ ++ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7166: Component/s: (was: HiveServer2) (was: UDF) Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Attachments: HIVE-7166.1.patch Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ | 1 | | 2 | | 3 | ++ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ ++ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7166: Attachment: HIVE-7166.1.patch Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Attachments: HIVE-7166.1.patch Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ | 1 | | 2 | | 3 | ++ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ ++ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1019) java.io.FileNotFoundException: HIVE_PLAN (No such file or directory)
[ https://issues.apache.org/jira/browse/HIVE-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018555#comment-14018555 ] Bennie Schut commented on HIVE-1019: xuanjinlee this is a somewhat prehistoric patch which I forgot to close. Most people have moved to hiveserver2 which doesn't suffer from these threading issues. Unless anyone objects I would actually like to close this issue. java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) Key: HIVE-1019 URL: https://issues.apache.org/jira/browse/HIVE-1019 Project: Hive Issue Type: Bug Components: Server Infrastructure Affects Versions: 0.6.0 Reporter: Bennie Schut Assignee: Bennie Schut Priority: Minor Attachments: HIVE-1019-1.patch, HIVE-1019-2.patch, HIVE-1019-3.patch, HIVE-1019-4.patch, HIVE-1019-5.patch, HIVE-1019-6.patch, HIVE-1019-7.patch, HIVE-1019-8.patch, HIVE-1019.patch, stacktrace2.txt I keep getting errors like this: java.io.FileNotFoundException: HIVE_PLAN (No such file or directory) and : java.io.IOException: cannot find dir = hdfs://victoria.ebuddy.com:9000/tmp/hive-dwh/801467596/10002 in partToPartitionInfo! when running multiple threads with roughly similar queries. I have a patch for this which works for me. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22174/ --- (Updated June 5, 2014, 7:33 a.m.) Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang. Changes --- Cache the Calendar in thread-local as suggested for performance. Bugs: HIVE-6394 https://issues.apache.org/jira/browse/HIVE-6394 Repository: hive-git Description --- This uses the Jodd library to convert java.sql.Timestamp type used by Hive into the {julian-day:nanos} format expected by parquet, and vice-versa. Diffs (updated) - data/files/parquet_types.txt 0be390b pom.xml 4bb8880 ql/pom.xml 13c477a ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 4da0d30 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 29f7e11 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java 57161d8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java fb2f5a8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 3490061 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_types.q 5d6333c ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 Diff: https://reviews.apache.org/r/22174/diff/ Testing --- Unit tests the new libraries, and also added timestamp data in the parquet_types q-test. Thanks, Szehon Ho
[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6394: Attachment: HIVE-6394.5.patch I don't think so, as I am modifying the values with the given timestamp. I added a thread-local cache of calendar that is lazily-created. Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018563#comment-14018563 ] Hive QA commented on HIVE-4867: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648427/HIVE-4867.5.patch.txt {color:red}ERROR:{color} -1 due to 43 failed/errored test(s), 5510 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_filters org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_rearrange org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters_overlap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_reorder4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_filter_on_outerjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_subquery2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_test_outer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multi_join_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_reduce_deduplicate_exclude_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_nested_mapjoin org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_sortmerge_mapjoin_mismatch_1 org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/390/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/390/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-390/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 43 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648427 Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator --- Key: HIVE-4867 URL: https://issues.apache.org/jira/browse/HIVE-4867 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Navis Attachments: HIVE-4867.1.patch.txt, HIVE-4867.2.patch.txt, HIVE-4867.3.patch.txt, HIVE-4867.4.patch.txt, HIVE-4867.5.patch.txt, source_only.txt A ReduceSinkOperator emits data in the format of keys and values. Right now, a column may appear in both the key list and value
[jira] [Commented] (HIVE-7110) TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile
[ https://issues.apache.org/jira/browse/HIVE-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018562#comment-14018562 ] Szehon Ho commented on HIVE-7110: - I mean, this test is working for me and also on the main build, so I don't see a need for a fix. These were indeed using maven. You can check on-going builds here: [http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build|http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build] TestHCatPartitionPublish test failure: No FileSystem or scheme: pfile - Key: HIVE-7110 URL: https://issues.apache.org/jira/browse/HIVE-7110 Project: Hive Issue Type: Bug Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7110.1.patch, HIVE-7110.2.patch, HIVE-7110.3.patch, HIVE-7110.4.patch I got the following TestHCatPartitionPublish test failure when running all unit tests against Hadoop 1. This also appears when testing against Hadoop 2. {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 26.06 sec FAILURE! - in org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish testPartitionPublish(org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish) Time elapsed: 1.361 sec ERROR! org.apache.hive.hcatalog.common.HCatException: org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output information. Cause : java.io.IOException: No FileSystem for scheme: pfile at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1443) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:212) at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70) at org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.runMRCreateFail(TestHCatPartitionPublish.java:191) at org.apache.hive.hcatalog.mapreduce.TestHCatPartitionPublish.testPartitionPublish(TestHCatPartitionPublish.java:155) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-1539) Concurrent metastore threading problem
[ https://issues.apache.org/jira/browse/HIVE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bennie Schut resolved HIVE-1539. Resolution: Fixed Release Note: We switched to a datanucleus version = 2.2 a long time ago so this is fixed. Concurrent metastore threading problem --- Key: HIVE-1539 URL: https://issues.apache.org/jira/browse/HIVE-1539 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Bennie Schut Assignee: Bennie Schut Attachments: ClassLoaderResolver.patch, HIVE-1539-1.patch, HIVE-1539.patch, thread_dump_hanging.txt When running hive as a service and running a high number of queries concurrently I end up with multiple threads running at 100% cpu without any progress. Looking at these threads I notice this thread(484e): at org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:598) But on a different thread(63a2): at org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoReplaceField(MStorageDescriptor.java) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7179) hive connect to hbase cause select results error
zhengzhuangjie created HIVE-7179: Summary: hive connect to hbase cause select results error Key: HIVE-7179 URL: https://issues.apache.org/jira/browse/HIVE-7179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.12.0 Environment: Hadoop 1.0.4, HBase 0.94.6.1, Hive 0.12.0 Reporter: zhengzhuangjie CREATE EXTERNAL TABLE hb_test(key string, `_plat` int, `_uid` int)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,d:_plat#b,d:_uid#b)TBLPROPERTIES(hbase.table.name = test); insert overwrite local directory '/data/mf/app/data/check' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select key, `_plat`, `_uid` from hb_test where key'00604' and key'00605'; after finish the query, under fold /data/mf/app/data/check output eight files. and each file has the same duplicate data -- This message was sent by Atlassian JIRA (v6.2#6252)
RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3
I just changed ConnectionURL and warehouse in hive-site.xml, As for tez-0.5-snapshot, property nametez.lib.uris/name value${fs.defaultFS}/data/tez/lib/value /property property nametez.am.resource.memory.mb/name value1280/value /property None others. Date: Wed, 4 Jun 2014 23:28:07 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: the...@hortonworks.com To: dev@hive.apache.org Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355) ... 7 more Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: hbut...@hortonworks.com Date: Wed, 4 Jun 2014 22:42:32 -0700 To: dev@hive.apache.org +1 - Verified signature and checksum - Checked release notes - built source - ran a few unit tests - ran a few queries in local mode On Jun 2, 2014, at 7:20 PM, Thejas Nair the...@hortonworks.com wrote: +1 - Verified signatures and checksum of both packages - Checked release notes - Build source package - Ran simple hive queries on newly built package in local mode - Ran simple queries on package on single node cluster with both tez and mr as execution engines - Ran unit tests On Mon, Jun 2,
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
Yu, I don't think tez-0.5 has been released yet. Can you try with tez-0.4 ( http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)? Thanks, --Vaibhav On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote: I just changed ConnectionURL and warehouse in hive-site.xml, As for tez-0.5-snapshot, property nametez.lib.uris/name value${fs.defaultFS}/data/tez/lib/value /property property nametez.am.resource.memory.mb/name value1280/value /property None others. Date: Wed, 4 Jun 2014 23:28:07 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: the...@hortonworks.com To: dev@hive.apache.org Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355) ... 7 more Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: hbut...@hortonworks.com Date: Wed, 4 Jun 2014 22:42:32 -0700 To: dev@hive.apache.org +1 - Verified signature and checksum - Checked release notes - built source - ran a few unit tests - ran a few queries in local mode On Jun 2, 2014, at 7:20 PM, Thejas Nair the...@hortonworks.com wrote: +1 - Verified signatures and checksum of both
RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3
Ok, I'll try it today. and post back. Date: Thu, 5 Jun 2014 01:20:10 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: vgumas...@hortonworks.com To: dev@hive.apache.org Yu, I don't think tez-0.5 has been released yet. Can you try with tez-0.4 ( http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)? Thanks, --Vaibhav On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote: I just changed ConnectionURL and warehouse in hive-site.xml, As for tez-0.5-snapshot, property nametez.lib.uris/name value${fs.defaultFS}/data/tez/lib/value /property property nametez.am.resource.memory.mb/name value1280/value /property None others. Date: Wed, 4 Jun 2014 23:28:07 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: the...@hortonworks.com To: dev@hive.apache.org Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:355) ... 7 more Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: hbut...@hortonworks.com Date: Wed, 4 Jun 2014 22:42:32 -0700
Re: Hive 0.13/Hcatalog : Mapreduce Exception : java.lang.IncompatibleClassChangeError
I don't have environment to confirm this. But if the this happens, we should include HIVE-6432 into HIVE-0.13.1. 2014-06-05 12:44 GMT+09:00 Navis류승우 navis@nexr.com: It's fixed in HIVE-6432. I think you should rebuild your own hcatalog from source with profile -Phadoop-1. 2014-06-05 9:08 GMT+09:00 Sundaramoorthy, Malliyanathan malliyanathan.sundaramoor...@citi.com: Hi, I am using Hadoop 2.4.0 with Hive 0.13 + included package of HCatalog . Wrote a simple map-reduce job from the example and running the code below .. getting “*Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected“ * .. Not sure of the error I am making .. Not sure if there a compatibility issue .. please help.. *boolean* success = *true*; *try* { Configuration conf = getConf(); args = *new* GenericOptionsParser(conf, args).getRemainingArgs(); //Hive Table Details String dbName = args[0]; String inputTableName= args[1]; String outputTableName= args[2]; //Job Input Job job = *new* *Job**(conf,**Scenarios**)*; //Initialize Map/Reducer Input/Output HCatInputFormat.*setInput*(job,dbName,inputTableName); //HCatInputFormat.ssetInput(job,InputJobInfo.create(dbName, inputTableName, null)); job.setInputFormatClass(HCatInputFormat.*class*); job.setJarByClass(MainRunner.*class*); job.setMapperClass(ScenarioMapper.*class*); job.setReducerClass(ScenarioReducer.*class*); job.setMapOutputKeyClass(IntWritable.*class*); job.setMapOutputValueClass(IntWritable.*class*); job.setOutputKeyClass(WritableComparable.*class*); job.setOutputValueClass(DefaultHCatRecord.*class*); HCatOutputFormat.*setOutput*(job, OutputJobInfo.*create*(dbName, outputTableName, *null*)); HCatSchema outSchema = HCatOutputFormat.*getTableSchema*(conf); System.*err*.println(INFO: output schema explicitly set for writing:+ outSchema); HCatOutputFormat.*setSchema*(job, outSchema); job.setOutputFormatClass(HCatOutputFormat.*class*); 14/06/02 18:52:57 INFO client.RMProxy: Connecting to ResourceManager at localhost/00.04.07.174:8040 Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:104) at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:84) at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:73) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.run(MainRunner.java:79) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.main(MainRunner.java:89) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Regards, Malli
[jira] [Commented] (HIVE-7062) Support Streaming mode in Windowing
[ https://issues.apache.org/jira/browse/HIVE-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018584#comment-14018584 ] Lefty Leverenz commented on HIVE-7062: -- Okay, thanks [~rhbutani]. I've put this with my doc-by-0.14 tasks. Support Streaming mode in Windowing --- Key: HIVE-7062 URL: https://issues.apache.org/jira/browse/HIVE-7062 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.14.0 Attachments: HIVE-7062.1.patch, HIVE-7062.4.patch, HIVE-7062.5.patch, HIVE-7062.6.patch 1. Have the Windowing Table Function support streaming mode. 2. Have special handling for Ranking UDAFs. 3. Have special handling for Sum/Avg for fixed size Wdws. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported
[ https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7160: Attachment: HIVE-7160.1.patch.txt Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported -- Key: HIVE-7160 URL: https://issues.apache.org/jira/browse/HIVE-7160 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gopal V Priority: Minor Attachments: HIVE-7160.1.patch.txt simple UDF missing vectorization - simple example would be hive explain select concat( l_orderkey, ' msecs') from lineitem; is not vectorized while hive explain select concat(cast(l_orderkey as string), ' msecs') from lineitem; can be vectorized. {code} 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf found for GenericUDFConcat, descriptor: Argument Count = 2, mode = PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = {COLUMN,COLUMN} 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is not supported at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported
[ https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7160: Summary: Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported (was: Vectorization Udf: GenericUDFConcat, is not supported) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported -- Key: HIVE-7160 URL: https://issues.apache.org/jira/browse/HIVE-7160 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gopal V Priority: Minor Attachments: HIVE-7160.1.patch.txt simple UDF missing vectorization - simple example would be hive explain select concat( l_orderkey, ' msecs') from lineitem; is not vectorized while hive explain select concat(cast(l_orderkey as string), ' msecs') from lineitem; can be vectorized. {code} 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf found for GenericUDFConcat, descriptor: Argument Count = 2, mode = PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = {COLUMN,COLUMN} 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is not supported at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported
[ https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7160: Assignee: Navis Status: Patch Available (was: Open) There might some design issue. added preferredType for VectorizedExpressions annotation. Will try argument conversion if there are not proper VectorExpression. In this case, concat(columnint/columnstring) is not supported but adding preferredType=String... makes final try with first argument casted to string type. Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported -- Key: HIVE-7160 URL: https://issues.apache.org/jira/browse/HIVE-7160 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis Priority: Minor Attachments: HIVE-7160.1.patch.txt simple UDF missing vectorization - simple example would be hive explain select concat( l_orderkey, ' msecs') from lineitem; is not vectorized while hive explain select concat(cast(l_orderkey as string), ' msecs') from lineitem; can be vectorized. {code} 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf found for GenericUDFConcat, descriptor: Argument Count = 2, mode = PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = {COLUMN,COLUMN} 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is not supported at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22174/#review44805 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java https://reviews.apache.org/r/22174/#comment79343 A stupid question perhaps, but is INT96 reserved for timestamps in parquet? I dug this up, but not sure if it's definitive: https://github.com/Parquet/parquet-mr/issues/101 - justin coffey On June 5, 2014, 7:33 a.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22174/ --- (Updated June 5, 2014, 7:33 a.m.) Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang. Bugs: HIVE-6394 https://issues.apache.org/jira/browse/HIVE-6394 Repository: hive-git Description --- This uses the Jodd library to convert java.sql.Timestamp type used by Hive into the {julian-day:nanos} format expected by parquet, and vice-versa. Diffs - data/files/parquet_types.txt 0be390b pom.xml 4bb8880 ql/pom.xml 13c477a ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 4da0d30 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 29f7e11 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java 57161d8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java fb2f5a8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 3490061 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_types.q 5d6333c ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 Diff: https://reviews.apache.org/r/22174/diff/ Testing --- Unit tests the new libraries, and also added timestamp data in the parquet_types q-test. Thanks, Szehon Ho
RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3
Unfortunately, I got the same error. Hadoop-2.4.0(HA enabled), Hive-0.13.1-rc3, tez-0.4 If I set 'hive.execution.engine' to mr by default, then Hive does work. but if changed it to 'tez', I got the followig error. please look at here for hive and tez configuration: http://pastebin.com/hid1AA83 From: azur...@outlook.com To: dev@hive.apache.org Subject: RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3 Date: Thu, 5 Jun 2014 08:27:27 + Ok, I'll try it today. and post back. Date: Thu, 5 Jun 2014 01:20:10 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: vgumas...@hortonworks.com To: dev@hive.apache.org Yu, I don't think tez-0.5 has been released yet. Can you try with tez-0.4 ( http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)? Thanks, --Vaibhav On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote: I just changed ConnectionURL and warehouse in hive-site.xml, As for tez-0.5-snapshot, property nametez.lib.uris/name value${fs.defaultFS}/data/tez/lib/value /property property nametez.am.resource.memory.mb/name value1280/value /property None others. Date: Wed, 4 Jun 2014 23:28:07 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: the...@hortonworks.com To: dev@hive.apache.org Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at
RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3
Hive-0.13.0 works well under both tez and mr. From: azur...@outlook.com To: dev@hive.apache.org Subject: RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3 Date: Thu, 5 Jun 2014 09:07:41 + Unfortunately, I got the same error. Hadoop-2.4.0(HA enabled), Hive-0.13.1-rc3, tez-0.4 If I set 'hive.execution.engine' to mr by default, then Hive does work. but if changed it to 'tez', I got the followig error. please look at here for hive and tez configuration: http://pastebin.com/hid1AA83 From: azur...@outlook.com To: dev@hive.apache.org Subject: RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3 Date: Thu, 5 Jun 2014 08:27:27 + Ok, I'll try it today. and post back. Date: Thu, 5 Jun 2014 01:20:10 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: vgumas...@hortonworks.com To: dev@hive.apache.org Yu, I don't think tez-0.5 has been released yet. Can you try with tez-0.4 ( http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)? Thanks, --Vaibhav On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote: I just changed ConnectionURL and warehouse in hive-site.xml, As for tez-0.5-snapshot, property nametez.lib.uris/name value${fs.defaultFS}/data/tez/lib/value /property property nametez.am.resource.memory.mb/name value1280/value /property None others. Date: Wed, 4 Jun 2014 23:28:07 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: the...@hortonworks.com To: dev@hive.apache.org Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at
[jira] [Commented] (HIVE-6625) HiveServer2 running in http mode should support trusted proxy access
[ https://issues.apache.org/jira/browse/HIVE-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018611#comment-14018611 ] Lefty Leverenz commented on HIVE-6625: -- Just for the record, [~vaibhavgumashta] updated these user docs on the wiki: * [Admin Manual: Setting Up HiveServer2 | https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2] ** diffs: [https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=30758712selectedPageVersions=25selectedPageVersions=16] * [HiveServer2 Clients | https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients] ** diffs: [https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=30758725selectedPageVersions=43selectedPageVersions=39] HiveServer2 running in http mode should support trusted proxy access Key: HIVE-6625 URL: https://issues.apache.org/jira/browse/HIVE-6625 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.13.0 Attachments: HIVE-6625.1.patch, HIVE-6625.2.patch HIVE-5155 adds trusted proxy access to HiveServer2. This patch a minor change to have it used when running HiveServer2 in http mode. Patch to be applied on top of HIVE-4764 HIVE-5155. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018651#comment-14018651 ] Hive QA commented on HIVE-7166: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648433/HIVE-7166.1.patch {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 5585 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_schema_evolution org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testBetweenFilters org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/392/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/392/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-392/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648433 Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Attachments: HIVE-7166.1.patch Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0:
RE: [VOTE] Apache Hive 0.13.1 Release Candidate 3
+1 now. Sorry, It's my fault. I hacked Hadoop major version in my test cluster as 1.3.0, actually it was 2.4.0. But ShimLoader parse major version as 1, so get HDFSv1 FileSystem. then I get this error. Date: Thu, 5 Jun 2014 01:20:10 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: vgumas...@hortonworks.com To: dev@hive.apache.org Yu, I don't think tez-0.5 has been released yet. Can you try with tez-0.4 ( http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/)? Thanks, --Vaibhav On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote: I just changed ConnectionURL and warehouse in hive-site.xml, As for tez-0.5-snapshot, property nametez.lib.uris/name value${fs.defaultFS}/data/tez/lib/value /property property nametez.am.resource.memory.mb/name value1280/value /property None others. Date: Wed, 4 Jun 2014 23:28:07 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: the...@hortonworks.com To: dev@hive.apache.org Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:144) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123) at
[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018764#comment-14018764 ] Hive QA commented on HIVE-6394: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648439/HIVE-6394.5.patch {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 5589 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_schema_evolution org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_parquet_timestamp org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProvider.testSimplePrivileges org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/393/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/393/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-393/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648439 Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7117) Partitions not inheriting table permissions after alter rename partition
[ https://issues.apache.org/jira/browse/HIVE-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018776#comment-14018776 ] Xuefu Zhang commented on HIVE-7117: --- +1 Partitions not inheriting table permissions after alter rename partition Key: HIVE-7117 URL: https://issues.apache.org/jira/browse/HIVE-7117 Project: Hive Issue Type: Bug Components: Security Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7117.2.patch, HIVE-7117.3.patch, HIVE-7117.4.patch, HIVE-7117.5.patch, HIVE-7117.6.patch, HIVE-7117.7.patch, HIVE-7117.8.patch, HIVE-7117.patch On altering/renaming a partition it must inherit permission of the parent directory, if the flag hive.warehouse.subdir.inherit.perms is set. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
+1 On Thu, Jun 5, 2014 at 5:32 AM, Yu Azuryy azur...@outlook.com wrote: +1 now. Sorry, It's my fault. I hacked Hadoop major version in my test cluster as 1.3.0, actually it was 2.4.0. But ShimLoader parse major version as 1, so get HDFSv1 FileSystem. then I get this error. Date: Thu, 5 Jun 2014 01:20:10 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: vgumas...@hortonworks.com To: dev@hive.apache.org Yu, I don't think tez-0.5 has been released yet. Can you try with tez-0.4 ( http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/ )? Thanks, --Vaibhav On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote: I just changed ConnectionURL and warehouse in hive-site.xml, As for tez-0.5-snapshot, property nametez.lib.uris/name value${fs.defaultFS}/data/tez/lib/value /property property nametez.am.resource.memory.mb/name value1280/value /property None others. Date: Wed, 4 Jun 2014 23:28:07 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: the...@hortonworks.com To: dev@hive.apache.org Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.hadoop.fs.FileSystem.deleteOnExit(FileSystem.java:1353) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createTezDir(TezSessionState.java:297) at
[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit
[ https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018909#comment-14018909 ] Eugene Koifman commented on HIVE-7155: -- [~shanyu] I can't comment on RB. Did you perhaps not publish it? WebHCat controller job exceeds container memory limit - Key: HIVE-7155 URL: https://issues.apache.org/jira/browse/HIVE-7155 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: shanyu zhao Assignee: shanyu zhao Attachments: HIVE-7155.1.patch, HIVE-7155.patch Submit a Hive query on a large table via WebHCat results in failure because the WebHCat controller job is killed by Yarn since it exceeds the memory limit (set by mapreduce.map.memory.mb, defaults to 1GB): {code} INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from Stage_InjusticeEvents where LogTimestamp '2014-03-01 00:00:00' and LogTimestamp = '2014-03-01 01:00:00'; {code} We could increase mapreduce.map.memory.mb to solve this problem, but this way we are changing this setting system wise. We need to provide a WebHCat configuration to overwrite mapreduce.map.memory.mb when submitting the controller job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
Ted Yu created HIVE-7180: Summary: BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7160) Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported
[ https://issues.apache.org/jira/browse/HIVE-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018973#comment-14018973 ] Hive QA commented on HIVE-7160: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648447/HIVE-7160.1.patch.txt {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 5511 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testDropTable org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitionNames org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testListPartitions org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/394/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-394/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648447 Vectorization Udf: GenericUDFConcat for non-string columns input, is not supported -- Key: HIVE-7160 URL: https://issues.apache.org/jira/browse/HIVE-7160 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis Priority: Minor Attachments: HIVE-7160.1.patch.txt simple UDF missing vectorization - simple example would be hive explain select concat( l_orderkey, ' msecs') from lineitem; is not vectorized while hive explain select concat(cast(l_orderkey as string), ' msecs') from lineitem; can be vectorized. {code} 14/05/31 15:28:59 [main]: DEBUG vector.VectorizationContext: No vector udf found for GenericUDFConcat, descriptor: Argument Count = 2, mode = PROJECTION, Argument Types = {LONG, STRING}, Input Expression Types = {COLUMN,COLUMN} 14/05/31 15:28:59 [main]: DEBUG physical.Vectorizer: Failed to vectorize org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFConcat, is not supported at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:918) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7176) FileInputStream is not closed in Commands#properties()
[ https://issues.apache.org/jira/browse/HIVE-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019005#comment-14019005 ] Ashutosh Chauhan commented on HIVE-7176: +1 FileInputStream is not closed in Commands#properties() -- Key: HIVE-7176 URL: https://issues.apache.org/jira/browse/HIVE-7176 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7176.1.patch.txt NO PRECOMMIT TESTS In beeline.Commands, around line 834: {code} props.load(new FileInputStream(parts[i])); {code} The FileInputStream is not closed upon return from the method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7075) JsonSerde raises NullPointerException when object key is not lower case
[ https://issues.apache.org/jira/browse/HIVE-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019003#comment-14019003 ] Ashutosh Chauhan commented on HIVE-7075: cc: [~sushanth] Would you like to review this one? JsonSerde raises NullPointerException when object key is not lower case --- Key: HIVE-7075 URL: https://issues.apache.org/jira/browse/HIVE-7075 Project: Hive Issue Type: Bug Components: HCatalog, Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Yibing Shi Assignee: Navis Attachments: HIVE-7075.1.patch.txt, HIVE-7075.2.patch.txt, HIVE-7075.3.patch.txt We have noticed that the JsonSerde produces a NullPointerException if a JSON object has a key value that is not lower case. For example. Assume we have the file one.json: { empId : 123, name : John } { empId : 456, name : Jane } hive CREATE TABLE emps (empId INT, name STRING) ROW FORMAT SERDE org.apache.hive.hcatalog.data.JsonSerDe; hive LOAD DATA LOCAL INPATH 'one.json' INTO TABLE emps; hive SELECT * FROM emps; Failed with exception java.io.IOException:java.lang.NullPointerException Notice, it seems to work if the keys are lower case. Assume we have the file 'two.json': { empid : 123, name : John } { empid : 456, name : Jane } hive DROP TABLE emps; hive CREATE TABLE emps (empId INT, name STRING) ROW FORMAT SERDE org.apache.hive.hcatalog.data.JsonSerDe; hive LOAD DATA LOCAL INPATH 'two.json' INTO TABLE emps; hive SELECT * FROM emps; OK 123 John 456 Jane -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7131) Dependencies of fetch task for tez are not shown properly
[ https://issues.apache.org/jira/browse/HIVE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7131: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Dependencies of fetch task for tez are not shown properly - Key: HIVE-7131 URL: https://issues.apache.org/jira/browse/HIVE-7131 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.14.0 Attachments: HIVE-7131.1.patch.txt, HIVE-7131.2.patch.txt, HIVE-7131.3.patch.txt HIVE-3925 made dependencies for fetch task. But missed that for Tez tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7135) Fix test fail of TestTezTask.testSubmit
[ https://issues.apache.org/jira/browse/HIVE-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019023#comment-14019023 ] Ashutosh Chauhan commented on HIVE-7135: +1 Fix test fail of TestTezTask.testSubmit --- Key: HIVE-7135 URL: https://issues.apache.org/jira/browse/HIVE-7135 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7135.1.patch, HIVE-7135.2.patch.txt HIVE-7043 broke a tez test case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7086) TestHiveServer2.testConnection is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7086: --- Fix Version/s: (was: 0.14.0) Status: Open (was: Patch Available) Seems like further investigation is required for this one. TestHiveServer2.testConnection is failing on trunk -- Key: HIVE-7086 URL: https://issues.apache.org/jira/browse/HIVE-7086 Project: Hive Issue Type: Test Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Vaibhav Gumashta Attachments: HIVE-7086.1.patch Able to repro locally on fresh checkout -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7136: --- Status: Open (was: Patch Available) Our ptest framework accepts patches named only in certain format. Please upload patch with filename in one of these supported format : https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system --- Key: HIVE-7136 URL: https://issues.apache.org/jira/browse/HIVE-7136 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.13.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Priority: Minor Attachments: HIVE-7136-1.patch, HIVE-7136.patch Current hive cli assumes that the source file (hive script) is always on the local file system. This patch implements support for reading source files from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the default behavior intact to be reading from default filesystem (local) in case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7180) BufferedReader is not closed in MetaStoreSchemaInfo ctor
[ https://issues.apache.org/jira/browse/HIVE-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-7180: -- Assignee: Swarnim Kulkarni BufferedReader is not closed in MetaStoreSchemaInfo ctor Key: HIVE-7180 URL: https://issues.apache.org/jira/browse/HIVE-7180 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: Swarnim Kulkarni Priority: Minor Here is related code: {code} BufferedReader bfReader = new BufferedReader(new FileReader(upgradeListFile)); String currSchemaVersion; while ((currSchemaVersion = bfReader.readLine()) != null) { upgradeOrderList.add(currSchemaVersion.trim()); {code} BufferedReader / FileReader should be closed upon return from ctor. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4867: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed .4 patch since that has lot less failures. Lets take up removal of values for mapjoin for smaller table in HIVE-7173 since looks like there are some additional test fails which we need to work through. Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator --- Key: HIVE-4867 URL: https://issues.apache.org/jira/browse/HIVE-4867 Project: Hive Issue Type: Improvement Reporter: Yin Huai Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-4867.1.patch.txt, HIVE-4867.2.patch.txt, HIVE-4867.3.patch.txt, HIVE-4867.4.patch.txt, HIVE-4867.5.patch.txt, source_only.txt A ReduceSinkOperator emits data in the format of keys and values. Right now, a column may appear in both the key list and value list, which result in unnecessary overhead for shuffling. Example: We have a query shown below ... {code:sql} explain select ss_ticket_number from store_sales cluster by ss_ticket_number; {\code} The plan is ... {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: store_sales TableScan alias: store_sales Select Operator expressions: expr: ss_ticket_number type: int outputColumnNames: _col0 Reduce Output Operator key expressions: expr: _col0 type: int sort order: + Map-reduce partition columns: expr: _col0 type: int tag: -1 value expressions: expr: _col0 type: int Reduce Operator Tree: Extract File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {\code} The column 'ss_ticket_number' is in both the key list and value list of the ReduceSinkOperator. The type of ss_ticket_number is int. For this case, BinarySortableSerDe will introduce 1 byte more for every int in the key. LazyBinarySerDe will also introduce overhead when recording the length of a int. For every int, 10 bytes should be a rough estimation of the size of data emitted from the Map phase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7155) WebHCat controller job exceeds container memory limit
[ https://issues.apache.org/jira/browse/HIVE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019089#comment-14019089 ] shanyu zhao commented on HIVE-7155: --- [~ekoifman] I did publish it. I can add comments to it. Can you please double check? Thx. WebHCat controller job exceeds container memory limit - Key: HIVE-7155 URL: https://issues.apache.org/jira/browse/HIVE-7155 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: shanyu zhao Assignee: shanyu zhao Attachments: HIVE-7155.1.patch, HIVE-7155.patch Submit a Hive query on a large table via WebHCat results in failure because the WebHCat controller job is killed by Yarn since it exceeds the memory limit (set by mapreduce.map.memory.mb, defaults to 1GB): {code} INSERT OVERWRITE TABLE Temp_InjusticeEvents_2014_03_01_00_00 SELECT * from Stage_InjusticeEvents where LogTimestamp '2014-03-01 00:00:00' and LogTimestamp = '2014-03-01 01:00:00'; {code} We could increase mapreduce.map.memory.mb to solve this problem, but this way we are changing this setting system wise. We need to provide a WebHCat configuration to overwrite mapreduce.map.memory.mb when submitting the controller job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7181) Beginner User On Apache Jira
Nishant Kelkar created HIVE-7181: Summary: Beginner User On Apache Jira Key: HIVE-7181 URL: https://issues.apache.org/jira/browse/HIVE-7181 Project: Hive Issue Type: Wish Reporter: Nishant Kelkar Priority: Minor Hi All! I've just started to use Apache's Jira board (I registered today). I've used Jira for my work before, so I know how to navigate within Jira. But my main question, was understanding how issues are handled in the open source community (to which I want to contribute, but I'm a noob here too). So basically, a person comes up with a ticket when he/she thinks that the issue they are facing, is a bug/improvement. Questions: 1. Whom am I supposed to assign the ticket to? (myself?) 2. Who would be the QA assignee? 3. If addressing the issue requires looking at the code, how am I supposed to change the code and bring into effect those changes? (At work, we maintain a Git repo on our private server. So everyone always has access to the latest code). 4. Where can I find a list of all the people who are active on this project (Hive)? It would be nice if I could tag people by their names in my ticket comments. 5. Where can I find well formatted documentation about how to take issues from discovery to fixture on Apache Jira? I apologize in advance, if my questions are too simple. Thanks, and any/all help is appreciated! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
[ https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019133#comment-14019133 ] Ashutosh Chauhan commented on HIVE-7050: [~prasanth_j] Does this also support display of column stats for a particular partition of table? Test case doesnt cover it, so not sure. I was hoping following syntax to work, but seems like not supported yet. {code} describe formatted T partition (k1=v1) c1; {code} Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE - Key: HIVE-7050 URL: https://issues.apache.org/jira/browse/HIVE-7050 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, HIVE-7050.4.patch, HIVE-7050.5.patch, HIVE-7050.6.patch There is currently no way to display the column level stats from hive CLI. It will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22170: analyze table T compute statistics for columns; will now compute stats for all columns.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22170/ --- (Updated June 5, 2014, 7:37 p.m.) Review request for hive and Prasanth_J. Changes --- Incorporated Prashanth's suggestion for displaying column stats. Bugs: HIVE-7168 https://issues.apache.org/jira/browse/HIVE-7168 Repository: hive-git Description --- analyze table T compute statistics for columns; will now compute stats for all columns. Diffs (updated) - metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java 1245d80 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 5b77e6f ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 6d958fd ql/src/test/queries/clientpositive/columnstats_partlvl.q 9dfe8ff ql/src/test/queries/clientpositive/columnstats_tbllvl.q 170fbc5 ql/src/test/results/clientpositive/columnstats_partlvl.q.out d91be8d ql/src/test/results/clientpositive/columnstats_tbllvl.q.out 3d3d0e2 Diff: https://reviews.apache.org/r/22170/diff/ Testing --- Added new tests. Thanks, Ashutosh Chauhan
[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7168: --- Attachment: HIVE-7168.1.patch Incorporated Prashanth's feedback. Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7168: --- Status: Patch Available (was: Open) Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7168: --- Status: Open (was: Patch Available) Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
[ https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019147#comment-14019147 ] Prasanth J commented on HIVE-7050: -- No it is not supported yet. HIVE-7051 is created to support it. Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE - Key: HIVE-7050 URL: https://issues.apache.org/jira/browse/HIVE-7050 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, HIVE-7050.4.patch, HIVE-7050.5.patch, HIVE-7050.6.patch There is currently no way to display the column level stats from hive CLI. It will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Attachment: (was: HIVE-7136-1.patch) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system --- Key: HIVE-7136 URL: https://issues.apache.org/jira/browse/HIVE-7136 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.13.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Priority: Minor Attachments: HIVE-7136.01.patch, HIVE-7136.patch Current hive cli assumes that the source file (hive script) is always on the local file system. This patch implements support for reading source files from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the default behavior intact to be reading from default filesystem (local) in case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Attachment: HIVE-7136.01.patch Renaming the patch file name to meet ptest requirements Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system --- Key: HIVE-7136 URL: https://issues.apache.org/jira/browse/HIVE-7136 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.13.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Priority: Minor Attachments: HIVE-7136.01.patch, HIVE-7136.patch Current hive cli assumes that the source file (hive script) is always on the local file system. This patch implements support for reading source files from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the default behavior intact to be reading from default filesystem (local) in case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Status: Patch Available (was: Open) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system --- Key: HIVE-7136 URL: https://issues.apache.org/jira/browse/HIVE-7136 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.13.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Priority: Minor Attachments: HIVE-7136.01.patch, HIVE-7136.patch Current hive cli assumes that the source file (hive script) is always on the local file system. This patch implements support for reading source files from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the default behavior intact to be reading from default filesystem (local) in case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3
Thanks for testing it out, folks. With 3 PMC +1s, and no outstanding -1s, and 72 hours having passed, the vote passes. I will proceed with the release process and send out an announce mail shortly. Thanks to all the others that helped out with the previous RCs as well! :) On Thu, Jun 5, 2014 at 9:09 AM, Ashutosh Chauhan hashut...@apache.org wrote: +1 On Thu, Jun 5, 2014 at 5:32 AM, Yu Azuryy azur...@outlook.com wrote: +1 now. Sorry, It's my fault. I hacked Hadoop major version in my test cluster as 1.3.0, actually it was 2.4.0. But ShimLoader parse major version as 1, so get HDFSv1 FileSystem. then I get this error. Date: Thu, 5 Jun 2014 01:20:10 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: vgumas...@hortonworks.com To: dev@hive.apache.org Yu, I don't think tez-0.5 has been released yet. Can you try with tez-0.4 ( http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.0-incubating/ )? Thanks, --Vaibhav On Thu, Jun 5, 2014 at 1:16 AM, Yu Azuryy azur...@outlook.com wrote: I just changed ConnectionURL and warehouse in hive-site.xml, As for tez-0.5-snapshot, property nametez.lib.uris/name value${fs.defaultFS}/data/tez/lib/value /property property nametez.am.resource.memory.mb/name value1280/value /property None others. Date: Wed, 4 Jun 2014 23:28:07 -0700 Subject: Re: [VOTE] Apache Hive 0.13.1 Release Candidate 3 From: the...@hortonworks.com To: dev@hive.apache.org Yu, I am not able to reproduce this issue with hadoop 2.4.0 and tez 0.4 . I tried it with both file system cache enabled and disabled. What are the non default configurations you have on your machine ? Thanks, Thejas On Wed, Jun 4, 2014 at 10:56 PM, Yu Azuryy azur...@outlook.com wrote: Hive-0.13.0 works well in my test cluster. -1 Verified with hadoop-2.4.0 and tez-0.5-snapshot, Hive cannot start? And I also built hive branch-0.13, the same error. [test@vm-10-154-** tmp]$ hive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/06/05 13:53:45 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed 14/06/05 13:53:45 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in jar:file:/letv/hive-0.13.0/lib/hive-common-0.13.1.jar!/hive-log4j.properties Exception in thread main java.lang.RuntimeException: java.io.IOException: Filesystem closed at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:357) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1066) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1062) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at
[jira] [Created] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()
Ted Yu created HIVE-7182: Summary: ResultSet is not closed in JDBCStatsPublisher#init() Key: HIVE-7182 URL: https://issues.apache.org/jira/browse/HIVE-7182 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} ResultSet rs = dbm.getTables(null, null, JDBCStatsUtils.getStatTableName(), null); boolean tblExists = rs.next(); {code} rs is not closed upon return from init() If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the close() call should be placed in finally block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7181) Beginner User On Apache Jira
[ https://issues.apache.org/jira/browse/HIVE-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019346#comment-14019346 ] Lefty Leverenz commented on HIVE-7181: -- Welcome to the community, [~nkelkar]! 1. In general, you would only assign a Jira ticket to yourself if you intended to fix it. You can leave it unassigned if you don't know who will work on it. For example, this ticket wouldn't be assigned to you ... but actually questions like these don't belong in the Jira, they should be sent to dev@hive.apache.org. Have you joined the Hive mailing lists yet? (See link below.) 2. We don't have QA assignees, or if we do it's news to me. 3. See the contributor documentation in the wiki (How to Contribute). 4. Good question -- I keep a list of Jira usernames which I'll post in a separate comment, but it's far from complete. The wiki has a People page which links to a chart list of contributors. But you can just type @firstName lastName in the comment box and a list of possibilities will appear, then click on one of them to insert the tag. 5. See the contributor documentation in the wiki. * [Hive mailing lists | http://hive.apache.org/mailing_lists.html] * [People page | http://hive.apache.org/people.html] ** [Chart list of contributors | https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12310843statistictype=assigneesselectedProjectId=12310843reportKey=com.atlassian.jira.plugin.system.reports%3Apie-reportNext=Next] * [Hive Wiki: Resources for Contributors | https://cwiki.apache.org/confluence/display/Hive/Home#Home-ResourcesforContributors] ** [How to Contribute | https://cwiki.apache.org/confluence/display/Hive/HowToContribute] ** [Developer Guide | https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide] ** [Building Hive | https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Building] Beginner User On Apache Jira Key: HIVE-7181 URL: https://issues.apache.org/jira/browse/HIVE-7181 Project: Hive Issue Type: Wish Reporter: Nishant Kelkar Priority: Minor Labels: documentation, newbie Hi All! I've just started to use Apache's Jira board (I registered today). I've used Jira for my work before, so I know how to navigate within Jira. But my main question, was understanding how issues are handled in the open source community (to which I want to contribute, but I'm a noob here too). So basically, a person comes up with a ticket when he/she thinks that the issue they are facing, is a bug/improvement. Questions: 1. Whom am I supposed to assign the ticket to? (myself?) 2. Who would be the QA assignee? 3. If addressing the issue requires looking at the code, how am I supposed to change the code and bring into effect those changes? (At work, we maintain a Git repo on our private server. So everyone always has access to the latest code). 4. Where can I find a list of all the people who are active on this project (Hive)? It would be nice if I could tag people by their names in my ticket comments. 5. Where can I find well formatted documentation about how to take issues from discovery to fixture on Apache Jira? I apologize in advance, if my questions are too simple. Thanks, and any/all help is appreciated! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7181) Beginner User On Apache Jira
[ https://issues.apache.org/jira/browse/HIVE-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019361#comment-14019361 ] Lefty Leverenz commented on HIVE-7181: -- Here's an incomplete list of Hive contributors alphabetized by username, with -- (duplicates) for first-name lookup: alangates -- Alan Gates amalakar -- Arup Malakar apivovarov -- Alexander Pivovarov appodictic -- Edward Capriolo ashutoshc -- Ashutosh Chauhan brocknoland -- Brock Noland busbey -- Sean Busbey cdrome -- Chris Drome chouhan -- Rakesh Chouhan cos -- Konstantin Boudnik cwsteinbach -- Carl Steinbach deepesh -- Deepesh Khandelwal drankye -- Kai Zheng dschorow -- David Schorow -- (appodictic) -- Edward Capriolo ehans -- Eric Hanson ekoifman -- Eugene Koifman -- (toffer) -- Francis Liu -- (wangfsh) -- Fusheng Wang kevinwilfong -- Kevin Wilfong -- (cos) -- Konstantin Boudnik fwiffo -- Joey Echeverria gopalv -- Gopal V hagleitn -- Gunther Hagleitner -- (rhbutani) -- Harish Butani hsubramaniyan -- Hari Sankar Sivarama Subramaniyan -- (qwertymaniac) -- Harsh J jarcec -- Jarek Jarcec Cecho jdere -- Jason Dere jnp -- Jitendra Nath Pandey -- (fwiffo) -- Joey Echeverria jcoffey -- Justin Coffey -- (drankye) -- Kai Zheng lars_francke -- Lars Francke leftylev -- Lefty Leverenz mattf -- Matt Foley namit -- Namit Jain navis -- Navis Ryu ndimiduk -- Nick Dimiduk nitinpawar432 -- Nitin Pawar owen.omalley -- Owen O'Malley prasadm -- Prasad Mujumdar prasanth_j -- Prasanth Jayachandran rhbutani -- Harish Butani -- (chouhan) -- Rakesh Chouhan roshan_naik -- Roshan Naik rusanu -- Remus Rusanu qwertymaniac -- Harsh J sershe -- Sergey Shelukhin shivshi -- Shivaraju Gowda shuainie -- Shuaishuai Nie subrotosanyal -- Subroto Sanyal sushanth -- Sushanth Sowmyan sxyuan -- Samuel Yuan -- (busbey) -- Sean Busbey szehon -- Szehon Ho teddy.choi -- Teddy Choi thejas -- Thejas Nair thiruvel -- Thiruvel Thirumoolan toffer -- Francis Liu vgumashta -- Vaibhav Gumashta vikram.dixit -- Vikram Dixit Kumaraswamy vikramsi -- Vikram S vinodkv -- Vinod Kumar Vavilapalli viraj -- Viraj Bhat xuefuz -- Xuefu Zhang wangfsh -- Fusheng Wang wzc1989 -- Zhichun Wu yhuai -- Yin Huai -- (wzc1989) -- Zhichun Wu Beginner User On Apache Jira Key: HIVE-7181 URL: https://issues.apache.org/jira/browse/HIVE-7181 Project: Hive Issue Type: Wish Reporter: Nishant Kelkar Priority: Minor Labels: documentation, newbie Hi All! I've just started to use Apache's Jira board (I registered today). I've used Jira for my work before, so I know how to navigate within Jira. But my main question, was understanding how issues are handled in the open source community (to which I want to contribute, but I'm a noob here too). So basically, a person comes up with a ticket when he/she thinks that the issue they are facing, is a bug/improvement. Questions: 1. Whom am I supposed to assign the ticket to? (myself?) 2. Who would be the QA assignee? 3. If addressing the issue requires looking at the code, how am I supposed to change the code and bring into effect those changes? (At work, we maintain a Git repo on our private server. So everyone always has access to the latest code). 4. Where can I find a list of all the people who are active on this project (Hive)? It would be nice if I could tag people by their names in my ticket comments. 5. Where can I find well formatted documentation about how to take issues from discovery to fixture on Apache Jira? I apologize in advance, if my questions are too simple. Thanks, and any/all help is appreciated! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()
Ted Yu created HIVE-7183: Summary: Size of partColumnGrants should be checked in ObjectStore#removeRole() Key: HIVE-7183 URL: https://issues.apache.org/jira/browse/HIVE-7183 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Here is related code: {code} ListMPartitionColumnPrivilege partColumnGrants = listPrincipalAllPartitionColumnGrants( mRol.getRoleName(), PrincipalType.ROLE); if (tblColumnGrants.size() 0) { pm.deletePersistentAll(partColumnGrants); {code} Size of tblColumnGrants is currently checked. Size of partColumnGrants should be checked instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7181) Beginner User On Apache Jira
[ https://issues.apache.org/jira/browse/HIVE-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019393#comment-14019393 ] Nishant Kelkar commented on HIVE-7181: -- [~leftylev], thanks so much! These links are really helpful! Specially, the People page and the How to Contribute pages. Also, I've sent a request email at user-subscr...@hive.apache.org and dev-subscr...@hive.apache.org, so I guess I should be hearing soon. Thanks again! Beginner User On Apache Jira Key: HIVE-7181 URL: https://issues.apache.org/jira/browse/HIVE-7181 Project: Hive Issue Type: Wish Reporter: Nishant Kelkar Priority: Minor Labels: documentation, newbie Hi All! I've just started to use Apache's Jira board (I registered today). I've used Jira for my work before, so I know how to navigate within Jira. But my main question, was understanding how issues are handled in the open source community (to which I want to contribute, but I'm a noob here too). So basically, a person comes up with a ticket when he/she thinks that the issue they are facing, is a bug/improvement. Questions: 1. Whom am I supposed to assign the ticket to? (myself?) 2. Who would be the QA assignee? 3. If addressing the issue requires looking at the code, how am I supposed to change the code and bring into effect those changes? (At work, we maintain a Git repo on our private server. So everyone always has access to the latest code). 4. Where can I find a list of all the people who are active on this project (Hive)? It would be nice if I could tag people by their names in my ticket comments. 5. Where can I find well formatted documentation about how to take issues from discovery to fixture on Apache Jira? I apologize in advance, if my questions are too simple. Thanks, and any/all help is appreciated! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7185) KeyWrapperFactory#TextKeyWrapper#equals() extracts Text incorrectly when isCopy is false
Ted Yu created HIVE-7185: Summary: KeyWrapperFactory#TextKeyWrapper#equals() extracts Text incorrectly when isCopy is false Key: HIVE-7185 URL: https://issues.apache.org/jira/browse/HIVE-7185 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} } else { t1 = soi_new.getPrimitiveWritableObject(key); t2 = soi_copy.getPrimitiveWritableObject(obj); {code} t2 should be assigned soi_new.getPrimitiveWritableObject(obj) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7184) TestHadoop20SAuthBridge no longer compiles after HADOOP-10448
Jason Dere created HIVE-7184: Summary: TestHadoop20SAuthBridge no longer compiles after HADOOP-10448 Key: HIVE-7184 URL: https://issues.apache.org/jira/browse/HIVE-7184 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.14.0 Reporter: Jason Dere HADOOP-10448 moves a couple of methods which were being used by the TestHadoop20SAuthBridge test. If/when Hive build uses Hadoop 2.5 as a dependency, this will cause compilation errors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7184) TestHadoop20SAuthBridge no longer compiles after HADOOP-10448
[ https://issues.apache.org/jira/browse/HIVE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7184: - Attachment: HIVE-7184.1.patch Attaching patch that should allow test to compile once Hive starts compiling with Hadoop 2.5 TestHadoop20SAuthBridge no longer compiles after HADOOP-10448 - Key: HIVE-7184 URL: https://issues.apache.org/jira/browse/HIVE-7184 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.14.0 Reporter: Jason Dere Attachments: HIVE-7184.1.patch HADOOP-10448 moves a couple of methods which were being used by the TestHadoop20SAuthBridge test. If/when Hive build uses Hadoop 2.5 as a dependency, this will cause compilation errors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7168) Don't require to name all columns in analyze statements if stats collection is for all columns
[ https://issues.apache.org/jira/browse/HIVE-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019430#comment-14019430 ] Hive QA commented on HIVE-7168: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648534/HIVE-7168.1.patch {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 5510 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_collect_set org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_top_level org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/396/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/396/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-396/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648534 Don't require to name all columns in analyze statements if stats collection is for all columns -- Key: HIVE-7168 URL: https://issues.apache.org/jira/browse/HIVE-7168 Project: Hive Issue Type: Improvement Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7168.1.patch, HIVE-7168.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7166: Status: Open (was: Patch Available) Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Attachments: HIVE-7166.1.patch Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ | 1 | | 2 | | 3 | ++ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ ++ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7166: Attachment: HIVE-7166.2.patch Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ | 1 | | 2 | | 3 | ++ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ ++ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-2777: -- Status: Open (was: Patch Available) ability to add and drop partitions atomically - Key: HIVE-2777 URL: https://issues.apache.org/jira/browse/HIVE-2777 Project: Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.13.0 Reporter: Aniket Mokashi Assignee: Aniket Mokashi Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch, hive-2777.patch Hive should have ability to atomically add and drop partitions. This way admins can change partitions atomically without breaking the running jobs. It allows admin to merge several partitions into one. Essentially, we would like to have an api- add_drop_partitions(String db, String tbl_name, ListPartition addParts, ListListString dropParts, boolean deleteData); This jira covers changes required for metastore and thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019514#comment-14019514 ] Hive QA commented on HIVE-7136: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12648539/HIVE-7136.01.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 5585 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats16 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testListPartitionNames org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testListPartitions org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartition org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/397/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/397/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-397/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12648539 Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system --- Key: HIVE-7136 URL: https://issues.apache.org/jira/browse/HIVE-7136 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.13.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Priority: Minor Attachments: HIVE-7136.01.patch, HIVE-7136.patch Current hive cli assumes that the source file (hive script) is always on the local file system. This patch implements support for reading source files from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the default behavior intact to be reading from default filesystem (local) in case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7174) Do not accept string as scale and precision when reading Avro schema
[ https://issues.apache.org/jira/browse/HIVE-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarek Jarcec Cecho updated HIVE-7174: - Attachment: dec.avro Do not accept string as scale and precision when reading Avro schema Key: HIVE-7174 URL: https://issues.apache.org/jira/browse/HIVE-7174 Project: Hive Issue Type: Bug Reporter: Jarek Jarcec Cecho Assignee: Jarek Jarcec Cecho Fix For: 0.14.0 Attachments: HIVE-7174.patch, dec.avro I've noticed that the current AvroSerde will happily accept schema that uses string instead of integer for scale and precision, e.g. fragment {{precision:4,scale:1}} from following table: {code} CREATE TABLE `avro_dec1`( `name` string COMMENT 'from deserializer', `value` decimal(4,1) COMMENT 'from deserializer') COMMENT 'just drop the schema right into the HQL' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'numFiles'='1', 'avro.schema.literal'='{\namespace\:\com.howdy\,\name\:\some_schema\,\type\:\record\,\fields\:[{\name\:\name\,\type\:\string\},{\name\:\value\,\type\:{\type\:\bytes\,\logicalType\:\decimal\,\precision\:\4\,\scale\:\1\}}]}' ); {code} However the Decimal spec defined in AVRO-1402 requires only integer to be there and hence is allowing only following fragment instead {{precision:4,scale:1}} (e.g. no double quotes around numbers). As Hive can propagate this incorrect schema to new files and hence creating files with invalid schema, I think that we should alter the behavior and insist on the correct schema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7174) Do not accept string as scale and precision when reading Avro schema
[ https://issues.apache.org/jira/browse/HIVE-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019542#comment-14019542 ] Jarek Jarcec Cecho commented on HIVE-7174: -- I've noticed that file {{dev.avro}} has been created with incorrect schema, so I've attached fixed version. Attached file should replace the one in {{data/files/dec.avro}}. Do not accept string as scale and precision when reading Avro schema Key: HIVE-7174 URL: https://issues.apache.org/jira/browse/HIVE-7174 Project: Hive Issue Type: Bug Reporter: Jarek Jarcec Cecho Assignee: Jarek Jarcec Cecho Fix For: 0.14.0 Attachments: HIVE-7174.patch, dec.avro I've noticed that the current AvroSerde will happily accept schema that uses string instead of integer for scale and precision, e.g. fragment {{precision:4,scale:1}} from following table: {code} CREATE TABLE `avro_dec1`( `name` string COMMENT 'from deserializer', `value` decimal(4,1) COMMENT 'from deserializer') COMMENT 'just drop the schema right into the HQL' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'numFiles'='1', 'avro.schema.literal'='{\namespace\:\com.howdy\,\name\:\some_schema\,\type\:\record\,\fields\:[{\name\:\name\,\type\:\string\},{\name\:\value\,\type\:{\type\:\bytes\,\logicalType\:\decimal\,\precision\:\4\,\scale\:\1\}}]}' ); {code} However the Decimal spec defined in AVRO-1402 requires only integer to be there and hence is allowing only following fragment instead {{precision:4,scale:1}} (e.g. no double quotes around numbers). As Hive can propagate this incorrect schema to new files and hence creating files with invalid schema, I think that we should alter the behavior and insist on the correct schema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
[ https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019549#comment-14019549 ] Ashutosh Chauhan commented on HIVE-7050: Ok..cool Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE - Key: HIVE-7050 URL: https://issues.apache.org/jira/browse/HIVE-7050 Project: Hive Issue Type: Bug Components: Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, HIVE-7050.4.patch, HIVE-7050.5.patch, HIVE-7050.6.patch There is currently no way to display the column level stats from hive CLI. It will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE -- This message was sent by Atlassian JIRA (v6.2#6252)
TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS
Hi, I am trying to build hive on my local desktop. I am facing an issue with test case : TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS The issue is only with hadoop-2 and not with hadoop-1 Has anyone been able to run this test case? Trace : org.apache.hadoop.ipc.RemoteException: File /path/to/schema/schema.avsc could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1406) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2596) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:563) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:407) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:592) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) at org.apache.hadoop.ipc.Client.call(Client.java:1406) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:211) at com.sun.proxy.$Proxy14.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:348) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1275) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1123) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:527) Thanks, Pankit
TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS fails on hive-13 for hadoop-2
Hi, I am trying to build hive on my local desktop. I am facing an issue with test case : TestAvroSerdeUtils#determineSchemaCanReadSchemaFromHDFS The issue is only with hadoop-2 and not with hadoop-1 Has anyone been able to run this test case? Trace : org.apache.hadoop.ipc.RemoteException: File /path/to/schema/schema.avsc could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1406) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2596) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:563) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:407) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:592) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956) at org.apache.hadoop.ipc.Client.call(Client.java:1406) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:211) at com.sun.proxy.$Proxy14.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy14.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:348) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1275) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1123) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:527) Thanks, Pankit
[jira] [Commented] (HIVE-7174) Do not accept string as scale and precision when reading Avro schema
[ https://issues.apache.org/jira/browse/HIVE-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019552#comment-14019552 ] Xuefu Zhang commented on HIVE-7174: --- +1 Do not accept string as scale and precision when reading Avro schema Key: HIVE-7174 URL: https://issues.apache.org/jira/browse/HIVE-7174 Project: Hive Issue Type: Bug Reporter: Jarek Jarcec Cecho Assignee: Jarek Jarcec Cecho Fix For: 0.14.0 Attachments: HIVE-7174.patch, dec.avro I've noticed that the current AvroSerde will happily accept schema that uses string instead of integer for scale and precision, e.g. fragment {{precision:4,scale:1}} from following table: {code} CREATE TABLE `avro_dec1`( `name` string COMMENT 'from deserializer', `value` decimal(4,1) COMMENT 'from deserializer') COMMENT 'just drop the schema right into the HQL' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'numFiles'='1', 'avro.schema.literal'='{\namespace\:\com.howdy\,\name\:\some_schema\,\type\:\record\,\fields\:[{\name\:\name\,\type\:\string\},{\name\:\value\,\type\:{\type\:\bytes\,\logicalType\:\decimal\,\precision\:\4\,\scale\:\1\}}]}' ); {code} However the Decimal spec defined in AVRO-1402 requires only integer to be there and hence is allowing only following fragment instead {{precision:4,scale:1}} (e.g. no double quotes around numbers). As Hive can propagate this incorrect schema to new files and hence creating files with invalid schema, I think that we should alter the behavior and insist on the correct schema. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 22174: HIVE-6394 Implement Timestmap in ParquetSerde
On June 5, 2014, 8:43 a.m., justin coffey wrote: ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java, line 165 https://reviews.apache.org/r/22174/diff/3/?file=603954#file603954line165 A stupid question perhaps, but is INT96 reserved for timestamps in parquet? I dug this up, but not sure if it's definitive: https://github.com/Parquet/parquet-mr/issues/101 Yea I dont think its reserved, but there is missing an OriginalType annotation called 'Timestamp' in parquet for the application to recognize, which will require yet another parquet version-bump. Do you think we can go ahead with it now and then add it later in a follow-up JIRA? Or wait for that to be added first? - Szehon --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22174/#review44805 --- On June 5, 2014, 7:33 a.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22174/ --- (Updated June 5, 2014, 7:33 a.m.) Review request for hive, Brock Noland, justin coffey, and Xuefu Zhang. Bugs: HIVE-6394 https://issues.apache.org/jira/browse/HIVE-6394 Repository: hive-git Description --- This uses the Jodd library to convert java.sql.Timestamp type used by Hive into the {julian-day:nanos} format expected by parquet, and vice-versa. Diffs - data/files/parquet_types.txt 0be390b pom.xml 4bb8880 ql/pom.xml 13c477a ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 4da0d30 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java 29f7e11 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java 57161d8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java fb2f5a8 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/utils/NanoTimeUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 3490061 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetTimestampUtils.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_types.q 5d6333c ql/src/test/results/clientpositive/parquet_types.q.out c23f7f1 Diff: https://reviews.apache.org/r/22174/diff/ Testing --- Unit tests the new libraries, and also added timestamp data in the parquet_types q-test. Thanks, Szehon Ho