[jira] [Updated] (HIVE-11083) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11083: Attachment: HIVE-11083.patch [~hsubramaniyan] can you take a look? Make test cbo_windowing robust -- Key: HIVE-11083 URL: https://issues.apache.org/jira/browse/HIVE-11083 Project: Hive Issue Type: Test Components: Tests Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11083.patch Make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)
[ https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597984#comment-14597984 ] Gunther Hagleitner commented on HIVE-10729: --- [~mmccline] the test on this bug doesn't happen anymore but there is: https://issues.apache.org/jira/browse/HIVE-11051. The attached test on that bug used to be fixed with this patch here. It might makes sense to resolve this one and move the code over to HIVE-11051 if that's the case. Query failed when select complex columns from joinned table (tez map join only) --- Key: HIVE-10729 URL: https://issues.apache.org/jira/browse/HIVE-10729 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Selina Zhang Assignee: Matt McCline Attachments: HIVE-10729.03.patch, HIVE-10729.1.patch, HIVE-10729.2.patch When map join happens, if projection columns include complex data types, query will fail. Steps to reproduce: {code:sql} hive set hive.auto.convert.join; hive.auto.convert.join=true hive desc foo; a arrayint hive select * from foo; [1,2] hive desc src_int; key int value string hive select * from src_int where key=2; 2val_2 hive select * from foo join src_int src on src.key = foo.a[1]; {code} Query will fail with stack trace {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386) ... 23 more {noformat} Similar error when projection columns include a map: {code:sql} hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC; hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM src LIMIT 1; hive select * from src join test where src.key=test.a; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11062) Remove Exception stacktrace from Log.info when ACL is not supported.
[ https://issues.apache.org/jira/browse/HIVE-11062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-11062: --- Fix Version/s: 2.0.0 Remove Exception stacktrace from Log.info when ACL is not supported. Key: HIVE-11062 URL: https://issues.apache.org/jira/browse/HIVE-11062 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 1.1.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor Fix For: 2.0.0 Attachments: HIVE-11062.1.patch When logging set to info, Extended ACL Enabled and the file system does not support ACL, there are a lot of Exception stack trace in the log file. Although it is benign, it can easily make users frustrated. We should set the level to show the Exception in debug. Current, the Exception in the log looks like: {noformat} 2015-06-19 05:09:59,376 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: Skipping ACL inheritance: File system for path s3a://yibing/hive does not support ACLs but dfs.namenode.acls.enabled is set to true: java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus java.lang.UnsupportedOperationException: S3AFileSystem doesn't support getAclStatus at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429) at org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:729) at org.apache.hadoop.hive.ql.metadata.Hive.inheritFromTable(Hive.java:2786) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2694) at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:640) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1587) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:297) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145) at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70) at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11084) Issue in Parquet Hive Table
[ https://issues.apache.org/jira/browse/HIVE-11084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanchal Kumar Ghosh updated HIVE-11084: Summary: Issue in Parquet Hive Table (was: Issue in Parquet Hove Table) Issue in Parquet Hive Table --- Key: HIVE-11084 URL: https://issues.apache.org/jira/browse/HIVE-11084 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.9.0 Environment: GNU/Linux Reporter: Chanchal Kumar Ghosh {quote} hive CREATE TABLE intable_p ( sr_no int, name string, emp_id int ) PARTITIONED BY ( a string, b string, c string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS PARQUET; hive insert overwrite table intable_p partition (a='a', b='b', c='c') select * from intable; Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.59 sec HDFS Read: 247 HDFS Write: 410 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 590 msec OK Time taken: 30.382 seconds hive show create table intable_p; OK CREATE TABLE `intable_p`( `sr_no` int, `name` string, `emp_id` int) PARTITIONED BY ( `a` string, `b` string, `c` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://nameservice1/hive/db/intable_p' TBLPROPERTIES ( 'transient_lastDdlTime'='1435080569') Time taken: 0.212 seconds, Fetched: 19 row(s) hive CREATE TABLE `intable_p2`( `sr_no` int, `name` string, `emp_id` int) PARTITIONED BY ( `a` string, `b` string, `c` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; OK Time taken: 0.179 seconds hive insert overwrite table intable_p2 partition (a='a', b='b', c='c') select * from intable; Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator ... Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-06-23 17:34:40,471 Stage-1 map = 0%, reduce = 0% 2015-06-23 17:35:10,753 Stage-1 map = 100%, reduce = 0% Ended Job = job_1433246369760_7947 with errors Error during job, obtaining debugging information... Examining task ID: task_ (and more) from job job_ Task with the most failures(4): - Task ID: task_ URL: - Diagnostic Messages for this Task: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {sr_no:1,name:ABC,emp_id:1001} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {sr_no:1,name:ABC,emp_id:1001} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180) ... 8 more Caused by: {color:red}java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.ArrayWritable{color} at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:628) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at
[jira] [Updated] (HIVE-11030) Enhance storage layer to create one delta file per write
[ https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11030: -- Attachment: HIVE-11030.3.patch Enhance storage layer to create one delta file per write Key: HIVE-11030 URL: https://issues.apache.org/jira/browse/HIVE-11030 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch Currently each txn using ACID insert/update/delete will generate a delta directory like delta_100_101. In order to support multi-statement transactions we must generate one delta per operation within the transaction so the deltas would be named like delta_100_101_0001, etc. Support for MERGE (HIVE-10924) would need the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11083) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597965#comment-14597965 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-11083: -- +1 pending run Make test cbo_windowing robust -- Key: HIVE-11083 URL: https://issues.apache.org/jira/browse/HIVE-11083 Project: Hive Issue Type: Test Components: Tests Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11083.patch Make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598006#comment-14598006 ] Jesus Camacho Rodriguez commented on HIVE-10996: [~jpullokkaran], fail is unrelated. It is ready to go in. Thanks Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10795) Remove use of PerfLogger from Orc
[ https://issues.apache.org/jira/browse/HIVE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-10795: - Attachment: HIVE-10795.patch Thanks for the catch, Damien. I removed both of the now unused CLASS_NAME variables. I realized that OrcInputFormat already had a static for LOG.isDebugEnabled so I switched all of the calls to use that. Remove use of PerfLogger from Orc - Key: HIVE-10795 URL: https://issues.apache.org/jira/browse/HIVE-10795 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-10795.patch, HIVE-10795.patch, HIVE-10795.patch PerfLogger is yet another class with a huge dependency set that Orc doesn't need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
[ https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-11076: -- Fix Version/s: 1.3.0 Explicitly set hive.cbo.enable=true for some tests -- Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10795) Remove use of PerfLogger from Orc
[ https://issues.apache.org/jira/browse/HIVE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597871#comment-14597871 ] Damien Carol commented on HIVE-10795: - [~owen.omalley] you should remove _CLASS_NAME_ (line:41) Remove use of PerfLogger from Orc - Key: HIVE-10795 URL: https://issues.apache.org/jira/browse/HIVE-10795 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-10795.patch, HIVE-10795.patch PerfLogger is yet another class with a huge dependency set that Orc doesn't need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11076) Explicitly set hive.cbo.enable=true for some tests
[ https://issues.apache.org/jira/browse/HIVE-11076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-11076: -- Fix Version/s: (was: 1.3.0) Explicitly set hive.cbo.enable=true for some tests -- Key: HIVE-11076 URL: https://issues.apache.org/jira/browse/HIVE-11076 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Fix For: 2.0.0 Attachments: HIVE-11076.01.patch, HIVE-11076.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11037) HiveOnTez: make explain user level = true as default
[ https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11037: --- Attachment: HIVE-11037.08.patch rebase to master (no difference except line offset) per [~jpullokkaran]'s request. HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, HIVE-11037.06.patch, HIVE-11037.07.patch, HIVE-11037.08.patch In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins
[ https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598110#comment-14598110 ] Jesus Camacho Rodriguez commented on HIVE-10533: Thanks [~ashutoshc], I need to check why the changes in the last version of the patch introduced these regressions. I'll take a look and submit a new patch. CBO (Calcite Return Path): Join to MultiJoin support for outer joins Key: HIVE-10533 URL: https://issues.apache.org/jira/browse/HIVE-10533 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-10533.01.patch, HIVE-10533.02.patch, HIVE-10533.02.patch, HIVE-10533.03.patch, HIVE-10533.04.patch, HIVE-10533.patch CBO return path: auto_join7.q can be used to reproduce the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598131#comment-14598131 ] Laljo John Pullokkaran commented on HIVE-10996: --- [~jcamachorodriguez]Patch needs to be modified for branch-1 possibly for branch-1.0 Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598105#comment-14598105 ] Wei Zheng commented on HIVE-10233: -- [~hagleitn] [~vikram.dixit] 1. Same question as [~mmokhtar] mentioned, why do we only allocate less than 1% memory to the mapjoin? 2. What is the use for pctx.getConf(); in the beginning of MemoryDecider.resolve()? 3. For these three method calls, the param work is not used, so can be removed. Also can consider removing the param work in evaluateWork(TezWork work, BaseWork w). evaluateMapWork(work, (MapWork) w); evaluateReduceWork(work, (ReduceWork) w); evaluateMergeWork(work, (MergeJoinWork) w); 4. In evaluateOperators(BaseWork w, PhysicalContext pctx), pctx is not used 5. Indentation for evaluateOperators needs to be adjusted Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598128#comment-14598128 ] Laljo John Pullokkaran commented on HIVE-10996: --- Committed to master, 1.2 branch. Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10553) Remove hardcoded Parquet references from SearchArgumentImpl
[ https://issues.apache.org/jira/browse/HIVE-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-10553: - Attachment: HIVE-10553.patch My patch had gone a little stale, so I updated it. I also manually re-ran the test case that failed in jenkins and it passed. Remove hardcoded Parquet references from SearchArgumentImpl --- Key: HIVE-10553 URL: https://issues.apache.org/jira/browse/HIVE-10553 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Owen O'Malley Attachments: HIVE-10553.patch, HIVE-10553.patch, HIVE-10553.patch SARGs currently depend on Parquet code, which causes a tight coupling between parquet releases and storage-api versions. Move Parquet code out to its own RecordReader, similar to ORC's SargApplier implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11083) Make test cbo_windowing robust
[ https://issues.apache.org/jira/browse/HIVE-11083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598148#comment-14598148 ] Hive QA commented on HIVE-11083: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741305/HIVE-11083.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9013 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4352/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4352/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4352/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741305 - PreCommit-HIVE-TRUNK-Build Make test cbo_windowing robust -- Key: HIVE-11083 URL: https://issues.apache.org/jira/browse/HIVE-11083 Project: Hive Issue Type: Test Components: Tests Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11083.patch Make result set deterministic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598142#comment-14598142 ] Hive QA commented on HIVE-10999: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741320/HIVE-10999.3-spark.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7997 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/903/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/903/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-903/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741320 - PreCommit-HIVE-SPARK-Build Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11084) Issue in Parquet Hive Table
[ https://issues.apache.org/jira/browse/HIVE-11084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chanchal Kumar Ghosh updated HIVE-11084: Description: {code} hive CREATE TABLE intable_p ( sr_no int, name string, emp_id int ) PARTITIONED BY ( a string, b string, c string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS PARQUET; hive insert overwrite table intable_p partition (a='a', b='b', c='c') select * from intable; Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.59 sec HDFS Read: 247 HDFS Write: 410 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 590 msec OK Time taken: 30.382 seconds hive show create table intable_p; OK CREATE TABLE `intable_p`( `sr_no` int, `name` string, `emp_id` int) PARTITIONED BY ( `a` string, `b` string, `c` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://nameservice1/hive/db/intable_p' TBLPROPERTIES ( 'transient_lastDdlTime'='1435080569') Time taken: 0.212 seconds, Fetched: 19 row(s) hive CREATE TABLE `intable_p2`( `sr_no` int, `name` string, `emp_id` int) PARTITIONED BY ( `a` string, `b` string, `c` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; OK Time taken: 0.179 seconds hive insert overwrite table intable_p2 partition (a='a', b='b', c='c') select * from intable; Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator ... Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-06-23 17:34:40,471 Stage-1 map = 0%, reduce = 0% 2015-06-23 17:35:10,753 Stage-1 map = 100%, reduce = 0% Ended Job = job_1433246369760_7947 with errors Error during job, obtaining debugging information... Examining task ID: task_ (and more) from job job_ Task with the most failures(4): - Task ID: task_ URL: - Diagnostic Messages for this Task: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {sr_no:1,name:ABC,emp_id:1001} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:198) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {sr_no:1,name:ABC,emp_id:1001} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:549) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:180) ... 8 more Caused by: {color:red}java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.ArrayWritable{color} at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:628) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539) ... 9 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec hive {code} What is the issue with my second table? was: {quote} hive CREATE TABLE intable_p ( sr_no int,
[jira] [Commented] (HIVE-11037) HiveOnTez: make explain user level = true as default
[ https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598303#comment-14598303 ] Laljo John Pullokkaran commented on HIVE-11037: --- Committed to master. HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, HIVE-11037.06.patch, HIVE-11037.07.patch, HIVE-11037.08.patch In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs
[ https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Howell updated HIVE-7288: --- Tags: hadoop streaming, WebHcat, libjars, archives, CSS (was: hadoop streaming, WebHcat, libjars, archives) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs - Key: HIVE-7288 URL: https://issues.apache.org/jira/browse/HIVE-7288 Project: Hive Issue Type: New Feature Components: WebHCat Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1 Environment: HDInsight deploying HDP 2.1; Also HDP 2.1 on Windows Reporter: Azim Uddin Assignee: shanyu zhao Attachments: HIVE-7288.1.patch, hive-7288.patch Issue: == Due to lack of parameters (or support for) equivalent of '-libjars' and '-archives' in WebHcat REST API, we cannot use an external Java Jars or Archive files with a Streaming MapReduce job, when the job is submitted via WebHcat/templeton. I am citing a few use cases here, but there can be plenty of scenarios like this- #1 (for -archives):In order to use R with a hadoop distribution like HDInsight or HDP on Windows, we could package the R directory up in a zip file and rename it to r.jar and put it into HDFS or WASB. We can then do something like this from hadoop command line (ignore the wasb syntax, same command can be run with hdfs) - hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives wasb:///example/jars/r.jar -files wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r -mapper ./r.jar/bin/Rscript.exe mapper.r -reducer ./r.jar/bin/Rscript.exe reducer.r -input /example/data/gutenberg -output /probe/r/wordcount This works from hadoop command line, but due to lack of support for '-archives' parameter in WebHcat, we can't submit the same Streaming MR job via WebHcat. #2 (for -libjars): Consider a scenario where a user would like to use a custom inputFormat with a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a hadoop command line we can do something like this - hadoop jar /path/to/hadoop-streaming.jar \ -libjars /path/to/custom-formats.jar \ -D map.output.key.field.separator=, \ -D mapred.text.key.partitioner.options=-k1,1 \ -input my_data/ \ -output my_output/ \ -outputformat test.example.outputformat.DateFieldMultipleOutputFormat \ -mapper my_mapper.py \ -reducer my_reducer.py \ But due to lack of support for '-libjars' parameter for streaming MapReduce job in WebHcat, we can't submit the above streaming MR job (that uses a custom Java JAR) via WebHcat. Impact: We think, being able to submit jobs remotely is a vital feature for hadoop to be enterprise-ready and WebHcat plays an important role there. Streaming MapReduce job is also very important for interoperability. So, it would be very useful to keep WebHcat on par with hadoop command line in terms of streaming MR job submission capability. Ask: Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop streaming jobs in WebHcat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7347) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter
[ https://issues.apache.org/jira/browse/HIVE-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Howell updated HIVE-7347: --- Tags: webhcat, Pig, execute, schema, CSS (was: webhcat, Pig, execute, schema) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter -- Key: HIVE-7347 URL: https://issues.apache.org/jira/browse/HIVE-7347 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0, 0.13.0 Environment: HDP 2.1 on Windows; HDInsight deploying HDP 2.1 Reporter: Azim Uddin 1. Consider you are using HDP 2.1 on Windows, and you have a tsv file (named rawInput.tsv) like this (just an example, you can use any) - http://a.com http://b.com1 http://b.com http://c.com2 http://d.com http://e.com3 2. With the tsv file uploaded to HDFS, run the following Pig job via WebHcat using 'execute' parameter, something like this- curl.exe -d execute=rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); readyInput = limit rawInput 10; store readyInput into '/test/output' using PigStorage; -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any The job fails with exit code 255 - [main] org.apache.hive.hcatalog.templeton.tool.LaunchMapper: templeton: job failed with exit code 255 From stderr, we see the following -readyInput was unexpected at this time. 3. The same job works via Pig Grunt Shell and if we use the WebHcat 'file' parameter, instead of 'execute' parameter - a. Create a pig script called pig-script.txt with the query below and put it HDFS /test/script rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); readyInput = limit rawInput 10; store readyInput into '/test/Output' using PigStorage; b. Run the job via webHcat: curl.exe -d file=/test/script/pig_script.txt -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any 4. Also, WebHcat 'execute' option works if we don't define the schema in the Pig query, something like this- curl.exe -d execute=rawInput = load '/test/data' using PigStorage; readyInput = limit rawInput 10; store readyInput into '/test/output' using PigStorage; -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any Ask is- WebHcat 'execute' option should work for Pig query with schema defined - it appears to be a parsing issue with WebHcat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
[ https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598460#comment-14598460 ] Gunther Hagleitner commented on HIVE-11079: --- +1 Fix qfile tests that fail on Windows due to CR/character escape differences --- Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.12.patch Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598338#comment-14598338 ] Xuefu Zhang commented on HIVE-10996: {quote} fail is unrelated. It is ready to go in. {quote} [~jcamachorodriguez], Could you elaborate why you think that the test failure isn't related? I can clearly see there is a result diff generated by your patch for test org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28. {code} 173c173 POSTHOOK: Lineage: dest_j1.key EXPRESSION [(src1)x.FieldSchema(name:key, type:string, comment:default), ] --- POSTHOOK: Lineage: dest_j1.key SIMPLE [(src1)x.FieldSchema(name:key, type:string, comment:default), ] {code} Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs
[ https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Howell updated HIVE-7288: --- Tags: hadoop streaming, WebHcat, libjars, archives, MicrosoftCSS (was: hadoop streaming, WebHcat, libjars, archives, CSS) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs - Key: HIVE-7288 URL: https://issues.apache.org/jira/browse/HIVE-7288 Project: Hive Issue Type: New Feature Components: WebHCat Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1 Environment: HDInsight deploying HDP 2.1; Also HDP 2.1 on Windows Reporter: Azim Uddin Assignee: shanyu zhao Attachments: HIVE-7288.1.patch, hive-7288.patch Issue: == Due to lack of parameters (or support for) equivalent of '-libjars' and '-archives' in WebHcat REST API, we cannot use an external Java Jars or Archive files with a Streaming MapReduce job, when the job is submitted via WebHcat/templeton. I am citing a few use cases here, but there can be plenty of scenarios like this- #1 (for -archives):In order to use R with a hadoop distribution like HDInsight or HDP on Windows, we could package the R directory up in a zip file and rename it to r.jar and put it into HDFS or WASB. We can then do something like this from hadoop command line (ignore the wasb syntax, same command can be run with hdfs) - hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives wasb:///example/jars/r.jar -files wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r -mapper ./r.jar/bin/Rscript.exe mapper.r -reducer ./r.jar/bin/Rscript.exe reducer.r -input /example/data/gutenberg -output /probe/r/wordcount This works from hadoop command line, but due to lack of support for '-archives' parameter in WebHcat, we can't submit the same Streaming MR job via WebHcat. #2 (for -libjars): Consider a scenario where a user would like to use a custom inputFormat with a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a hadoop command line we can do something like this - hadoop jar /path/to/hadoop-streaming.jar \ -libjars /path/to/custom-formats.jar \ -D map.output.key.field.separator=, \ -D mapred.text.key.partitioner.options=-k1,1 \ -input my_data/ \ -output my_output/ \ -outputformat test.example.outputformat.DateFieldMultipleOutputFormat \ -mapper my_mapper.py \ -reducer my_reducer.py \ But due to lack of support for '-libjars' parameter for streaming MapReduce job in WebHcat, we can't submit the above streaming MR job (that uses a custom Java JAR) via WebHcat. Impact: We think, being able to submit jobs remotely is a vital feature for hadoop to be enterprise-ready and WebHcat plays an important role there. Streaming MapReduce job is also very important for interoperability. So, it would be very useful to keep WebHcat on par with hadoop command line in terms of streaming MR job submission capability. Ask: Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop streaming jobs in WebHcat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7347) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter
[ https://issues.apache.org/jira/browse/HIVE-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Howell updated HIVE-7347: --- Tags: webhcat, Pig, execute, schema, MicrosoftCSS (was: webhcat, Pig, execute, schema, CSS) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter -- Key: HIVE-7347 URL: https://issues.apache.org/jira/browse/HIVE-7347 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0, 0.13.0 Environment: HDP 2.1 on Windows; HDInsight deploying HDP 2.1 Reporter: Azim Uddin 1. Consider you are using HDP 2.1 on Windows, and you have a tsv file (named rawInput.tsv) like this (just an example, you can use any) - http://a.com http://b.com1 http://b.com http://c.com2 http://d.com http://e.com3 2. With the tsv file uploaded to HDFS, run the following Pig job via WebHcat using 'execute' parameter, something like this- curl.exe -d execute=rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); readyInput = limit rawInput 10; store readyInput into '/test/output' using PigStorage; -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any The job fails with exit code 255 - [main] org.apache.hive.hcatalog.templeton.tool.LaunchMapper: templeton: job failed with exit code 255 From stderr, we see the following -readyInput was unexpected at this time. 3. The same job works via Pig Grunt Shell and if we use the WebHcat 'file' parameter, instead of 'execute' parameter - a. Create a pig script called pig-script.txt with the query below and put it HDFS /test/script rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); readyInput = limit rawInput 10; store readyInput into '/test/Output' using PigStorage; b. Run the job via webHcat: curl.exe -d file=/test/script/pig_script.txt -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any 4. Also, WebHcat 'execute' option works if we don't define the schema in the Pig query, something like this- curl.exe -d execute=rawInput = load '/test/data' using PigStorage; readyInput = limit rawInput 10; store readyInput into '/test/output' using PigStorage; -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any Ask is- WebHcat 'execute' option should work for Pig query with schema defined - it appears to be a parsing issue with WebHcat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10895: Attachment: HIVE-10895.2.patch ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10795) Remove use of PerfLogger from Orc
[ https://issues.apache.org/jira/browse/HIVE-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598436#comment-14598436 ] Hive QA commented on HIVE-10795: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741327/HIVE-10795.patch {color:green}SUCCESS:{color} +1 9014 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4354/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4354/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4354/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12741327 - PreCommit-HIVE-TRUNK-Build Remove use of PerfLogger from Orc - Key: HIVE-10795 URL: https://issues.apache.org/jira/browse/HIVE-10795 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-10795.patch, HIVE-10795.patch, HIVE-10795.patch PerfLogger is yet another class with a huge dependency set that Orc doesn't need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10790) orc write on viewFS throws exception
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10790: Summary: orc write on viewFS throws exception (was: orc file sql excute fail ) orc write on viewFS throws exception Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1 Attachments: HIVE-10790.0.patch.txt from a text table insert into a orc table like as {code:sql} insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; {code} will throws a error , {noformat} Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Description: This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. https://reviews.apache.org/r/35792/ Review board link. was: This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.2.0 Reporter: Rohit Dholakia Assignee: Rohit Dholakia Labels: patch Attachments: HIVE-10438.patch, Proposal-rscompressor.pdf, Results_Snappy_protobuf_TBinary_TCompact.pdf, hs2driver-master.zip, hs2resultSetcompressor.zip, readme.txt This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. https://reviews.apache.org/r/35792/ Review board link. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598349#comment-14598349 ] Jesus Camacho Rodriguez commented on HIVE-10996: [~xuefuz], that test has been failing intermittently for last QA runs, not only those related to this patch: {noformat} ... http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4309/testReport/ http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4313/testReport/ http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4317/testReport/ http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4321/testReport/ http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4324/testReport/ http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4332/testReport/ http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4336/testReport/ http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4337/testReport/ {noformat} Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in
[jira] [Updated] (HIVE-7288) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs
[ https://issues.apache.org/jira/browse/HIVE-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Howell updated HIVE-7288: --- Tags: hadoop streaming, WebHcat, libjars, archives, MicrosoftSupport (was: hadoop streaming, WebHcat, libjars, archives, MicrosoftCSS) Enable support for -libjars and -archives in WebHcat for Streaming MapReduce jobs - Key: HIVE-7288 URL: https://issues.apache.org/jira/browse/HIVE-7288 Project: Hive Issue Type: New Feature Components: WebHCat Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1 Environment: HDInsight deploying HDP 2.1; Also HDP 2.1 on Windows Reporter: Azim Uddin Assignee: shanyu zhao Attachments: HIVE-7288.1.patch, hive-7288.patch Issue: == Due to lack of parameters (or support for) equivalent of '-libjars' and '-archives' in WebHcat REST API, we cannot use an external Java Jars or Archive files with a Streaming MapReduce job, when the job is submitted via WebHcat/templeton. I am citing a few use cases here, but there can be plenty of scenarios like this- #1 (for -archives):In order to use R with a hadoop distribution like HDInsight or HDP on Windows, we could package the R directory up in a zip file and rename it to r.jar and put it into HDFS or WASB. We can then do something like this from hadoop command line (ignore the wasb syntax, same command can be run with hdfs) - hadoop jar %HADOOP_HOME%\lib\hadoop-streaming.jar -archives wasb:///example/jars/r.jar -files wasb:///example/apps/mapper.r,wasb:///example/apps/reducer.r -mapper ./r.jar/bin/Rscript.exe mapper.r -reducer ./r.jar/bin/Rscript.exe reducer.r -input /example/data/gutenberg -output /probe/r/wordcount This works from hadoop command line, but due to lack of support for '-archives' parameter in WebHcat, we can't submit the same Streaming MR job via WebHcat. #2 (for -libjars): Consider a scenario where a user would like to use a custom inputFormat with a Streaming MapReduce job and wrote his own custom InputFormat JAR. From a hadoop command line we can do something like this - hadoop jar /path/to/hadoop-streaming.jar \ -libjars /path/to/custom-formats.jar \ -D map.output.key.field.separator=, \ -D mapred.text.key.partitioner.options=-k1,1 \ -input my_data/ \ -output my_output/ \ -outputformat test.example.outputformat.DateFieldMultipleOutputFormat \ -mapper my_mapper.py \ -reducer my_reducer.py \ But due to lack of support for '-libjars' parameter for streaming MapReduce job in WebHcat, we can't submit the above streaming MR job (that uses a custom Java JAR) via WebHcat. Impact: We think, being able to submit jobs remotely is a vital feature for hadoop to be enterprise-ready and WebHcat plays an important role there. Streaming MapReduce job is also very important for interoperability. So, it would be very useful to keep WebHcat on par with hadoop command line in terms of streaming MR job submission capability. Ask: Enable parameter support for 'libjars' and 'archives' in WebHcat for Hadoop streaming jobs in WebHcat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11058) Make alter_merge* tests (ORC only) stable across different OSes
[ https://issues.apache.org/jira/browse/HIVE-11058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-11058. -- Resolution: Won't Fix The stats difference can occur when tests are run in different timezones. ORC stores the timezone id in stripe metadata causing difference in file sizes. Make alter_merge* tests (ORC only) stable across different OSes --- Key: HIVE-11058 URL: https://issues.apache.org/jira/browse/HIVE-11058 Project: Hive Issue Type: Bug Affects Versions: 1.2.0, 2.0.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran alter_merge* (ORC only) tests are showing stats diff in different OSes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11030) Enhance storage layer to create one delta file per write
[ https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598284#comment-14598284 ] Hive QA commented on HIVE-11030: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741324/HIVE-11030.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9050 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4353/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4353/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4353/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741324 - PreCommit-HIVE-TRUNK-Build Enhance storage layer to create one delta file per write Key: HIVE-11030 URL: https://issues.apache.org/jira/browse/HIVE-11030 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch Currently each txn using ACID insert/update/delete will generate a delta directory like delta_100_101. In order to support multi-statement transactions we must generate one delta per operation within the transaction so the deltas would be named like delta_100_101_0001, etc. Support for MERGE (HIVE-10924) would need the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL
[ https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598355#comment-14598355 ] Ashutosh Chauhan commented on HIVE-11007: - +1 CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL - Key: HIVE-11007 URL: https://issues.apache.org/jira/browse/HIVE-11007 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11007.01.patch, HIVE-11007.02.patch In dynamic partitioning case, for example, we are going to have TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than SEL2, which causes error in return path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7347) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter
[ https://issues.apache.org/jira/browse/HIVE-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Howell updated HIVE-7347: --- Tags: webhcat, Pig, execute, schema, MicrosoftSupport (was: webhcat, Pig, execute, schema, MicrosoftCSS) Pig Query with defined schema fails when submitted via WebHcat 'execute' parameter -- Key: HIVE-7347 URL: https://issues.apache.org/jira/browse/HIVE-7347 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0, 0.13.0 Environment: HDP 2.1 on Windows; HDInsight deploying HDP 2.1 Reporter: Azim Uddin 1. Consider you are using HDP 2.1 on Windows, and you have a tsv file (named rawInput.tsv) like this (just an example, you can use any) - http://a.com http://b.com1 http://b.com http://c.com2 http://d.com http://e.com3 2. With the tsv file uploaded to HDFS, run the following Pig job via WebHcat using 'execute' parameter, something like this- curl.exe -d execute=rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); readyInput = limit rawInput 10; store readyInput into '/test/output' using PigStorage; -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any The job fails with exit code 255 - [main] org.apache.hive.hcatalog.templeton.tool.LaunchMapper: templeton: job failed with exit code 255 From stderr, we see the following -readyInput was unexpected at this time. 3. The same job works via Pig Grunt Shell and if we use the WebHcat 'file' parameter, instead of 'execute' parameter - a. Create a pig script called pig-script.txt with the query below and put it HDFS /test/script rawInput = load '/test/data' using PigStorage as (SourceUrl:chararray, DestinationUrl:chararray, InstanceCount:int); readyInput = limit rawInput 10; store readyInput into '/test/Output' using PigStorage; b. Run the job via webHcat: curl.exe -d file=/test/script/pig_script.txt -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any 4. Also, WebHcat 'execute' option works if we don't define the schema in the Pig query, something like this- curl.exe -d execute=rawInput = load '/test/data' using PigStorage; readyInput = limit rawInput 10; store readyInput into '/test/output' using PigStorage; -d statusdir=/test/status http://localhost:50111/templeton/v1/pig?user.name=hadoop; --user hadoop:any Ask is- WebHcat 'execute' option should work for Pig query with schema defined - it appears to be a parsing issue with WebHcat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11030) Enhance storage layer to create one delta file per write
[ https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598314#comment-14598314 ] Eugene Koifman commented on HIVE-11030: --- The test failure is not related. The same failure appears in other runs w/o this patch. [~alangates] could you review please? Enhance storage layer to create one delta file per write Key: HIVE-11030 URL: https://issues.apache.org/jira/browse/HIVE-11030 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch Currently each txn using ACID insert/update/delete will generate a delta directory like delta_100_101. In order to support multi-statement transactions we must generate one delta per operation within the transaction so the deltas would be named like delta_100_101_0001, etc. Support for MERGE (HIVE-10924) would need the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598351#comment-14598351 ] Xuefu Zhang commented on HIVE-10996: Okay. Thanks for the explanation. Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10790) orc file sql excute fail
[ https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598398#comment-14598398 ] Ashutosh Chauhan commented on HIVE-10790: - +1 orc file sql excute fail - Key: HIVE-10790 URL: https://issues.apache.org/jira/browse/HIVE-10790 Project: Hive Issue Type: Bug Components: API Affects Versions: 0.13.0, 0.14.0 Environment: Hadoop 2.5.0-cdh5.3.2 hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Labels: patch Fix For: 0.14.1 Attachments: HIVE-10790.0.patch.txt from a text table insert into a orc table like as {code:sql} insert overwrite table custom.rank_less_orc_none partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where logdate='2015051500'; {code} will throws a error , {noformat} Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767) at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227) ... 8 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused
[ https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaowei wang reopened HIVE-10983: - LazySimpleSerDe bug ,when Text is reused -- Key: HIVE-10983 URL: https://issues.apache.org/jira/browse/HIVE-10983 Project: Hive Issue Type: Bug Components: API, CLI Affects Versions: 0.14.0 Environment: Hadoop 2.3.0-cdh5.0.0 Hive 0.14 Reporter: xiaowei wang Assignee: xiaowei wang Priority: Critical Labels: patch Fix For: 0.14.1, 1.2.0 Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt When i query data from a lzo table , I found in results : the length of the current row is always largr than the previous row, and sometimes,the current row contains the contents of the previous row。 For example ,i execute a sql ,select * from web_searchhub where logdate=2015061003, the result of sql see blow.Notice that ,the second row content contains the first row content. INFO [03:00:05.589] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003 INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 session=901,thread=223ession=3151,thread=254 2015061003 The content of origin lzo file content see below ,just 2 rows. INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb session=3148,thread=285 INFO [03:00:05.635] HttpFrontServer::FrontSH msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285 I think this error is caused by the Text reuse,and I found the solutions . Addicational, table create sql is : CREATE EXTERNAL TABLE `web_searchhub`( `line` string) PARTITIONED BY ( `logdate` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\U' WITH SERDEPROPERTIES ( 'serialization.encoding'='GBK') STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat; LOCATION 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files
[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598799#comment-14598799 ] Pengcheng Xiong commented on HIVE-11043: [~prasanth_j] and [~gopalv], [~jpullokkaran] asked me to track the recent constant test cases failing on master and I came here. It seems that this patch causes the problem. At the first sight, authorization_delete.q sounds unrelated. However, it includes creating a table stored as ORC. If I revert this patch, the test cases can pass. Could you guys take a look? Thanks. ORC split strategies should adapt based on number of files -- Key: HIVE-11043 URL: https://issues.apache.org/jira/browse/HIVE-11043 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch ORC split strategies added in HIVE-10114 chose strategies based on average file size. It would be beneficial to choose a different strategy based on number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)
[ https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598804#comment-14598804 ] Greg Senia commented on HIVE-10729: --- Gunther Hagleitner and Matt Mcline Using this Patch against my JIRA HIVE-11051 and the test case on Hadoop 2.4.1 with Hive 1.2.0 and Tez 0.5.4 it still fails: Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002246948195,svcrqst_id:003629537980,svcrqst_crt_dts:2015-04-24 12:48:37.859683,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-24 12:48:37.859683,crsr_lupdt:null,cntevsds_lupdt:2015-04-24 12:48:40.499238,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {cnctevn_id:002246948195,svcrqst_id:003629537980,svcrqst_crt_dts:2015-04-24 12:48:37.859683,subject_seq_no:1,plan_component:HMOM1 ,cust_segment:RM ,cnctyp_cd:001,cnctmd_cd:D02,cnctevs_cd:007,svcrtyp_cd:335,svrstyp_cd:088,cmpltyp_cd: ,catsrsn_cd:,apealvl_cd: ,cnstnty_cd:001,svcrqst_asrqst_ind:Y,svcrqst_rtnorig_in:N,svcrqst_vwasof_dt:null,sum_reason_cd:98,sum_reason:Exclude,crsr_master_claim_index:null,svcrqct_cds:[ ],svcrqst_lupdt:2015-04-24 12:48:37.859683,crsr_lupdt:null,cntevsds_lupdt:2015-04-24 12:48:40.499238,ignore_me:1,notes:null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: Index: 0, Size: 0 at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:426) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:122) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) ... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.set(ArrayList.java:426) at org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer.fixupComplexObjects(MapJoinBytesTableContainer.java:424) at org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.uppack(HybridHashTableContainer.java:875) at org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.first(HybridHashTableContainer.java:845) at org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$ReusableRowContainer.first(HybridHashTableContainer.java:722) at org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.first(UnwrapRowContainer.java:62) at org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.first(UnwrapRowContainer.java:33) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:650) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:756) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:414) ... 23 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1434641270368_13820_2_01 [Map 2] killed/failed due to:null]DAG failed due to vertex failure. failedVertices:1 killedVertices:0 Query failed when select complex columns from joinned table (tez map join only) --- Key: HIVE-10729 URL: https://issues.apache.org/jira/browse/HIVE-10729 Project: Hive Issue Type: Bug Components:
[jira] [Commented] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI
[ https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598822#comment-14598822 ] Adam Kunicki commented on HIVE-11089: - Additionally, it seems that the Hive docs https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest reflect the HiveEndPoint API prior to HIVE-8427 which does allow for specifying a proxy user. Hive Streaming: connection fails when using a proxy user UGI Key: HIVE-11089 URL: https://issues.apache.org/jira/browse/HIVE-11089 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Adam Kunicki Labels: ACID, Streaming HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
[ https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598894#comment-14598894 ] Hive QA commented on HIVE-11079: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741433/HIVE-11079.5.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9019 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4360/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4360/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4360/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741433 - PreCommit-HIVE-TRUNK-Build Fix qfile tests that fail on Windows due to CR/character escape differences --- Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, HIVE-11079.3.patch, HIVE-11079.4.patch, HIVE-11079.5.patch A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files
[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598893#comment-14598893 ] Gopal V commented on HIVE-11043: [~prasanth_j]: sure, looks like errors when reading footers for the 1 file/1 split case. The error is actually {code} Caused by: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:3212) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} ORC split strategies should adapt based on number of files -- Key: HIVE-11043 URL: https://issues.apache.org/jira/browse/HIVE-11043 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch ORC split strategies added in HIVE-10114 chose strategies based on average file size. It would be beneficial to choose a different strategy based on number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10844) Combine equivalent Works for HoS[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-10844: - Attachment: HIVE-10844.3-spark.patch Combine equivalent Works for HoS[Spark Branch] -- Key: HIVE-10844 URL: https://issues.apache.org/jira/browse/HIVE-10844 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-10844.1-spark.patch, HIVE-10844.2-spark.patch, HIVE-10844.3-spark.patch Some Hive queries(like [TPCDS Q39|https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-tpcds/query39.sql]) may share the same subquery, which translated into sperate, but equivalent Works in SparkWork, combining these equivalent Works into a single one would help to benifit from following dynamic RDD caching optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files
[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597289#comment-14597289 ] Hive QA commented on HIVE-11043: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741166/HIVE-11043.2.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9014 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4 org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3] {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4345/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4345/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4345/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741166 - PreCommit-HIVE-TRUNK-Build ORC split strategies should adapt based on number of files -- Key: HIVE-11043 URL: https://issues.apache.org/jira/browse/HIVE-11043 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch ORC split strategies added in HIVE-10114 chose strategies based on average file size. It would be beneficial to choose a different strategy based on number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11043) ORC split strategies should adapt based on number of files
[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597295#comment-14597295 ] Prasanth Jayachandran commented on HIVE-11043: -- LGTM, +1. I don't think the test failures are related. ORC split strategies should adapt based on number of files -- Key: HIVE-11043 URL: https://issues.apache.org/jira/browse/HIVE-11043 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch ORC split strategies added in HIVE-10114 chose strategies based on average file size. It would be beneficial to choose a different strategy based on number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597389#comment-14597389 ] Hive QA commented on HIVE-10438: {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741206/HIVE-10438.patch {color:green}SUCCESS:{color} +1 9013 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4346/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4346/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4346/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12741206 - PreCommit-HIVE-TRUNK-Build Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.2.0 Reporter: Rohit Dholakia Assignee: Rohit Dholakia Labels: patch Attachments: HIVE-10438.patch, Proposal-rscompressor.pdf, Results_Snappy_protobuf_TBinary_TCompact.pdf, hs2driver-master.zip, hs2resultSetcompressor.zip, readme.txt This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
[ https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597469#comment-14597469 ] Hive QA commented on HIVE-11079: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741205/HIVE-11079.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9018 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partitioned_date_time {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4347/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4347/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4347/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741205 - PreCommit-HIVE-TRUNK-Build Fix qfile tests that fail on Windows due to CR/character escape differences --- Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11079.1.patch A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-11043) ORC split strategies should adapt based on number of files
[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reopened HIVE-11043: -- [~pxiong] My bad. From the looks of the the test failures seemed unrelated. I reverted the patch on branch-1 and master. [~gopalv] Can you look at the test failures? ORC split strategies should adapt based on number of files -- Key: HIVE-11043 URL: https://issues.apache.org/jira/browse/HIVE-11043 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch ORC split strategies added in HIVE-10114 chose strategies based on average file size. It would be beneficial to choose a different strategy based on number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11043) ORC split strategies should adapt based on number of files
[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598829#comment-14598829 ] Prasanth Jayachandran edited comment on HIVE-11043 at 6/24/15 3:54 AM: --- [~pxiong] My bad. From the looks of it, the test failures seemed unrelated. I reverted the patch on branch-1 and master. [~gopalv] Can you look at the test failures? was (Author: prasanth_j): [~pxiong] My bad. From the looks of the the test failures seemed unrelated. I reverted the patch on branch-1 and master. [~gopalv] Can you look at the test failures? ORC split strategies should adapt based on number of files -- Key: HIVE-11043 URL: https://issues.apache.org/jira/browse/HIVE-11043 Project: Hive Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanth Jayachandran Assignee: Gopal V Fix For: 2.0.0 Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch ORC split strategies added in HIVE-10114 chose strategies based on average file size. It would be beneficial to choose a different strategy based on number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI
[ https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kunicki updated HIVE-11089: Description: HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). If the goal is to determine whether this is a secure cluster, we could instead call: {code} this.secureMode = ugi == null ? ugi.getRealAuthenticationMethod() != SIMPLE {code} This change would for both proxy users and real users. See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a was: HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). Hive Streaming: connection fails when using a proxy user UGI Key: HIVE-11089 URL: https://issues.apache.org/jira/browse/HIVE-11089 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Adam Kunicki Labels: ACID, Streaming HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). If the goal is to determine whether this is a secure cluster, we could instead call: {code} this.secureMode = ugi == null ? ugi.getRealAuthenticationMethod() != SIMPLE {code} This change would for both proxy users and real users. See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598838#comment-14598838 ] Hive QA commented on HIVE-10895: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741385/HIVE-10895.2.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9020 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4 org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3] {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4359/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4359/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4359/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741385 - PreCommit-HIVE-TRUNK-Build ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI
[ https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kunicki updated HIVE-11089: Description: HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} It also seems that between 13.1 and 0.14 the newConnection() method that includes a proxy user has been removed. for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a was: HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). If the goal is to determine whether this is a secure cluster, we could instead call: {code} this.secureMode = ugi == null ? ugi.getRealAuthenticationMethod() != SIMPLE {code} This change would for both proxy users and real users. See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a Hive Streaming: connection fails when using a proxy user UGI Key: HIVE-11089 URL: https://issues.apache.org/jira/browse/HIVE-11089 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Adam Kunicki Labels: ACID, Streaming HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} It also seems that between 13.1 and 0.14 the newConnection() method that includes a proxy user has been removed. for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI
[ https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kunicki updated HIVE-11089: Description: HIVE-7508 Add Kerberos Support seems to also remove the ability to specify a proxy user. HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} It also seems that between 13.1 and 0.14 the newConnection() method that includes a proxy user has been removed. for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a was: HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} It also seems that between 13.1 and 0.14 the newConnection() method that includes a proxy user has been removed. for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a Hive Streaming: connection fails when using a proxy user UGI Key: HIVE-11089 URL: https://issues.apache.org/jira/browse/HIVE-11089 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Adam Kunicki Labels: ACID, Streaming HIVE-7508 Add Kerberos Support seems to also remove the ability to specify a proxy user. HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). See lines 273, 274 of HiveEndPoint.java {code} this.secureMode = ugi==null ? false : ugi.hasKerberosCredentials(); this.msClient = getMetaStoreClient(endPoint, conf, secureMode); {code} It also seems that between 13.1 and 0.14 the newConnection() method that includes a proxy user has been removed. for reference: https://github.com/apache/hive/commit/8e423a12db47759196c24535fbc32236b79f464a -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
[ https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-11079: -- Attachment: HIVE-11079.4.patch Actually it looks like a couple of the test fixes are not necessary - I had 2 different build environments, and one of them had git core.crlf=true which caused the files to have Windows style CRLF line endings, which affects some tests. In patch v4 I've removed the changes for decimal_udf2.q and describe_comment_indent.q. Fix qfile tests that fail on Windows due to CR/character escape differences --- Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, HIVE-11079.3.patch, HIVE-11079.4.patch A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-11053: - Assignee: GAOLUN Add more tests for HIVE-10844[Spark Branch] --- Key: HIVE-11053 URL: https://issues.apache.org/jira/browse/HIVE-11053 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: GAOLUN Priority: Minor Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598755#comment-14598755 ] Chengxiang Li commented on HIVE-10999: -- Seems the latest upload patch pass all the tests, except org.apache.hadoop.hive.cli.TestCliDriver.initializationError. :) Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-11053: - Assignee: GAOLUN Add more tests for HIVE-10844[Spark Branch] --- Key: HIVE-11053 URL: https://issues.apache.org/jira/browse/HIVE-11053 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: GAOLUN Priority: Minor Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-11053: - Assignee: (was: GAOLUN) Add more tests for HIVE-10844[Spark Branch] --- Key: HIVE-11053 URL: https://issues.apache.org/jira/browse/HIVE-11053 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Priority: Minor Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
[ https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-11079: -- Attachment: HIVE-11079.5.patch Patch needed rebase after HIVE-11037 Fix qfile tests that fail on Windows due to CR/character escape differences --- Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, HIVE-11079.3.patch, HIVE-11079.4.patch, HIVE-11079.5.patch A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598760#comment-14598760 ] Hive QA commented on HIVE-10233: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741403/HIVE-10233.13.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9016 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4 org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3] {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4358/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4358/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4358/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741403 - PreCommit-HIVE-TRUNK-Build Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598766#comment-14598766 ] Xuefu Zhang commented on HIVE-10999: [~chengxiang li], [~spena] is aware of the problem and investigating it. In the mean time, please feel free move this JIRA forward, ignoring that failure. Thanks. Upgrade Spark dependency to 1.4 [Spark Branch] -- Key: HIVE-10999 URL: https://issues.apache.org/jira/browse/HIVE-10999 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-10999.1-spark.patch, HIVE-10999.2-spark.patch, HIVE-10999.3-spark.patch, HIVE-10999.3-spark.patch Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 1.4.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10553) Remove hardcoded Parquet references from SearchArgumentImpl
[ https://issues.apache.org/jira/browse/HIVE-10553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598665#comment-14598665 ] Hive QA commented on HIVE-10553: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741348/HIVE-10553.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9016 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4 org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3] {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4356/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4356/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4356/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741348 - PreCommit-HIVE-TRUNK-Build Remove hardcoded Parquet references from SearchArgumentImpl --- Key: HIVE-10553 URL: https://issues.apache.org/jira/browse/HIVE-10553 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Owen O'Malley Attachments: HIVE-10553.patch, HIVE-10553.patch, HIVE-10553.patch SARGs currently depend on Parquet code, which causes a tight coupling between parquet releases and storage-api versions. Move Parquet code out to its own RecordReader, similar to ORC's SargApplier implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11037) HiveOnTez: make explain user level = true as default
[ https://issues.apache.org/jira/browse/HIVE-11037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598668#comment-14598668 ] Hive QA commented on HIVE-11037: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741353/HIVE-11037.08.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4357/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4357/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4357/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-4357/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 55c6d41 HIVE-10790 : orc write on viewFS throws exception (Xioawei Wang via Ashutosh Chauhan) + git clean -f -d Removing serde/src/java/org/apache/hadoop/hive/ql/io/sarg/ExpressionTree.java + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 55c6d41 HIVE-10790 : orc write on viewFS throws exception (Xioawei Wang via Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12741353 - PreCommit-HIVE-TRUNK-Build HiveOnTez: make explain user level = true as default Key: HIVE-11037 URL: https://issues.apache.org/jira/browse/HIVE-11037 Project: Hive Issue Type: Improvement Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11037.01.patch, HIVE-11037.02.patch, HIVE-11037.03.patch, HIVE-11037.04.patch, HIVE-11037.05.patch, HIVE-11037.06.patch, HIVE-11037.07.patch, HIVE-11037.08.patch In Hive-9780, we introduced a new level of explain for hive on tez. We would like to make it running by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-10233: -- Attachment: HIVE-10233.14.patch fix indent in .14 Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch, HIVE-10233.14.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
[ https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598686#comment-14598686 ] Gunther Hagleitner commented on HIVE-11079: --- test failures are unrelated (failed on next run as well). Fix qfile tests that fail on Windows due to CR/character escape differences --- Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, HIVE-11079.3.patch A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
[ https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598558#comment-14598558 ] Hive QA commented on HIVE-11079: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12741338/HIVE-11079.2.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9021 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_delete_own_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_update_own_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28 org.apache.hive.hcatalog.pig.TestHCatStorer.testEmptyStore[3] {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4355/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4355/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4355/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12741338 - PreCommit-HIVE-TRUNK-Build Fix qfile tests that fail on Windows due to CR/character escape differences --- Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, HIVE-11079.3.patch A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10233) Hive on tez: memory manager for grace hash join
[ https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-10233: -- Attachment: HIVE-10233.13.patch Fix for unions. Hive on tez: memory manager for grace hash join --- Key: HIVE-10233 URL: https://issues.apache.org/jira/browse/HIVE-10233 Project: Hive Issue Type: Bug Components: Tez Affects Versions: llap, 2.0.0 Reporter: Vikram Dixit K Assignee: Gunther Hagleitner Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch, HIVE-10233.08.patch, HIVE-10233.09.patch, HIVE-10233.10.patch, HIVE-10233.11.patch, HIVE-10233.12.patch, HIVE-10233.13.patch We need a memory manager in llap/tez to manage the usage of memory across threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11089) Hive Streaming: connection fails when using a proxy user UGI
[ https://issues.apache.org/jira/browse/HIVE-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598620#comment-14598620 ] Adam Kunicki commented on HIVE-11089: - HIVE-8427 introduces a change that causes incorrect behavior when using a proxy user with HiveEndPoint.newConnection() Hive Streaming: connection fails when using a proxy user UGI Key: HIVE-11089 URL: https://issues.apache.org/jira/browse/HIVE-11089 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0, 1.0.0, 1.2.0 Reporter: Adam Kunicki Labels: ACID, Streaming HIVE-8427 adds a call to ugi.hasKerberosCredentials() to check whether the connection is supposed to be a secure connection. This however breaks support for Proxy Users as a proxy user UGI will always return false to hasKerberosCredentials(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11079) Fix qfile tests that fail on Windows due to CR/character escape differences
[ https://issues.apache.org/jira/browse/HIVE-11079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-11079: -- Attachment: HIVE-11079.3.patch Patch v3 - updating decimal_udf2.q as well, which has stats/file size differences on Windows due to CR differences on text files. Fixing the test by changing the table type to ORC rather than text, which should have more consistent data size between platforms. The actual stats values look different from before, but stats are not really important for this test. Fix qfile tests that fail on Windows due to CR/character escape differences --- Key: HIVE-11079 URL: https://issues.apache.org/jira/browse/HIVE-11079 Project: Hive Issue Type: Bug Components: Tests Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-11079.1.patch, HIVE-11079.2.patch, HIVE-11079.3.patch A few qfile tests are failing on Windows due to a couple of windows-specific issues: - The table comment for the test includes a CR character, which is different on Windows compared to Unix. - The partition path in the test includes a space character. Unlike Unix, on Windows space characters in Hive paths are escaped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10173) ThreadLocal synchronized initialvalue() is irrelevant in JDK7
[ https://issues.apache.org/jira/browse/HIVE-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598566#comment-14598566 ] Ashutosh Chauhan commented on HIVE-10173: - +1 LGTM ThreadLocal synchronized initialvalue() is irrelevant in JDK7 - Key: HIVE-10173 URL: https://issues.apache.org/jira/browse/HIVE-10173 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Ferdinand Xu Priority: Minor Attachments: HIVE-10173.patch The threadlocals need not synchronize the calls to initialvalue(), since that is effectively going to be called once per-thread in JDK7. The anti-pattern lives on due to a very old JDK bug - https://bugs.openjdk.java.net/browse/JDK-6550283 {code} $ git grep --name-only -c protected.*synchronized.*initialValue common/src/java/org/apache/hadoop/hive/conf/LoopingByteArrayInputStream.java contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesInput.java contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesOutput.java contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesRecordInput.java contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesRecordOutput.java contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableInput.java contrib/src/java/org/apache/hadoop/hive/contrib/util/typedbytes/TypedBytesWritableOutput.java metastore/src/java/org/apache/hadoop/hive/metastore/Deadline.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ql/src/java/org/apache/hadoop/hive/ql/session/OperationLog.java serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java service/src/java/org/apache/hive/service/auth/TSetIpAddressProcessor.java service/src/java/org/apache/hive/service/cli/session/SessionManager.java shims/common/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge.java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)