[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614833#comment-14614833 ] Vaibhav Gumashta commented on HIVE-10895: - +1 on the patch. Tests will take few hrs to report results. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10895) ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources
[ https://issues.apache.org/jira/browse/HIVE-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614773#comment-14614773 ] Vaibhav Gumashta commented on HIVE-10895: - [~aihuaxu] I'm back to work. Will review and try your patch today. ObjectStore does not close Query objects in some calls, causing a potential leak in some metastore db resources --- Key: HIVE-10895 URL: https://issues.apache.org/jira/browse/HIVE-10895 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 Reporter: Takahiko Saito Assignee: Aihua Xu Attachments: HIVE-10895.1.patch, HIVE-10895.2.patch, HIVE-10895.3.patch During testing, we've noticed Oracle db running out of cursors. Might be related to this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614686#comment-14614686 ] Hive QA commented on HIVE-11053: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12743675/HIVE-11053.2-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7993 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.initializationError org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_rdd_cache org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/921/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/921/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-921/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12743675 - PreCommit-HIVE-SPARK-Build Add more tests for HIVE-10844[Spark Branch] --- Key: HIVE-11053 URL: https://issues.apache.org/jira/browse/HIVE-11053 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: GaoLun Priority: Minor Attachments: HIVE-11053.1-spark.patch, HIVE-11053.2-spark.patch Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10281) Update people page for the new committers
[ https://issues.apache.org/jira/browse/HIVE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614960#comment-14614960 ] Jesus Camacho Rodriguez commented on HIVE-10281: +1, thanks [~Ferd]! Update people page for the new committers - Key: HIVE-10281 URL: https://issues.apache.org/jira/browse/HIVE-10281 Project: Hive Issue Type: Task Components: Website Reporter: Chao Sun Assignee: Ferdinand Xu Attachments: HIVE-10281.patch NO PRECOMMIT TESTS Add Jesus and Chinna as committer in the people page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10996: --- Affects Version/s: (was: 1.0.0) Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Fix For: 2.0.0, 1.2.2 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11129) Issue a warning when copied from UTF-8 to ISO 8859-1
[ https://issues.apache.org/jira/browse/HIVE-11129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615066#comment-14615066 ] Aihua Xu commented on HIVE-11129: - That should be the case. While seems like the warning may be too restrict since converting between UTF-8, UTF-16 and UTF-32 should cause no loss. let me handle that case. Issue a warning when copied from UTF-8 to ISO 8859-1 Key: HIVE-11129 URL: https://issues.apache.org/jira/browse/HIVE-11129 Project: Hive Issue Type: Bug Components: File Formats Reporter: Aihua Xu Assignee: Aihua Xu Fix For: 2.0.0 Attachments: HIVE-11129.patch Copying data from a table using UTF-8 encoding to one using ISO 8859-1 encoding causes data corruption without warning. {noformat} CREATE TABLE person_utf8 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='UTF8'); {noformat} Put the following data in the table: Müller,Thomas Jørgensen,Jørgen Vega,Andrés 中村,浩人 אביה,נועם {noformat} CREATE TABLE person_2 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1') AS select * from person_utf8; {noformat} expected to get mangled data but we should give a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11164) WebHCat should log contents of HiveConf on startup
[ https://issues.apache.org/jira/browse/HIVE-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11164: -- Attachment: HIVE-11164.patch WebHCat should log contents of HiveConf on startup -- Key: HIVE-11164 URL: https://issues.apache.org/jira/browse/HIVE-11164 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11164.patch There are a few places in WebHCat that do new HiveConf() but HiveConf is not added to AppConfig. Need to log HiveConf contents on startup to help diagnosing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11013) LLAP: MiniTez tez_join_hash test on the branch fails with NPE (initializeOp not called?)
[ https://issues.apache.org/jira/browse/HIVE-11013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11013: Attachment: HIVE-11013.01.patch master patch. Hopefully HiveQA will also run LLAP: MiniTez tez_join_hash test on the branch fails with NPE (initializeOp not called?) Key: HIVE-11013 URL: https://issues.apache.org/jira/browse/HIVE-11013 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11013.01.patch, HIVE-11013.patch Line numbers are shifted due to logging; the NPE is at {noformat} hashMapRowGetters = new ReusableGetAdaptor[mapJoinTables.length]; {noformat} So looks like mapJoinTables is null. I added logging to see if they could be set to null from cache, but that doesn't seem to be the case. Looks like initializeOp is not called. {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception from MapJoinOperator : null at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:428) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:872) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:872) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:656) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:659) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:755) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:315) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:278) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:271) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361) ... 17 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:339) ... 29 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10940: Assignee: (was: Sergey Shelukhin) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11130) Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object
[ https://issues.apache.org/jira/browse/HIVE-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615574#comment-14615574 ] Alan Gates commented on HIVE-11130: --- The only comment I have is that in the HiveTxnManagerImpl implementations of lockTable, etc. I think it would be good to call HiveTxnManger.supportsExplicitLock and throw if that returns true. This avoids an erroneous code path ending up there from DbTxnManager, which should never call these methods. Other than that, +1. Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object Key: HIVE-11130 URL: https://issues.apache.org/jira/browse/HIVE-11130 Project: Hive Issue Type: Sub-task Components: Locking Affects Versions: 2.0.0 Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-11130.patch This is just a refactoring step which keeps the current logic, but it exposes the explicit lock/unlock table and database in HiveTxnManager which should be implemented differently by the subclasses ( currently it's not. e.g., for ZooKeeper implementation, we should lock table and database when we try to lock the table). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615615#comment-14615615 ] Gopal V commented on HIVE-10940: [~hagleitn]: this fixes the leak, but reintroduces the performance issue. Added log lines and it showed for query27 {code} 2015-07-06 13:08:31,521 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: hasObj = false, hasExpr=true 2015-07-06 13:08:31,522 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: hive.io.file.readcolumn.ids=0,6 2015-07-06 13:08:31,522 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: hive.io.file.readcolumn.names=d_date_sk,d_year {code} so it hits the serialize codepath still {code} if (!hasObj) { serializedFilterObj = Utilities.serializeObject(filterObject); } {code} HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gunther Hagleitner Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: udf_cosine_similarity-v01.patch) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Nishant Kelkar Labels: CosineSimilarity, SimilarityMetric, UDF Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, HIVE-9557.3.patch algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11011) LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615650#comment-14615650 ] Vikram Dixit K commented on HIVE-11011: --- +1 LGTM LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE -- Key: HIVE-11011 URL: https://issues.apache.org/jira/browse/HIVE-11011 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11011.patch Original issue here was fixed by TEZ-2568. The new issue is: {noformat} 2015-07-01 15:53:44,374 ERROR [main]: SessionState (SessionState.java:printError(987)) - Vertex failed, vertexName=Map 2, vertexId=vertex_1435791127343_0002_2_00, diagnostics=[Task failed, taskId=task_1435791127343_0002_2_00_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: attempt_1435791127343_0002_2_00_00_0:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:255) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeLocalWork(CommonMergeJoinOperator.java:631) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:221) ... 15 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9272) Tests for utf-8 support
[ https://issues.apache.org/jira/browse/HIVE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615684#comment-14615684 ] Aswathy Chellammal Sreekumar commented on HIVE-9272: [~ekoifman] Could you please review the attached patch and see if it solves the issue Tests for utf-8 support --- Key: HIVE-9272 URL: https://issues.apache.org/jira/browse/HIVE-9272 Project: Hive Issue Type: Test Components: Tests, WebHCat Affects Versions: 0.14.0 Reporter: Aswathy Chellammal Sreekumar Assignee: Aswathy Chellammal Sreekumar Priority: Minor Attachments: HIVE-9272.1.patch, HIVE-9272.2.patch, HIVE-9272.3.patch, HIVE-9272.4.patch, HIVE-9272.5.patch, HIVE-9272.6.patch, HIVE-9272.7.patch, HIVE-9272.8.patch, HIVE-9272.9.patch, HIVE-9272.patch Including some test cases for utf8 support in webhcat. The first four tests invoke hive, pig, mapred and streaming apis for testing the utf8 support for data processed, file names and job name. The last test case tests the filtering of job name with utf8 character -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615273#comment-14615273 ] Jesus Camacho Rodriguez commented on HIVE-10996: Pushed to 1.1 branch. Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Fix For: 1.1.1, 2.0.0, 1.2.2 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615282#comment-14615282 ] Alain Blankenburg-Schröder commented on HIVE-5317: -- Thanks for your email. Unfortunately, you will no longer be able to reach me under this mailaccount. Please note that your email will not be forwarded. For urgent inquiries, please contact my colleague Philipp Kölmel via email p.koel...@bigpoint.netmailto:p.koel...@bigpoint.net. Best regards, Alain Blankenburg-Schröder Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11184) Lineage - ExprProcFactory#getExprString may throw NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-11184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-11184: --- Fix Version/s: 2.0.0 Lineage - ExprProcFactory#getExprString may throw NullPointerException -- Key: HIVE-11184 URL: https://issues.apache.org/jira/browse/HIVE-11184 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 2.0.0 Attachments: HIVE-11184.1.patch ColumnInfo may have null alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11011) LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-11011: --- Assignee: Sergey Shelukhin LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE -- Key: HIVE-11011 URL: https://issues.apache.org/jira/browse/HIVE-11011 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Original issue here was fixed by TEZ-2568. The new issue is: {noformat} 2015-07-01 15:53:44,374 ERROR [main]: SessionState (SessionState.java:printError(987)) - Vertex failed, vertexName=Map 2, vertexId=vertex_1435791127343_0002_2_00, diagnostics=[Task failed, taskId=task_1435791127343_0002_2_00_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: attempt_1435791127343_0002_2_00_00_0:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:255) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeLocalWork(CommonMergeJoinOperator.java:631) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:221) ... 15 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11170) port parts of HIVE-11015 to master for ease of future merging
[ https://issues.apache.org/jira/browse/HIVE-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11170: Attachment: HIVE-11170.01.patch Same patch for HiveQA port parts of HIVE-11015 to master for ease of future merging - Key: HIVE-11170 URL: https://issues.apache.org/jira/browse/HIVE-11170 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 2.0.0 Attachments: HIVE-11170.01.patch, HIVE-11170.patch That patch changes how IOContext is created (file structure) and adds tests; I will merge non-LLAP parts of it now, so it's easier to merge later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4239) Remove lock on compilation stage
[ https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615692#comment-14615692 ] Thejas M Nair commented on HIVE-4239: - +1 Sorry about the delay! Remove lock on compilation stage Key: HIVE-4239 URL: https://issues.apache.org/jira/browse/HIVE-4239 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Carl Steinbach Assignee: Sergey Shelukhin Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4239) Remove lock on compilation stage
[ https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615552#comment-14615552 ] Sergey Shelukhin commented on HIVE-4239: [~thejas] I just realized this actually still needs review :) Remove lock on compilation stage Key: HIVE-4239 URL: https://issues.apache.org/jira/browse/HIVE-4239 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor Reporter: Carl Steinbach Assignee: Sergey Shelukhin Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call
[ https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10940: --- Assignee: Gunther Hagleitner HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call - Key: HIVE-10940 URL: https://issues.apache.org/jira/browse/HIVE-10940 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gunther Hagleitner Fix For: 2.0.0 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, HIVE-10940.03.patch, HIVE-10940.patch {code} String filterText = filterExpr.getExprString(); String filterExprSerialized = Utilities.serializeExpression(filterExpr); {code} the serializeExpression initializes Kryo and produces a new packed object for every split. HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters. And Kryo is very slow to do this for a large filter clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11164) WebHCat should log contents of HiveConf on startup
[ https://issues.apache.org/jira/browse/HIVE-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615569#comment-14615569 ] Thejas M Nair commented on HIVE-11164: -- +1 WebHCat should log contents of HiveConf on startup -- Key: HIVE-11164 URL: https://issues.apache.org/jira/browse/HIVE-11164 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11164.patch There are a few places in WebHCat that do new HiveConf() but HiveConf is not added to AppConfig. Need to log HiveConf contents on startup to help diagnosing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11011) LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11011: Attachment: HIVE-11011.patch This appears to be branch-specific issue, the line that sets dummy ops for map-side record processor is missing. git blame does not give me conclusive results for when it was removed... re-adding it [~vikram.dixit] can you take a look? LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE -- Key: HIVE-11011 URL: https://issues.apache.org/jira/browse/HIVE-11011 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11011.patch Original issue here was fixed by TEZ-2568. The new issue is: {noformat} 2015-07-01 15:53:44,374 ERROR [main]: SessionState (SessionState.java:printError(987)) - Vertex failed, vertexName=Map 2, vertexId=vertex_1435791127343_0002_2_00, diagnostics=[Task failed, taskId=task_1435791127343_0002_2_00_00, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: attempt_1435791127343_0002_2_00_00_0:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:255) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeLocalWork(CommonMergeJoinOperator.java:631) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:221) ... 15 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9272) Tests for utf-8 support
[ https://issues.apache.org/jira/browse/HIVE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aswathy Chellammal Sreekumar updated HIVE-9272: --- Attachment: HIVE-9272.9.patch Tests for utf-8 support --- Key: HIVE-9272 URL: https://issues.apache.org/jira/browse/HIVE-9272 Project: Hive Issue Type: Test Components: Tests, WebHCat Affects Versions: 0.14.0 Reporter: Aswathy Chellammal Sreekumar Assignee: Aswathy Chellammal Sreekumar Priority: Minor Attachments: HIVE-9272.1.patch, HIVE-9272.2.patch, HIVE-9272.3.patch, HIVE-9272.4.patch, HIVE-9272.5.patch, HIVE-9272.6.patch, HIVE-9272.7.patch, HIVE-9272.8.patch, HIVE-9272.9.patch, HIVE-9272.patch Including some test cases for utf8 support in webhcat. The first four tests invoke hive, pig, mapred and streaming apis for testing the utf8 support for data processed, file names and job name. The last test case tests the filtering of job name with utf8 character -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11129) Issue a warning when copied from UTF-8 to ISO 8859-1
[ https://issues.apache.org/jira/browse/HIVE-11129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-11129: Attachment: (was: HIVE-11129.patch) Issue a warning when copied from UTF-8 to ISO 8859-1 Key: HIVE-11129 URL: https://issues.apache.org/jira/browse/HIVE-11129 Project: Hive Issue Type: Bug Components: File Formats Reporter: Aihua Xu Assignee: Aihua Xu Fix For: 2.0.0 Attachments: HIVE-11129.patch Copying data from a table using UTF-8 encoding to one using ISO 8859-1 encoding causes data corruption without warning. {noformat} CREATE TABLE person_utf8 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='UTF8'); {noformat} Put the following data in the table: Müller,Thomas Jørgensen,Jørgen Vega,Andrés 中村,浩人 אביה,נועם {noformat} CREATE TABLE person_2 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1') AS select * from person_utf8; {noformat} expected to get mangled data but we should give a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11129) Issue a warning when copied from UTF-8 to ISO 8859-1
[ https://issues.apache.org/jira/browse/HIVE-11129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-11129: Attachment: HIVE-11129.patch Issue a warning when copied from UTF-8 to ISO 8859-1 Key: HIVE-11129 URL: https://issues.apache.org/jira/browse/HIVE-11129 Project: Hive Issue Type: Bug Components: File Formats Reporter: Aihua Xu Assignee: Aihua Xu Fix For: 2.0.0 Attachments: HIVE-11129.patch Copying data from a table using UTF-8 encoding to one using ISO 8859-1 encoding causes data corruption without warning. {noformat} CREATE TABLE person_utf8 (name STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='UTF8'); {noformat} Put the following data in the table: Müller,Thomas Jørgensen,Jørgen Vega,Andrés 中村,浩人 אביה,נועם {noformat} CREATE TABLE person_2 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1') AS select * from person_utf8; {noformat} expected to get mangled data but we should give a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11184) Lineage - ExprProcFactory#getExprString may throw NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-11184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-11184: --- Attachment: HIVE-11184.1.patch Lineage - ExprProcFactory#getExprString may throw NullPointerException -- Key: HIVE-11184 URL: https://issues.apache.org/jira/browse/HIVE-11184 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Attachments: HIVE-11184.1.patch ColumnInfo may have null alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615279#comment-14615279 ] Paul Fosse commented on HIVE-5317: -- Merge command seems to be needed to do the first use case of the ACID feature. Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. Typically we will load the updates to a hive table and just want to merge that table to the existing dimension. We are either using the old way of doing this (ingest, reconcile, compact purge) or we are writing a Python script to process the updates. But we can't do 500K update statements an hour, so it doesn't seem the ACID does us any good for this use case until we have merge Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results
[ https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10996: --- Fix Version/s: 1.1.1 Aggregation / Projection over Multi-Join Inner Query producing incorrect results Key: HIVE-10996 URL: https://issues.apache.org/jira/browse/HIVE-10996 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 1.2.0, 1.1.0, 1.3.0, 2.0.0 Reporter: Gautam Kowshik Assignee: Jesus Camacho Rodriguez Priority: Critical Fix For: 1.1.1, 2.0.0, 1.2.2 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression. The following query (Q1) produces no results: {code} select s from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; {code} While this one (Q2) does produce results : {code} select * from ( select last.*, action.st2, action.n from ( select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp from (select * from purchase_history) purchase join (select * from cart_history) mevt on purchase.s = mevt.s where purchase.timestamp mevt.timestamp group by purchase.s, purchase.timestamp ) last join (select * from events) action on last.s = action.s and last.last_stage_timestamp = action.timestamp ) list; 1 21 20 Bob 1234 1 31 30 Bob 1234 3 51 50 Jeff1234 {code} The setup to test this is: {code} create table purchase_history (s string, product string, price double, timestamp int); insert into purchase_history values ('1', 'Belt', 20.00, 21); insert into purchase_history values ('1', 'Socks', 3.50, 31); insert into purchase_history values ('3', 'Belt', 20.00, 51); insert into purchase_history values ('4', 'Shirt', 15.50, 59); create table cart_history (s string, cart_id int, timestamp int); insert into cart_history values ('1', 1, 10); insert into cart_history values ('1', 2, 20); insert into cart_history values ('1', 3, 30); insert into cart_history values ('1', 4, 40); insert into cart_history values ('3', 5, 50); insert into cart_history values ('4', 6, 60); create table events (s string, st2 string, n int, timestamp int); insert into events values ('1', 'Bob', 1234, 20); insert into events values ('1', 'Bob', 1234, 30); insert into events values ('1', 'Bob', 1234, 25); insert into events values ('2', 'Sam', 1234, 30); insert into events values ('3', 'Jeff', 1234, 50); insert into events values ('4', 'Ted', 1234, 60); {code} I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue. I will note that if I convert the inner join to a table and select from that the issue does not appear. Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11171) Join reordering algorithm might introduce projects between joins
[ https://issues.apache.org/jira/browse/HIVE-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-11171: -- Attachment: HIVE-11171.02.patch Join reordering algorithm might introduce projects between joins Key: HIVE-11171 URL: https://issues.apache.org/jira/browse/HIVE-11171 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Attachments: HIVE-11171.01.patch, HIVE-11171.02.patch, HIVE-11171.patch, HIVE-11171.patch Join reordering algorithm might introduce projects between joins which causes multijoin optimization in SemanticAnalyzer to not kick in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez
[ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-10673: -- Attachment: HIVE-10673.9.patch Precommit tests never ran - re-uploading patch Dynamically partitioned hash join for Tez - Key: HIVE-10673 URL: https://issues.apache.org/jira/browse/HIVE-10673 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch, HIVE-10673.6.patch, HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 of the CPU was spent during sorting/merging. While this does not work for MR, for other execution engines (such as Tez), it is possible to create a reduce-side join that uses unsorted inputs in order to eliminate the sorting, which may be faster than a shuffle join. To join on unsorted inputs, we can use the hash join algorithm to perform the join in the reducer. This will require the small tables in the join to fit in the reducer/hash table for this to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615278#comment-14615278 ] Paul Fosse commented on HIVE-5317: -- It was moved into issue 10924. I don't know why. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615283#comment-14615283 ] Paul Fosse commented on HIVE-5317: -- By it, I mean Merge. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5456) Queries fail on avro backed table with empty partition
[ https://issues.apache.org/jira/browse/HIVE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-5456: - Labels: Avro AvroSerde (was: ) Queries fail on avro backed table with empty partition --- Key: HIVE-5456 URL: https://issues.apache.org/jira/browse/HIVE-5456 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.13.1 Reporter: Prasad Mujumdar Assignee: Chaoyu Tang Labels: Avro, AvroSerde Fix For: 0.14.0 Attachments: HIVE-5456.patch, HIVE-5456.patch The following query fails {noformat} DROP TABLE IF EXISTS episodes_partitioned; CREATE TABLE episodes_partitioned PARTITIONED BY (doctor_pt INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ('avro.schema.literal'='{ namespace: testing.hive.avro.serde, name: episodes, type: record, fields: [ { name:title, type:string, doc:episode title }, { name:air_date, type:string, doc:initial date }, { name:doctor, type:int, doc:main actor playing the Doctor in episode } ] }'); ALTER TABLE episodes_partitioned ADD PARTITION (doctor_pt=4); ALTER TABLE episodes_partitioned ADD PARTITION (doctor_pt=5); SELECT COUNT(*) FROM episodes_partitioned; {noformat} with following exception {noformat} java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat.getHiveRecordWriter(AvroContainerOutputFormat.java:61) at org.apache.hadoop.hive.ql.exec.Utilities.createEmptyFile(Utilities.java:2869) at org.apache.hadoop.hive.ql.exec.Utilities.createDummyFileForEmptyPartition(Utilities.java:2901) at org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:2825) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:381) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1409) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1187) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1015) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:883) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615433#comment-14615433 ] Alan Gates commented on HIVE-5317: -- Yes, agreed that the merge command is needed, and hence is being worked on HIVE-10924. Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10924) add support for MERGE statement
[ https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-10924: -- Issue Type: New Feature (was: Bug) add support for MERGE statement --- Key: HIVE-10924 URL: https://issues.apache.org/jira/browse/HIVE-10924 Project: Hive Issue Type: New Feature Components: Query Planning, Query Processor Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman add support for MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10986) Check of fs.trash.interval in HiveMetaStore should be consistent with Trash.moveToAppropriateTrash()
[ https://issues.apache.org/jira/browse/HIVE-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615438#comment-14615438 ] Eugene Koifman commented on HIVE-10986: --- try getting the FileSystem based on Path not Configuration. Check of fs.trash.interval in HiveMetaStore should be consistent with Trash.moveToAppropriateTrash() Key: HIVE-10986 URL: https://issues.apache.org/jira/browse/HIVE-10986 Project: Hive Issue Type: Sub-task Components: Hive Affects Versions: 1.2.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10986.2.patch, HIVE-10986.patch This is a followup to HIVE-10629. Trash.moveToAppropriateTrash() takes core-site.xml but HiveMetaStore checks hiveConf which is a problem when they disagree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support
[ https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5317: - Comment: was deleted (was: Thanks for your email. Unfortunately, you will no longer be able to reach me under this mailaccount. Please note that your email will not be forwarded. For urgent inquiries, please contact my colleague Philipp Kölmel via email p.koel...@bigpoint.netmailto:p.koel...@bigpoint.net. Best regards, Alain Blankenburg-Schröder ) Implement insert, update, and delete in Hive with full ACID support --- Key: HIVE-5317 URL: https://issues.apache.org/jira/browse/HIVE-5317 Project: Hive Issue Type: New Feature Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.14.0 Attachments: InsertUpdatesinHive.pdf Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are: * INSERT INTO tbl SELECT … * INSERT INTO tbl VALUES ... * UPDATE tbl SET … WHERE … * DELETE FROM tbl WHERE … * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ... * SET TRANSACTION LEVEL … * BEGIN/END TRANSACTION Use Cases * Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys. * Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance. * Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11016) MiniTez mergejoin test fails with Tez input error (issue in merge join under certain conditions)
[ https://issues.apache.org/jira/browse/HIVE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615446#comment-14615446 ] Sergey Shelukhin commented on HIVE-11016: - la la la MiniTez mergejoin test fails with Tez input error (issue in merge join under certain conditions) Key: HIVE-11016 URL: https://issues.apache.org/jira/browse/HIVE-11016 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11016.01.patch, HIVE-11016.patch Didn't spend a lot of time investigating, but from the code it looks like we shouldn't be calling it after false at least on this path (after false from next, pushRecord returns false, which causes fetchDone to be set for the tag; and fetchOneRow is not called if that is set; should be ok unless tags are messed up?) {noformat} 2015-06-15 17:28:17,272 ERROR [main]: SessionState (SessionState.java:printError(984)) - Vertex failed, vertexName=Reducer 2, vertexId=vertex_1434414363282_0002_17_03, diagnostics=[Task failed, taskId=task_1434414363282_0002_17_03_02, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: attempt_1434414363282_0002_17_03_02_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.RuntimeException: java.io.IOException: Please check if you are invoking moveToNext() even after it returned false. at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.lang.RuntimeException: java.io.IOException: Please check if you are invoking moveToNext() even after it returned false. at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:338) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: java.io.IOException: Please check if you are invoking moveToNext() even after it returned false. at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:412) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:380) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:449) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:389) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:651) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:314) ... 15 more Caused by: java.lang.RuntimeException: java.io.IOException: Please check if you are invoking moveToNext() even after it returned false. at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:404) ... 20 more Caused by: java.io.IOException: Please check if you are invoking moveToNext() even after it returned false. at org.apache.tez.runtime.library.common.ValuesIterator.hasCompletedProcessing(ValuesIterator.java:223) at
[jira] [Updated] (HIVE-11160) Collect column stats when set hive.stats.autogather=true
[ https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11160: --- Attachment: Design doc for auto column stats gathering.docx Collect column stats when set hive.stats.autogather=true Key: HIVE-11160 URL: https://issues.apache.org/jira/browse/HIVE-11160 Project: Hive Issue Type: New Feature Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: Design doc for auto column stats gathering.docx, HIVE-11160.01.patch Hive will collect table stats when set hive.stats.autogather=true during the INSERT OVERWRITE command. And then the users need to collect the column stats themselves using Analyze command. In this patch, the column stats will also be collected automatically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11186) Remove unused LlapUtils class from ql.io.orc
[ https://issues.apache.org/jira/browse/HIVE-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-11186: - Attachment: HIVE-11186.patch Remove unused LlapUtils class from ql.io.orc Key: HIVE-11186 URL: https://issues.apache.org/jira/browse/HIVE-11186 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-11186.patch LlapUtils class is unused. Remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11030) Enhance storage layer to create one delta file per write
[ https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11030: -- Attachment: HIVE-11030.6.patch Enhance storage layer to create one delta file per write Key: HIVE-11030 URL: https://issues.apache.org/jira/browse/HIVE-11030 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch, HIVE-11030.4.patch, HIVE-11030.5.patch, HIVE-11030.6.patch Currently each txn using ACID insert/update/delete will generate a delta directory like delta_100_101. In order to support multi-statement transactions we must generate one delta per operation within the transaction so the deltas would be named like delta_100_101_0001, etc. Support for MERGE (HIVE-10924) would need the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-11137: -- Assignee: Owen O'Malley (was: Nishant Kelkar) In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-11137.1.patch Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-4734) Use custom ObjectInspectors for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-4734: - Labels: Avro AvroSerde Performance (was: ) Use custom ObjectInspectors for AvroSerde - Key: HIVE-4734 URL: https://issues.apache.org/jira/browse/HIVE-4734 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mark Wagner Labels: Avro, AvroSerde, Performance Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch, HIVE-4734.4.patch, HIVE-4734.5.patch Currently, the AvroSerde recursively copies all fields of a record from the GenericRecord to a List row object and provides the standard ObjectInspectors. Performance can be improved by providing ObjectInspectors to the Avro record itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11151) Calcite transitive predicate inference rule should not transitively add not null filter on non-nullable input
[ https://issues.apache.org/jira/browse/HIVE-11151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11151: Fix Version/s: 1.2.2 Calcite transitive predicate inference rule should not transitively add not null filter on non-nullable input - Key: HIVE-11151 URL: https://issues.apache.org/jira/browse/HIVE-11151 Project: Hive Issue Type: Bug Components: CBO, Logical Optimizer Affects Versions: 1.2.0, 1.2.1 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11151.2.patch, HIVE-11151.3.patch, HIVE-11151.4.patch, HIVE-11151.patch Calcite rule will add predicates even if types don't match -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11054) Read error : Partition Varchar column cannot be cast to string
[ https://issues.apache.org/jira/browse/HIVE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11054: --- Description: Hi, I have one table with VARCHAR and CHAR datatypes.My target table has structure like this :-- {code} CREATE EXTERNAL TABLE test_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT '') PARTITIONED BY ( src_sys_cd varchar(10) COMMENT '',batch_id string COMMENT '') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/test/test_table'; My source table has structure like below :-- CREATE EXTERNAL TABLE test_staging_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT '' src_sys_cd varchar(10) COMMENT '', batch_id string COMMENT '') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/test/test_staging_table'; {code} We were loading data using pig script. Its a direct load, no transformation needed. But when i was checking test_table's data in hive. It is giving belowmentioned error: {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.io.IOException: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271) ... 11 more Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 15 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:566) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:90) ... 17 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec hive {code} Please do the needful. was: Hi, I have one table with VARCHAR and CHAR datatypes.My target table has structure like this :-- CREATE EXTERNAL TABLE test_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT
[jira] [Assigned] (HIVE-10535) LLAP: Cleanup map join cache when a query completes
[ https://issues.apache.org/jira/browse/HIVE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-10535: --- Assignee: Sergey Shelukhin LLAP: Cleanup map join cache when a query completes --- Key: HIVE-10535 URL: https://issues.apache.org/jira/browse/HIVE-10535 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Sergey Shelukhin Fix For: llap -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11054) Read error : Partition Varchar column cannot be cast to string
[ https://issues.apache.org/jira/browse/HIVE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11054: --- Component/s: (was: Database/Schema) Vectorization Read error : Partition Varchar column cannot be cast to string -- Key: HIVE-11054 URL: https://issues.apache.org/jira/browse/HIVE-11054 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0 Reporter: Devansh Srivastava Assignee: Gopal V Labels: Vectorization Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11054.1.patch Hi, I have one table with VARCHAR and CHAR datatypes.My target table has structure like this :-- {code} CREATE EXTERNAL TABLE test_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT '') PARTITIONED BY ( src_sys_cd varchar(10) COMMENT '',batch_id string COMMENT '') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/test/test_table'; My source table has structure like below :-- CREATE EXTERNAL TABLE test_staging_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT '' src_sys_cd varchar(10) COMMENT '', batch_id string COMMENT '') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/test/test_staging_table'; {code} We were loading data using pig script. Its a direct load, no transformation needed. But when i was checking test_table's data in hive. It is giving belowmentioned error: {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.io.IOException: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271) ... 11 more Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 15 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at
[jira] [Updated] (HIVE-11054) Read error : Partition Varchar column cannot be cast to string
[ https://issues.apache.org/jira/browse/HIVE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11054: --- Labels: Vectorization (was: ) Read error : Partition Varchar column cannot be cast to string -- Key: HIVE-11054 URL: https://issues.apache.org/jira/browse/HIVE-11054 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0 Reporter: Devansh Srivastava Assignee: Gopal V Labels: Vectorization Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11054.1.patch Hi, I have one table with VARCHAR and CHAR datatypes.My target table has structure like this :-- {code} CREATE EXTERNAL TABLE test_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT '') PARTITIONED BY ( src_sys_cd varchar(10) COMMENT '',batch_id string COMMENT '') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/test/test_table'; My source table has structure like below :-- CREATE EXTERNAL TABLE test_staging_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT '' src_sys_cd varchar(10) COMMENT '', batch_id string COMMENT '') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/test/test_staging_table'; {code} We were loading data using pig script. Its a direct load, no transformation needed. But when i was checking test_table's data in hive. It is giving belowmentioned error: {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.io.IOException: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271) ... 11 more Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 15 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:566) at
[jira] [Updated] (HIVE-11054) Read error : Partition Varchar column cannot be cast to string
[ https://issues.apache.org/jira/browse/HIVE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-11054: --- Affects Version/s: 1.2.0 Read error : Partition Varchar column cannot be cast to string -- Key: HIVE-11054 URL: https://issues.apache.org/jira/browse/HIVE-11054 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.14.0, 1.2.0 Reporter: Devansh Srivastava Assignee: Gopal V Labels: Vectorization Fix For: 1.3.0, 2.0.0 Attachments: HIVE-11054.1.patch Hi, I have one table with VARCHAR and CHAR datatypes.My target table has structure like this :-- {code} CREATE EXTERNAL TABLE test_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT '') PARTITIONED BY ( src_sys_cd varchar(10) COMMENT '',batch_id string COMMENT '') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/test/test_table'; My source table has structure like below :-- CREATE EXTERNAL TABLE test_staging_table( dob string COMMENT '', version_nbr int COMMENT '', record_status string COMMENT '', creation_timestamp timestamp COMMENT '' src_sys_cd varchar(10) COMMENT '', batch_id string COMMENT '') ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/test/test_staging_table'; {code} We were loading data using pig script. Its a direct load, no transformation needed. But when i was checking test_table's data in hive. It is giving belowmentioned error: {code} Diagnostic Messages for this Task: Error: java.io.IOException: java.io.IOException: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271) ... 11 more Caused by: java.lang.RuntimeException: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:95) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 15 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to java.lang.String at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:566) at
[jira] [Updated] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon
[ https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10937: Attachment: HIVE-10937.02.patch rebased patch LLAP: make ObjectCache for plans work properly in the daemon Key: HIVE-10937 URL: https://issues.apache.org/jira/browse/HIVE-10937 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10937.01.patch, HIVE-10937.02.patch, HIVE-10937.patch There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 4Mb each. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11186) Remove unused LlapUtils class from ql.io.orc
[ https://issues.apache.org/jira/browse/HIVE-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-11186. -- Resolution: Fixed Fix Version/s: llap Committed patch to llap branch. Remove unused LlapUtils class from ql.io.orc Key: HIVE-11186 URL: https://issues.apache.org/jira/browse/HIVE-11186 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Fix For: llap Attachments: HIVE-11186.patch LlapUtils class is unused. Remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils
[ https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-11137: -- Attachment: (was: HIVE-11137.1.patch) In DateWritable remove the use of LazyBinaryUtils - Key: HIVE-11137 URL: https://issues.apache.org/jira/browse/HIVE-11137 Project: Hive Issue Type: Sub-task Reporter: Owen O'Malley Assignee: Owen O'Malley Currently the DateWritable class uses LazyBinaryUtils, which has a lot of dependencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11110) Enable HiveJoinAddNotNullRule in CBO
[ https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615740#comment-14615740 ] Mostafa Mokhtar commented on HIVE-0: [~jpullokkaran] this is the full query {code} select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as store_sales_quantitycount ,avg(ss_quantity) as store_sales_quantityave ,stddev_samp(ss_quantity) as store_sales_quantitystdev ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov ,count(sr_return_quantity) as_store_returns_quantitycount ,avg(sr_return_quantity) as_store_returns_quantityave ,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as store_returns_quantitycov ,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as catalog_sales_quantityave ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitystdev ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov from store_sales ,store_returns ,catalog_sales ,date_dim d1 ,date_dim d2 ,date_dim d3 ,store ,item where d1.d_quarter_name = '2000Q1' and d1.d_date_sk = store_sales.ss_sold_date_sk and item.i_item_sk = store_sales.ss_item_sk and store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_customer_sk = store_returns.sr_customer_sk and store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number and store_returns.sr_returned_date_sk = d2.d_date_sk and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3') and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk and store_returns.sr_item_sk = catalog_sales.cs_item_sk and catalog_sales.cs_sold_date_sk = d3.d_date_sk and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3') group by i_item_id ,i_item_desc ,s_state order by i_item_id ,i_item_desc ,s_state limit 100; {code} Expected plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 10 - Map 11 (BROADCAST_EDGE) Map 3 - Map 7 (BROADCAST_EDGE) Map 8 - Map 10 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) Reducer 4 - Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE), Map 3 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE) Reducer 5 - Reducer 4 (SIMPLE_EDGE) Reducer 6 - Reducer 5 (SIMPLE_EDGE) DagName: jenkins_20150706174402_eceec100-6023-4058-85de-5cc96c9a150e:2 Vertices: Map 1 Map Operator Tree: TableScan alias: item filterExpr: i_item_sk is not null (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: i_item_sk is not null (type: boolean) Statistics: Num rows: 48000 Data size: 13824000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int), i_item_id (type: string), i_item_desc (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 48000 Data size: 13824000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 48000 Data size: 13824000 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 10 Map Operator Tree: TableScan alias: store_returns filterExpr: ((sr_customer_sk is not null and sr_item_sk is not null) and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((sr_customer_sk is not null and sr_item_sk is not null) and sr_ticket_number is not null) (type: boolean) Statistics: Num rows: 54568434 Data size: 1083441396 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: sr_item_sk (type: int), sr_customer_sk (type: int), sr_ticket_number (type: int), sr_return_quantity (type: int), sr_returned_date_sk (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4
[jira] [Updated] (HIVE-11188) Make ORCFile's String Dictionary more efficient
[ https://issues.apache.org/jira/browse/HIVE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-11188: -- Description: Currently, ORCFile's String Dictionary uses StringRedBlackTree for adding/finding/sorting duplicate strings. When there are a large number of unique strings (let's say over 16K) and a large number of rows (let's say 1M), the binary search will take O(1M * log(16K)) time which can be very long. Alternatively, ORCFile's String Dictionary can use HashMap for adding/finding duplicate strings, and use quicksort at the end to produce a sorted order. In the same case above, the total time spent will be O(1M + 16K * log(16K)) which is much faster. When the number of unique string is close to the number of rows (let's say, both around 1M), ORC will automatically disable the dictionary encoding. In the old approach will take O(1M * log(1M)), and our new approach will take O(1M) since we can skip the final quicksort if the dictionary encoding is disabled. So in either case, the new approach should be a win. Here is an PMP output based on ~600 traces (so 126 means 126/600 ~= 21% of total time). It's a query like INSERT OVERWRITE TABLE SELECT * FROM src using hive-1.1.0-cdh-5.4.1. 126 org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:67) 35 java.util.zip.Deflater.deflateBytes(Native Method) 26 org.apache.hadoop.hive.ql.io.orc.SerializationUtils.findClosestNumBits(SerializationUtils.java:218) 24 org.apache.hadoop.hive.serde2.lazy.LazyNonPrimitive.isNull(LazyNonPrimitive.java:63) 22 org.apache.hadoop.hive.serde2.lazy.LazyMap.parse(LazyMap.java:204) 22 org.apache.hadoop.hive.serde2.lazy.LazyLong.parseLong(LazyLong.java:116) 21 org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:111) 19 org.apache.hadoop.hive.serde2.lazy.LazyPrimitive.hashCode(LazyPrimitive.java:57) 18 org.apache.hadoop.hive.ql.io.orc.RedBlackTree.getRight(RedBlackTree.java:99) 16 org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1932) 15 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method) 15 org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:929) 12 org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1607) 12 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) 11 org.apache.hadoop.hive.ql.io.orc.RedBlackTree.getLeft(RedBlackTree.java:92) 11 org.apache.hadoop.hive.ql.io.orc.DynamicIntArray.add(DynamicIntArray.java:105) 10 org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) ... Make ORCFile's String Dictionary more efficient --- Key: HIVE-11188 URL: https://issues.apache.org/jira/browse/HIVE-11188 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 1.2.0, 1.1.0 Reporter: Zheng Shao Priority: Minor Currently, ORCFile's String Dictionary uses StringRedBlackTree for adding/finding/sorting duplicate strings. When there are a large number of unique strings (let's say over 16K) and a large number of rows (let's say 1M), the binary search will take O(1M * log(16K)) time which can be very long. Alternatively, ORCFile's String Dictionary can use HashMap for adding/finding duplicate strings, and use quicksort at the end to produce a sorted order. In the same case above, the total time spent will be O(1M + 16K * log(16K)) which is much faster. When the number of unique string is close to the number of rows (let's say, both around 1M), ORC will automatically disable the dictionary encoding. In the old approach will take O(1M * log(1M)), and our new approach will take O(1M) since we can skip the final quicksort if the dictionary encoding is disabled. So in either case, the new approach should be a win. Here is an PMP output based on ~600 traces (so 126 means 126/600 ~= 21% of total time). It's a query like INSERT OVERWRITE TABLE SELECT * FROM src using hive-1.1.0-cdh-5.4.1. 126 org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:67) 35 java.util.zip.Deflater.deflateBytes(Native Method) 26 org.apache.hadoop.hive.ql.io.orc.SerializationUtils.findClosestNumBits(SerializationUtils.java:218) 24 org.apache.hadoop.hive.serde2.lazy.LazyNonPrimitive.isNull(LazyNonPrimitive.java:63) 22 org.apache.hadoop.hive.serde2.lazy.LazyMap.parse(LazyMap.java:204) 22 org.apache.hadoop.hive.serde2.lazy.LazyLong.parseLong(LazyLong.java:116) 21
[jira] [Updated] (HIVE-11152) Swapping join inputs in ASTConverter
[ https://issues.apache.org/jira/browse/HIVE-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11152: Fix Version/s: 1.2.2 Swapping join inputs in ASTConverter Key: HIVE-11152 URL: https://issues.apache.org/jira/browse/HIVE-11152 Project: Hive Issue Type: Bug Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 2.0.0, 1.2.2 Attachments: HIVE-11152.02.patch, HIVE-11152.patch We want that multijoin optimization in SemanticAnalyzer always kicks in when CBO is enabled (if possible). For that, we may need to swap the join inputs when we return from CBO through the Hive AST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.1.patch) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: CosineSimilarity, SimilarityMetric, UDF algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.3.patch) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: CosineSimilarity, SimilarityMetric, UDF algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11188) Make ORCFile's String Dictionary more efficient
[ https://issues.apache.org/jira/browse/HIVE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-11188: -- Priority: Major (was: Minor) Make ORCFile's String Dictionary more efficient --- Key: HIVE-11188 URL: https://issues.apache.org/jira/browse/HIVE-11188 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 1.2.0, 1.1.0 Reporter: Zheng Shao Currently, ORCFile's String Dictionary uses StringRedBlackTree for adding/finding/sorting duplicate strings. When there are a large number of unique strings (let's say over 16K) and a large number of rows (let's say 1M), the binary search will take O(1M * log(16K)) time which can be very long. Alternatively, ORCFile's String Dictionary can use HashMap for adding/finding duplicate strings, and use quicksort at the end to produce a sorted order. In the same case above, the total time spent will be O(1M + 16K * log(16K)) which is much faster. When the number of unique string is close to the number of rows (let's say, both around 1M), ORC will automatically disable the dictionary encoding. In the old approach will take O(1M * log(1M)), and our new approach will take O(1M) since we can skip the final quicksort if the dictionary encoding is disabled. So in either case, the new approach should be a win. Here is an PMP output based on ~600 traces (so 126 means 126/600 ~= 21% of total time). It's a query like INSERT OVERWRITE TABLE SELECT * FROM src using hive-1.1.0-cdh-5.4.1. 126 org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:67) 35 java.util.zip.Deflater.deflateBytes(Native Method) 26 org.apache.hadoop.hive.ql.io.orc.SerializationUtils.findClosestNumBits(SerializationUtils.java:218) 24 org.apache.hadoop.hive.serde2.lazy.LazyNonPrimitive.isNull(LazyNonPrimitive.java:63) 22 org.apache.hadoop.hive.serde2.lazy.LazyMap.parse(LazyMap.java:204) 22 org.apache.hadoop.hive.serde2.lazy.LazyLong.parseLong(LazyLong.java:116) 21 org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:111) 19 org.apache.hadoop.hive.serde2.lazy.LazyPrimitive.hashCode(LazyPrimitive.java:57) 18 org.apache.hadoop.hive.ql.io.orc.RedBlackTree.getRight(RedBlackTree.java:99) 16 org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1932) 15 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method) 15 org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:929) 12 org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1607) 12 org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) 11 org.apache.hadoop.hive.ql.io.orc.RedBlackTree.getLeft(RedBlackTree.java:92) 11 org.apache.hadoop.hive.ql.io.orc.DynamicIntArray.add(DynamicIntArray.java:105) 10 org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo
[ https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Kelkar updated HIVE-9557: - Attachment: (was: HIVE-9557.2.patch) create UDF to measure strings similarity using Cosine Similarity algo - Key: HIVE-9557 URL: https://issues.apache.org/jira/browse/HIVE-9557 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Labels: CosineSimilarity, SimilarityMetric, UDF algo description http://en.wikipedia.org/wiki/Cosine_similarity {code} --one word different, total 2 words str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f {code} reference implementation: https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10927) Add number of HMS/HS2 connection metrics
[ https://issues.apache.org/jira/browse/HIVE-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10927: - Summary: Add number of HMS/HS2 connection metrics (was: Add number of HMS connection metrics) Add number of HMS/HS2 connection metrics Key: HIVE-10927 URL: https://issues.apache.org/jira/browse/HIVE-10927 Project: Hive Issue Type: Sub-task Components: Diagnosability Reporter: Szehon Ho Fix For: 1.3.0 Attachments: HIVE-10927.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10927) Add number of HMS connection metrics
[ https://issues.apache.org/jira/browse/HIVE-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-10927: - Attachment: HIVE-10927.patch Add number of HMS connection metrics Key: HIVE-10927 URL: https://issues.apache.org/jira/browse/HIVE-10927 Project: Hive Issue Type: Sub-task Components: Diagnosability Reporter: Szehon Ho Fix For: 1.3.0 Attachments: HIVE-10927.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11189) Add 'IGNORE NULLS' to FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/HIVE-11189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616107#comment-14616107 ] Prateek Rungta commented on HIVE-11189: --- Looks like the functions already support it: [1]. So I am able to do what I need by passing an extra parameter to the functions. i.e. the 'true' in the query below is to specify whether to skill_nulls or not. ``` SELECT id, LAST_VALUE(col, true) over (PARTITION BY id ORDER BY date) ``` Which means the easy fix is to update the specification for the functions: [2], along with the docs. I still think adding syntactic support IGNORE NULLS is a good idea, it'll help people already familiar with other systems avoid this issue. [1]: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLastValue.java#L74 [2]: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLastValue.java#L40-L41 Add 'IGNORE NULLS' to FIRST_VALUE/LAST_VALUE Key: HIVE-11189 URL: https://issues.apache.org/jira/browse/HIVE-11189 Project: Hive Issue Type: Improvement Components: PTF-Windowing Reporter: Prateek Rungta Other RDBMS support the specification of 'IGNORE NULLS' over a paritition to skip NULL values for Analytic Functions. Example - Oracle's docs: http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions057.htm Please consider adding this to Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11179) HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers
[ https://issues.apache.org/jira/browse/HIVE-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616113#comment-14616113 ] ASF GitHub Bot commented on HIVE-11179: --- GitHub user sundapeng opened a pull request: https://github.com/apache/hive/pull/44 HIVE-11179: HIVE should allow custom converting from HivePrivilegeObj… …ectDesc to privilegeObject for different authorizers You can merge this pull request into a Git repository by running: $ git pull https://github.com/sundapeng/hive master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/44.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #44 commit f82dc66be7cc876323567670b7000756394baf91 Author: Sun Dapeng s...@apache.org Date: 2015-07-07T02:02:48Z HIVE-11179: HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers - Key: HIVE-11179 URL: https://issues.apache.org/jira/browse/HIVE-11179 Project: Hive Issue Type: Improvement Reporter: Dapeng Sun Assignee: Dapeng Sun Labels: Authorization HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers: There is a case in Apache Sentry: Sentry support uri and server level privilege, but in hive side, it uses {{AuthorizationUtils.getHivePrivilegeObject(privSubjectDesc)}} to do the converting, and the code in {{getHivePrivilegeObject()}} only handle the scenes for table and database {noformat} privSubjectDesc.getTable() ? HivePrivilegeObjectType.TABLE_OR_VIEW : HivePrivilegeObjectType.DATABASE; {noformat} A solution is move this method to {{HiveAuthorizer}}, so that a custom Authorizer could enhance it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11179) HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers
[ https://issues.apache.org/jira/browse/HIVE-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616116#comment-14616116 ] Ferdinand Xu commented on HIVE-11179: - LGTM +1 pending to the tests HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers - Key: HIVE-11179 URL: https://issues.apache.org/jira/browse/HIVE-11179 Project: Hive Issue Type: Improvement Reporter: Dapeng Sun Assignee: Dapeng Sun Labels: Authorization Attachments: HIVE-11179.001.patch HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers: There is a case in Apache Sentry: Sentry support uri and server level privilege, but in hive side, it uses {{AuthorizationUtils.getHivePrivilegeObject(privSubjectDesc)}} to do the converting, and the code in {{getHivePrivilegeObject()}} only handle the scenes for table and database {noformat} privSubjectDesc.getTable() ? HivePrivilegeObjectType.TABLE_OR_VIEW : HivePrivilegeObjectType.DATABASE; {noformat} A solution is move this method to {{HiveAuthorizer}}, so that a custom Authorizer could enhance it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11182) Enable optimized hash tables for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-11182: -- Attachment: HIVE-11182.1-spark.patch The optimized table is not a {{MapJoinPersistableTableContainer}}. So in patch v1, we still dump the table as HashMapWrapper, but we can optionally load them back as optimized table. Enable optimized hash tables for spark [Spark Branch] - Key: HIVE-11182 URL: https://issues.apache.org/jira/browse/HIVE-11182 Project: Hive Issue Type: Improvement Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-11182.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11190) ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default
[ https://issues.apache.org/jira/browse/HIVE-11190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dapeng Sun updated HIVE-11190: -- Attachment: HIVE-11190.001.patch ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default Key: HIVE-11190 URL: https://issues.apache.org/jira/browse/HIVE-11190 Project: Hive Issue Type: Bug Reporter: Dapeng Sun Assignee: Dapeng Sun Attachments: HIVE-11190.001.patch ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default. it will cause user failed to customize the METASTORE_FILTER_HOOK -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11190) ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default
[ https://issues.apache.org/jira/browse/HIVE-11190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616228#comment-14616228 ] ASF GitHub Bot commented on HIVE-11190: --- GitHub user sundapeng opened a pull request: https://github.com/apache/hive/pull/45 HIVE-11190: ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default You can merge this pull request into a Git repository by running: $ git pull https://github.com/sundapeng/hive HIVE-11190 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/45.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #45 commit db87e59f6e1b213bfea9b6e84c056716c20210d5 Author: Sun Dapeng s...@apache.org Date: 2015-07-07T05:49:26Z HIVE-11190: ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default Key: HIVE-11190 URL: https://issues.apache.org/jira/browse/HIVE-11190 Project: Hive Issue Type: Bug Reporter: Dapeng Sun Assignee: Dapeng Sun Attachments: HIVE-11190.001.patch ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default. it will cause user failed to customize the METASTORE_FILTER_HOOK -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11190) ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default
[ https://issues.apache.org/jira/browse/HIVE-11190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616229#comment-14616229 ] Ferdinand Xu commented on HIVE-11190: - [~dapengsun], thanks for your patch. LGTM for the patch. [~thejas], do you have any further comments on this patch? ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default Key: HIVE-11190 URL: https://issues.apache.org/jira/browse/HIVE-11190 Project: Hive Issue Type: Bug Reporter: Dapeng Sun Assignee: Dapeng Sun Attachments: HIVE-11190.001.patch ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default. it will cause user failed to customize the METASTORE_FILTER_HOOK -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GaoLun updated HIVE-11053: -- Attachment: HIVE-11053.2-spark.patch Format corrected. Add more tests for HIVE-10844[Spark Branch] --- Key: HIVE-11053 URL: https://issues.apache.org/jira/browse/HIVE-11053 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: GaoLun Priority: Minor Attachments: HIVE-11053.1-spark.patch, HIVE-11053.2-spark.patch Add some test cases for self union, self-join, CWE, and repeated sub-queries to verify the job of combining quivalent works in HIVE-10844. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10281) Update people page for the new committers
[ https://issues.apache.org/jira/browse/HIVE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-10281: --- Assignee: Ferdinand Xu Update people page for the new committers - Key: HIVE-10281 URL: https://issues.apache.org/jira/browse/HIVE-10281 Project: Hive Issue Type: Task Components: Website Reporter: Chao Sun Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10281) Update people page for the new committers
[ https://issues.apache.org/jira/browse/HIVE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10281: Description: NO PRECOMMIT TESTS Add Jesus and Chinna as committer in the people page Update people page for the new committers - Key: HIVE-10281 URL: https://issues.apache.org/jira/browse/HIVE-10281 Project: Hive Issue Type: Task Components: Website Reporter: Chao Sun Assignee: Ferdinand Xu NO PRECOMMIT TESTS Add Jesus and Chinna as committer in the people page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10281) Update people page for the new committers
[ https://issues.apache.org/jira/browse/HIVE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-10281: Attachment: HIVE-10281.patch [~chinnalalam] [~jcamachorodriguez] Please help review it. Thank you! Update people page for the new committers - Key: HIVE-10281 URL: https://issues.apache.org/jira/browse/HIVE-10281 Project: Hive Issue Type: Task Components: Website Reporter: Chao Sun Assignee: Ferdinand Xu Attachments: HIVE-10281.patch NO PRECOMMIT TESTS Add Jesus and Chinna as committer in the people page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10281) Update people page for the new committers
[ https://issues.apache.org/jira/browse/HIVE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614628#comment-14614628 ] Chinna Rao Lalam commented on HIVE-10281: - Thanks [~Ferd] for the patch. My details are correct. Update people page for the new committers - Key: HIVE-10281 URL: https://issues.apache.org/jira/browse/HIVE-10281 Project: Hive Issue Type: Task Components: Website Reporter: Chao Sun Assignee: Ferdinand Xu Attachments: HIVE-10281.patch NO PRECOMMIT TESTS Add Jesus and Chinna as committer in the people page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11183) Enable optimized hash tables for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li resolved HIVE-11183. --- Resolution: Duplicate Enable optimized hash tables for spark [Spark Branch] - Key: HIVE-11183 URL: https://issues.apache.org/jira/browse/HIVE-11183 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11182) Enable optimized hash tables for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-11182: -- Component/s: Spark Enable optimized hash tables for spark [Spark Branch] - Key: HIVE-11182 URL: https://issues.apache.org/jira/browse/HIVE-11182 Project: Hive Issue Type: Improvement Components: Spark Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)