[jira] [Commented] (HIVE-3531) Simple lock manager for dedicated hive server
[ https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503654#comment-13503654 ] Phabricator commented on HIVE-3531: --- cwsteinbach has requested changes to the revision HIVE-3531 [jira] Simple lock manager for dedicated hive server. INLINE COMMENTS ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestEmbeddedLockManager.java:1 Please add an ASF license header. REVISION DETAIL https://reviews.facebook.net/D5871 BRANCH DPAL-1906 To: JIRA, cwsteinbach, navis Simple lock manager for dedicated hive server - Key: HIVE-3531 URL: https://issues.apache.org/jira/browse/HIVE-3531 Project: Hive Issue Type: Improvement Components: Locking, Server Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch In many cases, we uses hive server as a sole proxy for executing all the queries. For that, current default lock manager based on zookeeper seemed a little heavy. Simple in-memory lock manager could be enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3531) Simple lock manager for dedicated hive server
[ https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503655#comment-13503655 ] Carl Steinbach commented on HIVE-3531: -- @Navis: Can you please add an ASF license header to TestEmbeddedLockManager? Looks good otherwise. Simple lock manager for dedicated hive server - Key: HIVE-3531 URL: https://issues.apache.org/jira/browse/HIVE-3531 Project: Hive Issue Type: Improvement Components: Locking, Server Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch In many cases, we uses hive server as a sole proxy for executing all the queries. For that, current default lock manager based on zookeeper seemed a little heavy. Simple in-memory lock manager could be enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3743) DML Operations , load operation error.
Gaoshijie created HIVE-3743: --- Summary: DML Operations , load operation error. Key: HIVE-3743 URL: https://issues.apache.org/jira/browse/HIVE-3743 Project: Hive Issue Type: Bug Reporter: Gaoshijie A short period of time operate load command, happen this: java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3743) DML Operations , load operation error.
[ https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaoshijie updated HIVE-3743: Description: A short period of time operate load command, happen this: java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; was: A short period of time operate load command, happen this: java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) DML Operations , load operation error. -- Key: HIVE-3743 URL: https://issues.apache.org/jira/browse/HIVE-3743 Project: Hive Issue Type: Bug Reporter: Gaoshijie A short period of time operate load command, happen this: java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3743) DML Operations , load operation error.
[ https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaoshijie updated HIVE-3743: Description: A short period of time operate load command, happen this: the log from hadoop 'hadoop-user-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; was: A short period of time operate load command, happen this: the log from hadoop 'hadoop-uesr-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; DML Operations , load operation error. -- Key: HIVE-3743 URL: https://issues.apache.org/jira/browse/HIVE-3743 Project: Hive Issue Type: Bug Reporter: Gaoshijie A short period of time operate load command, happen this: the log from hadoop 'hadoop-user-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3743) DML Operations , load operation error.
[ https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaoshijie updated HIVE-3743: Description: A short period of time operate load command, happen this: the log from hadoop 'hadoop-*-datanode-*.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; was: A short period of time operate load command, happen this: java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; DML Operations , load operation error. -- Key: HIVE-3743 URL: https://issues.apache.org/jira/browse/HIVE-3743 Project: Hive Issue Type: Bug Reporter: Gaoshijie A short period of time operate load command, happen this: the log from hadoop 'hadoop-*-datanode-*.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3743) DML Operations , load operation error.
[ https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaoshijie updated HIVE-3743: Description: A short period of time operate load command, happen this: the log from hadoop 'hadoop-uesr-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; was: A short period of time operate load command, happen this: the log from hadoop 'hadoop-*-datanode-*.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; DML Operations , load operation error. -- Key: HIVE-3743 URL: https://issues.apache.org/jira/browse/HIVE-3743 Project: Hive Issue Type: Bug Reporter: Gaoshijie A short period of time operate load command, happen this: the log from hadoop 'hadoop-uesr-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3743) DML Operations , load operation error.
[ https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaoshijie updated HIVE-3743: Description: A short period of time operate load command, happen this: the log from hadoop 'hadoop-user-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; 8 minutes after per operate. the log from hadoop 'hadoop-user-datanode-host.log' ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(host:50010, storageID=DS-161359523-host-50010-1353919878809, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[ was: A short period of time operate load command, happen this: the log from hadoop 'hadoop-user-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; DML Operations , load operation error. -- Key: HIVE-3743 URL: https://issues.apache.org/jira/browse/HIVE-3743 Project: Hive Issue Type: Bug Reporter: Gaoshijie A short period of time operate load command, happen this: the log from hadoop 'hadoop-user-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; 8 minutes after per operate. the log from hadoop 'hadoop-user-datanode-host.log' ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(host:50010, storageID=DS-161359523-host-50010-1353919878809, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3743) DML Operations , load operation error.
[ https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaoshijie updated HIVE-3743: Environment: suse 11 64bit, hadoop-core-0.20.203.0, hive-0.9.0 Affects Version/s: 0.9.0 DML Operations , load operation error. -- Key: HIVE-3743 URL: https://issues.apache.org/jira/browse/HIVE-3743 Project: Hive Issue Type: Bug Affects Versions: 0.9.0 Environment: suse 11 64bit, hadoop-core-0.20.203.0, hive-0.9.0 Reporter: Gaoshijie A short period of time operate load command, happen this: the log from hadoop 'hadoop-user-datanode-host.log' java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 255 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92) at java.lang.Thread.run(Thread.java:619) load command like this: LOAD DATA INPATH '/tmp/1' INTO TABLE tb1; LOAD DATA INPATH '/tmp/2' INTO TABLE tb1; LOAD DATA INPATH '/tmp/255' INTO TABLE tb1; LOAD DATA INPATH '/tmp/256' INTO TABLE tb1; LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1; 8 minutes after per operate. the log from hadoop 'hadoop-user-datanode-host.log' ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(host:50010, storageID=DS-161359523-host-50010-1353919878809, infoPort=50075, ipcPort=50020):DataXceiver java.net.SocketTimeoutException: 48 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3531) Simple lock manager for dedicated hive server
[ https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503739#comment-13503739 ] Phabricator commented on HIVE-3531: --- navis has commented on the revision HIVE-3531 [jira] Simple lock manager for dedicated hive server. INLINE COMMENTS ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestEmbeddedLockManager.java:1 I remember I've added it, but missed again. Sorry. REVISION DETAIL https://reviews.facebook.net/D5871 BRANCH DPAL-1906 To: JIRA, cwsteinbach, navis Simple lock manager for dedicated hive server - Key: HIVE-3531 URL: https://issues.apache.org/jira/browse/HIVE-3531 Project: Hive Issue Type: Improvement Components: Locking, Server Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch In many cases, we uses hive server as a sole proxy for executing all the queries. For that, current default lock manager based on zookeeper seemed a little heavy. Simple in-memory lock manager could be enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3381) Result of outer join is not valid
[ https://issues.apache.org/jira/browse/HIVE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503772#comment-13503772 ] Phabricator commented on HIVE-3381: --- navis has commented on the revision HIVE-3381 [jira] Result of outer join is not valid. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java:352 'skipVectors' has same meaning with 'inputNulls' in original source, which makes output value for the index(alias) to be filled with null. But doing this I always confused by it's name that it means value for the index(alias) is null, which can be true or false either. I'll change it to 'inputNulls' if you prefer. When inputNulls for some index is true, it gets metadata for the index and gets length for it and fills null for that length. 'offsets' is just pre-calculated values of such offsets. ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java:363 The problem is I'm also still not sure that this patch is right. I'll add more comments. ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java:396 ok. REVISION DETAIL https://reviews.facebook.net/D5565 To: JIRA, navis Cc: njain Result of outer join is not valid - Key: HIVE-3381 URL: https://issues.apache.org/jira/browse/HIVE-3381 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Critical Attachments: HIVE-3381.D5565.3.patch Outer joins, especially full outer joins or outer join with filter on 'ON clause' is not showing proper results. For example, query in test join_1to1.q {code} SELECT * FROM join_1to1_1 a full outer join join_1to1_2 b on a.key1 = b.key1 and a.value = 66 and b.value = 66 ORDER BY a.key1 ASC, a.key2 ASC, a.value ASC, b.key1 ASC, b.key2 ASC, b.value ASC; {code} results {code} NULL NULLNULLNULLNULL66 NULL NULLNULLNULL10050 66 NULL NULLNULL10 10010 66 NULL NULLNULL30 10030 88 NULL NULLNULL35 10035 88 NULL NULLNULL40 10040 88 NULL NULLNULL40 10040 88 NULL NULLNULL50 10050 88 NULL NULLNULL50 10050 88 NULL NULLNULL50 10050 88 NULL NULLNULL70 10040 88 NULL NULLNULL70 10040 88 NULL NULLNULL70 10040 88 NULL NULLNULL70 10040 88 NULL NULL66 NULLNULLNULL NULL 10050 66 NULLNULLNULL 5 10005 66 5 10005 66 1510015 66 NULLNULLNULL 2010020 66 20 10020 66 2510025 88 NULLNULLNULL 3010030 66 NULLNULLNULL 3510035 88 NULLNULLNULL 4010040 66 NULLNULLNULL 4010040 66 40 10040 66 4010040 88 NULLNULLNULL 4010040 88 NULLNULLNULL 5010050 66 NULLNULLNULL 5010050 66 50 10050 66 5010050 66 50 10050 66 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 6010040 66 60 10040 66 6010040 66 60 10040 66 6010040 66 60 10040 66 6010040 66 60 10040 66 7010040 66 NULLNULLNULL 7010040 66 NULLNULLNULL 7010040 66 NULLNULLNULL 7010040 66 NULLNULLNULL 8010040 88 NULLNULLNULL 8010040 88 NULLNULLNULL 8010040 88 NULLNULLNULL 8010040 88 NULLNULLNULL {code} but it seemed not right. This should be {code} NULL NULLNULLNULLNULL66 NULL NULLNULLNULL10050 66 NULL NULLNULL10 10010 66 NULL NULLNULL25 10025 66 NULL NULLNULL30 10030 88 NULL NULLNULL35 10035 88 NULL NULLNULL40 10040 88 NULL NULLNULL50 10050 88 NULL NULLNULL70 10040 88 NULL NULLNULL70 10040 88 NULL NULLNULL80 10040 66 NULL NULLNULL80 10040 66 NULL NULL66 NULLNULLNULL NULL 10050 66 NULLNULLNULL 5 10005 66 5 10005 66 1510015 66 NULLNULLNULL 2010020 66 20 10020 66 2510025 88 NULLNULLNULL 3010030 66 NULLNULLNULL 3510035 88
[jira] [Updated] (HIVE-3531) Simple lock manager for dedicated hive server
[ https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3531: -- Attachment: HIVE-3531.D5871.3.patch navis updated the revision HIVE-3531 [jira] Simple lock manager for dedicated hive server. Reviewers: JIRA, cwsteinbach added ASF header. sorry again REVISION DETAIL https://reviews.facebook.net/D5871 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestEmbeddedLockManager.java To: JIRA, cwsteinbach, navis Simple lock manager for dedicated hive server - Key: HIVE-3531 URL: https://issues.apache.org/jira/browse/HIVE-3531 Project: Hive Issue Type: Improvement Components: Locking, Server Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch, HIVE-3531.D5871.3.patch In many cases, we uses hive server as a sole proxy for executing all the queries. For that, current default lock manager based on zookeeper seemed a little heavy. Simple in-memory lock manager could be enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3733) Improve Hive's logic for conditional merge
[ https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated HIVE-3733: - Attachment: HIVE-3733.1.patch.txt Attaching patch to fix the issue (used git diff --no-prefix ...). I tried using arc diff --jira HIVE-3733 Got : PHP Fatal error: Call to undefined method ArcanistGitAPI::amendGitHeadCommit() in /Users/pradeepk/opensource-hive/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 173 I saw some other references to this error in different JIRAs but no solution suggested - is there a fix for this issue? So I manually uploaded a diff (used git diff ..) to create the review - https://reviews.facebook.net/D6969 Improve Hive's logic for conditional merge -- Key: HIVE-3733 URL: https://issues.apache.org/jira/browse/HIVE-3733 Project: Hive Issue Type: Improvement Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: HIVE-3733.1.patch.txt If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is set to false then when hive encounters a FileSinkOperator when generating map reduce tasks, it will look at the entire job to see if it has a reducer, if it does it will not merge. Instead it should be check if the FileSinkOperator is a child of the reducer. This means that outputs generated in the mapper will be merged, and outputs generated in the reducer will not be, the intended effect of setting those configs. Simple repro: set hive.merge.mapfiles=true; set hive.merge.mapredfiles=false; EXPLAIN FROM input_table INSERT OVERWRITE TABLE output_table1 SELECT key, COUNT(*) group by key INSERT OVERWRITE TABLE output_table2 SELECT *; The output should contain a Conditional Operator, Mapred Stages, and Move tasks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3733) Improve Hive's logic for conditional merge
[ https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503952#comment-13503952 ] Pradeep Kamath commented on HIVE-3733: -- Also on my Mac, a few tests fails testCliDriver_escape1 testCliDriver_escape2 testCliDriver_join29 testCliDriver_join35 testCliDriver_lineage1 testCliDriver_load_dyn_part14 testCliDriver_union10 testCliDriver_union12 testCliDriver_union18 testCliDriver_union30 testCliDriver_union4 testCliDriver_union6 I spot checked a few of them (union4, union6) and they are due to differences in plan output - the new output seems to have more operators including the conditional operator - I will look more into it - any guidance to help me would be greatly appreciated. Improve Hive's logic for conditional merge -- Key: HIVE-3733 URL: https://issues.apache.org/jira/browse/HIVE-3733 Project: Hive Issue Type: Improvement Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: HIVE-3733.1.patch.txt If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is set to false then when hive encounters a FileSinkOperator when generating map reduce tasks, it will look at the entire job to see if it has a reducer, if it does it will not merge. Instead it should be check if the FileSinkOperator is a child of the reducer. This means that outputs generated in the mapper will be merged, and outputs generated in the reducer will not be, the intended effect of setting those configs. Simple repro: set hive.merge.mapfiles=true; set hive.merge.mapredfiles=false; EXPLAIN FROM input_table INSERT OVERWRITE TABLE output_table1 SELECT key, COUNT(*) group by key INSERT OVERWRITE TABLE output_table2 SELECT *; The output should contain a Conditional Operator, Mapred Stages, and Move tasks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3733) Improve Hive's logic for conditional merge
[ https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503960#comment-13503960 ] Pradeep Kamath commented on HIVE-3733: -- I noticed the following in HiveConf.java: {code} HIVEMERGEMAPFILES(hive.merge.mapfiles, true) {code} I suspect that with my new changes and the above setting, we are now merging files for map only tasks where we previously were not - I am very new to the hive code base - would request some committer to take a look to see if in fact the new behavior is the expected one given that the default for merge map files is true. Improve Hive's logic for conditional merge -- Key: HIVE-3733 URL: https://issues.apache.org/jira/browse/HIVE-3733 Project: Hive Issue Type: Improvement Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: HIVE-3733.1.patch.txt If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is set to false then when hive encounters a FileSinkOperator when generating map reduce tasks, it will look at the entire job to see if it has a reducer, if it does it will not merge. Instead it should be check if the FileSinkOperator is a child of the reducer. This means that outputs generated in the mapper will be merged, and outputs generated in the reducer will not be, the intended effect of setting those configs. Simple repro: set hive.merge.mapfiles=true; set hive.merge.mapredfiles=false; EXPLAIN FROM input_table INSERT OVERWRITE TABLE output_table1 SELECT key, COUNT(*) group by key INSERT OVERWRITE TABLE output_table2 SELECT *; The output should contain a Conditional Operator, Mapred Stages, and Move tasks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3744) Thrift create_table should check types of table columns
Bhushan Mandhani created HIVE-3744: -- Summary: Thrift create_table should check types of table columns Key: HIVE-3744 URL: https://issues.apache.org/jira/browse/HIVE-3744 Project: Hive Issue Type: Bug Components: Thrift API Reporter: Bhushan Mandhani Assignee: Bhushan Mandhani Priority: Minor The Thrift create_table() does not look at the datatype strings of Table objects coming in through Thrift. When someone fails to set one of them, we can end up with empty string for datatype and corrupt metadata. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3531) Simple lock manager for dedicated hive server
[ https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504044#comment-13504044 ] Phabricator commented on HIVE-3531: --- cwsteinbach has accepted the revision HIVE-3531 [jira] Simple lock manager for dedicated hive server. REVISION DETAIL https://reviews.facebook.net/D5871 BRANCH DPAL-1906 To: JIRA, cwsteinbach, navis Simple lock manager for dedicated hive server - Key: HIVE-3531 URL: https://issues.apache.org/jira/browse/HIVE-3531 Project: Hive Issue Type: Improvement Components: Locking, Server Infrastructure Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch, HIVE-3531.D5871.3.patch In many cases, we uses hive server as a sole proxy for executing all the queries. For that, current default lock manager based on zookeeper seemed a little heavy. Simple in-memory lock manager could be enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3742) The derby metastore schema script for 0.10.0 doesn't run
[ https://issues.apache.org/jira/browse/HIVE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504051#comment-13504051 ] Prasad Mujumdar commented on HIVE-3742: --- @Ashutosh, thanks for catching that. Looks like this column INTEGER_IDX should included in the primary key of SKEWED_STRING_LIST_VALUES. Rest of the 0.10 schema scripts, mysql, oracle and postgres look fine. The derby metastore schema script for 0.10.0 doesn't run Key: HIVE-3742 URL: https://issues.apache.org/jira/browse/HIVE-3742 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-3742-2.patch, HIVE-3742.patch The hive-schema-0.10.0.derby.sql contains incorrect alter statement for SKEWED_STRING_LIST which causes the script execution to fail -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3742) The derby metastore schema script for 0.10.0 doesn't run
[ https://issues.apache.org/jira/browse/HIVE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasad Mujumdar updated HIVE-3742: -- Attachment: HIVE-3742-2.patch Additional patch per review comment The derby metastore schema script for 0.10.0 doesn't run Key: HIVE-3742 URL: https://issues.apache.org/jira/browse/HIVE-3742 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-3742-2.patch, HIVE-3742.patch The hive-schema-0.10.0.derby.sql contains incorrect alter statement for SKEWED_STRING_LIST which causes the script execution to fail -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3705) Adding authorization capability to the metastore
[ https://issues.apache.org/jira/browse/HIVE-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504069#comment-13504069 ] Phabricator commented on HIVE-3705: --- ashutoshc has requested changes to the revision HIVE-3705 [jira] Adding authorization capability to the metastore. Please see inline comments. INLINE COMMENTS conf/hive-default.xml.template:1254 I think it can be better worded as ... The hive metastore authorization.. conf/hive-default.xml.template:1269 same as previous comment. conf/hive-default.xml.template:1269 This will read better as The authorization manager class name... conf/hive-default.xml.template:1284 Same as above. ql/src/java/org/apache/hadoop/hive/ql/security/HadoopDefaultMetastoreAuthenticator.java:27 It seems that you have created new interface HiveMetaStoreAuthenticationProvider and added method setHandler() just to setConf(). If this is the only reason, then there is no need of this new interface, since HiveAuthenticationProvider already extends Configurable and at instantiation time of this interface in HiveUtils, you have access to conf which you can set. So, this interface and this new implementation of it seems overkill. ql/src/java/org/apache/hadoop/hive/ql/security/HiveMetastoreAuthenticationProvider.java:25 Need to update this javadoc. ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:62 Is there any reason for creating new instance of HiveConf? You can just save passed-in config. ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:64 Ideally, this bool should not be checked in here. If pre-event listener is already set for this listener class, we can safely assume that he intends this listener to fire and checks to happen. Infact, I don't see point of this boolean in HiveConf at all. Since, these checks are meant to be invoked via listener interface, if user has to consciously put these class names in the config. Why then he has to also set that boolean to be true? More configs usually results in more confusion. ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:125 This will break chaining of exception stack. I think it will be better done as: InvalidOperationException ex = new InvalidOperationException(e.getMessage()); ex.initCause(e.getCause()); throw ex; Same comment applies for all other try-catch blocks. ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:112 For all other operations (including grant, revoke etc.) no checks are performed and are allowed straight through. I think you should add a note in javadoc of this listener class that it only performs checks for create/add/alter/drop of db/tbl/partitions. I am not sure if following deployment is supported: This listener configured with DefaultHiveAuthProvider which does auth based on privs stored in metastore? If it is, then it has same problem which that provider has. User can grant himself all privs, no checks are done for that and then drop tables/dbs. I understand you are not improving semantics in this patch, but merely shifting checks on metastore from client, but just wanted to make sure my understanding is correct. ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:232 This method should be private. ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:250 This method should be private. ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:256 Why is this needed ? In particular, this will set location of partition to table's location in null-case. Is that desirable? ql/src/java/org/apache/hadoop/hive/ql/security/authorization/DefaultHiveAuthorizationProvider.java:30 You can just cast (HiveConf)conf, instead of doing new HiveConf()? ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:49 This is also defined in HiveMetaStore. This should be statically defined in HiveConf and should be referenced from there, instead of private copy in each class. ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:165 I am not sure about this, but is this about creating index, in which case Write makes sense or is this about reading indexing, in which case read should suffice ? ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:168 Same as above. Locks could be a shared lock or exclusive lock, resulting in equivalent of read and write privs? ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:301 This method already exists in
[jira] [Created] (HIVE-3745) Hive does improper = based string comparisons with trailing whitespaces
Harsh J created HIVE-3745: - Summary: Hive does improper = based string comparisons with trailing whitespaces Key: HIVE-3745 URL: https://issues.apache.org/jira/browse/HIVE-3745 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.9.0 Reporter: Harsh J Compared to other systems such as DB2, MySQL, etc., which disregard trailing whitespaces in a string used when comparing two strings with the {{=}} relational operator, Hive does not do this. For example, note the following line from the MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/char.html {quote} All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces. {quote} Hive still is whitespace sensitive and regards trailing spaces of a string as worthy elements when comparing. Ideally {{LIKE}} should consider this strongly, but {{=}} should not. Is there a specific reason behind this difference of implementation in Hive's SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3745) Hive does improper = based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HIVE-3745: -- Summary: Hive does improper = based string comparisons for strings with trailing whitespaces (was: Hive does improper = based string comparisons with trailing whitespaces) Hive does improper = based string comparisons for strings with trailing whitespaces - Key: HIVE-3745 URL: https://issues.apache.org/jira/browse/HIVE-3745 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.9.0 Reporter: Harsh J Compared to other systems such as DB2, MySQL, etc., which disregard trailing whitespaces in a string used when comparing two strings with the {{=}} relational operator, Hive does not do this. For example, note the following line from the MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/char.html {quote} All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces. {quote} Hive still is whitespace sensitive and regards trailing spaces of a string as worthy elements when comparing. Ideally {{LIKE}} should consider this strongly, but {{=}} should not. Is there a specific reason behind this difference of implementation in Hive's SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3746) TRowSet resultset structure should be column-oriented
Carl Steinbach created HIVE-3746: Summary: TRowSet resultset structure should be column-oriented Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: Server Infrastructure Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504103#comment-13504103 ] Carl Steinbach commented on HIVE-3746: -- Currently HS2 uses the following Thrift structures to represent a resultset: {noformat} // Represents a rowset struct TRowSet { // The starting row offset of this rowset. 1: required i64 startRowOffset 2: required listTRow rows } // Represents a row in a rowset. struct TRow { 1: required listTColumnValue colVals } union TColumnValue { 1: TBoolValue boolVal // BOOLEAN 2: TByteValue byteVal // TINYINT 3: TI16Valuei16Val // SMALLINT 4: TI32Valuei32Val // INT 5: TI64Valuei64Val // BIGINT, TIMESTAMP 6: TDoubleValue doubleVal// FLOAT, DOUBLE 7: TStringValue stringVal// STRING, LIST, MAP, STRUCT, UNIONTYPE, BINARY } // A Boolean column value. struct TBoolValue { // NULL if value is unset. 1: optional bool value } ... struct TStringValue { 1: optional string value } {noformat} This problem with this approach is that Thrift unions are not very efficient, and we pay this cost on a per-field basis. Instead, we should make the result set structure column-oriented as follows: {noformat} // Represents a rowset struct TRowSet { // The starting row offset of this rowset. 1: required i64 startRowOffset 2: required listTColumn columns } union TColumn { 1: listTBoolValue boolColumn 2: listTByteValue byteColumn 3: listTI16Value i16Column 4: listTI32Value i32Column 5: listTI64Value i64Column 6: listTDoubleValue doubleColumn 7: listTStringValue stringColumn } {noformat} TRowSet resultset structure should be column-oriented - Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: Server Infrastructure Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504126#comment-13504126 ] Carl Steinbach commented on HIVE-3746: -- This would probably be more efficient: {noformat} // Represents a rowset struct TRowSet { // The starting row offset of this rowset. 1: required i64 startRowOffset 2: required listTColumn columns } struct TColumn { 1: required listi32 nullOffsets 2: required TColumnData columnData } union TColumnData { 1: listbool boolColumn 2: listbyte byteColumn 3: listi16 i16Column 4: listi32 i32Column 5: listi64 i64Column 6: listdouble doubleColumn 7: liststring stringColumn } {noformat} We may be able to make this even more compact by using a run-length encoding scheme for the nullOffset vector (and possibly the ColumnData list too). TRowSet resultset structure should be column-oriented - Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: Server Infrastructure Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3726) History file closed in finalize method
[ https://issues.apache.org/jira/browse/HIVE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504178#comment-13504178 ] Ashutosh Chauhan commented on HIVE-3726: Couple of comments: * HiveServer uses TmpOutputFile, which it deletes (now via session.close()), but never closes the input stream which it opens on that file in readResults() and setupSessionIO() methods. This will result in leaking of file-descriptors. HiveServer2 has copy-pasted this code, so this leak occurs there too. * I am not sure about changes in SessionState per task. With your patch, each task will close the {{SessionState}} when it finishes. A query may have multiple tasks, so this implies every task will create SessionState when it begin executing, which seems counter-intuitive, since there should be one sessionstate object across all tasks of the query which was the case earlier too, isnt it? History file closed in finalize method -- Key: HIVE-3726 URL: https://issues.apache.org/jira/browse/HIVE-3726 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-3726.2-r1411423.patch, HIVE-3736.1-r1411423.patch TestCliNegative fails intermittently because it's up to the garbage collector to close History files. This is only a problem if you deal with a lot of SessionState objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3726) History file closed in finalize method
[ https://issues.apache.org/jira/browse/HIVE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3726: --- Affects Version/s: 0.10.0 0.9.0 Status: Open (was: Patch Available) History file closed in finalize method -- Key: HIVE-3726 URL: https://issues.apache.org/jira/browse/HIVE-3726 Project: Hive Issue Type: Bug Affects Versions: 0.9.0, 0.10.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-3726.2-r1411423.patch, HIVE-3736.1-r1411423.patch TestCliNegative fails intermittently because it's up to the garbage collector to close History files. This is only a problem if you deal with a lot of SessionState objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Jenkins build is back to normal : Hive-0.9.1-SNAPSHOT-h0.21 #211
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/211/
[jira] [Updated] (HIVE-3734) Static partition DML create duplicate files and records
[ https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3734: --- Description: Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; -- check DML result desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 484; set hive.optimize.listbucketing=true; explain extended select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; === was: Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; -- list bucketing DML explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; -- check DML result desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 484; set hive.optimize.listbucketing=true; explain extended select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; === Static partition DML create duplicate files and records --- Key: HIVE-3734 URL: https://issues.apache.org/jira/browse/HIVE-3734 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Gang Tim Liu Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; -- check DML result desc formatted testtable partition (ds='2008-04-08', hr='11'); select
[jira] [Updated] (HIVE-3734) Static partition DML create duplicate files and records
[ https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3734: --- Description: Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; -- check DML result desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 484; explain extended select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; === was: Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; -- check DML result desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 484; set hive.optimize.listbucketing=true; explain extended select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; === Static partition DML create duplicate files and records --- Key: HIVE-3734 URL: https://issues.apache.org/jira/browse/HIVE-3734 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Gang Tim Liu Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; -- check DML result desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from
[jira] [Updated] (HIVE-3734) Static partition DML create duplicate files and records
[ https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Tim Liu updated HIVE-3734: --- Description: Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 484; explain extended select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; === was: Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; -- check DML result desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 484; explain extended select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; select key, value from testtable where ds='2008-04-08' and hr='11' and key = 484; === Static partition DML create duplicate files and records --- Key: HIVE-3734 URL: https://issues.apache.org/jira/browse/HIVE-3734 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Gang Tim Liu Static DML create duplicate files and record. Given the following test case, hive will return 2 records: 484 val_484 484 val_484 but srcpart returns one record: 484 val_484 If you look at file system, DML generates duplicate file with the same content: -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0 Test Case === set hive.mapred.supports.subdirectories=true; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.merge.mapfiles=false; set hive.merge.mapredfiles=false; set mapred.input.dir.recursive=true; create table testtable (key String, value String) partitioned by (ds String, hr String) ; explain extended insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08'; desc formatted testtable partition (ds='2008-04-08', hr='11'); select count(1) from srcpart where ds='2008-04-08'; select count(1) from testtable where ds='2008-04-08'; select key, value from srcpart where
[jira] [Created] (HIVE-3747) Provide hive operation name for hookContext
Sudhanshu Arora created HIVE-3747: - Summary: Provide hive operation name for hookContext Key: HIVE-3747 URL: https://issues.apache.org/jira/browse/HIVE-3747 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Sudhanshu Arora The hookContext exposed through ExecuteWithHookContext, does not provide the name of the Hive operation. The following public API should be added in HookContext. public String getOperationName() { return SessionState.get().getHiveOperation().name(); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3234) getting the reporter in the recordwriter
[ https://issues.apache.org/jira/browse/HIVE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3234: -- Attachment: HIVE-3234.D6987.1.patch omalley requested code review of HIVE-3234 [jira] getting the reporter in the recordwriter. Reviewers: JIRA HIVE-3736 : hive unit test case build failure. (Ashish Singh via Ashutosh Chauhan) We would like to generate some custom statistics and report back to map/reduce later wen implement the FileSinkOperator.RecordWriter interface. However, the current interface design doesn't allow us to get the map reduce reporter object. Please extend the current FileSinkOperator.RecordWriter interface so that it's close() method passes in a map reduce reporter object. For the same reason, please also extend the RecordReader interface too to include a reporter object so that users can passes in custom map reduce counters. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D6987 AFFECTED FILES ivy/ivysettings.xml ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13InputFormat.java ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java ql/src/test/queries/clientpositive/custom_input_output_format.q ql/src/test/results/clientpositive/custom_input_output_format.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/16461/ To: JIRA, omalley getting the reporter in the recordwriter Key: HIVE-3234 URL: https://issues.apache.org/jira/browse/HIVE-3234 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.9.1 Environment: any Reporter: Jimmy Hu Assignee: Owen O'Malley Labels: newbie Fix For: 0.9.1 Attachments: HIVE-3234.D6699.1.patch, HIVE-3234.D6699.2.patch, HIVE-3234.D6987.1.patch Original Estimate: 48h Remaining Estimate: 48h We would like to generate some custom statistics and report back to map/reduce later wen implement the FileSinkOperator.RecordWriter interface. However, the current interface design doesn't allow us to get the map reduce reporter object. Please extend the current FileSinkOperator.RecordWriter interface so that it's close() method passes in a map reduce reporter object. For the same reason, please also extend the RecordReader interface too to include a reporter object so that users can passes in custom map reduce counters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2983) Hive ant targets for publishing maven artifacts can be simplified
[ https://issues.apache.org/jira/browse/HIVE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504274#comment-13504274 ] Travis Crawford commented on HIVE-2983: --- Re: publishing ant-tasks jar Agreed, I don't think we need to publish. I'll update. Re: special-casing exec Exec is different because its not actually a subproject, its generated in the {{ql}} subproject directory. The fatjar issue has been discussed back forth a lot now. If there's interest, I'd very much like to freshen up the patch discussed in https://issues.apache.org/jira/browse/HIVE-2424?focusedCommentId=13262898page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13262898 Let me know what you'd like to do regarding the exec jar. Given the number of issues about it, there's lots of community interest in thin jars (in addition to hive-exec). Hive ant targets for publishing maven artifacts can be simplified - Key: HIVE-2983 URL: https://issues.apache.org/jira/browse/HIVE-2983 Project: Hive Issue Type: Improvement Reporter: Travis Crawford Assignee: Travis Crawford Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2983.D2961.1.patch Hive has a few ant tasks related to publishing maven artifacts. As not all sub projects publish artifacts the {{iterate}} macro that simplifies other tasks cannot be used in this context. Hive already uses the {{for}} task from ant-contrib, which works great here. {{build.xml}} can be simplified by using the for task when preparing maven artifacts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3723) Hive Driver leaks ZooKeeper connections
[ https://issues.apache.org/jira/browse/HIVE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504297#comment-13504297 ] Gunther Hagleitner commented on HIVE-3723: -- The releaseLocks call happens at various places in the run method. I think that's fine too, since calling destroy would kill the HiveLockManager which is shared between different context objects (i.e.: That should only happen when someone specifically destroys the driver.) As a side note, compile doesn't seem to acquire any locks - the call is thus just a protection against future versions that could. Hive Driver leaks ZooKeeper connections --- Key: HIVE-3723 URL: https://issues.apache.org/jira/browse/HIVE-3723 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-3723.1-r1411423.patch In certain error cases (i.e.: statement fails to compile, semantic errors) the hive driver leaks zookeeper connections. This can be seen in the TestNegativeCliDriver test which accumulates a large number of open file handles and fails if the max allowed number of file handles isn't at least 2048. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504337#comment-13504337 ] Shreepadma Venugopalan commented on HIVE-3678: -- @Ashutosh: I've uploaded a new patch which adds 2 varchar columns for storing BigDecimal low and high values. Thanks. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shreepadma Venugopalan updated HIVE-3678: - Attachment: HIVE-3678.3.patch.txt Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504338#comment-13504338 ] Shreepadma Venugopalan commented on HIVE-3678: -- Updated patch is available on both JIRA and RB. Thanks. Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries
[ https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3633: - Status: Open (was: Patch Available) There were some comments on phabricator by Kevin sort-merge join does not work with sub-queries -- Key: HIVE-3633 URL: https://issues.apache.org/jira/browse/HIVE-3633 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, hive.3633.4.patch, hive.3633.5.patch, hive.3633.6.patch Consider the following query: create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 6 BUCKETS STORED AS TEXTFILE; create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 6 BUCKETS STORED AS TEXTFILE; -- load the above tables set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; explain select count(*) from ( select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, b.value as value2 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key) subq; The above query does not use sort-merge join. This would be very useful as we automatically convert the queries to use sorting and bucketing properties for join. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3381) Result of outer join is not valid
[ https://issues.apache.org/jira/browse/HIVE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504355#comment-13504355 ] Phabricator commented on HIVE-3381: --- njain has commented on the revision HIVE-3381 [jira] Result of outer join is not valid. Navis, I am not saying that the old code is good. On the contrary, it is really difficult to follow. There is serious lack of comments in that. But, we have to improve that - I really appreciate that you are fixing this very serious bug, but it would be really useful if you can add lots of comments so that it becomes much easier to maintain/enhance in future. REVISION DETAIL https://reviews.facebook.net/D5565 To: JIRA, navis Cc: njain Result of outer join is not valid - Key: HIVE-3381 URL: https://issues.apache.org/jira/browse/HIVE-3381 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Critical Attachments: HIVE-3381.D5565.3.patch Outer joins, especially full outer joins or outer join with filter on 'ON clause' is not showing proper results. For example, query in test join_1to1.q {code} SELECT * FROM join_1to1_1 a full outer join join_1to1_2 b on a.key1 = b.key1 and a.value = 66 and b.value = 66 ORDER BY a.key1 ASC, a.key2 ASC, a.value ASC, b.key1 ASC, b.key2 ASC, b.value ASC; {code} results {code} NULL NULLNULLNULLNULL66 NULL NULLNULLNULL10050 66 NULL NULLNULL10 10010 66 NULL NULLNULL30 10030 88 NULL NULLNULL35 10035 88 NULL NULLNULL40 10040 88 NULL NULLNULL40 10040 88 NULL NULLNULL50 10050 88 NULL NULLNULL50 10050 88 NULL NULLNULL50 10050 88 NULL NULLNULL70 10040 88 NULL NULLNULL70 10040 88 NULL NULLNULL70 10040 88 NULL NULLNULL70 10040 88 NULL NULL66 NULLNULLNULL NULL 10050 66 NULLNULLNULL 5 10005 66 5 10005 66 1510015 66 NULLNULLNULL 2010020 66 20 10020 66 2510025 88 NULLNULLNULL 3010030 66 NULLNULLNULL 3510035 88 NULLNULLNULL 4010040 66 NULLNULLNULL 4010040 66 40 10040 66 4010040 88 NULLNULLNULL 4010040 88 NULLNULLNULL 5010050 66 NULLNULLNULL 5010050 66 50 10050 66 5010050 66 50 10050 66 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 6010040 66 60 10040 66 6010040 66 60 10040 66 6010040 66 60 10040 66 6010040 66 60 10040 66 7010040 66 NULLNULLNULL 7010040 66 NULLNULLNULL 7010040 66 NULLNULLNULL 7010040 66 NULLNULLNULL 8010040 88 NULLNULLNULL 8010040 88 NULLNULLNULL 8010040 88 NULLNULLNULL 8010040 88 NULLNULLNULL {code} but it seemed not right. This should be {code} NULL NULLNULLNULLNULL66 NULL NULLNULLNULL10050 66 NULL NULLNULL10 10010 66 NULL NULLNULL25 10025 66 NULL NULLNULL30 10030 88 NULL NULLNULL35 10035 88 NULL NULLNULL40 10040 88 NULL NULLNULL50 10050 88 NULL NULLNULL70 10040 88 NULL NULLNULL70 10040 88 NULL NULLNULL80 10040 66 NULL NULLNULL80 10040 66 NULL NULL66 NULLNULLNULL NULL 10050 66 NULLNULLNULL 5 10005 66 5 10005 66 1510015 66 NULLNULLNULL 2010020 66 20 10020 66 2510025 88 NULLNULLNULL 3010030 66 NULLNULLNULL 3510035 88 NULLNULLNULL 4010040 66 40 10040 66 4010040 88 NULLNULLNULL 5010050 66 50 10050 66 5010050 66 50 10050 66 5010050 88 NULLNULLNULL 5010050 88 NULLNULLNULL 6010040 66 60 10040 66 6010040 66 60 10040 66 6010040 66 60 10040 66 6010040 66 60 10040 66 7010040
[jira] [Commented] (HIVE-3733) Improve Hive's logic for conditional merge
[ https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504362#comment-13504362 ] Namit Jain commented on HIVE-3733: -- Comments on phabricator. testCliDriver_escape1 testCliDriver_escape2 always fail on mac. For plan output changes, can you check the test and verify if the new plan is correct ? If you think the new plan is correct, modify the output file. Improve Hive's logic for conditional merge -- Key: HIVE-3733 URL: https://issues.apache.org/jira/browse/HIVE-3733 Project: Hive Issue Type: Improvement Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: HIVE-3733.1.patch.txt If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is set to false then when hive encounters a FileSinkOperator when generating map reduce tasks, it will look at the entire job to see if it has a reducer, if it does it will not merge. Instead it should be check if the FileSinkOperator is a child of the reducer. This means that outputs generated in the mapper will be merged, and outputs generated in the reducer will not be, the intended effect of setting those configs. Simple repro: set hive.merge.mapfiles=true; set hive.merge.mapredfiles=false; EXPLAIN FROM input_table INSERT OVERWRITE TABLE output_table1 SELECT key, COUNT(*) group by key INSERT OVERWRITE TABLE output_table2 SELECT *; The output should contain a Conditional Operator, Mapred Stages, and Move tasks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time
[ https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504365#comment-13504365 ] Namit Jain commented on HIVE-3718: -- 30005 is already being used. Can you use a new error code ? Add check to determine whether partition can be dropped at Semantic Analysis time - Key: HIVE-3718 URL: https://issues.apache.org/jira/browse/HIVE-3718 Project: Hive Issue Type: Task Components: CLI Reporter: Pamela Vagata Assignee: Pamela Vagata Priority: Minor Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3720) Expand and standardize authorization in Hive
[ https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504375#comment-13504375 ] Namit Jain commented on HIVE-3720: -- [~shreepadma], how is it different from the current hive authorization model ? Is the proposed functionality a superset of the existing one ? If yes, can you mark in the wiki, what has been already implemented ? Expand and standardize authorization in Hive Key: HIVE-3720 URL: https://issues.apache.org/jira/browse/HIVE-3720 Project: Hive Issue Type: Improvement Components: Authorization Affects Versions: 0.9.0 Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Attachments: Hive_Authorization_Functionality.pdf The existing implementation of authorization in Hive is not complete. Additionally the existing implementation has security holes. This JIRA is an umbrella JIRA for a) extending authorization to all SQL operations and direct metadata operations, and b) standardizing the authorization model and its semantics to mirror that of MySQL as closely as possible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time
[ https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pamela Vagata updated HIVE-3718: Attachment: HIVE-3718.3.patch.txt Got it, I've updated to use 30011 Add check to determine whether partition can be dropped at Semantic Analysis time - Key: HIVE-3718 URL: https://issues.apache.org/jira/browse/HIVE-3718 Project: Hive Issue Type: Task Components: CLI Reporter: Pamela Vagata Assignee: Pamela Vagata Priority: Minor Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, HIVE-3718.3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3234) getting the reporter in the recordwriter
[ https://issues.apache.org/jira/browse/HIVE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504406#comment-13504406 ] Ashutosh Chauhan commented on HIVE-3234: Looks good. It seems that patch contains another patch. Can you get rid of ivysettings.xml? +1 I am running tests now, will commit if tests pass. getting the reporter in the recordwriter Key: HIVE-3234 URL: https://issues.apache.org/jira/browse/HIVE-3234 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.9.1 Environment: any Reporter: Jimmy Hu Assignee: Owen O'Malley Labels: newbie Fix For: 0.9.1 Attachments: HIVE-3234.D6699.1.patch, HIVE-3234.D6699.2.patch, HIVE-3234.D6987.1.patch Original Estimate: 48h Remaining Estimate: 48h We would like to generate some custom statistics and report back to map/reduce later wen implement the FileSinkOperator.RecordWriter interface. However, the current interface design doesn't allow us to get the map reduce reporter object. Please extend the current FileSinkOperator.RecordWriter interface so that it's close() method passes in a map reduce reporter object. For the same reason, please also extend the RecordReader interface too to include a reporter object so that users can passes in custom map reduce counters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3723) Hive Driver leaks ZooKeeper connections
[ https://issues.apache.org/jira/browse/HIVE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504408#comment-13504408 ] Ashutosh Chauhan commented on HIVE-3723: Hmm.. make sense. +1 will commit if tests pass. Hive Driver leaks ZooKeeper connections --- Key: HIVE-3723 URL: https://issues.apache.org/jira/browse/HIVE-3723 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-3723.1-r1411423.patch In certain error cases (i.e.: statement fails to compile, semantic errors) the hive driver leaks zookeeper connections. This can be seen in the TestNegativeCliDriver test which accumulates a large number of open file handles and fails if the max allowed number of file handles isn't at least 2048. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3742) The derby metastore schema script for 0.10.0 doesn't run
[ https://issues.apache.org/jira/browse/HIVE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504411#comment-13504411 ] Ashutosh Chauhan commented on HIVE-3742: +1 The derby metastore schema script for 0.10.0 doesn't run Key: HIVE-3742 URL: https://issues.apache.org/jira/browse/HIVE-3742 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-3742-2.patch, HIVE-3742.patch The hive-schema-0.10.0.derby.sql contains incorrect alter statement for SKEWED_STRING_LIST which causes the script execution to fail -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries
[ https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3633: - Attachment: hive.3633.7.patch sort-merge join does not work with sub-queries -- Key: HIVE-3633 URL: https://issues.apache.org/jira/browse/HIVE-3633 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, hive.3633.4.patch, hive.3633.5.patch, hive.3633.6.patch, hive.3633.7.patch Consider the following query: create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 6 BUCKETS STORED AS TEXTFILE; create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 6 BUCKETS STORED AS TEXTFILE; -- load the above tables set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true; set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; explain select count(*) from ( select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, b.value as value2 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key) subq; The above query does not use sort-merge join. This would be very useful as we automatically convert the queries to use sorting and bucketing properties for join. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes
[ https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504416#comment-13504416 ] Ashutosh Chauhan commented on HIVE-3678: +1 patch looks good, though fails to apply cleanly. Can you update the patch on latest trunk? Add metastore upgrade scripts for column stats schema changes - Key: HIVE-3678 URL: https://issues.apache.org/jira/browse/HIVE-3678 Project: Hive Issue Type: Bug Components: Metastore Reporter: Shreepadma Venugopalan Assignee: Shreepadma Venugopalan Fix For: 0.10.0 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, HIVE-3678.3.patch.txt Add upgrade script for column statistics schema changes for Postgres/MySQL/Oracle/Derby -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3645) RCFileWriter does not implement the right function to support Federation
[ https://issues.apache.org/jira/browse/HIVE-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504419#comment-13504419 ] Ashutosh Chauhan commented on HIVE-3645: +1 will commit if tests pass. RCFileWriter does not implement the right function to support Federation Key: HIVE-3645 URL: https://issues.apache.org/jira/browse/HIVE-3645 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0, 0.10.0 Environment: Hadoop 0.23.3 federation, Hive 0.9 and Pig 0.10 Reporter: Viraj Bhat Attachments: HIVE_3645_branch_0.patch, HIVE_3645_trunk_0.patch Create a table using Hive DDL {code} CREATE TABLE tmp_hcat_federated_numbers_part_1 ( id int, intnum int, floatnum float )partitioned by ( part1string, part2string ) STORED AS rcfile LOCATION 'viewfs:///database/tmp_hcat_federated_numbers_part_1'; {code} Populate it using Pig: {code} A = load 'default.numbers_pig' using org.apache.hcatalog.pig.HCatLoader(); B = filter A by id = 500; C = foreach B generate (int)id, (int)intnum, (float)floatnum; store C into 'default.tmp_hcat_federated_numbers_part_1' using org.apache.hcatalog.pig.HCatStorer ('part1=pig, part2=hcat_pig_insert', 'id: int,intnum: int,floatnum: float'); {code} Generates the following error when running on a Federated Cluster: {quote} 2012-10-29 20:40:25,011 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1348522594824_0846_m_00_3 Info:Error: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:479) at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:723) at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:705) at org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86) at org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:100) at org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:228) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.init(MapTask.java:587) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:706) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs
[ https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504423#comment-13504423 ] Ashutosh Chauhan commented on HIVE-3648: +1 will commit if tests pass. HiveMetaStoreFsImpl is not compatible with hadoop viewfs Key: HIVE-3648 URL: https://issues.apache.org/jira/browse/HIVE-3648 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0, 0.10.0 Reporter: Kihwal Lee Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, HIVE_3648_trunk_1.patch HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() instead. Please note that this method is not available in hadoop versions earlier than 0.23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira