date:20121126


[ 
https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503654#comment-13503654
 ] 

Phabricator commented on HIVE-3531:
---

cwsteinbach has requested changes to the revision HIVE-3531 [jira] Simple lock 
manager for dedicated hive server.

INLINE COMMENTS
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestEmbeddedLockManager.java:1 
Please add an ASF license header.

REVISION DETAIL
  https://reviews.facebook.net/D5871

BRANCH
  DPAL-1906

To: JIRA, cwsteinbach, navis


 Simple lock manager for dedicated hive server
 -

 Key: HIVE-3531
 URL: https://issues.apache.org/jira/browse/HIVE-3531
 Project: Hive
  Issue Type: Improvement
  Components: Locking, Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch


 In many cases, we uses hive server as a sole proxy for executing all the 
 queries. For that, current default lock manager based on zookeeper seemed a 
 little heavy. Simple in-memory lock manager could be enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3531) Simple lock manager for dedicated hive server


[ 
https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503655#comment-13503655
 ] 

Carl Steinbach commented on HIVE-3531:
--

@Navis: Can you please add an ASF license header to TestEmbeddedLockManager? 
Looks good otherwise.

 Simple lock manager for dedicated hive server
 -

 Key: HIVE-3531
 URL: https://issues.apache.org/jira/browse/HIVE-3531
 Project: Hive
  Issue Type: Improvement
  Components: Locking, Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch


 In many cases, we uses hive server as a sole proxy for executing all the 
 queries. For that, current default lock manager based on zookeeper seemed a 
 little heavy. Simple in-memory lock manager could be enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3743) DML Operations , load operation error.

Gaoshijie created HIVE-3743:
---

 Summary: DML Operations , load operation error.
 Key: HIVE-3743
 URL: https://issues.apache.org/jira/browse/HIVE-3743
 Project: Hive
  Issue Type: Bug
Reporter: Gaoshijie


A short period of time operate load command, happen this:

java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3743) DML Operations , load operation error.


 [ 
https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaoshijie updated HIVE-3743:


Description: 
A short period of time operate load command, happen this:

java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;


  was:
A short period of time operate load command, happen this:

java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)



 DML Operations , load operation error.
 --

 Key: HIVE-3743
 URL: https://issues.apache.org/jira/browse/HIVE-3743
 Project: Hive
  Issue Type: Bug
Reporter: Gaoshijie

 A short period of time operate load command, happen this:
 java.io.IOException: xceiverCount 256 exceeds the limit of concurrent 
 xcievers 255
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
 at java.lang.Thread.run(Thread.java:619)
 load command like this:
 LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3743) DML Operations , load operation error.


 [ 
https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaoshijie updated HIVE-3743:


Description: 
A short period of time operate load command, happen this:

the log from hadoop 'hadoop-user-datanode-host.log'
java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;


  was:
A short period of time operate load command, happen this:

the log from hadoop 'hadoop-uesr-datanode-host.log'
java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;



 DML Operations , load operation error.
 --

 Key: HIVE-3743
 URL: https://issues.apache.org/jira/browse/HIVE-3743
 Project: Hive
  Issue Type: Bug
Reporter: Gaoshijie

 A short period of time operate load command, happen this:
 the log from hadoop 'hadoop-user-datanode-host.log'
 java.io.IOException: xceiverCount 256 exceeds the limit of concurrent 
 xcievers 255
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
 at java.lang.Thread.run(Thread.java:619)
 load command like this:
 LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3743) DML Operations , load operation error.


 [ 
https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaoshijie updated HIVE-3743:


Description: 
A short period of time operate load command, happen this:

the log from hadoop 'hadoop-*-datanode-*.log'
java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;


  was:
A short period of time operate load command, happen this:

java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;



 DML Operations , load operation error.
 --

 Key: HIVE-3743
 URL: https://issues.apache.org/jira/browse/HIVE-3743
 Project: Hive
  Issue Type: Bug
Reporter: Gaoshijie

 A short period of time operate load command, happen this:
 the log from hadoop 'hadoop-*-datanode-*.log'
 java.io.IOException: xceiverCount 256 exceeds the limit of concurrent 
 xcievers 255
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
 at java.lang.Thread.run(Thread.java:619)
 load command like this:
 LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3743) DML Operations , load operation error.


 [ 
https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaoshijie updated HIVE-3743:


Description: 
A short period of time operate load command, happen this:

the log from hadoop 'hadoop-uesr-datanode-host.log'
java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;


  was:
A short period of time operate load command, happen this:

the log from hadoop 'hadoop-*-datanode-*.log'
java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;



 DML Operations , load operation error.
 --

 Key: HIVE-3743
 URL: https://issues.apache.org/jira/browse/HIVE-3743
 Project: Hive
  Issue Type: Bug
Reporter: Gaoshijie

 A short period of time operate load command, happen this:
 the log from hadoop 'hadoop-uesr-datanode-host.log'
 java.io.IOException: xceiverCount 256 exceeds the limit of concurrent 
 xcievers 255
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
 at java.lang.Thread.run(Thread.java:619)
 load command like this:
 LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3743) DML Operations , load operation error.


 [ 
https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaoshijie updated HIVE-3743:


Description: 
A short period of time operate load command, happen this:

the log from hadoop 'hadoop-user-datanode-host.log'
java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;

8 minutes after per operate.
the log from hadoop 'hadoop-user-datanode-host.log'
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(host:50010, 
storageID=DS-161359523-host-50010-1353919878809, infoPort=50075, 
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[

  was:
A short period of time operate load command, happen this:

the log from hadoop 'hadoop-user-datanode-host.log'
java.io.IOException: xceiverCount 256 exceeds the limit of concurrent xcievers 
255
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
at java.lang.Thread.run(Thread.java:619)

load command like this:
LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;

LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;



 DML Operations , load operation error.
 --

 Key: HIVE-3743
 URL: https://issues.apache.org/jira/browse/HIVE-3743
 Project: Hive
  Issue Type: Bug
Reporter: Gaoshijie

 A short period of time operate load command, happen this:
 the log from hadoop 'hadoop-user-datanode-host.log'
 java.io.IOException: xceiverCount 256 exceeds the limit of concurrent 
 xcievers 255
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
 at java.lang.Thread.run(Thread.java:619)
 load command like this:
 LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;
 8 minutes after per operate.
 the log from hadoop 'hadoop-user-datanode-host.log'
 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(host:50010, 
 storageID=DS-161359523-host-50010-1353919878809, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : java.nio.channels.SocketChannel[

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3743) DML Operations , load operation error.


 [ 
https://issues.apache.org/jira/browse/HIVE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaoshijie updated HIVE-3743:


  Environment: 
suse 11 64bit,
hadoop-core-0.20.203.0,
hive-0.9.0
Affects Version/s: 0.9.0

 DML Operations , load operation error.
 --

 Key: HIVE-3743
 URL: https://issues.apache.org/jira/browse/HIVE-3743
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
 Environment: suse 11 64bit,
 hadoop-core-0.20.203.0,
 hive-0.9.0
Reporter: Gaoshijie

 A short period of time operate load command, happen this:
 the log from hadoop 'hadoop-user-datanode-host.log'
 java.io.IOException: xceiverCount 256 exceeds the limit of concurrent 
 xcievers 255
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:92)
 at java.lang.Thread.run(Thread.java:619)
 load command like this:
 LOAD DATA INPATH '/tmp/1' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/2' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/255' INTO TABLE tb1;
 LOAD DATA INPATH '/tmp/256' INTO TABLE tb1;
 
 LOAD DATA INPATH '/tmp/1000' INTO TABLE tb1;
 8 minutes after per operate.
 the log from hadoop 'hadoop-user-datanode-host.log'
 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(host:50010, 
 storageID=DS-161359523-host-50010-1353919878809, infoPort=50075, 
 ipcPort=50020):DataXceiver
 java.net.SocketTimeoutException: 48 millis timeout while waiting for 
 channel to be ready for write. ch : java.nio.channels.SocketChannel[

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3531) Simple lock manager for dedicated hive server


[ 
https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503739#comment-13503739
 ] 

Phabricator commented on HIVE-3531:
---

navis has commented on the revision HIVE-3531 [jira] Simple lock manager for 
dedicated hive server.

INLINE COMMENTS
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestEmbeddedLockManager.java:1 
I remember I've added it, but missed again. Sorry.

REVISION DETAIL
  https://reviews.facebook.net/D5871

BRANCH
  DPAL-1906

To: JIRA, cwsteinbach, navis


 Simple lock manager for dedicated hive server
 -

 Key: HIVE-3531
 URL: https://issues.apache.org/jira/browse/HIVE-3531
 Project: Hive
  Issue Type: Improvement
  Components: Locking, Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch


 In many cases, we uses hive server as a sole proxy for executing all the 
 queries. For that, current default lock manager based on zookeeper seemed a 
 little heavy. Simple in-memory lock manager could be enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3381) Result of outer join is not valid


[ 
https://issues.apache.org/jira/browse/HIVE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503772#comment-13503772
 ] 

Phabricator commented on HIVE-3381:
---

navis has commented on the revision HIVE-3381 [jira] Result of outer join is 
not valid.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java:352 
'skipVectors' has same meaning with 'inputNulls' in original source, which 
makes output value for the index(alias) to be filled with null. But doing this 
I always confused by it's name that it means value for the index(alias) is 
null, which can be true or false either. I'll change it to 'inputNulls' if you 
prefer.
  When inputNulls for some index is true, it gets metadata for the index and 
gets length for it and fills null for that length. 'offsets' is just 
pre-calculated values of such offsets.
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java:363 The 
problem is I'm also still not sure that this patch is right.
  I'll add more comments.
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java:396 ok.

REVISION DETAIL
  https://reviews.facebook.net/D5565

To: JIRA, navis
Cc: njain


 Result of outer join is not valid
 -

 Key: HIVE-3381
 URL: https://issues.apache.org/jira/browse/HIVE-3381
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: HIVE-3381.D5565.3.patch


 Outer joins, especially full outer joins or outer join with filter on 'ON 
 clause' is not showing proper results. For example, query in test join_1to1.q
 {code}
 SELECT * FROM join_1to1_1 a full outer join join_1to1_2 b on a.key1 = b.key1 
 and a.value = 66 and b.value = 66 ORDER BY a.key1 ASC, a.key2 ASC, a.value 
 ASC, b.key1 ASC, b.key2 ASC, b.value ASC;
 {code}
 results
 {code}
 NULL  NULLNULLNULLNULL66
 NULL  NULLNULLNULL10050   66
 NULL  NULLNULL10  10010   66
 NULL  NULLNULL30  10030   88
 NULL  NULLNULL35  10035   88
 NULL  NULLNULL40  10040   88
 NULL  NULLNULL40  10040   88
 NULL  NULLNULL50  10050   88
 NULL  NULLNULL50  10050   88
 NULL  NULLNULL50  10050   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL70  10040   88
 NULL  NULL66  NULLNULLNULL
 NULL  10050   66  NULLNULLNULL
 5 10005   66  5   10005   66
 1510015   66  NULLNULLNULL
 2010020   66  20  10020   66
 2510025   88  NULLNULLNULL
 3010030   66  NULLNULLNULL
 3510035   88  NULLNULLNULL
 4010040   66  NULLNULLNULL
 4010040   66  40  10040   66
 4010040   88  NULLNULLNULL
 4010040   88  NULLNULLNULL
 5010050   66  NULLNULLNULL
 5010050   66  50  10050   66
 5010050   66  50  10050   66
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 7010040   66  NULLNULLNULL
 7010040   66  NULLNULLNULL
 7010040   66  NULLNULLNULL
 7010040   66  NULLNULLNULL
 8010040   88  NULLNULLNULL
 8010040   88  NULLNULLNULL
 8010040   88  NULLNULLNULL
 8010040   88  NULLNULLNULL
 {code} 
 but it seemed not right. This should be 
 {code}
 NULL  NULLNULLNULLNULL66
 NULL  NULLNULLNULL10050   66
 NULL  NULLNULL10  10010   66
 NULL  NULLNULL25  10025   66
 NULL  NULLNULL30  10030   88
 NULL  NULLNULL35  10035   88
 NULL  NULLNULL40  10040   88
 NULL  NULLNULL50  10050   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL80  10040   66
 NULL  NULLNULL80  10040   66
 NULL  NULL66  NULLNULLNULL
 NULL  10050   66  NULLNULLNULL
 5 10005   66  5   10005   66
 1510015   66  NULLNULLNULL
 2010020   66  20  10020   66
 2510025   88  NULLNULLNULL
 3010030   66  NULLNULLNULL
 3510035   88

[jira] [Updated] (HIVE-3531) Simple lock manager for dedicated hive server


 [ 
https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3531:
--

Attachment: HIVE-3531.D5871.3.patch

navis updated the revision HIVE-3531 [jira] Simple lock manager for dedicated 
hive server.
Reviewers: JIRA, cwsteinbach

  added ASF header. sorry again


REVISION DETAIL
  https://reviews.facebook.net/D5871

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestEmbeddedLockManager.java

To: JIRA, cwsteinbach, navis


 Simple lock manager for dedicated hive server
 -

 Key: HIVE-3531
 URL: https://issues.apache.org/jira/browse/HIVE-3531
 Project: Hive
  Issue Type: Improvement
  Components: Locking, Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch, 
 HIVE-3531.D5871.3.patch


 In many cases, we uses hive server as a sole proxy for executing all the 
 queries. For that, current default lock manager based on zookeeper seemed a 
 little heavy. Simple in-memory lock manager could be enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3733) Improve Hive's logic for conditional merge

2012-11-26 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated HIVE-3733:
-

Attachment: HIVE-3733.1.patch.txt

Attaching patch to fix the issue (used git diff --no-prefix ...). I tried using 
arc diff --jira HIVE-3733
Got :
PHP Fatal error:  Call to undefined method ArcanistGitAPI::amendGitHeadCommit() 
in 
/Users/pradeepk/opensource-hive/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php 
on line 173

I saw some other references to this error in different JIRAs but no solution 
suggested - is there a fix for this issue?

So I manually uploaded a diff (used git diff ..) to create the review - 
https://reviews.facebook.net/D6969

 Improve Hive's logic for conditional merge
 --

 Key: HIVE-3733
 URL: https://issues.apache.org/jira/browse/HIVE-3733
 Project: Hive
  Issue Type: Improvement
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: HIVE-3733.1.patch.txt


 If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles 
 is set to false then when hive encounters a FileSinkOperator when generating 
 map reduce tasks, it will look at the entire job to see if it has a reducer, 
 if it does it will not merge. Instead it should be check if the 
 FileSinkOperator is a child of the reducer. This means that outputs generated 
 in the mapper will be merged, and outputs generated in the reducer will not 
 be, the intended effect of setting those configs.
 Simple repro:
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=false;
 EXPLAIN
 FROM input_table
 INSERT OVERWRITE TABLE output_table1 SELECT key, COUNT(*) group by key
 INSERT OVERWRITE TABLE output_table2 SELECT *;
 The output should contain a Conditional Operator, Mapred Stages, and Move 
 tasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3733) Improve Hive's logic for conditional merge

2012-11-26 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503952#comment-13503952
 ] 

Pradeep Kamath commented on HIVE-3733:
--

Also on my Mac, a few tests fails
testCliDriver_escape1
testCliDriver_escape2
testCliDriver_join29
testCliDriver_join35
testCliDriver_lineage1
testCliDriver_load_dyn_part14
testCliDriver_union10
testCliDriver_union12
testCliDriver_union18
testCliDriver_union30
testCliDriver_union4
testCliDriver_union6 

I spot checked a few of them (union4, union6) and they are due to differences 
in plan output - the new output seems to have more operators including the 
conditional operator - I will look more into it - any guidance to help me would 
be greatly appreciated.

 Improve Hive's logic for conditional merge
 --

 Key: HIVE-3733
 URL: https://issues.apache.org/jira/browse/HIVE-3733
 Project: Hive
  Issue Type: Improvement
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: HIVE-3733.1.patch.txt


 If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles 
 is set to false then when hive encounters a FileSinkOperator when generating 
 map reduce tasks, it will look at the entire job to see if it has a reducer, 
 if it does it will not merge. Instead it should be check if the 
 FileSinkOperator is a child of the reducer. This means that outputs generated 
 in the mapper will be merged, and outputs generated in the reducer will not 
 be, the intended effect of setting those configs.
 Simple repro:
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=false;
 EXPLAIN
 FROM input_table
 INSERT OVERWRITE TABLE output_table1 SELECT key, COUNT(*) group by key
 INSERT OVERWRITE TABLE output_table2 SELECT *;
 The output should contain a Conditional Operator, Mapred Stages, and Move 
 tasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3733) Improve Hive's logic for conditional merge

2012-11-26 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503960#comment-13503960
 ] 

Pradeep Kamath commented on HIVE-3733:
--

I noticed the following in HiveConf.java:
{code}
HIVEMERGEMAPFILES(hive.merge.mapfiles, true)
{code}

I suspect that with my new changes and the above setting, we are now merging 
files for map only tasks where we previously were not - I am very new to the 
hive code base  - would request some committer to take a look to see if in fact 
the new behavior is the expected one given that the default for merge map files 
is true.

 Improve Hive's logic for conditional merge
 --

 Key: HIVE-3733
 URL: https://issues.apache.org/jira/browse/HIVE-3733
 Project: Hive
  Issue Type: Improvement
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: HIVE-3733.1.patch.txt


 If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles 
 is set to false then when hive encounters a FileSinkOperator when generating 
 map reduce tasks, it will look at the entire job to see if it has a reducer, 
 if it does it will not merge. Instead it should be check if the 
 FileSinkOperator is a child of the reducer. This means that outputs generated 
 in the mapper will be merged, and outputs generated in the reducer will not 
 be, the intended effect of setting those configs.
 Simple repro:
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=false;
 EXPLAIN
 FROM input_table
 INSERT OVERWRITE TABLE output_table1 SELECT key, COUNT(*) group by key
 INSERT OVERWRITE TABLE output_table2 SELECT *;
 The output should contain a Conditional Operator, Mapred Stages, and Move 
 tasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3744) Thrift create_table should check types of table columns

2012-11-26 Thread Bhushan Mandhani (JIRA)

Bhushan Mandhani created HIVE-3744:
--

 Summary: Thrift create_table should check types of table columns
 Key: HIVE-3744
 URL: https://issues.apache.org/jira/browse/HIVE-3744
 Project: Hive
  Issue Type: Bug
  Components: Thrift API
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Minor


The Thrift create_table() does not look at the datatype strings of Table 
objects coming in through Thrift. When someone fails to set one of them, we can 
end up with empty string for datatype and corrupt metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3531) Simple lock manager for dedicated hive server


[ 
https://issues.apache.org/jira/browse/HIVE-3531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504044#comment-13504044
 ] 

Phabricator commented on HIVE-3531:
---

cwsteinbach has accepted the revision HIVE-3531 [jira] Simple lock manager for 
dedicated hive server.

REVISION DETAIL
  https://reviews.facebook.net/D5871

BRANCH
  DPAL-1906

To: JIRA, cwsteinbach, navis


 Simple lock manager for dedicated hive server
 -

 Key: HIVE-3531
 URL: https://issues.apache.org/jira/browse/HIVE-3531
 Project: Hive
  Issue Type: Improvement
  Components: Locking, Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3531.D5871.1.patch, HIVE-3531.D5871.2.patch, 
 HIVE-3531.D5871.3.patch


 In many cases, we uses hive server as a sole proxy for executing all the 
 queries. For that, current default lock manager based on zookeeper seemed a 
 little heavy. Simple in-memory lock manager could be enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3742) The derby metastore schema script for 0.10.0 doesn't run

2012-11-26 Thread Prasad Mujumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504051#comment-13504051
 ] 

Prasad Mujumdar commented on HIVE-3742:
---

@Ashutosh,  thanks for catching that. Looks like this column INTEGER_IDX should 
included in the primary key of SKEWED_STRING_LIST_VALUES. 
Rest of the 0.10 schema scripts, mysql, oracle and postgres look fine.

 The derby metastore schema script for 0.10.0 doesn't run
 

 Key: HIVE-3742
 URL: https://issues.apache.org/jira/browse/HIVE-3742
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-3742-2.patch, HIVE-3742.patch


 The hive-schema-0.10.0.derby.sql contains incorrect alter statement for 
 SKEWED_STRING_LIST which causes the script execution to fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3742) The derby metastore schema script for 0.10.0 doesn't run

2012-11-26 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-3742:
--

Attachment: HIVE-3742-2.patch

Additional patch per review comment

 The derby metastore schema script for 0.10.0 doesn't run
 

 Key: HIVE-3742
 URL: https://issues.apache.org/jira/browse/HIVE-3742
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-3742-2.patch, HIVE-3742.patch


 The hive-schema-0.10.0.derby.sql contains incorrect alter statement for 
 SKEWED_STRING_LIST which causes the script execution to fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3705) Adding authorization capability to the metastore

[
https://issues.apache.org/jira/browse/HIVE-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504069#comment-13504069
]

Phabricator commented on HIVE-3705:
---

ashutoshc has requested changes to the revision HIVE-3705 [jira] Adding
authorization capability to the metastore.

Please see inline comments.

INLINE COMMENTS
conf/hive-default.xml.template:1254 I think it can be better worded as ...
The hive metastore authorization..
conf/hive-default.xml.template:1269 same as previous comment.
conf/hive-default.xml.template:1269 This will read better as The
authorization manager class name...
conf/hive-default.xml.template:1284 Same as above.

ql/src/java/org/apache/hadoop/hive/ql/security/HadoopDefaultMetastoreAuthenticator.java:27
It seems that you have created new interface
HiveMetaStoreAuthenticationProvider and added method setHandler() just to
setConf(). If this is the only reason, then there is no need of this new
interface, since HiveAuthenticationProvider already extends Configurable and at
instantiation time of this interface in HiveUtils, you have access to conf
which you can set. So, this interface and this new implementation of it seems
overkill.

ql/src/java/org/apache/hadoop/hive/ql/security/HiveMetastoreAuthenticationProvider.java:25
Need to update this javadoc.

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:62
Is there any reason for creating new instance of HiveConf? You can just save
passed-in config.

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:64
Ideally, this bool should not be checked in here. If pre-event listener is
already set for this listener class, we can safely assume that he intends this
listener to fire and checks to happen.

Infact, I don't see point of this boolean in HiveConf at all. Since, these
checks are meant to be invoked via listener interface, if user has to
consciously put these class names in the config. Why then he has to also set
that boolean to be true?

More configs usually results in more confusion.

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:125
This will break chaining of exception stack. I think it will be better done as:
InvalidOperationException ex = new InvalidOperationException(e.getMessage());
ex.initCause(e.getCause());
throw ex;

Same comment applies for all other try-catch blocks.

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:112
For all other operations (including grant, revoke etc.) no checks are
performed and are allowed straight through. I think you should add a note in
javadoc of this listener class that it only performs checks for
create/add/alter/drop of db/tbl/partitions.

I am not sure if following deployment is supported: This listener configured
with DefaultHiveAuthProvider which does auth based on privs stored in
metastore? If it is, then it has same problem which that provider has. User can
grant himself all privs, no checks are done for that and then drop tables/dbs.
I understand you are not improving semantics in this patch, but merely shifting
checks on metastore from client, but just wanted to make sure my understanding
is correct.

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:232
This method should be private.

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:250
This method should be private.

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationPreEventListener.java:256
Why is this needed ? In particular, this will set location of partition to
table's location in null-case. Is that desirable?

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/DefaultHiveAuthorizationProvider.java:30
You can just cast (HiveConf)conf, instead of doing new HiveConf()?

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:49
This is also defined in HiveMetaStore. This should be statically defined in
HiveConf and should be referenced from there, instead of private copy in each
class.

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:165
I am not sure about this, but is this about creating index, in which case
Write makes sense or is this about reading indexing, in which case read should
suffice ?

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:168
Same as above. Locks could be a shared lock or exclusive lock, resulting in
equivalent of read and write privs?

ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java:301
This method already exists in

[jira] [Created] (HIVE-3745) Hive does improper = based string comparisons with trailing whitespaces

2012-11-26 Thread Harsh J (JIRA)

Harsh J created HIVE-3745:
-

 Summary: Hive does improper = based string comparisons with 
trailing whitespaces
 Key: HIVE-3745
 URL: https://issues.apache.org/jira/browse/HIVE-3745
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.9.0
Reporter: Harsh J


Compared to other systems such as DB2, MySQL, etc., which disregard trailing 
whitespaces in a string used when comparing two strings with the {{=}} 
relational operator, Hive does not do this.

For example, note the following line from the MySQL manual: 
http://dev.mysql.com/doc/refman/5.1/en/char.html
{quote}
All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR 
values in MySQL are compared without regard to any trailing spaces. 
{quote}

Hive still is whitespace sensitive and regards trailing spaces of a string as 
worthy elements when comparing. Ideally {{LIKE}} should consider this strongly, 
but {{=}} should not.

Is there a specific reason behind this difference of implementation in Hive's 
SQL?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3745) Hive does improper = based string comparisons for strings with trailing whitespaces

2012-11-26 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HIVE-3745:
--

Summary: Hive does improper = based string comparisons for strings with 
trailing whitespaces  (was: Hive does improper = based string comparisons 
with trailing whitespaces)

 Hive does improper = based string comparisons for strings with trailing 
 whitespaces
 -

 Key: HIVE-3745
 URL: https://issues.apache.org/jira/browse/HIVE-3745
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.9.0
Reporter: Harsh J

 Compared to other systems such as DB2, MySQL, etc., which disregard trailing 
 whitespaces in a string used when comparing two strings with the {{=}} 
 relational operator, Hive does not do this.
 For example, note the following line from the MySQL manual: 
 http://dev.mysql.com/doc/refman/5.1/en/char.html
 {quote}
 All MySQL collations are of type PADSPACE. This means that all CHAR and 
 VARCHAR values in MySQL are compared without regard to any trailing spaces. 
 {quote}
 Hive still is whitespace sensitive and regards trailing spaces of a string as 
 worthy elements when comparing. Ideally {{LIKE}} should consider this 
 strongly, but {{=}} should not.
 Is there a specific reason behind this difference of implementation in Hive's 
 SQL?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3746) TRowSet resultset structure should be column-oriented

Carl Steinbach created HIVE-3746:


 Summary: TRowSet resultset structure should be column-oriented
 Key: HIVE-3746
 URL: https://issues.apache.org/jira/browse/HIVE-3746
 Project: Hive
  Issue Type: Sub-task
  Components: Server Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented


[ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504103#comment-13504103
 ] 

Carl Steinbach commented on HIVE-3746:
--

Currently HS2 uses the following Thrift structures to represent a resultset:

{noformat}
// Represents a rowset
struct TRowSet {
  // The starting row offset of this rowset.
  1: required i64 startRowOffset
  2: required listTRow rows
}

// Represents a row in a rowset.
struct TRow {
  1: required listTColumnValue colVals
}

union TColumnValue {
  1: TBoolValue   boolVal  // BOOLEAN
  2: TByteValue   byteVal  // TINYINT
  3: TI16Valuei16Val   // SMALLINT
  4: TI32Valuei32Val   // INT
  5: TI64Valuei64Val   // BIGINT, TIMESTAMP
  6: TDoubleValue doubleVal// FLOAT, DOUBLE
  7: TStringValue stringVal// STRING, LIST, MAP, STRUCT, UNIONTYPE, BINARY
}

// A Boolean column value.
struct TBoolValue {
  // NULL if value is unset.
  1: optional bool value
}

...

struct TStringValue {
  1: optional string value
}
{noformat}

This problem with this approach is that Thrift unions are not very efficient, 
and we pay this cost on a per-field basis. Instead, we should make the result 
set structure column-oriented as follows:

{noformat}
// Represents a rowset
struct TRowSet {
  // The starting row offset of this rowset.
  1: required i64 startRowOffset
  2: required listTColumn columns
}

union TColumn {
  1: listTBoolValue boolColumn
  2: listTByteValue byteColumn
  3: listTI16Value i16Column
  4: listTI32Value i32Column
  5: listTI64Value i64Column
  6: listTDoubleValue doubleColumn
  7: listTStringValue stringColumn
}
{noformat}



 TRowSet resultset structure should be column-oriented
 -

 Key: HIVE-3746
 URL: https://issues.apache.org/jira/browse/HIVE-3746
 Project: Hive
  Issue Type: Sub-task
  Components: Server Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented


[ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504126#comment-13504126
 ] 

Carl Steinbach commented on HIVE-3746:
--

This would probably be more efficient:

{noformat}
// Represents a rowset
struct TRowSet {
  // The starting row offset of this rowset.
  1: required i64 startRowOffset
  2: required listTColumn columns
}

struct TColumn {
  1: required listi32 nullOffsets
  2: required TColumnData columnData
}

union TColumnData {
  1: listbool boolColumn
  2: listbyte byteColumn
  3: listi16 i16Column
  4: listi32 i32Column
  5: listi64 i64Column
  6: listdouble doubleColumn
  7: liststring stringColumn
}
{noformat}

We may be able to make this even more compact by using a run-length encoding 
scheme for the nullOffset vector (and possibly the ColumnData list too).

 TRowSet resultset structure should be column-oriented
 -

 Key: HIVE-3746
 URL: https://issues.apache.org/jira/browse/HIVE-3746
 Project: Hive
  Issue Type: Sub-task
  Components: Server Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3726) History file closed in finalize method

[
https://issues.apache.org/jira/browse/HIVE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504178#comment-13504178
]

Ashutosh Chauhan commented on HIVE-3726:

Couple of comments:

* HiveServer uses TmpOutputFile, which it deletes (now via session.close()),
but never closes the input stream which it opens on that file in readResults()
and setupSessionIO() methods. This will result in leaking of file-descriptors.
HiveServer2 has copy-pasted this code, so this leak occurs there too.

* I am not sure about changes in SessionState per task. With your patch, each
task will close the {{SessionState}} when it finishes. A query may have
multiple tasks, so this implies every task will create SessionState when it
begin executing, which seems counter-intuitive, since there should be one
sessionstate object across all tasks of the query which was the case earlier
too, isnt it?

History file closed in finalize method
--

Key: HIVE-3726
URL: https://issues.apache.org/jira/browse/HIVE-3726
Project: Hive
Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Attachments: HIVE-3726.2-r1411423.patch, HIVE-3736.1-r1411423.patch

TestCliNegative fails intermittently because it's up to the garbage collector
to close History files. This is only a problem if you deal with a lot of
SessionState objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3726) History file closed in finalize method


 [ 
https://issues.apache.org/jira/browse/HIVE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3726:
---

Affects Version/s: 0.10.0
   0.9.0
   Status: Open  (was: Patch Available)

 History file closed in finalize method
 --

 Key: HIVE-3726
 URL: https://issues.apache.org/jira/browse/HIVE-3726
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0, 0.10.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-3726.2-r1411423.patch, HIVE-3736.1-r1411423.patch


 TestCliNegative fails intermittently because it's up to the garbage collector 
 to close History files. This is only a problem if you deal with a lot of 
 SessionState objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Jenkins build is back to normal : Hive-0.9.1-SNAPSHOT-h0.21 #211

2012-11-26 Thread Apache Jenkins Server

See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/211/

[jira] [Updated] (HIVE-3734) Static partition DML create duplicate files and records

2012-11-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3734:
---

Description: 
Static DML create duplicate files and record.

Given the following test case, hive will return 2 records:
484 val_484
484 val_484

but srcpart returns one record:
484 val_484

If you look at file system, DML generates duplicate file with the same content:
-rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
-rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0


Test Case
===
set hive.mapred.supports.subdirectories=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.merge.mapfiles=false;  
set hive.merge.mapredfiles=false;
set mapred.input.dir.recursive=true;


create table testtable (key String, value String) partitioned by (ds String, hr 
String) ;


explain extended
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';


-- check DML result
desc formatted testtable partition (ds='2008-04-08', hr='11');

select count(1) from srcpart where ds='2008-04-08';
select count(1) from testtable where ds='2008-04-08';

select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
484;
set hive.optimize.listbucketing=true;
explain extended
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
===

  was:
Static DML create duplicate files and record.

Given the following test case, hive will return 2 records:
484 val_484
484 val_484

but srcpart returns one record:
484 val_484

If you look at file system, DML generates duplicate file with the same content:
-rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
-rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0


Test Case
===
set hive.mapred.supports.subdirectories=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.merge.mapfiles=false;  
set hive.merge.mapredfiles=false;
set mapred.input.dir.recursive=true;


create table testtable (key String, value String) partitioned by (ds String, hr 
String) ;

-- list bucketing DML
explain extended
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';


-- check DML result
desc formatted testtable partition (ds='2008-04-08', hr='11');

select count(1) from srcpart where ds='2008-04-08';
select count(1) from testtable where ds='2008-04-08';

select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
484;
set hive.optimize.listbucketing=true;
explain extended
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
===


 Static partition DML create duplicate files and records
 ---

 Key: HIVE-3734
 URL: https://issues.apache.org/jira/browse/HIVE-3734
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Gang Tim Liu

 Static DML create duplicate files and record.
 Given the following test case, hive will return 2 records:
 484   val_484
 484   val_484
 but srcpart returns one record:
 484   val_484
 If you look at file system, DML generates duplicate file with the same 
 content:
 -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0
 Test Case
 ===
 set hive.mapred.supports.subdirectories=true;
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 set hive.merge.mapfiles=false;  
 set hive.merge.mapredfiles=false;
 set mapred.input.dir.recursive=true;
 create table testtable (key String, value String) partitioned by (ds String, 
 hr String) ;
 explain extended
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 -- check DML result
 desc formatted testtable partition (ds='2008-04-08', hr='11');
 select

[jira] [Updated] (HIVE-3734) Static partition DML create duplicate files and records

2012-11-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3734:
---

Description: 
Static DML create duplicate files and record.

Given the following test case, hive will return 2 records:
484 val_484
484 val_484

but srcpart returns one record:
484 val_484

If you look at file system, DML generates duplicate file with the same content:
-rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
-rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0


Test Case
===
set hive.mapred.supports.subdirectories=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.merge.mapfiles=false;  
set hive.merge.mapredfiles=false;
set mapred.input.dir.recursive=true;


create table testtable (key String, value String) partitioned by (ds String, hr 
String) ;


explain extended
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';


-- check DML result
desc formatted testtable partition (ds='2008-04-08', hr='11');

select count(1) from srcpart where ds='2008-04-08';
select count(1) from testtable where ds='2008-04-08';

select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
484;
explain extended
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
===

  was:
Static DML create duplicate files and record.

Given the following test case, hive will return 2 records:
484 val_484
484 val_484

but srcpart returns one record:
484 val_484

If you look at file system, DML generates duplicate file with the same content:
-rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
-rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0


Test Case
===
set hive.mapred.supports.subdirectories=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.merge.mapfiles=false;  
set hive.merge.mapredfiles=false;
set mapred.input.dir.recursive=true;


create table testtable (key String, value String) partitioned by (ds String, hr 
String) ;


explain extended
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';


-- check DML result
desc formatted testtable partition (ds='2008-04-08', hr='11');

select count(1) from srcpart where ds='2008-04-08';
select count(1) from testtable where ds='2008-04-08';

select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
484;
set hive.optimize.listbucketing=true;
explain extended
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
===


 Static partition DML create duplicate files and records
 ---

 Key: HIVE-3734
 URL: https://issues.apache.org/jira/browse/HIVE-3734
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Gang Tim Liu

 Static DML create duplicate files and record.
 Given the following test case, hive will return 2 records:
 484   val_484
 484   val_484
 but srcpart returns one record:
 484   val_484
 If you look at file system, DML generates duplicate file with the same 
 content:
 -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0
 Test Case
 ===
 set hive.mapred.supports.subdirectories=true;
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 set hive.merge.mapfiles=false;  
 set hive.merge.mapredfiles=false;
 set mapred.input.dir.recursive=true;
 create table testtable (key String, value String) partitioned by (ds String, 
 hr String) ;
 explain extended
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 -- check DML result
 desc formatted testtable partition (ds='2008-04-08', hr='11');
 select count(1) from srcpart where ds='2008-04-08';
 select count(1) from

[jira] [Updated] (HIVE-3734) Static partition DML create duplicate files and records

2012-11-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3734:
---

Description: 
Static DML create duplicate files and record.

Given the following test case, hive will return 2 records:
484 val_484
484 val_484

but srcpart returns one record:
484 val_484

If you look at file system, DML generates duplicate file with the same content:
-rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
-rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0


Test Case
===
set hive.mapred.supports.subdirectories=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.merge.mapfiles=false;
set hive.merge.mapredfiles=false;
set mapred.input.dir.recursive=true;

create table testtable (key String, value String) partitioned by (ds String, hr 
String) ;

explain extended
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';

desc formatted testtable partition (ds='2008-04-08', hr='11');

select count(1) from srcpart where ds='2008-04-08';
select count(1) from testtable where ds='2008-04-08';

select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
484;
explain extended
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
===

  was:
Static DML create duplicate files and record.

Given the following test case, hive will return 2 records:
484 val_484
484 val_484

but srcpart returns one record:
484 val_484

If you look at file system, DML generates duplicate file with the same content:
-rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
-rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0


Test Case
===
set hive.mapred.supports.subdirectories=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.merge.mapfiles=false;  
set hive.merge.mapredfiles=false;
set mapred.input.dir.recursive=true;


create table testtable (key String, value String) partitioned by (ds String, hr 
String) ;


explain extended
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';
insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
key, value from srcpart where ds='2008-04-08';


-- check DML result
desc formatted testtable partition (ds='2008-04-08', hr='11');

select count(1) from srcpart where ds='2008-04-08';
select count(1) from testtable where ds='2008-04-08';

select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
484;
explain extended
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
484;
===


 Static partition DML create duplicate files and records
 ---

 Key: HIVE-3734
 URL: https://issues.apache.org/jira/browse/HIVE-3734
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Gang Tim Liu

 Static DML create duplicate files and record.
 Given the following test case, hive will return 2 records:
 484   val_484
 484   val_484
 but srcpart returns one record:
 484   val_484
 If you look at file system, DML generates duplicate file with the same 
 content:
 -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0
 Test Case
 ===
 set hive.mapred.supports.subdirectories=true;
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 set hive.merge.mapfiles=false;
 set hive.merge.mapredfiles=false;
 set mapred.input.dir.recursive=true;
 create table testtable (key String, value String) partitioned by (ds String, 
 hr String) ;
 explain extended
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 desc formatted testtable partition (ds='2008-04-08', hr='11');
 select count(1) from srcpart where ds='2008-04-08';
 select count(1) from testtable where ds='2008-04-08';
 select key, value from srcpart where

[jira] [Created] (HIVE-3747) Provide hive operation name for hookContext

2012-11-26 Thread Sudhanshu Arora (JIRA)

Sudhanshu Arora created HIVE-3747:
-

 Summary: Provide hive operation name for hookContext
 Key: HIVE-3747
 URL: https://issues.apache.org/jira/browse/HIVE-3747
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Sudhanshu Arora


The hookContext exposed through ExecuteWithHookContext, does not provide the 
name of the Hive operation. 

The following public API should be added in HookContext.
public String getOperationName() {
return SessionState.get().getHiveOperation().name();
}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3234) getting the reporter in the recordwriter


 [ 
https://issues.apache.org/jira/browse/HIVE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3234:
--

Attachment: HIVE-3234.D6987.1.patch

omalley requested code review of HIVE-3234 [jira] getting the reporter in the 
recordwriter.
Reviewers: JIRA

  HIVE-3736 : hive unit test case build failure. (Ashish Singh via Ashutosh 
Chauhan)

  We would like to generate some custom statistics and report back to 
map/reduce later wen implement the
   FileSinkOperator.RecordWriter interface. However, the current interface 
design doesn't allow us to get the map reduce reporter object. Please extend 
the current FileSinkOperator.RecordWriter interface so that it's close() method 
passes in a map reduce reporter object.

  For the same reason, please also extend the RecordReader interface too to 
include a reporter object so that users can passes in custom map reduce  
counters.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D6987

AFFECTED FILES
  ivy/ivysettings.xml
  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
  ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13InputFormat.java
  ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java
  ql/src/test/queries/clientpositive/custom_input_output_format.q
  ql/src/test/results/clientpositive/custom_input_output_format.q.out

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/16461/

To: JIRA, omalley


 getting the reporter in the recordwriter
 

 Key: HIVE-3234
 URL: https://issues.apache.org/jira/browse/HIVE-3234
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.9.1
 Environment: any
Reporter: Jimmy Hu
Assignee: Owen O'Malley
  Labels: newbie
 Fix For: 0.9.1

 Attachments: HIVE-3234.D6699.1.patch, HIVE-3234.D6699.2.patch, 
 HIVE-3234.D6987.1.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We would like to generate some custom statistics and report back to 
 map/reduce later wen implement the 
  FileSinkOperator.RecordWriter interface. However, the current interface 
 design doesn't allow us to get the map reduce reporter object. Please extend 
 the current FileSinkOperator.RecordWriter interface so that it's close() 
 method passes in a map reduce reporter object. 
 For the same reason, please also extend the RecordReader interface too to 
 include a reporter object so that users can passes in custom map reduce  
 counters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2983) Hive ant targets for publishing maven artifacts can be simplified

2012-11-26 Thread Travis Crawford (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504274#comment-13504274
]

Travis Crawford commented on HIVE-2983:
---

Re: publishing ant-tasks jar
Agreed, I don't think we need to publish. I'll update.

Re: special-casing exec
Exec is different because its not actually a subproject, its generated in the
{{ql}} subproject directory. The fatjar issue has been discussed back forth a
lot now. If there's interest, I'd very much like to freshen up the patch
discussed in
https://issues.apache.org/jira/browse/HIVE-2424?focusedCommentId=13262898page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13262898

Let me know what you'd like to do regarding the exec jar. Given the number of
issues about it, there's lots of community interest in thin jars (in addition
to hive-exec).

Hive ant targets for publishing maven artifacts can be simplified
-

Key: HIVE-2983
URL: https://issues.apache.org/jira/browse/HIVE-2983
Project: Hive
Issue Type: Improvement
Reporter: Travis Crawford
Assignee: Travis Crawford
Priority: Minor
Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2983.D2961.1.patch

Hive has a few ant tasks related to publishing maven artifacts. As not all
sub projects publish artifacts the {{iterate}} macro that simplifies other
tasks cannot be used in this context.
Hive already uses the {{for}} task from ant-contrib, which works great here.
{{build.xml}} can be simplified by using the for task when preparing maven
artifacts.

[jira] [Commented] (HIVE-3723) Hive Driver leaks ZooKeeper connections

2012-11-26 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504297#comment-13504297
 ] 

Gunther Hagleitner commented on HIVE-3723:
--

The releaseLocks call happens at various places in the run method. I think 
that's fine too, since calling destroy would kill the HiveLockManager which is 
shared between different context objects (i.e.: That should only happen when 
someone specifically destroys the driver.)

As a side note, compile doesn't seem to acquire any locks - the call is thus 
just a protection against future versions that could. 

 Hive Driver leaks ZooKeeper connections
 ---

 Key: HIVE-3723
 URL: https://issues.apache.org/jira/browse/HIVE-3723
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-3723.1-r1411423.patch


 In certain error cases (i.e.: statement fails to compile, semantic errors) 
 the hive driver leaks zookeeper connections.
 This can be seen in the TestNegativeCliDriver test which accumulates a large 
 number of open file handles and fails if the max allowed number of file 
 handles isn't at least 2048.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-26 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504337#comment-13504337
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

@Ashutosh: I've uploaded a new patch which adds 2 varchar columns for storing 
BigDecimal low and high values. Thanks.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-26 Thread Shreepadma Venugopalan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3678:
-

Attachment: HIVE-3678.3.patch.txt

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-26 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504338#comment-13504338
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

Updated patch is available on both JIRA and RB. Thanks.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries


 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Status: Open  (was: Patch Available)

There were some comments on phabricator by Kevin

 sort-merge join does not work with sub-queries
 --

 Key: HIVE-3633
 URL: https://issues.apache.org/jira/browse/HIVE-3633
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, 
 hive.3633.4.patch, hive.3633.5.patch, hive.3633.6.patch


 Consider the following query:
 create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 -- load the above tables
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 explain
 select count(*) from
 (
 select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
 b.value as value2
 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
 subq;
 The above query does not use sort-merge join. This would be very useful as we 
 automatically convert the queries to use sorting and bucketing properties for 
 join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3381) Result of outer join is not valid


[ 
https://issues.apache.org/jira/browse/HIVE-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504355#comment-13504355
 ] 

Phabricator commented on HIVE-3381:
---

njain has commented on the revision HIVE-3381 [jira] Result of outer join is 
not valid.

  Navis, I am not saying that the old code is good. On the contrary, it is 
really difficult to follow.
  There is serious lack of comments in that.
  But, we have to improve that - I really appreciate that you are fixing this 
very serious bug,
  but it would be really useful if you can add lots of comments so that it 
becomes much easier to
  maintain/enhance in future.

REVISION DETAIL
  https://reviews.facebook.net/D5565

To: JIRA, navis
Cc: njain


 Result of outer join is not valid
 -

 Key: HIVE-3381
 URL: https://issues.apache.org/jira/browse/HIVE-3381
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: HIVE-3381.D5565.3.patch


 Outer joins, especially full outer joins or outer join with filter on 'ON 
 clause' is not showing proper results. For example, query in test join_1to1.q
 {code}
 SELECT * FROM join_1to1_1 a full outer join join_1to1_2 b on a.key1 = b.key1 
 and a.value = 66 and b.value = 66 ORDER BY a.key1 ASC, a.key2 ASC, a.value 
 ASC, b.key1 ASC, b.key2 ASC, b.value ASC;
 {code}
 results
 {code}
 NULL  NULLNULLNULLNULL66
 NULL  NULLNULLNULL10050   66
 NULL  NULLNULL10  10010   66
 NULL  NULLNULL30  10030   88
 NULL  NULLNULL35  10035   88
 NULL  NULLNULL40  10040   88
 NULL  NULLNULL40  10040   88
 NULL  NULLNULL50  10050   88
 NULL  NULLNULL50  10050   88
 NULL  NULLNULL50  10050   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL70  10040   88
 NULL  NULL66  NULLNULLNULL
 NULL  10050   66  NULLNULLNULL
 5 10005   66  5   10005   66
 1510015   66  NULLNULLNULL
 2010020   66  20  10020   66
 2510025   88  NULLNULLNULL
 3010030   66  NULLNULLNULL
 3510035   88  NULLNULLNULL
 4010040   66  NULLNULLNULL
 4010040   66  40  10040   66
 4010040   88  NULLNULLNULL
 4010040   88  NULLNULLNULL
 5010050   66  NULLNULLNULL
 5010050   66  50  10050   66
 5010050   66  50  10050   66
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 7010040   66  NULLNULLNULL
 7010040   66  NULLNULLNULL
 7010040   66  NULLNULLNULL
 7010040   66  NULLNULLNULL
 8010040   88  NULLNULLNULL
 8010040   88  NULLNULLNULL
 8010040   88  NULLNULLNULL
 8010040   88  NULLNULLNULL
 {code} 
 but it seemed not right. This should be 
 {code}
 NULL  NULLNULLNULLNULL66
 NULL  NULLNULLNULL10050   66
 NULL  NULLNULL10  10010   66
 NULL  NULLNULL25  10025   66
 NULL  NULLNULL30  10030   88
 NULL  NULLNULL35  10035   88
 NULL  NULLNULL40  10040   88
 NULL  NULLNULL50  10050   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL70  10040   88
 NULL  NULLNULL80  10040   66
 NULL  NULLNULL80  10040   66
 NULL  NULL66  NULLNULLNULL
 NULL  10050   66  NULLNULLNULL
 5 10005   66  5   10005   66
 1510015   66  NULLNULLNULL
 2010020   66  20  10020   66
 2510025   88  NULLNULLNULL
 3010030   66  NULLNULLNULL
 3510035   88  NULLNULLNULL
 4010040   66  40  10040   66
 4010040   88  NULLNULLNULL
 5010050   66  50  10050   66
 5010050   66  50  10050   66
 5010050   88  NULLNULLNULL
 5010050   88  NULLNULLNULL
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 6010040   66  60  10040   66
 7010040

[jira] [Commented] (HIVE-3733) Improve Hive's logic for conditional merge


[ 
https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504362#comment-13504362
 ] 

Namit Jain commented on HIVE-3733:
--

Comments on phabricator.

testCliDriver_escape1
testCliDriver_escape2

always fail on mac.

For plan output changes, can you check the test and verify if the new plan is 
correct ?
If you think the new plan is correct, modify the output file.

 Improve Hive's logic for conditional merge
 --

 Key: HIVE-3733
 URL: https://issues.apache.org/jira/browse/HIVE-3733
 Project: Hive
  Issue Type: Improvement
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: HIVE-3733.1.patch.txt


 If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles 
 is set to false then when hive encounters a FileSinkOperator when generating 
 map reduce tasks, it will look at the entire job to see if it has a reducer, 
 if it does it will not merge. Instead it should be check if the 
 FileSinkOperator is a child of the reducer. This means that outputs generated 
 in the mapper will be merged, and outputs generated in the reducer will not 
 be, the intended effect of setting those configs.
 Simple repro:
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=false;
 EXPLAIN
 FROM input_table
 INSERT OVERWRITE TABLE output_table1 SELECT key, COUNT(*) group by key
 INSERT OVERWRITE TABLE output_table2 SELECT *;
 The output should contain a Conditional Operator, Mapred Stages, and Move 
 tasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time


[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504365#comment-13504365
 ] 

Namit Jain commented on HIVE-3718:
--

30005 is already being used. Can you use a new error code ?

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3720) Expand and standardize authorization in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504375#comment-13504375
 ] 

Namit Jain commented on HIVE-3720:
--

[~shreepadma], how is it different from the current hive authorization model ?
Is the proposed functionality a superset of the existing one ?
If yes, can you mark in the wiki, what has been already implemented ?

 Expand and standardize authorization in Hive
 

 Key: HIVE-3720
 URL: https://issues.apache.org/jira/browse/HIVE-3720
 Project: Hive
  Issue Type: Improvement
  Components: Authorization
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: Hive_Authorization_Functionality.pdf


 The existing implementation of authorization in Hive is not complete. 
 Additionally the existing implementation has security holes. This JIRA is an 
 umbrella JIRA  for a) extending authorization to all SQL operations and 
 direct metadata operations, and b) standardizing the authorization model and 
 its semantics to mirror that of MySQL as closely as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-11-26 Thread Pamela Vagata (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: HIVE-3718.3.patch.txt

Got it, I've updated to use 30011

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
 HIVE-3718.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3234) getting the reporter in the recordwriter

[
https://issues.apache.org/jira/browse/HIVE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504406#comment-13504406
]

Ashutosh Chauhan commented on HIVE-3234:

Looks good. It seems that patch contains another patch. Can you get rid of
ivysettings.xml?
+1 I am running tests now, will commit if tests pass.

getting the reporter in the recordwriter

Key: HIVE-3234
URL: https://issues.apache.org/jira/browse/HIVE-3234
Project: Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Affects Versions: 0.9.1
Environment: any
Reporter: Jimmy Hu
Assignee: Owen O'Malley
Labels: newbie
Fix For: 0.9.1

Attachments: HIVE-3234.D6699.1.patch, HIVE-3234.D6699.2.patch,
HIVE-3234.D6987.1.patch

Original Estimate: 48h
Remaining Estimate: 48h

We would like to generate some custom statistics and report back to
map/reduce later wen implement the
FileSinkOperator.RecordWriter interface. However, the current interface
design doesn't allow us to get the map reduce reporter object. Please extend
the current FileSinkOperator.RecordWriter interface so that it's close()
method passes in a map reduce reporter object.
For the same reason, please also extend the RecordReader interface too to
include a reporter object so that users can passes in custom map reduce
counters.

[jira] [Commented] (HIVE-3723) Hive Driver leaks ZooKeeper connections


[ 
https://issues.apache.org/jira/browse/HIVE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504408#comment-13504408
 ] 

Ashutosh Chauhan commented on HIVE-3723:


Hmm.. make sense. +1 will commit if tests pass.

 Hive Driver leaks ZooKeeper connections
 ---

 Key: HIVE-3723
 URL: https://issues.apache.org/jira/browse/HIVE-3723
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-3723.1-r1411423.patch


 In certain error cases (i.e.: statement fails to compile, semantic errors) 
 the hive driver leaks zookeeper connections.
 This can be seen in the TestNegativeCliDriver test which accumulates a large 
 number of open file handles and fails if the max allowed number of file 
 handles isn't at least 2048.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3742) The derby metastore schema script for 0.10.0 doesn't run


[ 
https://issues.apache.org/jira/browse/HIVE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504411#comment-13504411
 ] 

Ashutosh Chauhan commented on HIVE-3742:


+1

 The derby metastore schema script for 0.10.0 doesn't run
 

 Key: HIVE-3742
 URL: https://issues.apache.org/jira/browse/HIVE-3742
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-3742-2.patch, HIVE-3742.patch


 The hive-schema-0.10.0.derby.sql contains incorrect alter statement for 
 SKEWED_STRING_LIST which causes the script execution to fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries


 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Attachment: hive.3633.7.patch

 sort-merge join does not work with sub-queries
 --

 Key: HIVE-3633
 URL: https://issues.apache.org/jira/browse/HIVE-3633
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, 
 hive.3633.4.patch, hive.3633.5.patch, hive.3633.6.patch, hive.3633.7.patch


 Consider the following query:
 create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 -- load the above tables
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 explain
 select count(*) from
 (
 select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
 b.value as value2
 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
 subq;
 The above query does not use sort-merge join. This would be very useful as we 
 automatically convert the queries to use sorting and bucketing properties for 
 join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes


[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504416#comment-13504416
 ] 

Ashutosh Chauhan commented on HIVE-3678:


+1 patch looks good, though fails to apply cleanly. Can you update the patch on 
latest trunk?

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3645) RCFileWriter does not implement the right function to support Federation


[ 
https://issues.apache.org/jira/browse/HIVE-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504419#comment-13504419
 ] 

Ashutosh Chauhan commented on HIVE-3645:


+1 will commit if tests pass.

 RCFileWriter does not implement the right function to support Federation
 

 Key: HIVE-3645
 URL: https://issues.apache.org/jira/browse/HIVE-3645
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0, 0.10.0
 Environment: Hadoop 0.23.3 federation, Hive 0.9 and Pig 0.10
Reporter: Viraj Bhat
 Attachments: HIVE_3645_branch_0.patch, HIVE_3645_trunk_0.patch


 Create a table using Hive DDL
 {code}
 CREATE TABLE tmp_hcat_federated_numbers_part_1 (
   id   int,  
   intnum   int,
   floatnum float
 )partitioned by (
   part1string,
   part2string
 )
 STORED AS rcfile
 LOCATION 'viewfs:///database/tmp_hcat_federated_numbers_part_1';
 {code}
 Populate it using Pig:
 {code}
 A = load 'default.numbers_pig' using org.apache.hcatalog.pig.HCatLoader();
 B = filter A by id =  500;
 C = foreach B generate (int)id, (int)intnum, (float)floatnum;
 store C into
 'default.tmp_hcat_federated_numbers_part_1'
 using org.apache.hcatalog.pig.HCatStorer
('part1=pig, part2=hcat_pig_insert',
 'id: int,intnum: int,floatnum: float');
 {code}
 Generates the following error when running on a Federated Cluster:
 {quote}
 2012-10-29 20:40:25,011 [main] ERROR
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate
 exception from backed error: AttemptID:attempt_1348522594824_0846_m_00_3
 Info:Error: org.apache.hadoop.fs.viewfs.NotInMountpointException:
 getDefaultReplication on empty path is invalid
 at
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:479)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:723)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:705)
 at
 org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86)
 at
 org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:100)
 at
 org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:228)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
 at
 org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.init(MapTask.java:587)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:706)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs