date:20140702


[ 
https://issues.apache.org/jira/browse/HIVE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049667#comment-14049667
 ] 

Lefty Leverenz commented on HIVE-3072:
--

This is documented in the wiki here:

* [Language Manual -- DDL -- Skewed Tables | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables]

 Hive List Bucketing - DDL support
 -

 Key: HIVE-3072
 URL: https://issues.apache.org/jira/browse/HIVE-3072
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.10.0

 Attachments: HIVE-3072.patch, HIVE-3072.patch.1, HIVE-3072.patch.2, 
 HIVE-3072.patch.3, HIVE-3072.patch.4, HIVE-3072.patch.5, HIVE-3072.patch.6, 
 HIVE-3072.patch.7


 If a hive table column has skewed keys, query performance on non-skewed key 
 is always impacted. Hive List Bucketing feature will address it:
 https://cwiki.apache.org/Hive/listbucketing.html
 This jira issue will track DDL change for the feature. It's for both single 
 skewed column and multiple columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-3072) Hive List Bucketing - DDL support


 [ 
https://issues.apache.org/jira/browse/HIVE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-3072:
-

Labels:   (was: TODOC10)

 Hive List Bucketing - DDL support
 -

 Key: HIVE-3072
 URL: https://issues.apache.org/jira/browse/HIVE-3072
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.10.0

 Attachments: HIVE-3072.patch, HIVE-3072.patch.1, HIVE-3072.patch.2, 
 HIVE-3072.patch.3, HIVE-3072.patch.4, HIVE-3072.patch.5, HIVE-3072.patch.6, 
 HIVE-3072.patch.7


 If a hive table column has skewed keys, query performance on non-skewed key 
 is always impacted. Hive List Bucketing feature will address it:
 https://cwiki.apache.org/Hive/listbucketing.html
 This jira issue will track DDL change for the feature. It's for both single 
 skewed column and multiple columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6382) PATCHED_BLOB encoding in ORC will corrupt data in some cases


[ 
https://issues.apache.org/jira/browse/HIVE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049710#comment-14049710
 ] 

Lefty Leverenz commented on HIVE-6382:
--

*hive.exec.orc.skip.corrupt.data* is documented in the wiki:

* [Configuration Properties -- hive.exec.orc.skip.corrupt.data | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.skip.corrupt.data]

 PATCHED_BLOB encoding in ORC will corrupt data in some cases
 

 Key: HIVE-6382
 URL: https://issues.apache.org/jira/browse/HIVE-6382
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.13.0

 Attachments: HIVE-6382.1.patch, HIVE-6382.2.patch, HIVE-6382.3.patch, 
 HIVE-6382.4.patch, HIVE-6382.5.patch, HIVE-6382.6.patch


 In PATCHED_BLOB encoding (added in HIVE-4123), gapVsPatchList is an array of 
 long that stores gap (g) between the values that are patched and the patch 
 value (p). The maximum distance of gap can be 511 that require 8 bits to 
 encode. And patch values can take more than 56 bits. When patch values take 
 more than 56 bits, p + g will become  64 bits which cannot be packed to a 
 long. This will result in data corruption under the case where patch values 
 are  56 bits. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-860) Persistent distributed cache


[ 
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049711#comment-14049711
 ] 

Hive QA commented on HIVE-860:
--



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653524/HIVE-860.patch

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 5630 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_nullable_fields
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_inputddl7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_view_cast
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_streaming
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/657/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/657/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-657/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653524

 Persistent distributed cache
 

 Key: HIVE-860
 URL: https://issues.apache.org/jira/browse/HIVE-860
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Zheng Shao
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch


 DistributedCache is shared across multiple jobs, if the hdfs file name is the 
 same.
 We need to make sure Hive put the same file into the same location every time 
 and do not overwrite if the file content is the same.
 We can achieve 2 different results:
 A1. Files added with the same name, timestamp, and md5 in the same session 
 will have a single copy in distributed cache.
 A2. Filed added with the same name, timestamp, and md5 will have a single 
 copy in distributed cache.
 A2 has a bigger benefit in sharing but may raise a question on when Hive 
 should clean it up in hdfs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6586) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)


[ 
https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049723#comment-14049723
 ] 

Lefty Leverenz commented on HIVE-6586:
--

HIVE-6749 changed the default value of hive.auto.convert.join.use.nonstaged to 
false in 0.13.0.  The change isn't in patch HIVE-6037-0.13.0.

 Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)
 ---

 Key: HIVE-6586
 URL: https://issues.apache.org/jira/browse/HIVE-6586
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Lefty Leverenz
  Labels: TODOC14

 HIVE-6037 puts the definitions of configuration parameters into the 
 HiveConf.java file, but several recent jiras for release 0.13.0 introduce new 
 parameters that aren't in HiveConf.java yet and some parameter definitions 
 need to be altered for 0.13.0.  This jira will patch HiveConf.java after 
 HIVE-6037 gets committed.
 Also, four typos patched in HIVE-6582 need to be fixed in the new 
 HiveConf.java.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6749) Turn hive.auto.convert.join.use.nonstaged off by default


[ 
https://issues.apache.org/jira/browse/HIVE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049729#comment-14049729
 ] 

Lefty Leverenz commented on HIVE-6749:
--

The changed default value for *hive.auto.convert.join.use.nonstaged* is 
documented in the wiki here:

* [Configuration Properties -- hive.auto.convert.join.use.nonstaged | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.auto.convert.join.use.nonstaged]

I added a comment to HIVE-6586 so the correct default value of 
*hive.auto.convert.join.use.nonstaged* won't get lost in the shuffle when 
HIVE-6037 changes HiveConf.java.

 Turn hive.auto.convert.join.use.nonstaged off by default
 

 Key: HIVE-6749
 URL: https://issues.apache.org/jira/browse/HIVE-6749
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.13.0

 Attachments: HIVE-6749.1.patch, HIVE-6749.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies

2014-07-02 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-5733:
--

Status: Open  (was: Patch Available)

There are problems with uploaded patch, canceling

 Publish hive-exec artifact without all the dependencies
 ---

 Key: HIVE-5733
 URL: https://issues.apache.org/jira/browse/HIVE-5733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Jarek Jarcec Cecho
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-5733.1.patch


 Currently the artifact {{hive-exec}} that is available in 
 [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar]
  is shading all the dependencies (= the jar contains all Hive's 
 dependencies). As other projects that are depending on Hive might be use 
 slightly different version of the dependencies, it can easily happens that 
 Hive's shaded version will be used instead which leads to very time consuming 
 debugging of what is happening (for example SQOOP-1198).
 Would it be feasible publish {{hive-exec}} jar that will be build without 
 shading any dependency? For example 
 [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar]
  is having classifier nodeps that represents artifact without any 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Branch for HIVE-7292

2014-07-02 Thread Xuefu Zhang

Hi all,

I have started working HIVE-7292, for which I think a branch would make
sense so that the trunk will be kept stable all the time, due to the fair
amount of integration work between Hive and Spark. Thus,  I'd like to
propose creating a branch in order to be able to do this incrementally and
collaboratively.

Secondly, there will be limited amount of refactoring work to support
HIVE-7292. For this, we will work directly on trunk.

Please let me know if you have any questions or concerns. At the same time,
design doc has been posted on JIRA and wiki for quite some time. Thank you
for those who have provided feedback, but feedback is welcome any time.

Regards,
Xuefu

[jira] [Updated] (HIVE-7029) Vectorize ReduceWork


 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Status: In Progress  (was: Patch Available)

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7029) Vectorize ReduceWork


 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Status: Patch Available  (was: In Progress)

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7029) Vectorize ReduceWork


 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Attachment: HIVE-7029.3.patch

Rebase with checkin of HIVE-7105 and recent changes.

Address reviewer comments.

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-3087) Implement stored procedure

2014-07-02 Thread Biju Devassy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biju Devassy updated HIVE-3087:
---

Attachment: (was: Document.rtf)

 Implement stored procedure 
 ---

 Key: HIVE-3087
 URL: https://issues.apache.org/jira/browse/HIVE-3087
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Biju Devassy

 Stored procedure implementation is missing in hive. It will be very useful 
 for executing a set of instructions.
 It should contain features like function creation, variable declaration with 
 different data types,control statements and other common stored procedure 
 techniques.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-3087) Implement stored procedure

2014-07-02 Thread Biju Devassy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biju Devassy updated HIVE-3087:
---

Attachment: ApproachNote.txt

Updated

 Implement stored procedure 
 ---

 Key: HIVE-3087
 URL: https://issues.apache.org/jira/browse/HIVE-3087
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Biju Devassy
 Attachments: ApproachNote.txt


 Stored procedure implementation is missing in hive. It will be very useful 
 for executing a set of instructions.
 It should contain features like function creation, variable declaration with 
 different data types,control statements and other common stored procedure 
 techniques.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7296) big data approximate processing at a very low cost based on hive sql

2014-07-02 Thread Carter Shanklin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050265#comment-14050265
 ] 

Carter Shanklin commented on HIVE-7296:
---

[~sjtufighter]

I have spoken to some Hive users that implemented their own UDF to compute 
approximate counts and ranks using lossy counting 
http://www.vldb.org/conf/2002/S10P03.pdf

They had tried some other approaches but settled on this because it allows 
tunable error and deals with skew fairly well.

This could be implemented in Hive using partitioned table functions and I think 
there are some users who would like this functionality. This sounds similar to 
your number (3). I've spoken to a few people on the Hive team and they think it 
sounds like a good idea, any interest in building this?

 big data approximate processing  at a very  low cost  based on hive sql 
 

 Key: HIVE-7296
 URL: https://issues.apache.org/jira/browse/HIVE-7296
 Project: Hive
  Issue Type: New Feature
Reporter: wangmeng

 For big data analysis, we often need to do the following query and statistics：
 1.Cardinality Estimation,   count the number of different elements in the 
 collection, such as Unique Visitor ,UV)
 Now we can use hive-query:
 Select distinct(id)  from TestTable ;
 2.Frequency Estimation: estimate number of an element is repeated, such as 
 the site visits of  a user 。
 Hive query: select  count(1)  from TestTable where name=”wangmeng”
 3.Heavy Hitters, top-k elements: such as top-100 shops 
 Hive query: select count(1), name  from TestTable  group by name ;  need UDF……
 4.Range Query: for example, to find out the number of  users between 20 to 30
 Hive query : select  count(1) from TestTable where age20 and age 30
 5.Membership Query : for example, whether  the user name is already 
 registered?
 According to the implementation mechanism of hive , it  will cost too large 
 memory space and a long query time.
 However ,in many cases, we do not need very accurate results and a small 
 error can be tolerated. In such case  , we can use  approximate processing  
 to greatly improve the time and space efficiency.
 Now , based  on some theoretical analysis materials ,I want to  do some for 
 these new features so much if possible. 
 So, is there anything I can do ?  Many Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22996: HIVE-7090 Support session-level temporary tables in Hive

2014-07-02 Thread Jason Dere



 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  Hey Jason, looks good! Nice work! I have a question or two below and a bit 
  nits.

Thanks for reviewing!


 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniMr.java,
   line 240
  https://reviews.apache.org/r/22996/diff/2/?file=619988#file619988line240
 
  When the error message does not contain the text we are looking for, 
  putting the actual text in the error message is useful.
  
  I.e. when this assertion fails we won't have any idea what the actual 
  message was. Thus the person debugging will have to actually make a code 
  change and re-run the test to see what happened.

Good, point, will change.


 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java,
   line 34
  https://reviews.apache.org/r/22996/diff/2/?file=619995#file619995line34
 
  I am sure this is a stupid question but why are we subclassing HMSC?

HiveMetaStoreClient is part of hive-metastore and can only see the classes that 
are part of hive-metastore or its dependents.  So making the changes directly 
in HiveMetaStoreClient wasn't possible because it does not have visibility to 
the SessionState which is part of hive-exec.
To try to put all of the logic into the HiveMetaStoreClient, I suppose it may 
have been possible to define an interface for the particular methods we wanted 
from the SessionState, and try to initialize the HMSC with the SessionState in 
some way.  But given the way the HMSC is used in Hive.createMetaStoreClient it 
didn't seem like there was a good place to actually call into HMSC to pass in 
the SessionState.


 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 
  10205
  https://reviews.apache.org/r/22996/diff/2/?file=62#file62line10205
 
  nit: 
  
  Is Partition columns are not supported on temporary tables and source 
  table in CREATE TABLE LIKE is partitioned. more clear?
 

Sounds better than my message :) will change.


 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java, line 80
  https://reviews.apache.org/r/22996/diff/2/?file=620003#file620003line80
 
  It looks to me like these can be private since they are not accessed 
  outside this class?

Will change in next patch


 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java, line 183
  https://reviews.apache.org/r/22996/diff/2/?file=620003#file620003line183
 
  These // should be javadoc style.

will change in next patch


 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java, line 383
  https://reviews.apache.org/r/22996/diff/2/?file=620003#file620003line383
 
  I understand it's coded today such that these three conf.get() will not 
  return null. However I believe we should use Preconditions.checkNotNull 
  here to ensure once that assumption is not true we don't give the dev/user 
  a terrible error message.

will change in next patch


 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java, line 445
  https://reviews.apache.org/r/22996/diff/2/?file=620003#file620003line445
 
  nit: 
  
  Is Cannot create directory more clear?

will change in next patch


 On July 2, 2014, 2:07 a.m., Brock Noland wrote:
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java, line 1093
  https://reviews.apache.org/r/22996/diff/2/?file=620003#file620003line1093
 
  Setter is not being used.

Ok, will remove


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22996/#review47170
---


On June 28, 2014, 12:35 a.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/22996/
 ---
 
 (Updated June 28, 2014, 12:35 a.m.)
 
 
 Review request for hive, Gunther Hagleitner, Navis Ryu, and Harish Butani.
 
 
 Bugs: HIVE-7090
 https://issues.apache.org/jira/browse/HIVE-7090
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Temp tables managed in memory by SessionState.
 SessionHiveMetaStoreClient overrides table-related methods in HiveMetaStore 
 to access the temp tables saved in the SessionState when appropriate.
 
 
 Diffs
 -
 
   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniMr.java 
 9fb7550 
   itests/qtest/testconfiguration.properties 1462ecd 
   metastore/if/hive_metastore.thrift cc802c6

[jira] [Updated] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7090:
-

Attachment: HIVE-7090.7.patch

Patch v7, with more feedback from brock.  Also rebased with trunk.

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 22996: HIVE-7090 Support session-level temporary tables in Hive

2014-07-02 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22996/
---

(Updated July 2, 2014, 6:23 p.m.)


Review request for hive, Gunther Hagleitner, Navis Ryu, and Harish Butani.


Changes
---

Changes based on Brock's feedback.


Bugs: HIVE-7090
https://issues.apache.org/jira/browse/HIVE-7090


Repository: hive-git


Description
---

Temp tables managed in memory by SessionState.
SessionHiveMetaStoreClient overrides table-related methods in HiveMetaStore to 
access the temp tables saved in the SessionState when appropriate.


Diffs (updated)
-

  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniMr.java 
9fb7550 
  itests/qtest/testconfiguration.properties 1462ecd 
  metastore/if/hive_metastore.thrift cc802c6 
  metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 9e8d912 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java abc4290 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java d8d900b 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 4d35176 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 3df2690 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
6c9876d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f934ac4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
71471f4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 399f92a 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 2537b75 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableLikeDesc.java cb5d64c 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 2143d0c 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 43125f7 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager.java 98c3cc3 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestMacroSemanticAnalyzer.java 
91de8da 
  
ql/src/test/org/apache/hadoop/hive/ql/parse/authorization/TestHiveAuthorizationTaskFactory.java
 20d08b3 
  ql/src/test/queries/clientnegative/temp_table_authorize_create_tbl.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/temp_table_column_stats.q PRE-CREATION 
  ql/src/test/queries/clientnegative/temp_table_create_like_partitions.q 
PRE-CREATION 
  ql/src/test/queries/clientnegative/temp_table_index.q PRE-CREATION 
  ql/src/test/queries/clientnegative/temp_table_partitions.q PRE-CREATION 
  ql/src/test/queries/clientnegative/temp_table_rename.q PRE-CREATION 
  ql/src/test/queries/clientpositive/show_create_table_temp_table.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/stats19.q 51514bd 
  ql/src/test/queries/clientpositive/temp_table.q PRE-CREATION 
  ql/src/test/queries/clientpositive/temp_table_external.q PRE-CREATION 
  ql/src/test/queries/clientpositive/temp_table_gb1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/temp_table_join1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/temp_table_names.q PRE-CREATION 
  ql/src/test/queries/clientpositive/temp_table_options1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/temp_table_precedence.q PRE-CREATION 
  ql/src/test/queries/clientpositive/temp_table_subquery1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/temp_table_windowing_expressions.q 
PRE-CREATION 
  ql/src/test/results/clientnegative/temp_table_authorize_create_tbl.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/temp_table_column_stats.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/temp_table_create_like_partitions.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/temp_table_index.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/temp_table_partitions.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/temp_table_rename.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/nullformat.q.out d311825 
  ql/src/test/results/clientpositive/nullformatCTAS.q.out cab23d5 
  ql/src/test/results/clientpositive/show_create_table_alter.q.out 206f4f8 
  ql/src/test/results/clientpositive/show_create_table_db_table.q.out 528dd36 
  ql/src/test/results/clientpositive/show_create_table_delimited.q.out d4ffd53 
  ql/src/test/results/clientpositive/show_create_table_serde.q.out a9e92b4 
  ql/src/test/results/clientpositive/show_create_table_temp_table.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/temp_table.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/temp_table_external.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/temp_table_gb1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/temp_table_join1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/temp_table_names.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/temp_table_options1.q.out PRE-CREATION

[jira] [Created] (HIVE-7327) Refactoring: Make Hive map side data processing reusable

Xuefu Zhang created HIVE-7327:
-

 Summary: Refactoring: Make Hive map side data processing reusable
 Key: HIVE-7327
 URL: https://issues.apache.org/jira/browse/HIVE-7327
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


ExecMapper is Hive's mapper implementation for MapReduce. Table rows are read 
by MR framework and processed by ExecMapper.map() method, which invokes Hive's 
map-side operator tree starting from MapOperator. This task is to extract the 
map-side data processing offered by the operator tree so that it can be used by 
other execution engine such as Spark. This is purely refactoring the existing 
code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7327) Refactoring: make Hive map side data processing reusable


 [ 
https://issues.apache.org/jira/browse/HIVE-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7327:
--

Summary: Refactoring: make Hive map side data processing reusable  (was: 
Refactoring: Make Hive map side data processing reusable)

 Refactoring: make Hive map side data processing reusable
 

 Key: HIVE-7327
 URL: https://issues.apache.org/jira/browse/HIVE-7327
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang

 ExecMapper is Hive's mapper implementation for MapReduce. Table rows are read 
 by MR framework and processed by ExecMapper.map() method, which invokes 
 Hive's map-side operator tree starting from MapOperator. This task is to 
 extract the map-side data processing offered by the operator tree so that it 
 can be used by other execution engine such as Spark. This is purely 
 refactoring the existing code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7328) Refactoring: make Hive reduce side data processing reusable

Xuefu Zhang created HIVE-7328:
-

 Summary: Refactoring: make Hive reduce side data processing 
reusable
 Key: HIVE-7328
 URL: https://issues.apache.org/jira/browse/HIVE-7328
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


ExecReducer is Hive's reducer implementation for MapReduce. Table rows are 
shuffled by MR framework to ExecReducer and further processed by 
ExecReducer.reduce() method, which invokes Hive's reduce-side operator tree 
starting. This task is to extract the reduce-side data processing offered by 
the operator tree so that it can be reused by other execution engine such as 
Spark. This is purely refactoring the existing code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7329) Create SparkWork

Xuefu Zhang created HIVE-7329:
-

 Summary: Create SparkWork
 Key: HIVE-7329
 URL: https://issues.apache.org/jira/browse/HIVE-7329
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


This class encapsulates all the work objects that can be executed in a single 
Spark job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7330) Create SparkTask

Xuefu Zhang created HIVE-7330:
-

 Summary: Create SparkTask
 Key: HIVE-7330
 URL: https://issues.apache.org/jira/browse/HIVE-7330
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


SparkTask handles the execution of SparkWork. It will execute a graph of map 
and reduce work using a SparkClient instance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7029) Vectorize ReduceWork


[ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050551#comment-14050551
 ] 

Hive QA commented on HIVE-7029:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653602/HIVE-7029.3.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5656 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/658/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/658/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-658/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653602

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7331) Create SparkCompiler

Xuefu Zhang created HIVE-7331:
-

 Summary: Create SparkCompiler
 Key: HIVE-7331
 URL: https://issues.apache.org/jira/browse/HIVE-7331
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


SparkCompiler translates the operator plan into SparkWorks. It behaves a 
similar way as MapReduceCompiler for MR and TezCompiler for Tez.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7332) Create SparkClient, interface to Spark cluster

Xuefu Zhang created HIVE-7332:
-

 Summary: Create SparkClient, interface to Spark cluster
 Key: HIVE-7332
 URL: https://issues.apache.org/jira/browse/HIVE-7332
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


SparkClient is responsible for Spark job submission, monitoring, progress and 
error reporting, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7333) Create RDD translator, translating Hive Tables into Spark RDDs

Xuefu Zhang created HIVE-7333:
-

 Summary: Create RDD translator, translating Hive Tables into Spark 
RDDs
 Key: HIVE-7333
 URL: https://issues.apache.org/jira/browse/HIVE-7333
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


Please refer to the design specification.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7144) GC pressure during ORC StringDictionary writes

2014-07-02 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7144:
--

Attachment: HIVE-7144.3.patch

 GC pressure during ORC StringDictionary writes 
 ---

 Key: HIVE-7144
 URL: https://issues.apache.org/jira/browse/HIVE-7144
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0
 Environment: ORC Table ~ 12 string columns
Reporter: Gopal V
Assignee: Gopal V
  Labels: ORC, Performance
 Attachments: HIVE-7144.1.patch, HIVE-7144.2.patch, HIVE-7144.3.patch, 
 orc-string-write.png


 When ORC string dictionary writes data out, it suffers from bad GC 
 performance due to a few allocations in-loop.
 !orc-string-write.png!
 The conversions are as follows
 StringTreeWriter::getStringValue() causes 2 conversions
 LazyString - Text (LazyString::getWritableObject)
 Text - String (LazyStringObjectInspector::getPrimitiveJavaObject)
 Then StringRedBlackTree::add() does one conversion
 String - Text
 This causes some GC pressure with un-necessary String and byte[] array 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7334) Create SparkShuffler, shuffling data between map-side data processing and reduce-side processing

Xuefu Zhang created HIVE-7334:
-

 Summary: Create SparkShuffler, shuffling data between map-side 
data processing and reduce-side processing
 Key: HIVE-7334
 URL: https://issues.apache.org/jira/browse/HIVE-7334
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


Please refer to the design spec.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Selina Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050571#comment-14050571
 ] 

Selina Zhang commented on HIVE-7090:


Just wonder how transaction works with temp table if we plan to add transaction 
management on top of it, since the meta data stores in the client side. 

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7336) Create MapFunction

Xuefu Zhang created HIVE-7336:
-

 Summary: Create MapFunction
 Key: HIVE-7336
 URL: https://issues.apache.org/jira/browse/HIVE-7336
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


Wrap Hive's map-side data processing for Spark.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7338) Create SparkPlanGenerator

Xuefu Zhang created HIVE-7338:
-

 Summary: Create SparkPlanGenerator
 Key: HIVE-7338
 URL: https://issues.apache.org/jira/browse/HIVE-7338
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


Translate SparkWork into SparkPlan. The translation may be invoked by 
SparkClient when executing SparkTask.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7337) Create ReduceFunction

Xuefu Zhang created HIVE-7337:
-

 Summary: Create ReduceFunction
 Key: HIVE-7337
 URL: https://issues.apache.org/jira/browse/HIVE-7337
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


Wrap Hive's reduce-side data processing for Spark.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7335) Create SparkPlan, DAG representation of a Spark job

Xuefu Zhang created HIVE-7335:
-

 Summary: Create SparkPlan, DAG representation of a Spark job
 Key: HIVE-7335
 URL: https://issues.apache.org/jira/browse/HIVE-7335
 Project: Hive
  Issue Type: Sub-task
Reporter: Xuefu Zhang


Encapsulate RDD, MapFunction, ReduceFunction, and SparkShuffler in a graph 
representation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)

2014-07-02 Thread Adrian Lange (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050596#comment-14050596
 ] 

Adrian Lange commented on HIVE-6914:


I get the same error, no matter what file format I use for the source table. 
I've tried Textfile, Sequencefile, RCFile, and ORC.

 parquet-hive cannot write nested map (map value is map)
 ---

 Key: HIVE-6914
 URL: https://issues.apache.org/jira/browse/HIVE-6914
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.0
Reporter: Tongjie Chen

 // table schema (identical for both plain text version and parquet version)
 desc hive desc text_mmap;
 m map
 // sample nested map entry
 {level1:{level2_key1:value1,level2_key2:value2}}
 The following query will fail, 
 insert overwrite table parquet_mmap select * from text_mmap;
 Caused by: parquet.io.ParquetEncodingException: This should be an 
 ArrayWritable or MapWritable: 
 org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
 at 
 parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
 ... 9 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)

2014-07-02 Thread Adrian Lange (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050611#comment-14050611
 ] 

Adrian Lange commented on HIVE-6914:


Could it be that [HIVE-7073] is the origin for this bug?

 parquet-hive cannot write nested map (map value is map)
 ---

 Key: HIVE-6914
 URL: https://issues.apache.org/jira/browse/HIVE-6914
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.13.0
Reporter: Tongjie Chen

 // table schema (identical for both plain text version and parquet version)
 desc hive desc text_mmap;
 m map
 // sample nested map entry
 {level1:{level2_key1:value1,level2_key2:value2}}
 The following query will fail, 
 insert overwrite table parquet_mmap select * from text_mmap;
 Caused by: parquet.io.ParquetEncodingException: This should be an 
 ArrayWritable or MapWritable: 
 org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
 at 
 parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
 ... 9 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7098) RecordUpdater should extend RecordWriter

2014-07-02 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7098:
-

Status: Open  (was: Patch Available)

After looking more at this I'm not sure this is the best way forward.  I want 
to review the design further before committing this patch, so moving it from 
patch available to open.

 RecordUpdater should extend RecordWriter
 

 Key: HIVE-7098
 URL: https://issues.apache.org/jira/browse/HIVE-7098
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats, Transactions
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7098.patch


 A new interface ql.io.RecordUpdater was added as part of the ACID work in 
 0.13.  This interface should extend RecordWriter because:
 # If it does not significant portions of FileSinkOperator will have to be 
 reworked to handle both RecordWriter and RecordUpdater
 # Once a file format accepts transactions, it should not generally be 
 possible to write using RecordWriter.write as that will write old style 
 records without transaction information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7144) GC pressure during ORC StringDictionary writes

2014-07-02 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050635#comment-14050635
 ] 

Gopal V commented on HIVE-7144:
---

Re-run tests with trunk.

 GC pressure during ORC StringDictionary writes 
 ---

 Key: HIVE-7144
 URL: https://issues.apache.org/jira/browse/HIVE-7144
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0
 Environment: ORC Table ~ 12 string columns
Reporter: Gopal V
Assignee: Gopal V
  Labels: ORC, Performance
 Attachments: HIVE-7144.1.patch, HIVE-7144.2.patch, HIVE-7144.3.patch, 
 orc-string-write.png


 When ORC string dictionary writes data out, it suffers from bad GC 
 performance due to a few allocations in-loop.
 !orc-string-write.png!
 The conversions are as follows
 StringTreeWriter::getStringValue() causes 2 conversions
 LazyString - Text (LazyString::getWritableObject)
 Text - String (LazyStringObjectInspector::getPrimitiveJavaObject)
 Then StringRedBlackTree::add() does one conversion
 String - Text
 This causes some GC pressure with un-necessary String and byte[] array 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050646#comment-14050646
 ] 

Jason Dere commented on HIVE-7090:
--

Are you referring to the ACID work that was done with HIVE-5317? My impression 
was that it would not be applicable for temp tables, because the temp table is 
only visible to the current session.  2 different sessions writing to a temp 
table with the same name will be updating their own separate version of the 
table, and will be unaffected by any updates from the other session.

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7287) hive --rcfilecat command is broken on Windows

2014-07-02 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7287:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Deepesh!

 hive --rcfilecat command is broken on Windows
 -

 Key: HIVE-7287
 URL: https://issues.apache.org/jira/browse/HIVE-7287
 Project: Hive
  Issue Type: Bug
  Components: CLI, Windows
Affects Versions: 0.13.0
 Environment: Windows
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-7287.1.patch


 {noformat}
 c:\ hive --rcfilecat --file-sizes --column-sizes-pretty /tmp/all100krc
 Not a valid JAR: C:\org.apache.hadoop.hive.cli.RCFileCat
 {noformat}
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5351) Secure-Socket-Layer (SSL) support for HiveServer2

2014-07-02 Thread Suhas Satish (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050663#comment-14050663
 ] 

Suhas Satish commented on HIVE-5351:


I have used the 3 properties above and started my hiveserver2 which is now 
using ssl. 

But how do I connect to it from beeline client? There doesnt seem to be any 
information about it. 

I am trying to use something like this - 

!connect 
jdbc:hive2://127.0.0.1:1/default;ssl=true;sslTrustStore=/opt/mapr/conf/ssl_truststore;

but when it prompts for username and password, it fails to connect even after I 
enter the correct ssl_truststore password.

Enter username for 
jdbc:hive2://10.10.30.181:1/default;ssl=true;sslTrustStore=/opt/mapr/conf/ssl_truststore;sslTrustStorePassword=mapr123:
 mapr
Enter password for 
jdbc:hive2://10.10.30.181:1/default;ssl=true;sslTrustStore=/opt/mapr/conf/ssl_truststore;sslTrustStorePassword=mapr123:
 
Error: Invalid URL: 
jdbc:hive2://10.10.30.181:1/default;ssl=true;sslTrustStore=/opt/mapr/conf/ssl_truststore;sslTrustStorePassword=mapr123
 (state=08S01,code=0)

Is my jdbc connect string the right way to connect?



 Secure-Socket-Layer (SSL) support for HiveServer2
 -

 Key: HIVE-5351
 URL: https://issues.apache.org/jira/browse/HIVE-5351
 Project: Hive
  Issue Type: Improvement
  Components: Authorization, HiveServer2, JDBC
Affects Versions: 0.11.0, 0.12.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.13.0

 Attachments: HIVE-5301.test-binary-files.tar, HIVE-5351.3.patch, 
 HIVE-5351.5.patch


 HiveServer2 and JDBC driver should support encrypted communication using SSL



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7287) hive --rcfilecat command is broken on Windows


[ 
https://issues.apache.org/jira/browse/HIVE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050684#comment-14050684
 ] 

Deepesh Khandelwal commented on HIVE-7287:
--

Thanks Jason for the review and commit!

 hive --rcfilecat command is broken on Windows
 -

 Key: HIVE-7287
 URL: https://issues.apache.org/jira/browse/HIVE-7287
 Project: Hive
  Issue Type: Bug
  Components: CLI, Windows
Affects Versions: 0.13.0
 Environment: Windows
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-7287.1.patch


 {noformat}
 c:\ hive --rcfilecat --file-sizes --column-sizes-pretty /tmp/all100krc
 Not a valid JAR: C:\org.apache.hadoop.hive.cli.RCFileCat
 {noformat}
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5789) WebHCat E2E tests do not launch on Windows

2014-07-02 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5789:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. 
Thanks for the patch [~deepesh]. Thanks for the review [~ekoifman]

 WebHCat E2E tests do not launch on Windows
 --

 Key: HIVE-5789
 URL: https://issues.apache.org/jira/browse/HIVE-5789
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.12.0
 Environment: Windows
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-5789.patch


 There are some assumptions in the build.xml invoking the perl script for 
 running tests that makes them unsuitable for non-UNIX environments.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5789) WebHCat E2E tests do not launch on Windows


[ 
https://issues.apache.org/jira/browse/HIVE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050689#comment-14050689
 ] 

Deepesh Khandelwal commented on HIVE-5789:
--

Thanks Eugene and Thejas!

 WebHCat E2E tests do not launch on Windows
 --

 Key: HIVE-5789
 URL: https://issues.apache.org/jira/browse/HIVE-5789
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.12.0
 Environment: Windows
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-5789.patch


 There are some assumptions in the build.xml invoking the perl script for 
 running tests that makes them unsuitable for non-UNIX environments.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6468) HS2 out of memory error when curl sends a get request

2014-07-02 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050710#comment-14050710
 ] 

Ravi Prakash commented on HIVE-6468:


Thanks Navis!
2. IMHO We should not catch RuntimeExceptions. There's a danger that we might 
end up covering the real exception (the OOM). Otherwise what's the point of 
having an exception hierarchy?
3. Good point. Thanks
4. Same as 2. getAuthTransFactory() could throw all the necessary exceptions
{code}public TTransportFactory getAuthTransFactory() throws 
TTransportException, AuthenticationException, LoginException {code}

I'm afraid I don't know enough about writing Thrift servers to review that 
code. 
Thanks for the pointer. I'm happy to add the timeout there.

 HS2 out of memory error when curl sends a get request
 -

 Key: HIVE-6468
 URL: https://issues.apache.org/jira/browse/HIVE-6468
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
 Environment: Centos 6.3, hive 12, hadoop-2.2
Reporter: Abin Shahab
Assignee: Navis
 Attachments: HIVE-6468.1.patch.txt, HIVE-6468.2.patch.txt


 We see an out of memory error when we run simple beeline calls.
 (The hive.server2.transport.mode is binary)
 curl localhost:1
 Exception in thread pool-2-thread-8 java.lang.OutOfMemoryError: Java heap 
 space
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:181)
   at 
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
   at 
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
   at 
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
   at 
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7029) Vectorize ReduceWork


 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Status: In Progress  (was: Patch Available)

Fix minor q file diffs.

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7029) Vectorize ReduceWork


 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Status: Patch Available  (was: In Progress)

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7029) Vectorize ReduceWork


 [ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7029:
---

Attachment: HIVE-7029.4.patch

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7294) sql std auth - authorize show grant statements

2014-07-02 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050720#comment-14050720
 ] 

Ashutosh Chauhan commented on HIVE-7294:


+1

 sql std auth - authorize show grant statements
 --

 Key: HIVE-7294
 URL: https://issues.apache.org/jira/browse/HIVE-7294
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7294.1.patch, HIVE-7294.2.patch


 A non admin user should not be allowed to run show grant commands only for 
 themselves or a role they belong to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7339) hive --orcfiledump command is not supported on Windows

Deepesh Khandelwal created HIVE-7339:


 Summary: hive --orcfiledump command is not supported on Windows
 Key: HIVE-7339
 URL: https://issues.apache.org/jira/browse/HIVE-7339
 Project: Hive
  Issue Type: Bug
  Components: CLI, Windows
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal


On Linux orcfiledump utility can be run using
{noformat}
hive --orcfiledump path_to_orc_file
hive --service orcfiledump path_to_orc_file
{noformat}
Hive CLI utility on windows doesn't support the option.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Selina Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050726#comment-14050726
 ] 

Selina Zhang commented on HIVE-7090:


I mean if we want to support ROLLBACK/COMMIT in next release, how do we 
rollback the changes for temp table. It seems server does not have a clue where 
the data location is. 

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7339) hive --orcfiledump command is not supported on Windows


 [ 
https://issues.apache.org/jira/browse/HIVE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-7339:
-

Attachment: HIVE-7339.1.patch

Attaching a patch that provides support on Windows.

 hive --orcfiledump command is not supported on Windows
 --

 Key: HIVE-7339
 URL: https://issues.apache.org/jira/browse/HIVE-7339
 Project: Hive
  Issue Type: Bug
  Components: CLI, Windows
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Attachments: HIVE-7339.1.patch


 On Linux orcfiledump utility can be run using
 {noformat}
 hive --orcfiledump path_to_orc_file
 hive --service orcfiledump path_to_orc_file
 {noformat}
 Hive CLI utility on windows doesn't support the option.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7339) hive --orcfiledump command is not supported on Windows


 [ 
https://issues.apache.org/jira/browse/HIVE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-7339:
-

Description: 
On Linux orcfiledump utility can be run using
{noformat}
hive --orcfiledump path_to_orc_file
hive --service orcfiledump path_to_orc_file
{noformat}
Hive CLI utility on windows doesn't support the option.

NO PRECOMMIT TESTS

  was:
On Linux orcfiledump utility can be run using
{noformat}
hive --orcfiledump path_to_orc_file
hive --service orcfiledump path_to_orc_file
{noformat}
Hive CLI utility on windows doesn't support the option.


 hive --orcfiledump command is not supported on Windows
 --

 Key: HIVE-7339
 URL: https://issues.apache.org/jira/browse/HIVE-7339
 Project: Hive
  Issue Type: Bug
  Components: CLI, Windows
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Attachments: HIVE-7339.1.patch


 On Linux orcfiledump utility can be run using
 {noformat}
 hive --orcfiledump path_to_orc_file
 hive --service orcfiledump path_to_orc_file
 {noformat}
 Hive CLI utility on windows doesn't support the option.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7339) hive --orcfiledump command is not supported on Windows


 [ 
https://issues.apache.org/jira/browse/HIVE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-7339:
-

Status: Patch Available  (was: Open)

 hive --orcfiledump command is not supported on Windows
 --

 Key: HIVE-7339
 URL: https://issues.apache.org/jira/browse/HIVE-7339
 Project: Hive
  Issue Type: Bug
  Components: CLI, Windows
Affects Versions: 0.13.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Attachments: HIVE-7339.1.patch


 On Linux orcfiledump utility can be run using
 {noformat}
 hive --orcfiledump path_to_orc_file
 hive --service orcfiledump path_to_orc_file
 {noformat}
 Hive CLI utility on windows doesn't support the option.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050737#comment-14050737
 ] 

Hive QA commented on HIVE-7090:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653646/HIVE-7090.7.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5673 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/659/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/659/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-659/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653646

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050744#comment-14050744
 ] 

Alan Gates commented on HIVE-7090:
--

Rolling back doesn't actually change any data.  What is does is mark a 
transaction id as aborted.  Then readers know to ignore records from that 
transaction id.  So consider the following scenario:

begin session
begin transaction 1 
write to temp table
commit
begin transaction 2
write more to temp table
rollback
read temp table

The read will know to disregard all records marked with transaction id 2 (this 
holds whether the table is temporary or not) and thus will only return records 
from the first write.


 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7231) Improve ORC padding

2014-07-02 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7231:
--

Attachment: HIVE-7231.7.patch

Rebase to trunk and update size correction code to reset all corrections to 
stripe sizes made when crossing a block boundary even for the unpadded insert 
case.

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch, HIVE-7231.7.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7340) Beeline fails to read a query with comments correctly.

Ashish Kumar Singh created HIVE-7340:


 Summary: Beeline fails to read a query with comments correctly. 
 Key: HIVE-7340
 URL: https://issues.apache.org/jira/browse/HIVE-7340
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh


Comment in the beginning of line works:
0: jdbc:hive2://localhost:1 select 
. . . . . . . . . . . . . . . . -- comment
. . . . . . . . . . . . . . . . * from store
. . . . . . . . . . . . . . . . limit 1;

but, having comments not in the beginning ignores rest of the query. So, limit 
1 is ignored here.
0: jdbc:hive2://localhost:1 select 
. . . . . . . . . . . . . . . . * from store -- comment
. . . . . . . . . . . . . . . . limit 1;

However, this is fine with Hive CLI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6694) Beeline should provide a way to execute shell command as Hive CLI does


 [ 
https://issues.apache.org/jira/browse/HIVE-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6694:
--

Attachment: HIVE-6694.5.patch

Patch #5 incorporated comments from both Brock and Navis above.

 Beeline should provide a way to execute shell command as Hive CLI does
 --

 Key: HIVE-6694
 URL: https://issues.apache.org/jira/browse/HIVE-6694
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.11.0, 0.12.0, 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.14.0

 Attachments: HIVE-6694.1.patch, HIVE-6694.1.patch, HIVE-6694.2.patch, 
 HIVE-6694.3.patch, HIVE-6694.4.patch, HIVE-6694.5.patch, HIVE-6694.patch


 Hive CLI allows a user to execute a shell command using ! notation. For 
 instance, !cat myfile.txt. Being able to execute shell command may be 
 important for some users. As a replacement, however, Beeline provides no such 
 capability, possibly because ! notation is reserved for SQLLine commands. 
 It's possible to provide this using a slightly syntactic variation such as 
 !sh cat myfilie.txt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7289) revert HIVE-6469


[ 
https://issues.apache.org/jira/browse/HIVE-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050816#comment-14050816
 ] 

Xuefu Zhang commented on HIVE-7289:
---

Patch committed to trunk. Thanks Jayesh.

 revert HIVE-6469
 

 Key: HIVE-7289
 URL: https://issues.apache.org/jira/browse/HIVE-7289
 Project: Hive
  Issue Type: Task
  Components: CLI
Affects Versions: 0.14.0
Reporter: Jayesh
Assignee: Jayesh
 Attachments: HIVE-7289.patch


 this task is to revert HIVE-6469



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7144) GC pressure during ORC StringDictionary writes


[ 
https://issues.apache.org/jira/browse/HIVE-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050814#comment-14050814
 ] 

Hive QA commented on HIVE-7144:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653660/HIVE-7144.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5672 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/660/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/660/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-660/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653660

 GC pressure during ORC StringDictionary writes 
 ---

 Key: HIVE-7144
 URL: https://issues.apache.org/jira/browse/HIVE-7144
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0
 Environment: ORC Table ~ 12 string columns
Reporter: Gopal V
Assignee: Gopal V
  Labels: ORC, Performance
 Attachments: HIVE-7144.1.patch, HIVE-7144.2.patch, HIVE-7144.3.patch, 
 orc-string-write.png


 When ORC string dictionary writes data out, it suffers from bad GC 
 performance due to a few allocations in-loop.
 !orc-string-write.png!
 The conversions are as follows
 StringTreeWriter::getStringValue() causes 2 conversions
 LazyString - Text (LazyString::getWritableObject)
 Text - String (LazyStringObjectInspector::getPrimitiveJavaObject)
 Then StringRedBlackTree::add() does one conversion
 String - Text
 This causes some GC pressure with un-necessary String and byte[] array 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7289) revert HIVE-6469


 [ 
https://issues.apache.org/jira/browse/HIVE-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7289:
--

 Tags: TODOC14
   Resolution: Fixed
Fix Version/s: 0.14.0
 Release Note: Remove documentation, if any, for HIVE-6469, as its change 
has been reverted.
   Status: Resolved  (was: Patch Available)

 revert HIVE-6469
 

 Key: HIVE-7289
 URL: https://issues.apache.org/jira/browse/HIVE-7289
 Project: Hive
  Issue Type: Task
  Components: CLI
Affects Versions: 0.14.0
Reporter: Jayesh
Assignee: Jayesh
 Fix For: 0.14.0

 Attachments: HIVE-7289.patch


 this task is to revert HIVE-6469



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7341) Support for Table replication across HCatalog instances

2014-07-02 Thread Mithun Radhakrishnan (JIRA)

Mithun Radhakrishnan created HIVE-7341:
--

 Summary: Support for Table replication across HCatalog instances
 Key: HIVE-7341
 URL: https://issues.apache.org/jira/browse/HIVE-7341
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 0.14.0


The HCatClient currently doesn't provide very much support for replicating 
HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
instances. 

Systems similar to Apache Falcon might find the need to replicate partition 
data between 2 clusters, and keep the HCatalog metadata in sync between the 
two. This poses a couple of problems:

# The definition of the source table might change (in column schema, I/O 
formats, record-formats, serde-parameters, etc.) The system will need a way to 
diff 2 tables and update the target-metastore with the changes. E.g. 
{code}
targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
hcatClient.updateTableSchema(dbName, tableName, targetTable);
{code}
# The current {HCatClient.addPartitions()} API requires that the partition's 
schema be derived from the table's schema, thereby requiring that the 
table-schema be resolved *before* partitions with the new schema are added to 
the table. This is problematic, because it introduces race conditions when 2 
partitions with differing column-schemas (e.g. right after a schema change) are 
copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track 
of the partition's schema, in flight.
# The source and target metastores might be running different/incompatible 
versions of Hive. 

The impending patch attempts to address these concerns (with some caveats).

# {{HCatTable}} now has 
## a {{diff()}} method, to compare against another HCatTable instance
## a {{resolve(diff)}} method to copy over specified table-attributes from 
another HCatTable
## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
{{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in 
other class-loaders may be used for comparison
# {{HCatPartition}} now provides finer-grained control over a Partition's 
column-schema, StorageDescriptor settings, etc. This allows partitions to be 
copied completely from source, with the ability to override specific properties 
if required (e.g. location).
# {{HCatClient.updateTableSchema()}} can now update the entire 
table-definition, not just the column schema.
# I've cleaned up and removed most of the redundancy between the HCatTable, 
HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
separate the table-attributes from the add-table-operation's attributes. By 
providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
HCatAddPartitionDesc.

I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2014-07-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7341:
---

Attachment: HIVE-7341.1.patch

The tentative first version of the fix.

 Support for Table replication across HCatalog instances
 ---

 Key: HIVE-7341
 URL: https://issues.apache.org/jira/browse/HIVE-7341
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 0.14.0

 Attachments: HIVE-7341.1.patch


 The HCatClient currently doesn't provide very much support for replicating 
 HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
 instances. 
 Systems similar to Apache Falcon might find the need to replicate partition 
 data between 2 clusters, and keep the HCatalog metadata in sync between the 
 two. This poses a couple of problems:
 # The definition of the source table might change (in column schema, I/O 
 formats, record-formats, serde-parameters, etc.) The system will need a way 
 to diff 2 tables and update the target-metastore with the changes. E.g. 
 {code}
 targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
 hcatClient.updateTableSchema(dbName, tableName, targetTable);
 {code}
 # The current {HCatClient.addPartitions()} API requires that the partition's 
 schema be derived from the table's schema, thereby requiring that the 
 table-schema be resolved *before* partitions with the new schema are added to 
 the table. This is problematic, because it introduces race conditions when 2 
 partitions with differing column-schemas (e.g. right after a schema change) 
 are copied in parallel. This can be avoided if each HCatAddPartitionDesc kept 
 track of the partition's schema, in flight.
 # The source and target metastores might be running different/incompatible 
 versions of Hive. 
 The impending patch attempts to address these concerns (with some caveats).
 # {{HCatTable}} now has 
 ## a {{diff()}} method, to compare against another HCatTable instance
 ## a {{resolve(diff)}} method to copy over specified table-attributes from 
 another HCatTable
 ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
 {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
 in other class-loaders may be used for comparison
 # {{HCatPartition}} now provides finer-grained control over a Partition's 
 column-schema, StorageDescriptor settings, etc. This allows partitions to be 
 copied completely from source, with the ability to override specific 
 properties if required (e.g. location).
 # {{HCatClient.updateTableSchema()}} can now update the entire 
 table-definition, not just the column schema.
 # I've cleaned up and removed most of the redundancy between the HCatTable, 
 HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
 separate the table-attributes from the add-table-operation's attributes. By 
 providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
 in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
 deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
 HCatAddPartitionDesc.
 I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2014-07-02 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7341:
---

Description: 
The HCatClient currently doesn't provide very much support for replicating 
HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
instances. 

Systems similar to Apache Falcon might find the need to replicate partition 
data between 2 clusters, and keep the HCatalog metadata in sync between the 
two. This poses a couple of problems:

# The definition of the source table might change (in column schema, I/O 
formats, record-formats, serde-parameters, etc.) The system will need a way to 
diff 2 tables and update the target-metastore with the changes. E.g. 
{code}
targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
hcatClient.updateTableSchema(dbName, tableName, targetTable);
{code}
# The current {{HCatClient.addPartitions()}} API requires that the partition's 
schema be derived from the table's schema, thereby requiring that the 
table-schema be resolved *before* partitions with the new schema are added to 
the table. This is problematic, because it introduces race conditions when 2 
partitions with differing column-schemas (e.g. right after a schema change) are 
copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track 
of the partition's schema, in flight.
# The source and target metastores might be running different/incompatible 
versions of Hive. 

The impending patch attempts to address these concerns (with some caveats).

# {{HCatTable}} now has 
## a {{diff()}} method, to compare against another HCatTable instance
## a {{resolve(diff)}} method to copy over specified table-attributes from 
another HCatTable
## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
{{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in 
other class-loaders may be used for comparison
# {{HCatPartition}} now provides finer-grained control over a Partition's 
column-schema, StorageDescriptor settings, etc. This allows partitions to be 
copied completely from source, with the ability to override specific properties 
if required (e.g. location).
# {{HCatClient.updateTableSchema()}} can now update the entire 
table-definition, not just the column schema.
# I've cleaned up and removed most of the redundancy between the HCatTable, 
HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
separate the table-attributes from the add-table-operation's attributes. By 
providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
HCatAddPartitionDesc.

I'll post a patch for trunk shortly.

  was:
The HCatClient currently doesn't provide very much support for replicating 
HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
instances. 

Systems similar to Apache Falcon might find the need to replicate partition 
data between 2 clusters, and keep the HCatalog metadata in sync between the 
two. This poses a couple of problems:

# The definition of the source table might change (in column schema, I/O 
formats, record-formats, serde-parameters, etc.) The system will need a way to 
diff 2 tables and update the target-metastore with the changes. E.g. 
{code}
targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
hcatClient.updateTableSchema(dbName, tableName, targetTable);
{code}
# The current {HCatClient.addPartitions()} API requires that the partition's 
schema be derived from the table's schema, thereby requiring that the 
table-schema be resolved *before* partitions with the new schema are added to 
the table. This is problematic, because it introduces race conditions when 2 
partitions with differing column-schemas (e.g. right after a schema change) are 
copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track 
of the partition's schema, in flight.
# The source and target metastores might be running different/incompatible 
versions of Hive. 

The impending patch attempts to address these concerns (with some caveats).

# {{HCatTable}} now has 
## a {{diff()}} method, to compare against another HCatTable instance
## a {{resolve(diff)}} method to copy over specified table-attributes from 
another HCatTable
## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
{{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in 
other class-loaders may be used for comparison
# {{HCatPartition}} now provides finer-grained control over a Partition's 
column-schema, StorageDescriptor settings, etc. This allows partitions to be 
copied completely from source, with the ability to override specific properties 
if required (e.g. location).
#

[jira] [Commented] (HIVE-5020) HCat reading null-key map entries causes NPE

2014-07-02 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050866#comment-14050866
]

Daniel Dai commented on HIVE-5020:
--

I find the following comments in LazyMap.java:
// LazyMap stores a map of Primitive LazyObjects to LazyObjects. Note that the
// keys of the map cannot contain null.

This could be the reason when I try to load null map key from RC file, I end up
with an infinite loop.

To be safe, it seems we shall disallow null map key. Even if we fix LazyMap,
there could be other places we made the same assumption.

HCat reading null-key map entries causes NPE

Key: HIVE-5020
URL: https://issues.apache.org/jira/browse/HIVE-5020
Project: Hive
Issue Type: Bug
Components: HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan

Currently, if someone has a null key in a map, HCatInputFormat will terminate
with an NPE while trying to read it.
{noformat}
java.lang.NullPointerException
at java.lang.String.compareTo(String.java:1167)
at java.lang.String.compareTo(String.java:92)
at java.util.TreeMap.put(TreeMap.java:545)
at
org.apache.hcatalog.data.HCatRecordSerDe.serializeMap(HCatRecordSerDe.java:222)
at
org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:198)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
at
org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
{noformat}
This is because we use a TreeMap to preserve order of elements in the map
when reading from the underlying storage/serde.
This problem is easily fixed in a number of ways:
a) Switch to HashMap, which allows null keys. That does not preserve order of
keys, which should not be important for map fields, but if we desire that, we
have a solution for that too - LinkedHashMap, which would both retain order
and allow us to insert null keys into the map.
b) Ignore null keyed entries - check if the field we read is null, and if it
is, then ignore that item in the record altogether. This way, HCat is robust
in what it does - it does not terminate with an NPE, and it does not allow
null keys in maps that might be problematic to layers above us that are not
used to seeing nulls as keys in maps.
Why do I bring up the second fix? First, I bring it up because of the way we
discovered this bug. When reading from an RCFile, we do not notice this bug.
If the same query that produced the RCFile instead produces an Orcfile, and
we try reading from it, we see this problem.
RCFile seems to be quietly stripping any null key entries, whereas Orc
retains them. This is why we didn't notice this problem for a long while, and
suddenly, now, we are. Now, if we fix our code to allow nulls in map keys
through to layers above, we expose layers above to this change, which may
then cause them to break. (Technically, this is stretching the case because
we already break now if they care) More importantly, though, we have a case
now, where the same data will be exposed differently if it were stored as orc
or if it were stored as rcfile. And as a layer that is supposed to make
storage invisible to the end user, HCat should attempt to provide some
consistency in how data behaves to the end user.
Secondly, whether or not nulls should be supported as keys in Maps seems to
be almost a religious view. Some people see it from a perspective of a
mapping, which lends itself to a Sure, if we encounter a null, we map to
this other value kind of a view, whereas other people view it from a lookup
index kind of view, which lends itself to a null as a key makes no sense -
What kind of lookup do you expect to perform? kind of view. Both views have
their points, and it makes sense to see if we need to support it.
That said...
There is another important concern at hand here: nulls in map keys might be
due to bad data(corruption or loading error), and by stripping them, we might
be silently hiding that from the user. So silent stripping is bad. This is
an important point that does steer me towards the former approach, of passing
it on to layers above, and standardize on an understanding that null keys in
maps are acceptable data that layers above us have to handle. After that, it
could be taken on as a further consistency fix, to fix RCFile so that it
allows nulls in map keys.
Having gone through this discussion of standardization, another important
question is whether or not there is actually a use-case for null keys in maps
in data. If there isn't, maybe we shouldn't allow writing that in the first
place, and both orc and rcfile must simply error out to the end user if they

[jira] [Commented] (HIVE-7282) HCatLoader fail to load Orc map with null key

2014-07-02 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050870#comment-14050870
 ] 

Daniel Dai commented on HIVE-7282:
--

When I digging more, I feel disallow null map key is more proper. Reasons are:
1. This can solve the semantic difference between orc and rcfile
2. Allow null map key seems risky, it will break assumption of some other code, 
eg, LazyMap

 HCatLoader fail to load Orc map with null key
 -

 Key: HIVE-7282
 URL: https://issues.apache.org/jira/browse/HIVE-7282
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: HIVE-7282-1.patch, HIVE-7282-2.patch


 Here is the stack:
 Get exception:
 AttemptID:attempt_1403634189382_0011_m_00_0 Info:Error: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToPigMap(PigHCatUtil.java:469)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:404)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:456)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:374)
 at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 ... 13 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7282) HCatLoader fail to load Orc map with null key

2014-07-02 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050878#comment-14050878
 ] 

Eugene Koifman commented on HIVE-7282:
--

I agree that null key in a map is a bad idea.  Since we still have to deal with 
data which already has been written with null key, could we add some table 
property that will let user say if data contains a map with null key, replace 
null with 'my_value' on read.  (Perhaps the same property can be used to 
change a null key to 'my_value' on write to support existing writers, but this 
of course won't work for all cases.)  This way null key can be disallowed.

 HCatLoader fail to load Orc map with null key
 -

 Key: HIVE-7282
 URL: https://issues.apache.org/jira/browse/HIVE-7282
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: HIVE-7282-1.patch, HIVE-7282-2.patch


 Here is the stack:
 Get exception:
 AttemptID:attempt_1403634189382_0011_m_00_0 Info:Error: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
 converting read value to tuple
 at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
 at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:58)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
 at 
 org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at 
 org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToPigMap(PigHCatUtil.java:469)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.extractPigObject(PigHCatUtil.java:404)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:456)
 at 
 org.apache.hive.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:374)
 at org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:64)
 ... 13 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7090) Support session-level temporary tables in Hive

2014-07-02 Thread Selina Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050903#comment-14050903
 ] 

Selina Zhang commented on HIVE-7090:


Thanks for the explanation! It will work. Somehow I got a wrong impression ACID 
needs store addition information along with data location. 

 Support session-level temporary tables in Hive
 --

 Key: HIVE-7090
 URL: https://issues.apache.org/jira/browse/HIVE-7090
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: Gunther Hagleitner
Assignee: Jason Dere
 Attachments: HIVE-7090.1.patch, HIVE-7090.2.patch, HIVE-7090.3.patch, 
 HIVE-7090.4.patch, HIVE-7090.5.patch, HIVE-7090.6.patch, HIVE-7090.7.patch


 It's common to see sql scripts that create some temporary table as an 
 intermediate result, run some additional queries against it and then clean up 
 at the end.
 We should support temporary tables properly, meaning automatically manage the 
 life cycle and make sure the visibility is restricted to the creating 
 connection/session. Without these it's common to see left over tables in 
 meta-store or weird errors with clashing tmp table names.
 Proposed syntax:
 CREATE TEMPORARY TABLE 
 CTAS, CTL, INSERT INTO, should all be supported as usual.
 Knowing that a user wants a temp table can enable us to further optimize 
 access to it. E.g.: temp tables should be kept in memory where possible, 
 compactions and merging table files aren't required, ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7342) support hiveserver2,metastore specific config files

2014-07-02 Thread Thejas M Nair (JIRA)

Thejas M Nair created HIVE-7342:
---

 Summary: support hiveserver2,metastore specific config files
 Key: HIVE-7342
 URL: https://issues.apache.org/jira/browse/HIVE-7342
 Project: Hive
  Issue Type: Bug
  Components: Configuration, HiveServer2, Metastore
Reporter: Thejas M Nair
Assignee: Thejas M Nair


There is currently a single configuration file for all components in hive. ie, 
components such as hive cli, hiveserver2 and metastore all read from the same 
hive-site.xml. 
It will be useful to have a server specific hive-site.xml, so that you can have 
some different configuration value set for a server. For example, you might 
want to enabled authorization checks for hiveserver2, while disabling the 
checks for hive cli. The workaround today is to add any component specific 
configuration as a commandline (-hiveconf) argument.

Using server specific config files (eg hiveserver2-site.xml, 
metastore-site.xml) that override the entries in hive-site.xml will make the 
configuration much more easy to manage.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 23253: HIVE-7340: Beeline fails to read a query with comments correctly

2014-07-02 Thread Ashish Singh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23253/
---

Review request for hive.


Repository: hive-git


Description
---

HIVE-7340: Beeline fails to read a query with comments correctly


Diffs
-

  beeline/src/java/org/apache/hive/beeline/Commands.java 
88a94d76a3750dcde31ff47913bf28b827b3b212 
  
itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 
140c1bccedb9ef3c81e89026db44ce4b59150ef4 

Diff: https://reviews.apache.org/r/23253/diff/


Testing
---

Added unit tests.


Thanks,

Ashish Singh

[jira] [Assigned] (HIVE-7312) CBO throws ArrayIndexOutOfBounds


 [ 
https://issues.apache.org/jira/browse/HIVE-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-7312:


Assignee: Laljo John Pullokkaran

 CBO throws ArrayIndexOutOfBounds
 

 Key: HIVE-7312
 URL: https://issues.apache.org/jira/browse/HIVE-7312
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Laljo John Pullokkaran

 Running tpcds query 17. Still confirming if col stats are available.
 When I turn CBO on (this is just the relevant snipped, the actual exception 
 is pages long):
 Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 0
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.RelOptHiveTable.getColStat(RelOptHiveTable.java:97)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.reloperators.HiveTableScanRel.getColStat(HiveTableScanRel.java:73)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:47)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:36)
   ... 272 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7340) Beeline fails to read a query with comments correctly.


 [ 
https://issues.apache.org/jira/browse/HIVE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-7340:
-

Attachment: HIVE-7340.patch

RB: https://reviews.apache.org/r/23253/

 Beeline fails to read a query with comments correctly. 
 ---

 Key: HIVE-7340
 URL: https://issues.apache.org/jira/browse/HIVE-7340
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7340.patch


 Comment in the beginning of line works:
 0: jdbc:hive2://localhost:1 select 
 . . . . . . . . . . . . . . . . -- comment
 . . . . . . . . . . . . . . . . * from store
 . . . . . . . . . . . . . . . . limit 1;
 but, having comments not in the beginning ignores rest of the query. So, 
 limit 1 is ignored here.
 0: jdbc:hive2://localhost:1 select 
 . . . . . . . . . . . . . . . . * from store -- comment
 . . . . . . . . . . . . . . . . limit 1;
 However, this is fine with Hive CLI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7029) Vectorize ReduceWork


[ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050950#comment-14050950
 ] 

Hive QA commented on HIVE-7029:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653693/HIVE-7029.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5656 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/662/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/662/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-662/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653693

 Vectorize ReduceWork
 

 Key: HIVE-7029
 URL: https://issues.apache.org/jira/browse/HIVE-7029
 Project: Hive
  Issue Type: Sub-task
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
 HIVE-7029.4.patch


 This will enable vectorization team to independently work on vectorization on 
 reduce side even before vectorized shuffle is ready.
 NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7312) CBO throws ArrayIndexOutOfBounds


 [ 
https://issues.apache.org/jira/browse/HIVE-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7312:
-

Status: Patch Available  (was: Open)

 CBO throws ArrayIndexOutOfBounds
 

 Key: HIVE-7312
 URL: https://issues.apache.org/jira/browse/HIVE-7312
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7312.patch


 Running tpcds query 17. Still confirming if col stats are available.
 When I turn CBO on (this is just the relevant snipped, the actual exception 
 is pages long):
 Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 0
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.RelOptHiveTable.getColStat(RelOptHiveTable.java:97)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.reloperators.HiveTableScanRel.getColStat(HiveTableScanRel.java:73)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:47)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:36)
   ... 272 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7312) CBO throws ArrayIndexOutOfBounds


 [ 
https://issues.apache.org/jira/browse/HIVE-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7312:
-

Attachment: HIVE-7312.patch

 CBO throws ArrayIndexOutOfBounds
 

 Key: HIVE-7312
 URL: https://issues.apache.org/jira/browse/HIVE-7312
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7312.patch


 Running tpcds query 17. Still confirming if col stats are available.
 When I turn CBO on (this is just the relevant snipped, the actual exception 
 is pages long):
 Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 0
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.RelOptHiveTable.getColStat(RelOptHiveTable.java:97)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.reloperators.HiveTableScanRel.getColStat(HiveTableScanRel.java:73)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:47)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:36)
   ... 272 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7280) CBO V1


 [ 
https://issues.apache.org/jira/browse/HIVE-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7280:
-

Summary: CBO V1  (was: Commit CBO code from github repo to CBO branch)

 CBO V1
 --

 Key: HIVE-7280
 URL: https://issues.apache.org/jira/browse/HIVE-7280
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7280.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7343) Update committer list


 [ 
https://issues.apache.org/jira/browse/HIVE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7343:


Description: NO PRECOMMIT TESTS

 Update committer list
 -

 Key: HIVE-7343
 URL: https://issues.apache.org/jira/browse/HIVE-7343
 Project: Hive
  Issue Type: Test
  Components: Documentation
Reporter: Szehon Ho
Assignee: Szehon Ho

 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7343) Update committer list


 [ 
https://issues.apache.org/jira/browse/HIVE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7343:


Attachment: HIVE-7343.patch

 Update committer list
 -

 Key: HIVE-7343
 URL: https://issues.apache.org/jira/browse/HIVE-7343
 Project: Hive
  Issue Type: Test
  Components: Documentation
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7343.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7343) Update committer list


[ 
https://issues.apache.org/jira/browse/HIVE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050980#comment-14050980
 ] 

Szehon Ho commented on HIVE-7343:
-

Adding myself and [~gopalv] , hope it is right.

 Update committer list
 -

 Key: HIVE-7343
 URL: https://issues.apache.org/jira/browse/HIVE-7343
 Project: Hive
  Issue Type: Test
  Components: Documentation
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7343.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7343) Update committer list


 [ 
https://issues.apache.org/jira/browse/HIVE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7343:


Status: Patch Available  (was: Open)

 Update committer list
 -

 Key: HIVE-7343
 URL: https://issues.apache.org/jira/browse/HIVE-7343
 Project: Hive
  Issue Type: Test
  Components: Documentation
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-7343.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7344) Add streaming support in Windowing mode for FirstVal, LastVal

Harish Butani created HIVE-7344:
---

 Summary: Add streaming support in Windowing mode for FirstVal, 
LastVal
 Key: HIVE-7344
 URL: https://issues.apache.org/jira/browse/HIVE-7344
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani


Continuation of HIVE-7062, HIVE-7143



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7344) Add streaming support in Windowing mode for FirstVal, LastVal


[ 
https://issues.apache.org/jira/browse/HIVE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050987#comment-14050987
 ] 

Harish Butani commented on HIVE-7344:
-

[~ashutoshc] can you please review. This is done in a manner very similar to 
Min/Max.

 Add streaming support in Windowing mode for FirstVal, LastVal
 -

 Key: HIVE-7344
 URL: https://issues.apache.org/jira/browse/HIVE-7344
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7344.1.patch


 Continuation of HIVE-7062, HIVE-7143



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7344) Add streaming support in Windowing mode for FirstVal, LastVal


 [ 
https://issues.apache.org/jira/browse/HIVE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7344:


Status: Patch Available  (was: Open)

 Add streaming support in Windowing mode for FirstVal, LastVal
 -

 Key: HIVE-7344
 URL: https://issues.apache.org/jira/browse/HIVE-7344
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7344.1.patch


 Continuation of HIVE-7062, HIVE-7143



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7344) Add streaming support in Windowing mode for FirstVal, LastVal


 [ 
https://issues.apache.org/jira/browse/HIVE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-7344:


Attachment: HIVE-7344.1.patch

 Add streaming support in Windowing mode for FirstVal, LastVal
 -

 Key: HIVE-7344
 URL: https://issues.apache.org/jira/browse/HIVE-7344
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7344.1.patch


 Continuation of HIVE-7062, HIVE-7143



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7275) optimize these functions for windowing function.


[ 
https://issues.apache.org/jira/browse/HIVE-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050988#comment-14050988
 ] 

Harish Butani commented on HIVE-7275:
-

Opened HIVE-7344 to handle FirstVal, LastVal streaming.

 optimize these functions for windowing function.
 

 Key: HIVE-7275
 URL: https://issues.apache.org/jira/browse/HIVE-7275
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0, 0.13.0
 Environment: Hadoop 2.4.0, Hive 13.0
Reporter: Kiet Ly

 Please apply the window streaming optimization from issue HIVE-7143/7062 to 
 these functions if they are applicable.
 row_number 
 count 
 rank 
 dense_rank  
 nvl 
 rank 
 dense_rank  
 nvl  
 cast  
 decode  
 median  
 stddev  
 coalesce  
 floor  
 sign  
 abs  
 ltrim  
 substring  
 to_char 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7247) Fix itests using hadoop-1 profile


 [ 
https://issues.apache.org/jira/browse/HIVE-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7247:


   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Brock for the review!

 Fix itests using hadoop-1 profile 
 --

 Key: HIVE-7247
 URL: https://issues.apache.org/jira/browse/HIVE-7247
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Fix For: 0.14.0

 Attachments: HIVE-7247.patch


 Currently building itests using -Phadoop-1 profile results in following 
 failure:
 {noformat}
 $cd itests
 $mvn install -DskipTests -Phadoop-1
 ...
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
 (default-testCompile) on project hive-it-unit: Compilation failure: 
 Compilation failure:
 [ERROR] 
 /Users/ghagleitner/Projects/hive-test-trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/FolderPermissionBase.java:[31,39]
  cannot find symbol
 [ERROR] symbol : class AclStatus
 [ERROR] location: package org.apache.hadoop.fs.permission
 [ERROR] 
 /Users/ghagleitner/Projects/hive-test-trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestExtendedAcls.java:[20,46]
  cannot find symbol
 [ERROR] symbol : class AclEntryScope
 [ERROR] location: package org.apache.hadoop.fs.permission
 [ERROR] 
 /Users/ghagleitner/Projects/hive-test-trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestExtendedAcls.java:[20,1]
  static import only from classes and interfaces
 [ERROR] 
 /Users/ghagleitner/Projects/hive-test-trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestExtendedAcls.java:[21,46]
  cannot find symbol
 [ERROR] symbol : class AclEntryType
 [ERROR] location: package org.apache.hadoop.fs.permission
 [ERROR] 
 /Users/ghagleitner/Projects/hive-test-trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestExtendedAcls.java:[21,1]
  static import only from classes and interfaces
 [ERROR] 
 /Users/ghagleitner/Projects/hive-test-trunk/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestExtendedAcls.java:[22,46]
  cannot find symbol
 [ERROR] symbol : class AclEntryType
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7231) Improve ORC padding


[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050998#comment-14050998
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653697/HIVE-7231.7.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5672 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/664/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/664/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-664/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653697

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch, HIVE-7231.2.patch, HIVE-7231.3.patch, 
 HIVE-7231.4.patch, HIVE-7231.5.patch, HIVE-7231.6.patch, HIVE-7231.7.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7267) can not trigger unit tests by command `mvn clean test -Phadoop-2'

2014-07-02 Thread John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John updated HIVE-7267:
---

Description: 
1. download hive 0.13.1
2. decompress and unpack the tarball
3. change the directory to hive 0.13.1
4. run `mvn clean install -DskipTests -Phadoop-2'
5. run `mvn test -Phadoop-2'

Could not find any unit tests

  was:
1. download hive 0.13.1
2. decompress and unpack the tarball
3. change the directory to hive 0.13.1
4. run `mvn clean test -Phadoop-2'

Could not find any unit tests


 can not trigger unit tests by command `mvn clean test -Phadoop-2'
 -

 Key: HIVE-7267
 URL: https://issues.apache.org/jira/browse/HIVE-7267
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: John

 1. download hive 0.13.1
 2. decompress and unpack the tarball
 3. change the directory to hive 0.13.1
 4. run `mvn clean install -DskipTests -Phadoop-2'
 5. run `mvn test -Phadoop-2'
 Could not find any unit tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7267) can not trigger unit tests by command `mvn test -Phadoop-2'

2014-07-02 Thread John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John updated HIVE-7267:
---

Summary: can not trigger unit tests by command `mvn test -Phadoop-2'  (was: 
can not trigger unit tests by command `mvn clean test -Phadoop-2')

 can not trigger unit tests by command `mvn test -Phadoop-2'
 ---

 Key: HIVE-7267
 URL: https://issues.apache.org/jira/browse/HIVE-7267
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: John

 1. download hive 0.13.1
 2. decompress and unpack the tarball
 3. change the directory to hive 0.13.1
 4. run `mvn clean install -DskipTests -Phadoop-2'
 5. run `mvn test -Phadoop-2'
 Could not find any unit tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7267) can not trigger unit tests by command `mvn test -Phadoop-2'

2014-07-02 Thread John (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John updated HIVE-7267:
---

Description: 
1. download hive 0.13.1
2. decompress and unpack the tarball
3. change the directory to hive 0.13.1
4. run `mvn clean install -DskipTests -Phadoop-2'
5. run `mvn test -Phadoop-2'

Could not find any unit tests

The link of official doc: 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIrunalloftheunittests?

  was:
1. download hive 0.13.1
2. decompress and unpack the tarball
3. change the directory to hive 0.13.1
4. run `mvn clean install -DskipTests -Phadoop-2'
5. run `mvn test -Phadoop-2'

Could not find any unit tests


 can not trigger unit tests by command `mvn test -Phadoop-2'
 ---

 Key: HIVE-7267
 URL: https://issues.apache.org/jira/browse/HIVE-7267
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: John

 1. download hive 0.13.1
 2. decompress and unpack the tarball
 3. change the directory to hive 0.13.1
 4. run `mvn clean install -DskipTests -Phadoop-2'
 5. run `mvn test -Phadoop-2'
 Could not find any unit tests
 The link of official doc: 
 https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIrunalloftheunittests?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6694) Beeline should provide a way to execute shell command as Hive CLI does


[ 
https://issues.apache.org/jira/browse/HIVE-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051017#comment-14051017
 ] 

Hive QA commented on HIVE-6694:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653706/HIVE-6694.5.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5673 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/666/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/666/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-666/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653706

 Beeline should provide a way to execute shell command as Hive CLI does
 --

 Key: HIVE-6694
 URL: https://issues.apache.org/jira/browse/HIVE-6694
 Project: Hive
  Issue Type: Improvement
  Components: CLI, Clients
Affects Versions: 0.11.0, 0.12.0, 0.13.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.14.0

 Attachments: HIVE-6694.1.patch, HIVE-6694.1.patch, HIVE-6694.2.patch, 
 HIVE-6694.3.patch, HIVE-6694.4.patch, HIVE-6694.5.patch, HIVE-6694.patch


 Hive CLI allows a user to execute a shell command using ! notation. For 
 instance, !cat myfile.txt. Being able to execute shell command may be 
 important for some users. As a replacement, however, Beeline provides no such 
 capability, possibly because ! notation is reserved for SQLLine commands. 
 It's possible to provide this using a slightly syntactic variation such as 
 !sh cat myfilie.txt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7312) CBO throws ArrayIndexOutOfBounds


[ 
https://issues.apache.org/jira/browse/HIVE-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051018#comment-14051018
 ] 

Hive QA commented on HIVE-7312:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12653736/HIVE-7312.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/667/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/667/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-667/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-667/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java'
Reverted 'itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java'
Reverted 'itests/util/pom.xml'
Reverted 'beeline/src/java/org/apache/hive/beeline/BeeLine.java'
Reverted 'beeline/src/java/org/apache/hive/beeline/Commands.java'
Reverted 'beeline/src/main/resources/BeeLine.properties'
Reverted 'cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen 
common/src/java/org/apache/hadoop/hive/common/cli/ShellCmdExecutor.java 
common/src/java/org/apache/hive/common/util/StreamPrinter.java contrib/target 
service/target serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1607521.

At revision 1607521.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12653736

 CBO throws ArrayIndexOutOfBounds
 

 Key: HIVE-7312
 URL: https://issues.apache.org/jira/browse/HIVE-7312
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7312.patch


 Running tpcds query 17. Still confirming if col stats are available.
 When I turn CBO on (this is just

[jira] [Created] (HIVE-7345) Beeline changes its prompt to reflect successful database connection even after failing to connect

Ashish Kumar Singh created HIVE-7345:


 Summary: Beeline changes its prompt to reflect successful database 
connection even after failing to connect
 Key: HIVE-7345
 URL: https://issues.apache.org/jira/browse/HIVE-7345
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh


Beeline changes its prompt to reflect successful database connection even after 
failing to connect, which is misleading.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7345) Beeline changes its prompt to reflect successful database connection even after failing to connect


 [ 
https://issues.apache.org/jira/browse/HIVE-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-7345:
-

Description: 
Beeline changes its prompt to reflect successful database connection even after 
failing to connect, which is misleading.

{code}
[asingh@e1118 tpcds]$ beeline -u jdbc:hive2://abclocalhost:1 hive
scan complete in 5ms
Connecting to jdbc:hive2://abclocalhost:1
Error: Invalid URL: jdbc:hive2://abclocalhost:1 (state=08S01,code=0)
Beeline version 0.12.0-cdh5.1.0-SNAPSHOT by Apache Hive
0: jdbc:hive2://abclocalhost:1 show tables;
No current connection
{code}

  was:Beeline changes its prompt to reflect successful database connection even 
after failing to connect, which is misleading.


 Beeline changes its prompt to reflect successful database connection even 
 after failing to connect
 --

 Key: HIVE-7345
 URL: https://issues.apache.org/jira/browse/HIVE-7345
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh

 Beeline changes its prompt to reflect successful database connection even 
 after failing to connect, which is misleading.
 {code}
 [asingh@e1118 tpcds]$ beeline -u jdbc:hive2://abclocalhost:1 hive
 scan complete in 5ms
 Connecting to jdbc:hive2://abclocalhost:1
 Error: Invalid URL: jdbc:hive2://abclocalhost:1 (state=08S01,code=0)
 Beeline version 0.12.0-cdh5.1.0-SNAPSHOT by Apache Hive
 0: jdbc:hive2://abclocalhost:1 show tables;
 No current connection
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 23256: HIVE-7345: Beeline changes its prompt to reflect successful database connection even after failing to connect

2014-07-02 Thread Ashish Singh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23256/
---

Review request for hive.


Bugs: HIVE-7345
https://issues.apache.org/jira/browse/HIVE-7345


Repository: hive-git


Description
---

HIVE-7345: Beeline changes its prompt to reflect successful database connection 
even after failing to connect


Diffs
-

  beeline/src/java/org/apache/hive/beeline/BeeLine.java 
2f3350e79f6168b11c13c6b4f84128c9255e0383 
  beeline/src/java/org/apache/hive/beeline/DatabaseConnection.java 
00b49afb72531a4c15d0239ba08b04faa229d262 

Diff: https://reviews.apache.org/r/23256/diff/


Testing
---

NA


Thanks,

Ashish Singh

[jira] [Updated] (HIVE-7345) Beeline changes its prompt to reflect successful database connection even after failing to connect


 [ 
https://issues.apache.org/jira/browse/HIVE-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-7345:
-

Attachment: HIVE-7345.patch

RB: https://reviews.apache.org/r/23256/

 Beeline changes its prompt to reflect successful database connection even 
 after failing to connect
 --

 Key: HIVE-7345
 URL: https://issues.apache.org/jira/browse/HIVE-7345
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7345.patch


 Beeline changes its prompt to reflect successful database connection even 
 after failing to connect, which is misleading.
 {code}
 [asingh@e1118 tpcds]$ beeline -u jdbc:hive2://abclocalhost:1 hive
 scan complete in 5ms
 Connecting to jdbc:hive2://abclocalhost:1
 Error: Invalid URL: jdbc:hive2://abclocalhost:1 (state=08S01,code=0)
 Beeline version 0.12.0-cdh5.1.0-SNAPSHOT by Apache Hive
 0: jdbc:hive2://abclocalhost:1 show tables;
 No current connection
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7312) CBO throws ArrayIndexOutOfBounds

2014-07-02 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051059#comment-14051059
 ] 

Gunther Hagleitner commented on HIVE-7312:
--

Committed to branch. Thanks [~jpullokkaran]!

 CBO throws ArrayIndexOutOfBounds
 

 Key: HIVE-7312
 URL: https://issues.apache.org/jira/browse/HIVE-7312
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7312.patch


 Running tpcds query 17. Still confirming if col stats are available.
 When I turn CBO on (this is just the relevant snipped, the actual exception 
 is pages long):
 Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 0
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.RelOptHiveTable.getColStat(RelOptHiveTable.java:97)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.reloperators.HiveTableScanRel.getColStat(HiveTableScanRel.java:73)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:47)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:36)
   ... 272 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7312) CBO throws ArrayIndexOutOfBounds

2014-07-02 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7312:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 CBO throws ArrayIndexOutOfBounds
 

 Key: HIVE-7312
 URL: https://issues.apache.org/jira/browse/HIVE-7312
 Project: Hive
  Issue Type: Sub-task
Reporter: Gunther Hagleitner
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7312.patch


 Running tpcds query 17. Still confirming if col stats are available.
 When I turn CBO on (this is just the relevant snipped, the actual exception 
 is pages long):
 Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 0
   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
   at java.util.ArrayList.get(ArrayList.java:411)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.RelOptHiveTable.getColStat(RelOptHiveTable.java:97)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.reloperators.HiveTableScanRel.getColStat(HiveTableScanRel.java:73)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:47)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.stats.HiveRelMdDistinctRowCount.getDistinctRowCount(HiveRelMdDistinctRowCount.java:36)
   ... 272 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7344) Add streaming support in Windowing mode for FirstVal, LastVal