date:20150126


[ 
https://issues.apache.org/jira/browse/HIVE-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291838#comment-14291838
 ] 

Hive QA commented on HIVE-9388:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694539/HIVE-9388.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7365 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2521/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2521/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2521/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694539 - PreCommit-HIVE-TRUNK-Build

 HiveServer2 fails to reconnect to MetaStore after MetaStore restart
 ---

 Key: HIVE-9388
 URL: https://issues.apache.org/jira/browse/HIVE-9388
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0, 0.14.0, 0.13.1
Reporter: Piotr Ackermann
 Attachments: HIVE-9388.patch


 How to reproduce:
 # Use Hue to connect to HiveServer2
 # Restart Metastore
 # Try to execute any query in Hue
 HiveServer2 report error:
 {quote}
 ERROR hive.log: Got exception: 
 org.apache.thrift.transport.TTransportException null
 org.apache.thrift.transport.TTransportException
 at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:355)
 at 
 org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:432)
 at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414)
 at 
 org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90)
 at com.sun.proxy.$Proxy10.getDatabases(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1681)
 at com.sun.proxy.$Proxy10.getDatabases(Unknown Source)
 at 
 org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:62)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:438)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
 at

Re: 0.15 release

2015-01-26 Thread Alan Gates


Brock,

Given there isn't consensus on numbering yet, could you holding off 
making the 0.15 branch.  We should come to a conclusion on whether we're 
doing 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers.


Alan.


Brock Noland mailto:br...@cloudera.com
January 20, 2015 at 21:25
Just a reminder that I plan on branching on 1/26/2015 and start
rolling release candidates on 2/9/2015. After branching I plan on
merging only blockers.

Brock
Brock Noland mailto:br...@cloudera.com
January 12, 2015 at 14:37
Hi,

Projects are instructed in the incubator that releases gain new users and
other attention. Additionally, as discussed in this forum I'd like to
increase the tempo of our release process[1].

As such, I plan on following this process:

1) Provide two weeks notice of branching
2) Provide two weeks to find issues on the branch and merging only 
blockers

3) Roll release candidates until a release vote passes

As such I plan on branching on 1/26/2015 and start rolling release
candidates on 2/9/2015.

Cheers,
Brock

1. Note I am not complaining as I did not help with releases until this
point.



--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Resolved] (HIVE-9100) HiveServer2 fail to connect to MetaStore after MetaStore restarting

2015-01-26 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-9100.

Resolution: Invalid

 HiveServer2 fail to connect to MetaStore after MetaStore restarting 
 

 Key: HIVE-9100
 URL: https://issues.apache.org/jira/browse/HIVE-9100
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2, Security
Reporter: Nemon Lou
 Attachments: hiveserver2.log, metastore.log


 Secure cluster with kerberos,remote metastore.
 How to reproduce :
 1,use beeline to connect to HiveServer2
 2,restart the MetaStore process
 3,type command like 'show tables' in beeline
 Client side will report this error:
 {quote}
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Could 
 not connect to meta store using any of the URIs provided. Most recent 
 failure: org.apache.thrift.transport.TTransportException: Peer indicated 
 failure: DIGEST-MD5: IO error acquiring password
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
 {quote}
 HiveServer2's log and metastore's log are uploaded as attachments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9100) HiveServer2 fail to connect to MetaStore after MetaStore restarting

2015-01-26 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9100:
---
Affects Version/s: (was: 0.13.1)
   (was: 0.14.0)

 HiveServer2 fail to connect to MetaStore after MetaStore restarting 
 

 Key: HIVE-9100
 URL: https://issues.apache.org/jira/browse/HIVE-9100
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2, Security
Reporter: Nemon Lou
 Attachments: hiveserver2.log, metastore.log


 Secure cluster with kerberos,remote metastore.
 How to reproduce :
 1,use beeline to connect to HiveServer2
 2,restart the MetaStore process
 3,type command like 'show tables' in beeline
 Client side will report this error:
 {quote}
 Error: Error while processing statement: FAILED: Execution Error, return code 
 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Could 
 not connect to meta store using any of the URIs provided. Most recent 
 failure: org.apache.thrift.transport.TTransportException: Peer indicated 
 failure: DIGEST-MD5: IO error acquiring password
   at 
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
 {quote}
 HiveServer2's log and metastore's log are uploaded as attachments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9454) Test failures due to new Calcite version

2015-01-26 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292022#comment-14292022
 ] 

Laljo John Pullokkaran commented on HIVE-9454:
--

[~brocknoland] we have started analyzing failures. There is already one Calcite 
bug filed.

 Test failures due to new Calcite version
 

 Key: HIVE-9454
 URL: https://issues.apache.org/jira/browse/HIVE-9454
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
 Attachments: HIVE-9454.1.patch


 A bunch of failures have started appearing in patches which seen unrelated. I 
 am thinking we've picked up a new version of Calcite. E.g.:
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2488/testReport/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_auto_join12/
 {noformat}
 Running: diff -a 
 /home/hiveptest/54.147.202.89-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/auto_join12.q.out
  
 /home/hiveptest/54.147.202.89-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../ql/src/test/results/clientpositive/auto_join12.q.out
 32c32
  $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src 
 ---
  $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src 
 35c35
  $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:src 
 ---
  $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:$hdt$_1:src 
 39c39
  $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src 
 ---
  $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src 
 54c54
  $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:src 
 ---
  $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:$hdt$_1:src 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Hive-0.14 - Build # 846 - Failure

2015-01-26 Thread Apache Jenkins Server

Changes for Build #846
[ekoifman] HIVE-9361 - Intermittent NPE in 
SessionHiveMetaStoreClient.alterTempTable




No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #846)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-0.14/846/ to view 
the results.

[jira] [Commented] (HIVE-9271) Add ability for client to request metastore to fire an event


[ 
https://issues.apache.org/jira/browse/HIVE-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292243#comment-14292243
 ] 

Sushanth Sowmyan commented on HIVE-9271:


Mostly looks good to me, and has my tentative +1.

One minor thing : I wondered if fire_notification_event shouldn't return a long 
instead of a void, thus returning the notification id of the fired event - but 
that's a strict no-no from a perspective of what a MetaStoreEventListener is 
supposed to be, since a NotificationListener is only one type of listener. That 
got me thinking - should this method be called fire_event rather than 
fire_notification_event?



 Add ability for client to request metastore to fire an event
 

 Key: HIVE-9271
 URL: https://issues.apache.org/jira/browse/HIVE-9271
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.15.0

 Attachments: HIVE-9271.patch


 Currently all events in Hive are fired by the metastore.  However, there are 
 events that only the client fully understands, such as DML operations.  There 
 should be a way for the client to request the metastore to fire a particular 
 event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9449) Push YARN configuration to Spark while deply Spark on YARN[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9449:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to spark branch. Thanks, Chengxiang.

 Push YARN configuration to Spark while deply Spark on YARN[Spark Branch]
 

 Key: HIVE-9449
 URL: https://issues.apache.org/jira/browse/HIVE-9449
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-9449.1-spark.patch, HIVE-9449.1-spark.patch, 
 HIVE-9449.2-spark.patch


 We only push Spark configuration and RSC configuration to Spark while launch 
 Spark cluster now, for Spark on YARN mode, Spark need extra YARN 
 configuration to launch Spark cluster. Besides this, to support dynamically 
 configuration setting for RSC configuration/YARN configuration, we need to 
 recreate SparkSession while RSC configuration/YARN configuration update as 
 well, as they may influence the Spark cluster deployment as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-3280) Make HiveMetaStoreClient a public API

2015-01-26 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-3280:

   Resolution: Fixed
Fix Version/s: 0.14.1
   Status: Resolved  (was: Patch Available)

Patch committed to trunk, 0.14 and branch-1.


 Make HiveMetaStoreClient a public API
 -

 Key: HIVE-3280
 URL: https://issues.apache.org/jira/browse/HIVE-3280
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Thejas M Nair
  Labels: api-addition
 Fix For: 0.14.1

 Attachments: HIVE-3280.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Hive-0.14 - Build # 847 - Fixed

2015-01-26 Thread Apache Jenkins Server

Changes for Build #846
[ekoifman] HIVE-9361 - Intermittent NPE in 
SessionHiveMetaStoreClient.alterTempTable


Changes for Build #847
[thejas] HIVE-3280 : Make HiveMetaStoreClient a public API (Thejas Nair, 
reviewed by Alan Gates)




No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #847)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-0.14/847/ to view 
the results.

[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds


 [ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9333:
--
Attachment: HIVE-9333.1.patch

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.1.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 We might save 200% of extra time by doing such change.
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9396) date_add()/date_sub() should allow tinyint/smallint/bigint arguments in addition to int

2015-01-26 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9396:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

I've committed this to trunk, thanks for doing this [~spena].

 date_add()/date_sub() should allow tinyint/smallint/bigint arguments in 
 addition to int
 ---

 Key: HIVE-9396
 URL: https://issues.apache.org/jira/browse/HIVE-9396
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Sergio Peña
 Attachments: HIVE-9396.3.patch, HIVE-9396.4.patch


 {noformat}
 hive select c1, date_add('1985-01-01', c1) from short1;
 FAILED: SemanticException [Error 10014]: Line 1:11 Wrong arguments 'c1':  
 DATE_ADD() only takes INT types as second  argument, got SHORT
 {noformat}
 We should allow date_add()/date_sub() to take any integral type for the 2nd 
 argument, rather than just int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 30281: Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Sergio Pena


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30281/
---

Review request for hive, Ryan Blue, cheng xu, and Dong Chen.


Bugs: HIVE-9333
https://issues.apache.org/jira/browse/HIVE-9333


Repository: hive-git


Description
---

This patch moves the ParquetHiveSerDe.serialize() implementation to 
DataWritableWriter class in order to save time in materializing data on 
serialize().


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
ea4109d358f7c48d1e2042e5da299475de4a0a29 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 060b1b722d32f3b2f88304a1a73eb249e150294b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
41b5f1c3b0ab43f734f8a211e3e03d5060c75434 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java
 e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
a693aff18516d133abf0aae4847d3fe00b9f1c96 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java
 667d3671547190d363107019cd9a2d105d26d336 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 
007a665529857bcec612f638a157aa5043562a15 
  serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/30281/diff/


Testing
---

The tests run were the following:

1. JMH (Java microbenchmark)

This benchmark called parquet serialize/write methods using text writable 
objects. 

Class.method  Before Change (ops/s)  After Change (ops/s)   

---
ParquetHiveSerDe.serialize:  19,113   249,528   -  19x 
speed increase
DataWritableWriter.write: 5,033 5,201   -  
3.34% speed increase


2. Write 20 million rows (~1GB file) from Text to Parquet

I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format 
using the following
statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text;

Time (s) it took to write the whole file BEFORE changes: 93.758 s
Time (s) it took to write the whole file AFTER changes: 83.903 s

It got a 10% of speed inscrease.


Thanks,

Sergio Pena

[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Attachment: hive-9327.txt

This patch changes no code, just puts the required Apache header on the source 
files and moves Microsoft's copyright notice to the NOTICE file.

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9468) Test groupby3_map_skew.q fails due to decimal precision difference

Xuefu Zhang created HIVE-9468:
-

 Summary: Test groupby3_map_skew.q fails due to decimal precision 
difference
 Key: HIVE-9468
 URL: https://issues.apache.org/jira/browse/HIVE-9468
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Xuefu Zhang


From test run, 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/682/testReport:
 
{code}
Running: diff -a 
/home/hiveptest/54.177.132.58-hiveptest-1/apache-svn-spark-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/groupby3_map_skew.q.out
 
/home/hiveptest/54.177.132.58-hiveptest-1/apache-svn-spark-source/itests/qtest/../../ql/src/test/results/clientpositive/groupby3_map_skew.q.out
162c162
 130091.0  260.182 256.10355987055016  98.00.0 
142.92680950752379  143.06995106518903  20428.07288 20469.0109
---
 130091.0  260.182 256.10355987055016  98.00.0 
 142.9268095075238   143.06995106518906  20428.07288 20469.0109
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9317) move Microsoft copyright to NOTICE file

2015-01-26 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292218#comment-14292218
 ] 

Alan Gates commented on HIVE-9317:
--

+1.

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9469) Hive Thrift Server throws Socket Timeout Exception: Read time out

2015-01-26 Thread Manish Malhotra (JIRA)

Manish Malhotra created HIVE-9469:
-

 Summary: Hive Thrift Server throws Socket Timeout Exception: Read 
time out
 Key: HIVE-9469
 URL: https://issues.apache.org/jira/browse/HIVE-9469
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
 Environment: 4 core cpu, 15gb memory. 2 thrift server behind load 
balancer
Reporter: Manish Malhotra


Hi All,

Please review the following problem, I also posted same in the hive-user group, 
but didnt got any response yet. 
This is happening quite frequently in our environment. 
So, it would be great if somebody can see and advise. 

I'm using Hive Thrift Server in Production which at peak handles around 500 
req/min.
After certain point the Hive Thrift Server is going into the no response mode 
and throws 
Following exception 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out 

As the metastore we are using MySQL, that is being used by Thrift server. 
The design / architecture is like this: 

Oozie --  Hive Action -- ELB (AWS) -- Hive Thrift ( 2 servers) -- MySQL 
(Master) --  MySQL (Slave).

Software versions: 

   Hive version : 0.10.0
   Hadoop: 1.2.1


Looks like when the load is beyond some threshold for certain operations it is 
having problem in responding. 
As the hive jobs sometimes fails because of this issue, we also have a 
auto-restart check to see if the Thrift server is not responding, it stops / 
kills and restart the service. 

Other tuning done: 

Thrift Server: 

Given 11gb heap, and configured CMS GC algo. 

MySQL: 

Tuned innodb_buffer, tmp_table and max_heap parameters.

So, can somebody please help to understand, what could be the root cause for 
this or somebody faced the similar issue. 

I found one related JIRA :https://issues.apache.org/jira/browse/HCATALOG-541

But this JIRA shows that Hive Thrift Server shows OOM error, but in my case I 
didnt see any OOM error in my case.


Regards,
Manish

Full Exception Stack: 

at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
at $Proxy7.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at

[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Priority: Blocker  (was: Major)

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9317) move Microsoft copyright to NOTICE file


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-9317:
---

Assignee: Owen O'Malley

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file


 [ 
https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-9317:

Status: Patch Available  (was: Open)

 move Microsoft copyright to NOTICE file
 ---

 Key: HIVE-9317
 URL: https://issues.apache.org/jira/browse/HIVE-9317
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.15.0

 Attachments: hive-9327.txt


 There are a set of files that still have the Microsoft copyright notices. 
 Those notices need to be moved into NOTICES and replaced with the standard 
 Apache headers.
 {code}
 ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java
 ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java
 ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java
 ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9428) LocalSparkJobStatus may return failed job as successful [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429
 ] 

Xuefu Zhang commented on HIVE-9428:
---

+1. groupby3_map_skew.q failure doesn't seem related. Filed HIVE-9468 for that.

 LocalSparkJobStatus may return failed job as successful [Spark Branch]
 --

 Key: HIVE-9428
 URL: https://issues.apache.org/jira/browse/HIVE-9428
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Attachments: HIVE-9428.1-spark.patch, HIVE-9428.2-spark.patch, 
 HIVE-9428.3-spark.patch


 Future is done doesn't necessarily mean the job is successful. We should rely 
 on SparkJobInfo to get job status whenever it's available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9428) LocalSparkJobStatus may return failed job as successful [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9428:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Rui.

 LocalSparkJobStatus may return failed job as successful [Spark Branch]
 --

 Key: HIVE-9428
 URL: https://issues.apache.org/jira/browse/HIVE-9428
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Fix For: spark-branch

 Attachments: HIVE-9428.1-spark.patch, HIVE-9428.2-spark.patch, 
 HIVE-9428.3-spark.patch


 Future is done doesn't necessarily mean the job is successful. We should rely 
 on SparkJobInfo to get job status whenever it's available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9468) Test groupby3_map_skew.q fails due to decimal precision difference


[ 
https://issues.apache.org/jira/browse/HIVE-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292230#comment-14292230
 ] 

Xuefu Zhang commented on HIVE-9468:
---

udaf_percentile_approx_23.q is another instance of the problem.

 Test groupby3_map_skew.q fails due to decimal precision difference
 --

 Key: HIVE-9468
 URL: https://issues.apache.org/jira/browse/HIVE-9468
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Xuefu Zhang

 From test run, 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/682/testReport:
  
 {code}
 Running: diff -a 
 /home/hiveptest/54.177.132.58-hiveptest-1/apache-svn-spark-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/groupby3_map_skew.q.out
  
 /home/hiveptest/54.177.132.58-hiveptest-1/apache-svn-spark-source/itests/qtest/../../ql/src/test/results/clientpositive/groupby3_map_skew.q.out
 162c162
  130091.0260.182 256.10355987055016  98.00.0 
 142.92680950752379  143.06995106518903  20428.07288 20469.0109
 ---
  130091.0260.182 256.10355987055016  98.00.0 
  142.9268095075238   143.06995106518906  20428.07288 20469.0109
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

[
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergio Peña updated HIVE-9333:
--
Status: Patch Available (was: In Progress)

Attach the patch to run the tests.
I also attach a link to the Review Board for code review.

Move parquet serialize implementation to DataWritableWriter to improve write
speeds
---

Key: HIVE-9333
URL: https://issues.apache.org/jira/browse/HIVE-9333
Project: Hive
Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
Attachments: HIVE-9333.1.patch

The serialize process on ParquetHiveSerDe parses a Hive object
to a Writable object by looping through all the Hive object children,
and creating new Writables objects per child. These final writables
objects are passed in to the Parquet writing function, and parsed again
on the DataWritableWriter class by looping through the ArrayWritable
object. These two loops (ParquetHiveSerDe.serialize() and
DataWritableWriter.write() may be reduced to use just one loop into the
DataWritableWriter.write() method in order to increment the writing process
speed for Hive parquet.
In order to achieve this, we can wrap the Hive object and object inspector
on ParquetHiveSerDe.serialize() method into an object that implements the
Writable object and thus avoid the loop that serialize() does, and leave the
loop parser to the DataWritableWriter.write() method. We can see how ORC does
this with the OrcSerde.OrcSerdeRow class.
Writable objects are organized differently on any kind of storage formats, so
I don't think it is necessary to create and keep the writable objects in the
serialize() method as they won't be used until the writing process starts
(DataWritableWriter.write()).
We might save 200% of extra time by doing such change.
This performance issue was found using microbenchmark tests from HIVE-8121.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: 0.15 release

2015-01-26 Thread Xuefu Zhang

From a different perspective, 0.14.1/0.15 proposal allow us to release
independently and concurrently. Once they are leased, we can have a
consented 1.0 release.

On the other hand, 1.0/1.1 would force us to wait to release 1.1 after 1.0
is released. This dependency seems artificial and can be avoided.

Thanks,
Xuefu

On Mon, Jan 26, 2015 at 9:17 AM, Brock Noland br...@cloudera.com wrote:

 Hi Alan,

 In all of my experience at Apache, I have been encouraged to release.
 Contributors rightly want to see their hard work gets in the hands of the
 users. That's why they contribute after all. Many contributors who have
 features in trunk would like get those features out into the community.
 This is completely reasonable of them. After all they've invested
 significant time in this work.

 Thus I don't feel we should delay getting their contributions released
 while we debate 1.0. The two have nothing todo with each other. I've
 mentioned on the list and in person to Thejas that I wanted this release to
 specifically avoid the 1.x discussion so it did not get bogged down in the
 1.x discussion. Again, this is completely reasonable.

 In short, everything I have experienced at Apache indicates that the folks
 who want to release 0.15 should be free to do the work to make that happen.

 Brock

 On Mon, Jan 26, 2015 at 7:02 AM, Alan Gates ga...@hortonworks.com wrote:

  Brock,
 
  Given there isn't consensus on numbering yet, could you holding off
 making
  the 0.15 branch.  We should come to a conclusion on whether we're doing
  0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers.
 
  Alan.
 
Brock Noland br...@cloudera.com
   January 20, 2015 at 21:25
  Just a reminder that I plan on branching on 1/26/2015 and start
  rolling release candidates on 2/9/2015. After branching I plan on
  merging only blockers.
 
  Brock
Brock Noland br...@cloudera.com
   January 12, 2015 at 14:37
  Hi,
 
  Projects are instructed in the incubator that releases gain new users and
  other attention. Additionally, as discussed in this forum I'd like to
  increase the tempo of our release process[1].
 
  As such, I plan on following this process:
 
  1) Provide two weeks notice of branching
  2) Provide two weeks to find issues on the branch and merging only
 blockers
  3) Roll release candidates until a release vote passes
 
  As such I plan on branching on 1/26/2015 and start rolling release
  candidates on 2/9/2015.
 
  Cheers,
  Brock
 
  1. Note I am not complaining as I did not help with releases until this
  point.
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
  to which it is addressed and may contain information that is
 confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.

[jira] [Updated] (HIVE-9397) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS

2015-01-26 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9397:
--
Assignee: (was: Gopal V)

 SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS
 

 Key: HIVE-9397
 URL: https://issues.apache.org/jira/browse/HIVE-9397
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.15.0
Reporter: Damien Carol

 These queries produce an error :
 {code:sql}
 DROP TABLE IF EXISTS foo;
 CREATE TABLE foo (id int) STORED AS ORC;
 INSERT INTO TABLE foo VALUES (1);
 INSERT INTO TABLE foo VALUES (2);
 INSERT INTO TABLE foo VALUES (3);
 INSERT INTO TABLE foo VALUES (4);
 INSERT INTO TABLE foo VALUES (5);
 SELECT max(id) FROM foo;
 ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS id;
 SELECT max(id) FROM foo;
 {code}
 The last query throws {{org.apache.hive.service.cli.HiveSQLException}}
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino SELECT max(id) FROM foo;
 +-+--+
 | _c0 |
 +-+--+
 org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException
 0: jdbc:hive2://nc-h04:1/casino
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9467) ORC - sort dictionary streams to the end of the stripe

Owen O'Malley created HIVE-9467:
---

 Summary: ORC - sort dictionary streams to the end of the stripe
 Key: HIVE-9467
 URL: https://issues.apache.org/jira/browse/HIVE-9467
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley


When reading ORC files, it would be convenient to group the dictionary streams 
at the end of the stripe. This would allow the reader to use fewer read 
operations if they want to load the dictionaries before they load the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: 0.15 release

2015-01-26 Thread Brock Noland

Hi Alan,

In all of my experience at Apache, I have been encouraged to release.
Contributors rightly want to see their hard work gets in the hands of the
users. That's why they contribute after all. Many contributors who have
features in trunk would like get those features out into the community.
This is completely reasonable of them. After all they've invested
significant time in this work.

Thus I don't feel we should delay getting their contributions released
while we debate 1.0. The two have nothing todo with each other. I've
mentioned on the list and in person to Thejas that I wanted this release to
specifically avoid the 1.x discussion so it did not get bogged down in the
1.x discussion. Again, this is completely reasonable.

In short, everything I have experienced at Apache indicates that the folks
who want to release 0.15 should be free to do the work to make that happen.

Brock

On Mon, Jan 26, 2015 at 7:02 AM, Alan Gates ga...@hortonworks.com wrote:

 Brock,

 Given there isn't consensus on numbering yet, could you holding off making
 the 0.15 branch.  We should come to a conclusion on whether we're doing
 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers.

 Alan.

   Brock Noland br...@cloudera.com
  January 20, 2015 at 21:25
 Just a reminder that I plan on branching on 1/26/2015 and start
 rolling release candidates on 2/9/2015. After branching I plan on
 merging only blockers.

 Brock
   Brock Noland br...@cloudera.com
  January 12, 2015 at 14:37
 Hi,

 Projects are instructed in the incubator that releases gain new users and
 other attention. Additionally, as discussed in this forum I'd like to
 increase the tempo of our release process[1].

 As such, I plan on following this process:

 1) Provide two weeks notice of branching
 2) Provide two weeks to find issues on the branch and merging only blockers
 3) Roll release candidates until a release vote passes

 As such I plan on branching on 1/26/2015 and start rolling release
 candidates on 2/9/2015.

 Cheers,
 Brock

 1. Note I am not complaining as I did not help with releases until this
 point.


 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

[jira] [Updated] (HIVE-9361) Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable

2015-01-26 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-9361:
-
   Resolution: Fixed
Fix Version/s: 1.0.0
   0.14.1
   0.15.0
   Status: Resolved  (was: Patch Available)

 Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable
 -

 Key: HIVE-9361
 URL: https://issues.apache.org/jira/browse/HIVE-9361
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.15.0, 0.14.1, 1.0.0

 Attachments: HIVE-9361.patch


 it's happening at 
 {noformat}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(newtCopy,
 wh.getFileStatusesForSD(newtCopy.getSd()), false, true);
 {noformat}
 other methods in this class call getWh() to get Warehouse so this likely 
 explains why it's intermittent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: 0.15 release

2015-01-26 Thread Alan Gates

I'm not asking you to slow the work for the release.  If you want to 
branch, you can branch.  What I'm asking is to hold on the numbering 
scheme.  So you could name the branch 'next' or something and then 
rename once we come to agreement.  Consensus in the community is 
important, and we should avoid things that make that consensus harder.


Alan.


Brock Noland mailto:br...@cloudera.com
January 26, 2015 at 9:17
Hi Alan,

In all of my experience at Apache, I have been encouraged to release.
Contributors rightly want to see their hard work gets in the hands of the
users. That's why they contribute after all. Many contributors who have
features in trunk would like get those features out into the community.
This is completely reasonable of them. After all they've invested
significant time in this work.

Thus I don't feel we should delay getting their contributions released
while we debate 1.0. The two have nothing todo with each other. I've
mentioned on the list and in person to Thejas that I wanted this 
release to

specifically avoid the 1.x discussion so it did not get bogged down in the
1.x discussion. Again, this is completely reasonable.

In short, everything I have experienced at Apache indicates that the folks
who want to release 0.15 should be free to do the work to make that 
happen.


Brock


Alan Gates mailto:ga...@hortonworks.com
January 26, 2015 at 7:02
Brock,

Given there isn't consensus on numbering yet, could you holding off 
making the 0.15 branch.  We should come to a conclusion on whether 
we're doing 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers.


Alan.

Brock Noland mailto:br...@cloudera.com
January 20, 2015 at 21:25
Just a reminder that I plan on branching on 1/26/2015 and start
rolling release candidates on 2/9/2015. After branching I plan on
merging only blockers.

Brock
Brock Noland mailto:br...@cloudera.com
January 12, 2015 at 14:37
Hi,

Projects are instructed in the incubator that releases gain new users and
other attention. Additionally, as discussed in this forum I'd like to
increase the tempo of our release process[1].

As such, I plan on following this process:

1) Provide two weeks notice of branching
2) Provide two weeks to find issues on the branch and merging only 
blockers

3) Roll release candidates until a release vote passes

As such I plan on branching on 1/26/2015 and start rolling release
candidates on 2/9/2015.

Cheers,
Brock

1. Note I am not complaining as I did not help with releases until this
point.



--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Commented] (HIVE-9448) Merge spark to trunk 1/23/15


[ 
https://issues.apache.org/jira/browse/HIVE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292199#comment-14292199
 ] 

Szehon Ho commented on HIVE-9448:
-

Test failure doesnt look related.  Committed to trunk, thanks Xuefu for review.

 Merge spark to trunk 1/23/15
 

 Key: HIVE-9448
 URL: https://issues.apache.org/jira/browse/HIVE-9448
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9448.2.patch, HIVE-9448.3.patch, HIVE-9448.patch


 Merging latest spark changes to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9447) Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition

2015-01-26 Thread Selina Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292449#comment-14292449
]

Selina Zhang commented on HIVE-9447:

The unit test failures seem irrelevant to this patch.

Metastore: inefficient Oracle query for removing unused column descriptors
when add/drop table/partition

Key: HIVE-9447
URL: https://issues.apache.org/jira/browse/HIVE-9447
Project: Hive
Issue Type: Improvement
Components: Metastore
Affects Versions: 0.14.0
Reporter: Selina Zhang
Assignee: Selina Zhang
Attachments: HIVE-9447.1.patch

Original Estimate: 3h
Remaining Estimate: 3h

Metastore needs removing unused column descriptors when drop/add partitions
or tables. For query the unused column descriptor, the current implementation
utilizes datanuleus' range function, which basically equals LIMIT syntax.
However, Oracle does not support LIMIT, the query is converted as
{quote}
SQL SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT
'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS
NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,
A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0
WHERE A0.CD_ID = ? ) subq ) WHERE rn = 1;
{quote}
Given that CD_ID is not very selective, this query may have to access large
amount of rows (depends how many partitions the table has, millions of rows
in our case). Metastore may become unresponsive because of this.
Since Metastore only needs to know if the specific CD_ID is referenced in SDS
table and does not need access the whole row. We can use
{quote}
select count(1) from SDS where SDS.CD_ID=?
{quote}
CD_ID is index column, the above query will do range scan for index, which is
faster.
For other DBs support LIMIT syntax such as MySQL, this problem does not
exist. However, the new query does not hurt.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests


 [ 
https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9470:
--
Status: Patch Available  (was: Open)

 Use a generic writable object to run ColumnaStorageBench write/read tests 
 --

 Key: HIVE-9470
 URL: https://issues.apache.org/jira/browse/HIVE-9470
 Project: Hive
  Issue Type: Improvement
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9470.1.patch


 The ColumnarStorageBench benchmark class is using a Parquet writable object 
 to run all write/read/serialize/deserialize tests. It would be better to use 
 a more generic writable object (like text writables) to get better benchmark 
 results between format storages.
 Using parquet writables may add advantage when writing parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9448) Merge spark to trunk 1/23/15


[ 
https://issues.apache.org/jira/browse/HIVE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292487#comment-14292487
 ] 

Lefty Leverenz commented on HIVE-9448:
--

Doc note:  This adds seven configuration parameters to HiveConf.java and 
changes the description of another one (see HIVE-9337 for branch commit) so 
they all need to be documented in the wiki.  A new section for Spark should be 
created in Configuration Properties for these parameters.

* hive.spark.client.future.timeout (new description)
* hive.spark.job.monitor.timeout
* hive.spark.client.connect.timeout
* hive.spark.client.server.connect.timeout
* hive.spark.client.secret.bits
* hive.spark.client.rpc.threads
* hive.spark.client.rpc.max.size
* hive.spark.client.channel.log.level
* [Configuration Properties | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties]

Additional documentation is probably needed in the Spark wikidoc for other 
changes in this patch and in HIVE-9257:

* [Hive on Spark: Getting Started | 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started]

 Merge spark to trunk 1/23/15
 

 Key: HIVE-9448
 URL: https://issues.apache.org/jira/browse/HIVE-9448
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: HIVE-9448.2.patch, HIVE-9448.3.patch, HIVE-9448.patch


 Merging latest spark changes to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9337) Move more hive.spark.* configurations to HiveConf [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292495#comment-14292495
 ] 

Lefty Leverenz commented on HIVE-9337:
--

HIVE-9448 merged these configuration parameters from the Spark branch to trunk.

 Move more hive.spark.* configurations to HiveConf [Spark Branch]
 

 Key: HIVE-9337
 URL: https://issues.apache.org/jira/browse/HIVE-9337
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Szehon Ho
Assignee: Szehon Ho
Priority: Blocker
  Labels: TODOC-SPARK
 Fix For: spark-branch

 Attachments: HIVE-9337-spark.patch, HIVE-9337.2-spark.patch


 Some hive.spark configurations have been added to HiveConf, but there are 
 some like hive.spark.log.dir that are not there.
 Also some configurations in RpcConfiguration.java might be eligible to be 
 moved.
 Without this, these configurations cannot be set dynamically via Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9396) date_add()/date_sub() should allow tinyint/smallint/bigint arguments in addition to int


 [ 
https://issues.apache.org/jira/browse/HIVE-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9396:
-
Fix Version/s: 0.15.0

 date_add()/date_sub() should allow tinyint/smallint/bigint arguments in 
 addition to int
 ---

 Key: HIVE-9396
 URL: https://issues.apache.org/jira/browse/HIVE-9396
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Sergio Peña
 Fix For: 0.15.0

 Attachments: HIVE-9396.3.patch, HIVE-9396.4.patch


 {noformat}
 hive select c1, date_add('1985-01-01', c1) from short1;
 FAILED: SemanticException [Error 10014]: Line 1:11 Wrong arguments 'c1':  
 DATE_ADD() only takes INT types as second  argument, got SHORT
 {noformat}
 We should allow date_add()/date_sub() to take any integral type for the 2nd 
 argument, rather than just int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9448) Merge spark to trunk 1/23/15


 [ 
https://issues.apache.org/jira/browse/HIVE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9448:
-
Labels: TODOC15  (was: )

 Merge spark to trunk 1/23/15
 

 Key: HIVE-9448
 URL: https://issues.apache.org/jira/browse/HIVE-9448
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Attachments: HIVE-9448.2.patch, HIVE-9448.3.patch, HIVE-9448.patch


 Merging latest spark changes to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds


[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292502#comment-14292502
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694605/HIVE-9333.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7373 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_parquet
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2522/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694605 - PreCommit-HIVE-TRUNK-Build

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.1.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 We might save 200% of extra time by doing such change.
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7660) Hive to support qualify analytic filtering

2015-01-26 Thread Alexander Pivovarov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292392#comment-14292392
 ] 

Alexander Pivovarov commented on HIVE-7660:
---

[~viji_r] Can you put query example in the description pls?

 Hive to support qualify analytic filtering
 --

 Key: HIVE-7660
 URL: https://issues.apache.org/jira/browse/HIVE-7660
 Project: Hive
  Issue Type: New Feature
Reporter: Viji
Priority: Trivial

 Currently, Hive does not support qualify analytic filtering. It would be 
 useful fi this feature were added in the future.
 As a workaround, since it is just a filter, we can replace it with a subquery 
 and filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9448) Merge spark to trunk 1/23/15


 [ 
https://issues.apache.org/jira/browse/HIVE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9448:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

 Merge spark to trunk 1/23/15
 

 Key: HIVE-9448
 URL: https://issues.apache.org/jira/browse/HIVE-9448
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC15
 Fix For: 0.15.0

 Attachments: HIVE-9448.2.patch, HIVE-9448.3.patch, HIVE-9448.patch


 Merging latest spark changes to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6977) Delete Hiveserver1

2015-01-26 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-6977:

Fix Version/s: 1.0.0

 Delete Hiveserver1
 --

 Key: HIVE-6977
 URL: https://issues.apache.org/jira/browse/HIVE-6977
 Project: Hive
  Issue Type: Task
  Components: JDBC, Server Infrastructure
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
  Labels: TODOC15
 Fix For: 0.15.0, 1.0.0

 Attachments: HIVE-6977.1.patch, HIVE-6977.patch


 See mailing list discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests


[ 
https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292507#comment-14292507
 ] 

Hive QA commented on HIVE-9470:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694608/HIVE-9470.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2523/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2523/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2523/

Messages:
{noformat}
 This message was trimmed, see log for full details 
main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-shims-scheduler ---
[INFO] Compiling 1 source file to 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/classes
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-shims-scheduler ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-scheduler 
---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/tmp/conf
 [copy] Copying 10 files to 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-shims-scheduler ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
hive-shims-scheduler ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-shims-scheduler ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/hive-shims-scheduler-0.15.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-shims-scheduler ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ 
hive-shims-scheduler ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/hive-shims-scheduler-0.15.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-scheduler/0.15.0-SNAPSHOT/hive-shims-scheduler-0.15.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-scheduler/0.15.0-SNAPSHOT/hive-shims-scheduler-0.15.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Shims 0.15.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims ---
[INFO] Deleting 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator (includes = 
[datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-shims ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-shims ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-shims ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-shims ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/tmp
[mkdir] Created dir:

[jira] [Updated] (HIVE-9474) truncate table changes permissions on the target

2015-01-26 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-9474:
---
Description: 
Create a table test(a string); 

Change the /user/hive/warehouse/test.db/test  permission to something else 
other than the default, like 777.

Then truncate table test; 

The permission goes back to the default.



  was:
I created a test table in beeline : create table test(a string); 

Permissions: 

# file: /user/hive/warehouse/test.db/test 
# owner: aryan 
# group: hive 
user::rwx 
group::rwx 
other::--x 

Now, in beeline : truncate table test; 

Permissions are now : 
# file: /user/hive/warehouse/test.db/test 
# owner: aryan 
# group: hive 
user::rwx 
group::r-x 
other::r-x 

Group write permissions have disappeared!


 truncate table changes permissions on the target
 

 Key: HIVE-9474
 URL: https://issues.apache.org/jira/browse/HIVE-9474
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9474.1.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 Create a table test(a string); 
 Change the /user/hive/warehouse/test.db/test  permission to something else 
 other than the default, like 777.
 Then truncate table test; 
 The permission goes back to the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

[
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergio Peña updated HIVE-9333:
--
Description:
The serialize process on ParquetHiveSerDe parses a Hive object
to a Writable object by looping through all the Hive object children,
and creating new Writables objects per child. These final writables
objects are passed in to the Parquet writing function, and parsed again
on the DataWritableWriter class by looping through the ArrayWritable
object. These two loops (ParquetHiveSerDe.serialize() and
DataWritableWriter.write() may be reduced to use just one loop into the
DataWritableWriter.write() method in order to increment the writing process
speed for Hive parquet.

In order to achieve this, we can wrap the Hive object and object inspector
on ParquetHiveSerDe.serialize() method into an object that implements the
Writable object and thus avoid the loop that serialize() does, and leave the
loop parser to the DataWritableWriter.write() method. We can see how ORC does
this with the OrcSerde.OrcSerdeRow class.

Writable objects are organized differently on any kind of storage formats, so I
don't think it is necessary to create and keep the writable objects in the
serialize() method as they won't be used until the writing process starts
(DataWritableWriter.write()).

This performance issue was found using microbenchmark tests from HIVE-8121.

was:
The serialize process on ParquetHiveSerDe parses a Hive object
to a Writable object by looping through all the Hive object children,
and creating new Writables objects per child. These final writables
objects are passed in to the Parquet writing function, and parsed again
on the DataWritableWriter class by looping through the ArrayWritable
object. These two loops (ParquetHiveSerDe.serialize() and
DataWritableWriter.write() may be reduced to use just one loop into the
DataWritableWriter.write() method in order to increment the writing process
speed for Hive parquet.

We might save 200% of extra time by doing such change.
This performance issue was found using microbenchmark tests from HIVE-8121.

Move parquet serialize implementation to DataWritableWriter to improve write
speeds
---

Key: HIVE-9333
URL: https://issues.apache.org/jira/browse/HIVE-9333
Project: Hive
Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
Attachments: HIVE-9333.1.patch

The serialize process on ParquetHiveSerDe parses a Hive object
to a Writable object by looping through all the Hive object children,
and creating new Writables objects per child. These final writables
objects are passed in to the Parquet writing function, and parsed again
on the DataWritableWriter class by looping through the ArrayWritable
object. These two loops (ParquetHiveSerDe.serialize() and
DataWritableWriter.write() may be reduced to use just one loop into the
DataWritableWriter.write() method in order to increment the writing process
speed for Hive parquet.
In order to achieve this, we can wrap the Hive object and object inspector
on ParquetHiveSerDe.serialize() method into an object that implements the
Writable object and thus avoid the loop that serialize() does, and leave the
loop parser to the DataWritableWriter.write() method. We can see how ORC does
this with the OrcSerde.OrcSerdeRow class.
Writable objects are organized differently on any kind of storage formats, so
I don't think it is necessary to create and keep the writable objects in the
serialize() method as they won't be used until the writing process starts
(DataWritableWriter.write()).
This performance issue was found using microbenchmark tests from HIVE-8121.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds


 [ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9333:
--
Status: Open  (was: Patch Available)

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.1.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 We might save 200% of extra time by doing such change.
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds


 [ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9333:
--
Status: Patch Available  (was: Open)

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.1.patch, HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds


 [ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9333:
--
Attachment: (was: HIVE-9333.1.patch)

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds


 [ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9333:
--
Attachment: HIVE-9333.2.patch

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.1.patch, HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 30281: Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Sergio Pena


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30281/
---

(Updated Ene. 27, 2015, 1:39 a.m.)


Review request for hive, Ryan Blue, cheng xu, and Dong Chen.


Changes
---

I forgot to add the BYTE/DECIMAL implementation. This patch contains them.


Bugs: HIVE-9333
https://issues.apache.org/jira/browse/HIVE-9333


Repository: hive-git


Description
---

This patch moves the ParquetHiveSerDe.serialize() implementation to 
DataWritableWriter class in order to save time in materializing data on 
serialize().


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java 
ea4109d358f7c48d1e2042e5da299475de4a0a29 
  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
 060b1b722d32f3b2f88304a1a73eb249e150294b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 
41b5f1c3b0ab43f734f8a211e3e03d5060c75434 
  
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java
 e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
a693aff18516d133abf0aae4847d3fe00b9f1c96 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java
 667d3671547190d363107019cd9a2d105d26d336 
  ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 
007a665529857bcec612f638a157aa5043562a15 
  serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/30281/diff/


Testing
---

The tests run were the following:

1. JMH (Java microbenchmark)

This benchmark called parquet serialize/write methods using text writable 
objects. 

Class.method  Before Change (ops/s)  After Change (ops/s)   

---
ParquetHiveSerDe.serialize:  19,113   249,528   -  19x 
speed increase
DataWritableWriter.write: 5,033 5,201   -  
3.34% speed increase


2. Write 20 million rows (~1GB file) from Text to Parquet

I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format 
using the following
statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text;

Time (s) it took to write the whole file BEFORE changes: 93.758 s
Time (s) it took to write the whole file AFTER changes: 83.903 s

It got a 10% of speed inscrease.


Thanks,

Sergio Pena

Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]

2015-01-26 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30264/#review69728
---


Looks good, glad we can mostly re-use miniYarnCluster.  Some minor comments 
below.

I also agree with Xuefu we dont need more golden files by having some output 
directory.  I'm also ok with not running entire set of spark tests with every 
run, and just running the list we got from minimr.query.files, it is up to you 
guys.  Only note is, once we check this in, committer will also make some edits 
to the build machine files to run these in proper batches, o/w they will run 
the entire TestMiniSparkOnYarnCliDriver suite with one batch.


shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java
https://reviews.apache.org/r/30264/#comment114485

Need the apache license.



shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java
https://reviews.apache.org/r/30264/#comment114486

Need a comment.


- Szehon Ho


On Jan. 26, 2015, 6:37 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30264/
 ---
 
 (Updated Jan. 26, 2015, 6:37 a.m.)
 
 
 Review request for hive, Szehon Ho and Xuefu Zhang.
 
 
 Bugs: HIVE-9211
 https://issues.apache.org/jira/browse/HIVE-9211
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on 
 miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are 
 enabled in this unit test except 3 qfile: bucket_num_reducers.q, 
 bucket_num_reducers2.q, udf_using.q, which is not supported in HoS.
 
 
 Diffs
 -
 
   data/conf/spark/hive-site.xml 016f568 
   data/conf/spark/standalone/hive-site.xml PRE-CREATION 
   data/conf/spark/yarn-client/hive-site.xml PRE-CREATION 
   itests/pom.xml e1e88f6 
   itests/qtest-spark/pom.xml d12fad5 
   itests/src/test/resources/testconfiguration.properties f583aaf 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 41a2ab7 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/auto_sortmerge_join_16.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucket4.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucket5.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucket6.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/bucketizedhiveinputformat.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucketmapjoin6.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucketmapjoin7.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/constprog_partitioner.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/disable_merge_for_bucketing.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/empty_dir_in_table.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/external_table_with_space_in_location_path.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/file_with_header_footer.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/groupby1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/groupby2.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/import_exported_table.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/index_bitmap3.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/index_bitmap_auto.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_bucketed_table.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_dyn_part.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_map_operators.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_merge.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_num_buckets.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_reducers_power_two.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/input16_cc.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/join1.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/leftsemijoin_mr.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/list_bucket_dml_10.q.java1.7.out

[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9211:

Attachment: HIVE-9211.2-spark.patch

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]

2015-01-26 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30264/#review69734
---

Ship it!


Looks good to me, leave it to Xuefu to review the other comments.

- Szehon Ho


On Jan. 27, 2015, 2:03 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30264/
 ---
 
 (Updated Jan. 27, 2015, 2:03 a.m.)
 
 
 Review request for hive, Szehon Ho and Xuefu Zhang.
 
 
 Bugs: HIVE-9211
 https://issues.apache.org/jira/browse/HIVE-9211
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on 
 miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are 
 enabled in this unit test except 3 qfile: bucket_num_reducers.q, 
 bucket_num_reducers2.q, udf_using.q, which is not supported in HoS.
 
 
 Diffs
 -
 
   data/conf/spark/hive-site.xml 016f568 
   data/conf/spark/standalone/hive-site.xml PRE-CREATION 
   data/conf/spark/yarn-client/hive-site.xml PRE-CREATION 
   itests/pom.xml e1e88f6 
   itests/qtest-spark/pom.xml d12fad5 
   itests/src/test/resources/testconfiguration.properties f583aaf 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 41a2ab7 
   ql/src/test/results/clientpositive/spark/bucket5.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket6.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucketizedhiveinputformat.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/empty_dir_in_table.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/external_table_with_space_in_location_path.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/file_with_header_footer.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/import_exported_table.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/index_bitmap3.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/index_bitmap_auto.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/infer_bucket_sort_bucketed_table.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/infer_bucket_sort_dyn_part.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/infer_bucket_sort_map_operators.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/infer_bucket_sort_merge.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/infer_bucket_sort_num_buckets.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/infer_bucket_sort_reducers_power_two.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/input16_cc.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_10.q.java1.7.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/load_fs2.q.out PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/load_hdfs_file_with_space_in_the_name.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/parallel_orderby.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/quotedid_smb.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/reduce_deduplicate.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/remote_script.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/root_dir_external_table.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/schemeAuthority.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/schemeAuthority2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/temp_table_external.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/truncate_column_buckets.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/uber_reduce.q.out PRE-CREATION 
   shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 
 b17f465 
   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
 a61c3ac 
   
 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java
  PRE-CREATION 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 064304c 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 aea90db 
 
 Diff: https://reviews.apache.org/r/30264/diff/
 
 
 Testing

[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292875#comment-14292875
 ] 

Szehon Ho commented on HIVE-9211:
-

+1 looks good to me.  Xuefu was looking at this too, so will leave to him to 
review for the rest.

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Give query results

2015-01-26 Thread Joel Zambrano

Hi *. I need some help on finding out where hive stores results that are 
available in the fetch results thrift api. Do you know if there is a file 
written in the cluster for any given query? (Metadata, select query, query that 
triggers mr). Are they in memory or HDFS?

Thanks,
Joel

[jira] [Updated] (HIVE-9475) HiveMetastoreClient.tableExists does not work


 [ 
https://issues.apache.org/jira/browse/HIVE-9475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9475:
---
Fix Version/s: 0.15.0
 Assignee: Brock Noland
   Status: Patch Available  (was: Open)

 HiveMetastoreClient.tableExists does not work
 -

 Key: HIVE-9475
 URL: https://issues.apache.org/jira/browse/HIVE-9475
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Blocker
 Fix For: 0.15.0

 Attachments: HIVE-9475.1.patch


 We check the return value against null returning true if the return value is 
 null. This is reversed, we should return true if the value is not null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293036#comment-14293036
 ] 

Xuefu Zhang commented on HIVE-9211:
---

[~brocknoland], do you have any idea how Chengxiang may access the container 
logs?

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9476) Beeline fails to start on trunk

2015-01-26 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-9476:
---
Fix Version/s: 0.15.0

 Beeline fails to start on trunk
 ---

 Key: HIVE-9476
 URL: https://issues.apache.org/jira/browse/HIVE-9476
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.15.0
Reporter: Vaibhav Gumashta
Priority: Blocker
 Fix For: 0.15.0


 {code}
 vgumashta:hive vgumashta$ beeline --verbose=true
 [ERROR] Terminal initialization failed; falling back to unsupported
 java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but 
 interface was expected
   at jline.TerminalFactory.create(TerminalFactory.java:101)
   at jline.TerminalFactory.get(TerminalFactory.java:158)
   at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73)
   at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
   at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 class jline.Terminal, but interface was expected
   at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101)
   at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
   at 
 org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
   at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 Working fine on 14.1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9476) Beeline fails to start on trunk

2015-01-26 Thread Vaibhav Gumashta (JIRA)

Vaibhav Gumashta created HIVE-9476:
--

 Summary: Beeline fails to start on trunk
 Key: HIVE-9476
 URL: https://issues.apache.org/jira/browse/HIVE-9476
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.15.0
Reporter: Vaibhav Gumashta
Priority: Blocker


{code}
vgumashta:hive vgumashta$ beeline --verbose=true
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but 
interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73)
at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Exception in thread main java.lang.IncompatibleClassChangeError: Found class 
jline.Terminal, but interface was expected
at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101)
at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117)
at 
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{code}

Working fine on 14.1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9211:

Attachment: HIVE-9211.2-spark.patch

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9211:

Attachment: (was: HIVE-9211.2-spark.patch)

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9138) Add some explain to PTF operator


 [ 
https://issues.apache.org/jira/browse/HIVE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9138:

Attachment: HIVE-9138.2.patch.txt

 Add some explain to PTF operator
 

 Key: HIVE-9138
 URL: https://issues.apache.org/jira/browse/HIVE-9138
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9138.1.patch.txt, HIVE-9138.2.patch.txt


 PTFOperator does not explain anything in explain statement, making it hard to 
 understand the internal works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9474) truncate table changes permissions on the target


[ 
https://issues.apache.org/jira/browse/HIVE-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292876#comment-14292876
 ] 

Hive QA commented on HIVE-9474:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694667/HIVE-9474.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7400 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2526/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2526/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2526/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694667 - PreCommit-HIVE-TRUNK-Build

 truncate table changes permissions on the target
 

 Key: HIVE-9474
 URL: https://issues.apache.org/jira/browse/HIVE-9474
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-9474.1.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 Create a table test(a string); 
 Change the /user/hive/warehouse/test.db/test  permission to something else 
 other than the default, like 777.
 Then truncate table test; 
 The permission goes back to the default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9462) HIVE-8577 - breaks type evolution


[ 
https://issues.apache.org/jira/browse/HIVE-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292922#comment-14292922
 ] 

Brock Noland commented on HIVE-9462:


That test failed because the binary file is not in the patch.

 HIVE-8577 - breaks type evolution
 -

 Key: HIVE-9462
 URL: https://issues.apache.org/jira/browse/HIVE-9462
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9462.1.patch, HIVE-9462.2.patch, type_evolution.avro


 If you write an avro field out as {{int}} and then change it's type to 
 {{long}} you will get an {{UnresolvedUnionException}} due to code in 
 HIVE-8577.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293057#comment-14293057
 ] 

Chengxiang Li commented on HIVE-9211:
-

I work on Linux. 

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions


 [ 
https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9436:
---
Attachment: HIVE-9436.3.patch

 RetryingMetaStoreClient does not retry JDOExceptions
 

 Key: HIVE-9436
 URL: https://issues.apache.org/jira/browse/HIVE-9436
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch


 RetryingMetaStoreClient has a bug in the following bit of code:
 {code}
 } else if ((e.getCause() instanceof MetaException) 
 e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) {
   caughtException = (MetaException) e.getCause();
 } else {
   throw e.getCause();
 }
 {code}
 The bug here is that java String.matches matches the entire string to the 
 regex, and thus, that match will fail if the message contains anything before 
 or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we 
 should match (?s).\*JDO[a-zA-Z]\*Exception.\*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions


 [ 
https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9436:
---
Status: Patch Available  (was: Open)

 RetryingMetaStoreClient does not retry JDOExceptions
 

 Key: HIVE-9436
 URL: https://issues.apache.org/jira/browse/HIVE-9436
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch


 RetryingMetaStoreClient has a bug in the following bit of code:
 {code}
 } else if ((e.getCause() instanceof MetaException) 
 e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) {
   caughtException = (MetaException) e.getCause();
 } else {
   throw e.getCause();
 }
 {code}
 The bug here is that java String.matches matches the entire string to the 
 regex, and thus, that match will fail if the message contains anything before 
 or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we 
 should match (?s).\*JDO[a-zA-Z]\*Exception.\*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9425) External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li reassigned HIVE-9425:
---

Assignee: Chengxiang Li

 External Function Jar files are not available for Driver when running with 
 yarn-cluster mode [Spark Branch]
 ---

 Key: HIVE-9425
 URL: https://issues.apache.org/jira/browse/HIVE-9425
 Project: Hive
  Issue Type: Sub-task
  Components: spark-branch
Reporter: Xiaomin Zhang
Assignee: Chengxiang Li

 15/01/20 00:27:31 INFO cluster.YarnClusterScheduler: 
 YarnClusterScheduler.postStartHook done
 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar 
 (java.io.FileNotFoundException: hive-exec-0.15.0-SNAPSHOT.jar (No such file 
 or directory)), was the --addJars option used?
 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar 
 (java.io.FileNotFoundException: opennlp-maxent-3.0.3.jar (No such file or 
 directory)), was the --addJars option used?
 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar 
 (java.io.FileNotFoundException: bigbenchqueriesmr.jar (No such file or 
 directory)), was the --addJars option used?
 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar 
 (java.io.FileNotFoundException: opennlp-tools-1.5.3.jar (No such file or 
 directory)), was the --addJars option used?
 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar 
 (java.io.FileNotFoundException: jcl-over-slf4j-1.7.5.jar (No such file or 
 directory)), was the --addJars option used?
 15/01/20 00:27:31 INFO client.RemoteDriver: Received job request 
 fef081b0-5408-4804-9531-d131fdd628e6
 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.max.split.size is 
 deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.min.split.size is 
 deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
 15/01/20 00:27:31 INFO client.RemoteDriver: Failed to run job 
 fef081b0-5408-4804-9531-d131fdd628e6
 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
 class: de.bankmark.bigbench.queries.q10.SentimentUDF
 Serialization trace:
 genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc)
 conf (org.apache.hadoop.hive.ql.exec.UDTFOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
 invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
   at 
 org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
   at 
 org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
 It seems the additional Jar files are not uploaded to DistributedCache, so 
 that the Driver cannot access it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests

2015-01-26 Thread Dong Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292872#comment-14292872
 ] 

Dong Chen commented on HIVE-9470:
-

LGTM. +1 pending test.

 Use a generic writable object to run ColumnaStorageBench write/read tests 
 --

 Key: HIVE-9470
 URL: https://issues.apache.org/jira/browse/HIVE-9470
 Project: Hive
  Issue Type: Improvement
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9470.1.patch


 The ColumnarStorageBench benchmark class is using a Parquet writable object 
 to run all write/read/serialize/deserialize tests. It would be better to use 
 a more generic writable object (like text writables) to get better benchmark 
 results between format storages.
 Using parquet writables may add advantage when writing parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: 0.15 release

2015-01-26 Thread Brock Noland

Hi Alan,

Thank you very much for the clarification. Naming the branch next for now
is fine with me. I've done so.

Cheers,
Brock

On Mon, Jan 26, 2015 at 10:27 AM, Alan Gates ga...@hortonworks.com wrote:

 I'm not asking you to slow the work for the release.  If you want to
 branch, you can branch.  What I'm asking is to hold on the numbering
 scheme.  So you could name the branch 'next' or something and then rename
 once we come to agreement.  Consensus in the community is important, and we
 should avoid things that make that consensus harder.

 Alan.

   Brock Noland br...@cloudera.com
  January 26, 2015 at 9:17
 Hi Alan,

 In all of my experience at Apache, I have been encouraged to release.
 Contributors rightly want to see their hard work gets in the hands of the
 users. That's why they contribute after all. Many contributors who have
 features in trunk would like get those features out into the community.
 This is completely reasonable of them. After all they've invested
 significant time in this work.

 Thus I don't feel we should delay getting their contributions released
 while we debate 1.0. The two have nothing todo with each other. I've
 mentioned on the list and in person to Thejas that I wanted this release to
 specifically avoid the 1.x discussion so it did not get bogged down in the
 1.x discussion. Again, this is completely reasonable.

 In short, everything I have experienced at Apache indicates that the folks
 who want to release 0.15 should be free to do the work to make that happen.

 Brock


   Alan Gates ga...@hortonworks.com
  January 26, 2015 at 7:02
  Brock,

 Given there isn't consensus on numbering yet, could you holding off making
 the 0.15 branch.  We should come to a conclusion on whether we're doing
 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers.

 Alan.

   Brock Noland br...@cloudera.com
  January 20, 2015 at 21:25
 Just a reminder that I plan on branching on 1/26/2015 and start
 rolling release candidates on 2/9/2015. After branching I plan on
 merging only blockers.

 Brock
   Brock Noland br...@cloudera.com
  January 12, 2015 at 14:37
 Hi,

 Projects are instructed in the incubator that releases gain new users and
 other attention. Additionally, as discussed in this forum I'd like to
 increase the tempo of our release process[1].

 As such, I plan on following this process:

 1) Provide two weeks notice of branching
 2) Provide two weeks to find issues on the branch and merging only blockers
 3) Roll release candidates until a release vote passes

 As such I plan on branching on 1/26/2015 and start rolling release
 candidates on 2/9/2015.

 Cheers,
 Brock

 1. Note I am not complaining as I did not help with releases until this
 point.


 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions


[ 
https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292991#comment-14292991
 ] 

Hive QA commented on HIVE-9436:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694681/HIVE-9436.3.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7397 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2528/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2528/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2528/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694681 - PreCommit-HIVE-TRUNK-Build

 RetryingMetaStoreClient does not retry JDOExceptions
 

 Key: HIVE-9436
 URL: https://issues.apache.org/jira/browse/HIVE-9436
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.1
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch


 RetryingMetaStoreClient has a bug in the following bit of code:
 {code}
 } else if ((e.getCause() instanceof MetaException) 
 e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) {
   caughtException = (MetaException) e.getCause();
 } else {
   throw e.getCause();
 }
 {code}
 The bug here is that java String.matches matches the entire string to the 
 regex, and thus, that match will fail if the message contains anything before 
 or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we 
 should match (?s).\*JDO[a-zA-Z]\*Exception.\*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9211:
---
Attachment: HIVE-9211.2-spark.patch

I modified the test framework locally so we should have a source directory 
for the failed test.

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch, 
 HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9462) HIVE-8577 - breaks type evolution


[ 
https://issues.apache.org/jira/browse/HIVE-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292945#comment-14292945
 ] 

Xuefu Zhang commented on HIVE-9462:
---

+1

 HIVE-8577 - breaks type evolution
 -

 Key: HIVE-9462
 URL: https://issues.apache.org/jira/browse/HIVE-9462
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9462.1.patch, HIVE-9462.2.patch, HIVE-9462.3.patch, 
 type_evolution.avro


 If you write an avro field out as {{int}} and then change it's type to 
 {{long}} you will get an {{UnresolvedUnionException}} due to code in 
 HIVE-8577.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.


[ 
https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293062#comment-14293062
 ] 

Szehon Ho commented on HIVE-6308:
-

+1, thanks for adding unit test

 COLUMNS_V2 Metastore table not populated for tables created without an 
 explicit column list.
 

 Key: HIVE-6308
 URL: https://issues.apache.org/jira/browse/HIVE-6308
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.10.0
Reporter: Alexander Behm
Assignee: Yongzhi Chen
 Attachments: HIVE-6308.1.patch


 Consider this example table:
 CREATE TABLE avro_test
 ROW FORMAT SERDE
 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED as INPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES (
 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc');
 When I try to run an ANALYZE TABLE for computing column stats on any of the 
 columns, then I get:
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 NoSuchObjectException(message:Column o_orderpriority for which stats 
 gathering is requested doesn't exist.)
 at 
 org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331)
 at 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't 
 populated properly during the table creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions


 [ 
https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9436:
---
Status: Open  (was: Patch Available)

Cancelling patch to resubmit an identical .3.patch so that the precommit tests 
don't skip it after the test issues we've had in the past couple of days.

 RetryingMetaStoreClient does not retry JDOExceptions
 

 Key: HIVE-9436
 URL: https://issues.apache.org/jira/browse/HIVE-9436
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9436.2.patch, HIVE-9436.patch


 RetryingMetaStoreClient has a bug in the following bit of code:
 {code}
 } else if ((e.getCause() instanceof MetaException) 
 e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) {
   caughtException = (MetaException) e.getCause();
 } else {
   throw e.getCause();
 }
 {code}
 The bug here is that java String.matches matches the entire string to the 
 regex, and thus, that match will fail if the message contains anything before 
 or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we 
 should match (?s).\*JDO[a-zA-Z]\*Exception.\*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]

2015-01-26 Thread chengxiang li


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30264/
---

(Updated Jan. 27, 2015, 2:03 a.m.)


Review request for hive, Szehon Ho and Xuefu Zhang.


Changes
---

fixed commented issues.


Bugs: HIVE-9211
https://issues.apache.org/jira/browse/HIVE-9211


Repository: hive-git


Description
---

MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on 
miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are 
enabled in this unit test except 3 qfile: bucket_num_reducers.q, 
bucket_num_reducers2.q, udf_using.q, which is not supported in HoS.


Diffs (updated)
-

  data/conf/spark/hive-site.xml 016f568 
  data/conf/spark/standalone/hive-site.xml PRE-CREATION 
  data/conf/spark/yarn-client/hive-site.xml PRE-CREATION 
  itests/pom.xml e1e88f6 
  itests/qtest-spark/pom.xml d12fad5 
  itests/src/test/resources/testconfiguration.properties f583aaf 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
41a2ab7 
  ql/src/test/results/clientpositive/spark/bucket5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/bucket6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/bucketizedhiveinputformat.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/empty_dir_in_table.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/external_table_with_space_in_location_path.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/file_with_header_footer.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/import_exported_table.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/index_bitmap3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/index_bitmap_auto.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/infer_bucket_sort_bucketed_table.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/infer_bucket_sort_dyn_part.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/infer_bucket_sort_map_operators.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/infer_bucket_sort_merge.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/infer_bucket_sort_num_buckets.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/infer_bucket_sort_reducers_power_two.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/input16_cc.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/list_bucket_dml_10.q.java1.7.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/load_fs2.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/load_hdfs_file_with_space_in_the_name.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/parallel_orderby.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_1.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_2.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/quotedid_smb.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/reduce_deduplicate.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/remote_script.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/root_dir_external_table.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/schemeAuthority.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/schemeAuthority2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/temp_table_external.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/truncate_column_buckets.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/uber_reduce.q.out PRE-CREATION 
  shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 
b17f465 
  shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
a61c3ac 
  
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java
 PRE-CREATION 
  shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
064304c 
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
aea90db 

Diff: https://reviews.apache.org/r/30264/diff/


Testing
---


Thanks,

chengxiang li

Re: Review Request 29196: Add some explain to PTF operator

2015-01-26 Thread Navis Ryu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29196/
---

(Updated Jan. 27, 2015, 2:14 a.m.)


Review request for hive.


Bugs: HIVE-9138
https://issues.apache.org/jira/browse/HIVE-9138


Repository: hive-git


Description
---

PTFOperator does not explain anything in explain statement, making it hard to 
understand the internal works. 


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 12fcd6a 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 8e00ee3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java daf6cb8 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
57ce849 
  ql/src/java/org/apache/hadoop/hive/ql/plan/Explain.java a3408a0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java 3ac3245 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java b62ffed 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/BoundaryDef.java 07590c0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/OrderExpressionDef.java 
e367d13 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PTFExpressionDef.java 5d200fb 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PTFInputDef.java 19ed2f2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PTFQueryInputDef.java 11ef932 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PartitionedTableFunctionDef.java 
327304c 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowExpressionDef.java 
b96e9d6 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowFrameDef.java 949ed10 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowFunctionDef.java e4ea358 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowTableFunctionDef.java 
083aaf2 
  ql/src/test/queries/clientpositive/ptf_matchpath.q 80dbe29 
  ql/src/test/results/clientpositive/correlationoptimizer12.q.out c32e41e 
  ql/src/test/results/clientpositive/ctas_colname.q.out 95c7acb 
  ql/src/test/results/clientpositive/groupby_resolution.q.out c611f7d 
  ql/src/test/results/clientpositive/ptf.q.out f678035 
  ql/src/test/results/clientpositive/ptf_matchpath.q.out e0cea0d 
  ql/src/test/results/clientpositive/ptf_streaming.q.out 9cf645d 
  ql/src/test/results/clientpositive/quotedid_basic.q.out b8cd4e9 
  ql/src/test/results/clientpositive/spark/ptf.q.out 8ca5496 
  ql/src/test/results/clientpositive/spark/ptf_matchpath.q.out e0cea0d 
  ql/src/test/results/clientpositive/spark/ptf_streaming.q.out f5ee72d 
  ql/src/test/results/clientpositive/spark/subquery_in.q.out 51b92a3 
  ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 020fdff 
  ql/src/test/results/clientpositive/subquery_in.q.out a2235af 
  ql/src/test/results/clientpositive/subquery_in_having.q.out 03cc2af 
  ql/src/test/results/clientpositive/subquery_notin.q.out 599a61e 
  ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out 06d5708 
  ql/src/test/results/clientpositive/tez/ptf.q.out 6f9dd91 
  ql/src/test/results/clientpositive/tez/ptf_matchpath.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/tez/ptf_streaming.q.out a935ef6 
  ql/src/test/results/clientpositive/tez/subquery_in.q.out 8bc7892 
  ql/src/test/results/clientpositive/tez/vectorized_ptf.q.out a814849 
  ql/src/test/results/clientpositive/vectorized_ptf.q.out 1e3c43c 
  ql/src/test/results/clientpositive/windowing_streaming.q.out ac9e180 

Diff: https://reviews.apache.org/r/29196/diff/


Testing
---


Thanks,

Navis Ryu

Re: Review Request 30151: Remove Extract Operator its friends from codebase.

2015-01-26 Thread Navis Ryu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30151/#review69732
---


This is huge cleanup. Good work!


ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out
https://reviews.apache.org/r/30151/#comment114490

Any idea why the plan is changed so much?


- Navis Ryu


On Jan. 24, 2015, 6:08 p.m., Ashutosh Chauhan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30151/
 ---
 
 (Updated Jan. 24, 2015, 6:08 p.m.)
 
 
 Review request for hive and Navis Ryu.
 
 
 Bugs: HIVE-9416
 https://issues.apache.org/jira/browse/HIVE-9416
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Remove Extract Operator  its friends from codebase.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java c299d3a 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java f3c382a 
   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 2e6a880 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9ed2c61 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExtractOperator.java 
 7f4bb64 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java
  24ca89f 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
  e16ba6c 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
  dc906e8 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
  3fead79 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/OpProcFactory.java 
 d6a6ed6 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java
  7954767 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java
  cf02bec 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 94b4621 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 4364f28 
   ql/src/java/org/apache/hadoop/hive/ql/plan/ExtractDesc.java 6762155 
   ql/src/java/org/apache/hadoop/hive/ql/plan/SelectDesc.java fa6b548 
   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 41862e6 
   ql/src/test/results/clientpositive/bucket1.q.out 13ec735 
   ql/src/test/results/clientpositive/bucket2.q.out 32a77c3 
   ql/src/test/results/clientpositive/bucket3.q.out ff7173e 
   ql/src/test/results/clientpositive/bucket4.q.out b99d12f 
   ql/src/test/results/clientpositive/bucket5.q.out 5992d6d 
   ql/src/test/results/clientpositive/bucket6.q.out 5b23d7d 
   ql/src/test/results/clientpositive/bucketsortoptimize_insert_1.q.out 
 75de953 
   ql/src/test/results/clientpositive/bucketsortoptimize_insert_2.q.out 
 599b8b9 
   ql/src/test/results/clientpositive/bucketsortoptimize_insert_3.q.out 
 7456ab0 
   ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out 
 fd99597 
   ql/src/test/results/clientpositive/bucketsortoptimize_insert_5.q.out 
 8130ab9 
   ql/src/test/results/clientpositive/bucketsortoptimize_insert_6.q.out 
 627aba0 
   ql/src/test/results/clientpositive/disable_merge_for_bucketing.q.out 
 9b058c8 
   ql/src/test/results/clientpositive/dynpart_sort_opt_vectorization.q.out 
 32e0745 
   ql/src/test/results/clientpositive/dynpart_sort_optimization.q.out 494bfa3 
   
 ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_dynamic.q.out
  b6e7b88 
   
 ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_static.q.out
  fc6d2ae 
   ql/src/test/results/clientpositive/load_dyn_part2.q.out 26f318a 
   ql/src/test/results/clientpositive/ptf.q.out f678035 
   ql/src/test/results/clientpositive/ptf_streaming.q.out 9cf645d 
   ql/src/test/results/clientpositive/smb_mapjoin_20.q.out 999dabd 
   ql/src/test/results/clientpositive/smb_mapjoin_21.q.out 539b70e 
   ql/src/test/results/clientpositive/spark/bucket2.q.out 5eb28fa 
   ql/src/test/results/clientpositive/spark/bucket3.q.out 1b1010a 
   ql/src/test/results/clientpositive/spark/bucket4.q.out 7dd49ac 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_2.q.out 
 365306e 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_4.q.out 
 3846de7 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_6.q.out 
 5b559c4 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_7.q.out 
 cefc6aa 
   ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_8.q.out 
 ca44d7c 
   ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out 
 3864c44 
   ql/src/test/results/clientpositive/spark/load_dyn_part2.q.out a8cef34

Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]

2015-01-26 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30264/#review69744
---

Ship it!


Ship It!

- Xuefu Zhang


On Jan. 27, 2015, 2:03 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30264/
 ---
 
 (Updated Jan. 27, 2015, 2:03 a.m.)
 
 
 Review request for hive, Szehon Ho and Xuefu Zhang.
 
 
 Bugs: HIVE-9211
 https://issues.apache.org/jira/browse/HIVE-9211
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on 
 miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are 
 enabled in this unit test except 3 qfile: bucket_num_reducers.q, 
 bucket_num_reducers2.q, udf_using.q, which is not supported in HoS.
 
 
 Diffs
 -
 
   data/conf/spark/hive-site.xml 016f568 
   data/conf/spark/standalone/hive-site.xml PRE-CREATION 
   data/conf/spark/yarn-client/hive-site.xml PRE-CREATION 
   itests/pom.xml e1e88f6 
   itests/qtest-spark/pom.xml d12fad5 
   itests/src/test/resources/testconfiguration.properties f583aaf 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 41a2ab7 
   ql/src/test/results/clientpositive/spark/bucket5.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucket6.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/bucketizedhiveinputformat.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/empty_dir_in_table.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/external_table_with_space_in_location_path.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/file_with_header_footer.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/import_exported_table.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/index_bitmap3.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/index_bitmap_auto.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/infer_bucket_sort_bucketed_table.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/infer_bucket_sort_dyn_part.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/infer_bucket_sort_map_operators.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/infer_bucket_sort_merge.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/infer_bucket_sort_num_buckets.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/infer_bucket_sort_reducers_power_two.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/input16_cc.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/list_bucket_dml_10.q.java1.7.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/load_fs2.q.out PRE-CREATION 
   
 ql/src/test/results/clientpositive/spark/load_hdfs_file_with_space_in_the_name.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/spark/parallel_orderby.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/quotedid_smb.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/reduce_deduplicate.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/remote_script.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/root_dir_external_table.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/schemeAuthority.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/spark/schemeAuthority2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/temp_table_external.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/truncate_column_buckets.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/spark/uber_reduce.q.out PRE-CREATION 
   shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 
 b17f465 
   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
 a61c3ac 
   
 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java
  PRE-CREATION 
   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
 064304c 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 aea90db 
 
 Diff: https://reviews.apache.org/r/30264/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 chengxiang li

[jira] [Updated] (HIVE-9397) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS


 [ 
https://issues.apache.org/jira/browse/HIVE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9397:

Attachment: HIVE-9397.1.patch.txt

 SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS
 

 Key: HIVE-9397
 URL: https://issues.apache.org/jira/browse/HIVE-9397
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.15.0
Reporter: Damien Carol
 Attachments: HIVE-9397.1.patch.txt


 These queries produce an error :
 {code:sql}
 DROP TABLE IF EXISTS foo;
 CREATE TABLE foo (id int) STORED AS ORC;
 INSERT INTO TABLE foo VALUES (1);
 INSERT INTO TABLE foo VALUES (2);
 INSERT INTO TABLE foo VALUES (3);
 INSERT INTO TABLE foo VALUES (4);
 INSERT INTO TABLE foo VALUES (5);
 SELECT max(id) FROM foo;
 ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS id;
 SELECT max(id) FROM foo;
 {code}
 The last query throws {{org.apache.hive.service.cli.HiveSQLException}}
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino SELECT max(id) FROM foo;
 +-+--+
 | _c0 |
 +-+--+
 org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException
 0: jdbc:hive2://nc-h04:1/casino
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9397) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS


 [ 
https://issues.apache.org/jira/browse/HIVE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9397:

Assignee: Navis
  Status: Patch Available  (was: Open)

 SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS
 

 Key: HIVE-9397
 URL: https://issues.apache.org/jira/browse/HIVE-9397
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.15.0
Reporter: Damien Carol
Assignee: Navis
 Attachments: HIVE-9397.1.patch.txt


 These queries produce an error :
 {code:sql}
 DROP TABLE IF EXISTS foo;
 CREATE TABLE foo (id int) STORED AS ORC;
 INSERT INTO TABLE foo VALUES (1);
 INSERT INTO TABLE foo VALUES (2);
 INSERT INTO TABLE foo VALUES (3);
 INSERT INTO TABLE foo VALUES (4);
 INSERT INTO TABLE foo VALUES (5);
 SELECT max(id) FROM foo;
 ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS id;
 SELECT max(id) FROM foo;
 {code}
 The last query throws {{org.apache.hive.service.cli.HiveSQLException}}
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino SELECT max(id) FROM foo;
 +-+--+
 | _c0 |
 +-+--+
 org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException
 0: jdbc:hive2://nc-h04:1/casino
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9438) The standalone-jdbc jar missing some jars


[ 
https://issues.apache.org/jira/browse/HIVE-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293065#comment-14293065
 ] 

Szehon Ho commented on HIVE-9438:
-

+1

 The standalone-jdbc jar missing some jars
 -

 Key: HIVE-9438
 URL: https://issues.apache.org/jira/browse/HIVE-9438
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Priority: Blocker
 Fix For: 0.15.0

 Attachments: HIVE-9438.1.patch


 The standalone-jdbc jar does not contain all the jars required for secure 
 connections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 30281: Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread cheng xu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30281/#review69723
---


Thank you for your patch. I have serveral general questions as follows.


ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
https://reviews.apache.org/r/30281/#comment114475

If compressionType is unneeded, this annotation may be removed as well.



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
https://reviews.apache.org/r/30281/#comment114474

Why remove compressionType code here?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
https://reviews.apache.org/r/30281/#comment114487

Why not define writeGroupFields with a parameter of ParquetWritable instead 
of parsing in object and objectInspector seperatedly?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
https://reviews.apache.org/r/30281/#comment114488

Assume if i%2 equals 0, it means the key. And only the key's value is not 
null, we'll write the value. What if comes a null value for both the key and 
value? Can we use the way like the original way that pass in the writable 
object and handle the null value case in the writeValue method. The code can 
become more simple and easy to understand.


- cheng xu


On Jan. 27, 2015, 1:39 a.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30281/
 ---
 
 (Updated Jan. 27, 2015, 1:39 a.m.)
 
 
 Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
 
 
 Bugs: HIVE-9333
 https://issues.apache.org/jira/browse/HIVE-9333
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch moves the ParquetHiveSerDe.serialize() implementation to 
 DataWritableWriter class in order to save time in materializing data on 
 serialize().
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
  ea4109d358f7c48d1e2042e5da299475de4a0a29 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
 9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
  060b1b722d32f3b2f88304a1a73eb249e150294b 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
  41b5f1c3b0ab43f734f8a211e3e03d5060c75434 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java
  e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
 a693aff18516d133abf0aae4847d3fe00b9f1c96 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java
  667d3671547190d363107019cd9a2d105d26d336 
   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 
 007a665529857bcec612f638a157aa5043562a15 
   serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30281/diff/
 
 
 Testing
 ---
 
 The tests run were the following:
 
 1. JMH (Java microbenchmark)
 
 This benchmark called parquet serialize/write methods using text writable 
 objects. 
 
 Class.method  Before Change (ops/s)  After Change (ops/s) 
   
 ---
 ParquetHiveSerDe.serialize:  19,113   249,528   -  
 19x speed increase
 DataWritableWriter.write: 5,033 5,201   -  
 3.34% speed increase
 
 
 2. Write 20 million rows (~1GB file) from Text to Parquet
 
 I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format 
 using the following
 statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text;
 
 Time (s) it took to write the whole file BEFORE changes: 93.758 s
 Time (s) it took to write the whole file AFTER changes: 83.903 s
 
 It got a 10% of speed inscrease.
 
 
 Thanks,
 
 Sergio Pena

[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds


[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292936#comment-14292936
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694678/HIVE-9333.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7394 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2527/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694678 - PreCommit-HIVE-TRUNK-Build

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292934#comment-14292934
 ] 

Xuefu Zhang commented on HIVE-9211:
---

+1 pending on test.

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9397) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS


[ 
https://issues.apache.org/jira/browse/HIVE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292959#comment-14292959
 ] 

Navis commented on HIVE-9397:
-

CLI uses OI in fetch task to read rows, which is provided by StatsOptimizer. 
But beeline uses schema in SemanticAnalyzer which is acquired from RR of final 
FS. I think StatsOptimizer should not override return schema.

 SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS
 

 Key: HIVE-9397
 URL: https://issues.apache.org/jira/browse/HIVE-9397
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.15.0
Reporter: Damien Carol
Assignee: Navis
 Attachments: HIVE-9397.1.patch.txt


 These queries produce an error :
 {code:sql}
 DROP TABLE IF EXISTS foo;
 CREATE TABLE foo (id int) STORED AS ORC;
 INSERT INTO TABLE foo VALUES (1);
 INSERT INTO TABLE foo VALUES (2);
 INSERT INTO TABLE foo VALUES (3);
 INSERT INTO TABLE foo VALUES (4);
 INSERT INTO TABLE foo VALUES (5);
 SELECT max(id) FROM foo;
 ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS id;
 SELECT max(id) FROM foo;
 {code}
 The last query throws {{org.apache.hive.service.cli.HiveSQLException}}
 {noformat}
 0: jdbc:hive2://nc-h04:1/casino SELECT max(id) FROM foo;
 +-+--+
 | _c0 |
 +-+--+
 org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException
 0: jdbc:hive2://nc-h04:1/casino
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293039#comment-14293039
 ] 

Brock Noland commented on HIVE-9211:


Hmm yeah that is not easy todo. I used to have ptest copy failed tests back to 
the master but a bad patch easily filled up the disk.

[~chengxiang li] - are you testing locally on Linux or Mac?

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]
 --

 Key: HIVE-9211
 URL: https://issues.apache.org/jira/browse/HIVE-9211
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M5
 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch


 HoS on YARN is a common use case in product environment, we'd better enable 
 unit test for this case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9462) HIVE-8577 - breaks type evolution


[ 
https://issues.apache.org/jira/browse/HIVE-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292781#comment-14292781
 ] 

Hive QA commented on HIVE-9462:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694647/HIVE-9462.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7400 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_type_evolution
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2524/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2524/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2524/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694647 - PreCommit-HIVE-TRUNK-Build

 HIVE-8577 - breaks type evolution
 -

 Key: HIVE-9462
 URL: https://issues.apache.org/jira/browse/HIVE-9462
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9462.1.patch, HIVE-9462.2.patch, type_evolution.avro


 If you write an avro field out as {{int}} and then change it's type to 
 {{long}} you will get an {{UnresolvedUnionException}} due to code in 
 HIVE-8577.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292844#comment-14292844
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch. I have left some general questions in the review 
board.

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292940#comment-14292940
 ] 

Hive QA commented on HIVE-9211:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694685/HIVE-9211.2-spark.patch

{color:red}ERROR:{color} -1 due to 40 failed/errored test(s), 7404 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/685/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/685/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 40 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694685 - PreCommit-HIVE-SPARK-Build

 Research on build mini HoS cluster on YARN for unit test[Spark Branch]

[jira] [Commented] (HIVE-9462) HIVE-8577 - breaks type evolution


[ 
https://issues.apache.org/jira/browse/HIVE-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292939#comment-14292939
 ] 

Xuefu Zhang commented on HIVE-9462:
---

Patch looks good. One minor observation: if the datum is null, then the 
datumClazz will be null, and then the logged message will have null. I'm not 
sure if this is intended.

 HIVE-8577 - breaks type evolution
 -

 Key: HIVE-9462
 URL: https://issues.apache.org/jira/browse/HIVE-9462
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9462.1.patch, HIVE-9462.2.patch, type_evolution.avro


 If you write an avro field out as {{int}} and then change it's type to 
 {{long}} you will get an {{UnresolvedUnionException}} due to code in 
 HIVE-8577.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 30281: Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30281/#review69748
---



ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java
https://reviews.apache.org/r/30281/#comment114550

Are these tests being replaced by something?


- Brock Noland


On Jan. 27, 2015, 1:39 a.m., Sergio Pena wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30281/
 ---
 
 (Updated Jan. 27, 2015, 1:39 a.m.)
 
 
 Review request for hive, Ryan Blue, cheng xu, and Dong Chen.
 
 
 Bugs: HIVE-9333
 https://issues.apache.org/jira/browse/HIVE-9333
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 This patch moves the ParquetHiveSerDe.serialize() implementation to 
 DataWritableWriter class in order to save time in materializing data on 
 serialize().
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java
  ea4109d358f7c48d1e2042e5da299475de4a0a29 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
 9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java
  060b1b722d32f3b2f88304a1a73eb249e150294b 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java
  41b5f1c3b0ab43f734f8a211e3e03d5060c75434 
   
 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java
  e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
 a693aff18516d133abf0aae4847d3fe00b9f1c96 
   
 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java
  667d3671547190d363107019cd9a2d105d26d336 
   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 
 007a665529857bcec612f638a157aa5043562a15 
   serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30281/diff/
 
 
 Testing
 ---
 
 The tests run were the following:
 
 1. JMH (Java microbenchmark)
 
 This benchmark called parquet serialize/write methods using text writable 
 objects. 
 
 Class.method  Before Change (ops/s)  After Change (ops/s) 
   
 ---
 ParquetHiveSerDe.serialize:  19,113   249,528   -  
 19x speed increase
 DataWritableWriter.write: 5,033 5,201   -  
 3.34% speed increase
 
 
 2. Write 20 million rows (~1GB file) from Text to Parquet
 
 I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format 
 using the following
 statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text;
 
 Time (s) it took to write the whole file BEFORE changes: 93.758 s
 Time (s) it took to write the whole file AFTER changes: 83.903 s
 
 It got a 10% of speed inscrease.
 
 
 Thanks,
 
 Sergio Pena

[jira] [Created] (HIVE-9475) HiveMetastoreClient.tableExists does not work

Brock Noland created HIVE-9475:
--

 Summary: HiveMetastoreClient.tableExists does not work
 Key: HIVE-9475
 URL: https://issues.apache.org/jira/browse/HIVE-9475
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Brock Noland
Priority: Blocker


We check the return value against null returning true if the return value is 
null. This is reversed, we should return true if the value is not null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8890) HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator recipe

2015-01-26 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8890:
---
Fix Version/s: 0.15.0

 HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator 
 recipe
 

 Key: HIVE-8890
 URL: https://issues.apache.org/jira/browse/HIVE-8890
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.1
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-8890.1.patch, HIVE-8890.2.patch


 Using this recipe gives better reliability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9138) Add some explain to PTF operator


[ 
https://issues.apache.org/jira/browse/HIVE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293055#comment-14293055
 ] 

Hive QA commented on HIVE-9138:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694686/HIVE-9138.2.patch.txt

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7400 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_window
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6_subq
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_matchpath
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2529/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2529/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2529/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694686 - PreCommit-HIVE-TRUNK-Build

 Add some explain to PTF operator
 

 Key: HIVE-9138
 URL: https://issues.apache.org/jira/browse/HIVE-9138
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9138.1.patch.txt, HIVE-9138.2.patch.txt


 PTFOperator does not explain anything in explain statement, making it hard to 
 understand the internal works. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]

2015-01-26 Thread chengxiang li



 On 一月 26, 2015, 10:30 p.m., Xuefu Zhang wrote:
  data/conf/spark/yarn-client/hive-site.xml, line 225
  https://reviews.apache.org/r/30264/diff/1/?file=834064#file834064line225
 
  Only one executor? Maybe 2 will make it more general.

Yes, that make sense.


On 一月 26, 2015, 10:30 p.m., chengxiang li wrote:
  I'm wondering why we have a new set of .out files?

Every Test*CliDriver has its own output directory, i didn't think much about 
this previously. With your remind, i think, yes, we could share the golden 
files with TestSparkCliDriver, as it's golden file should be the same as 
TestMiniSparkOnYarnCliDriver for each qtest. One thing more to be note here is 
that, as spark.query.files contains more than 500 qtests, and it take long 
enough time to run a full unit test for hive now, i didn't enable all 
spark.query.files qtest for TestMiniSparkOnYarnCliDriver, instead, i enable 
qtests from minimr,query.files for it, which contains about 50 qtests, it takes 
about 10 minutes in my own desktop.


- chengxiang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30264/#review69685
---


On 一月 26, 2015, 6:37 a.m., chengxiang li wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30264/
 ---
 
 (Updated 一月 26, 2015, 6:37 a.m.)
 
 
 Review request for hive, Szehon Ho and Xuefu Zhang.
 
 
 Bugs: HIVE-9211
 https://issues.apache.org/jira/browse/HIVE-9211
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on 
 miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are 
 enabled in this unit test except 3 qfile: bucket_num_reducers.q, 
 bucket_num_reducers2.q, udf_using.q, which is not supported in HoS.
 
 
 Diffs
 -
 
   data/conf/spark/hive-site.xml 016f568 
   data/conf/spark/standalone/hive-site.xml PRE-CREATION 
   data/conf/spark/yarn-client/hive-site.xml PRE-CREATION 
   itests/pom.xml e1e88f6 
   itests/qtest-spark/pom.xml d12fad5 
   itests/src/test/resources/testconfiguration.properties f583aaf 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd 
   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 
 41a2ab7 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/auto_sortmerge_join_16.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucket4.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucket5.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucket6.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/bucketizedhiveinputformat.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucketmapjoin6.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/bucketmapjoin7.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/constprog_partitioner.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/disable_merge_for_bucketing.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/empty_dir_in_table.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/external_table_with_space_in_location_path.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/file_with_header_footer.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/groupby1.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/groupby2.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/import_exported_table.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/index_bitmap3.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/index_bitmap_auto.q.out 
 PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_bucketed_table.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_dyn_part.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_map_operators.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_merge.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_num_buckets.q.out
  PRE-CREATION 
   
 ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_reducers_power_two.q.out
  PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/input16_cc.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/miniSparkOnYarn/join1.q.out PRE-CREATION

[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]