[jira] [Commented] (HIVE-9100) HiveServer2 fail to connect to MetaStore after MetaStore restarting
[ https://issues.apache.org/jira/browse/HIVE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291782#comment-14291782 ] Nemon Lou commented on HIVE-9100: - Mariusz Strzelecki is right.After changing metastore's TokenStore from memory to DB,the error disappears.Thanks, Mariusz Strzelecki. HiveServer2 fail to connect to MetaStore after MetaStore restarting Key: HIVE-9100 URL: https://issues.apache.org/jira/browse/HIVE-9100 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2, Security Affects Versions: 0.14.0, 0.13.1 Reporter: Nemon Lou Attachments: hiveserver2.log, metastore.log Secure cluster with kerberos,remote metastore. How to reproduce : 1,use beeline to connect to HiveServer2 2,restart the MetaStore process 3,type command like 'show tables' in beeline Client side will report this error: {quote} Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: DIGEST-MD5: IO error acquiring password at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190) {quote} HiveServer2's log and metastore's log are uploaded as attachments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9388) HiveServer2 fails to reconnect to MetaStore after MetaStore restart
[ https://issues.apache.org/jira/browse/HIVE-9388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291838#comment-14291838 ] Hive QA commented on HIVE-9388: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694539/HIVE-9388.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7365 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2521/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2521/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2521/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694539 - PreCommit-HIVE-TRUNK-Build HiveServer2 fails to reconnect to MetaStore after MetaStore restart --- Key: HIVE-9388 URL: https://issues.apache.org/jira/browse/HIVE-9388 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0, 0.14.0, 0.13.1 Reporter: Piotr Ackermann Attachments: HIVE-9388.patch How to reproduce: # Use Hue to connect to HiveServer2 # Restart Metastore # Try to execute any query in Hue HiveServer2 report error: {quote} ERROR hive.log: Got exception: org.apache.thrift.transport.TTransportException null org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:355) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:432) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:414) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:837) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:90) at com.sun.proxy.$Proxy10.getDatabases(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1681) at com.sun.proxy.$Proxy10.getDatabases(Unknown Source) at org.apache.hive.service.cli.operation.GetSchemasOperation.run(GetSchemasOperation.java:62) at org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715) at org.apache.hive.service.cli.session.HiveSessionImpl.getSchemas(HiveSessionImpl.java:438) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) at
Re: 0.15 release
Brock, Given there isn't consensus on numbering yet, could you holding off making the 0.15 branch. We should come to a conclusion on whether we're doing 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers. Alan. Brock Noland mailto:br...@cloudera.com January 20, 2015 at 21:25 Just a reminder that I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. After branching I plan on merging only blockers. Brock Brock Noland mailto:br...@cloudera.com January 12, 2015 at 14:37 Hi, Projects are instructed in the incubator that releases gain new users and other attention. Additionally, as discussed in this forum I'd like to increase the tempo of our release process[1]. As such, I plan on following this process: 1) Provide two weeks notice of branching 2) Provide two weeks to find issues on the branch and merging only blockers 3) Roll release candidates until a release vote passes As such I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. Cheers, Brock 1. Note I am not complaining as I did not help with releases until this point. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Resolved] (HIVE-9100) HiveServer2 fail to connect to MetaStore after MetaStore restarting
[ https://issues.apache.org/jira/browse/HIVE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-9100. Resolution: Invalid HiveServer2 fail to connect to MetaStore after MetaStore restarting Key: HIVE-9100 URL: https://issues.apache.org/jira/browse/HIVE-9100 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2, Security Reporter: Nemon Lou Attachments: hiveserver2.log, metastore.log Secure cluster with kerberos,remote metastore. How to reproduce : 1,use beeline to connect to HiveServer2 2,restart the MetaStore process 3,type command like 'show tables' in beeline Client side will report this error: {quote} Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: DIGEST-MD5: IO error acquiring password at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190) {quote} HiveServer2's log and metastore's log are uploaded as attachments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9100) HiveServer2 fail to connect to MetaStore after MetaStore restarting
[ https://issues.apache.org/jira/browse/HIVE-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9100: --- Affects Version/s: (was: 0.13.1) (was: 0.14.0) HiveServer2 fail to connect to MetaStore after MetaStore restarting Key: HIVE-9100 URL: https://issues.apache.org/jira/browse/HIVE-9100 Project: Hive Issue Type: Bug Components: Authentication, HiveServer2, Security Reporter: Nemon Lou Attachments: hiveserver2.log, metastore.log Secure cluster with kerberos,remote metastore. How to reproduce : 1,use beeline to connect to HiveServer2 2,restart the MetaStore process 3,type command like 'show tables' in beeline Client side will report this error: {quote} Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: Peer indicated failure: DIGEST-MD5: IO error acquiring password at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190) {quote} HiveServer2's log and metastore's log are uploaded as attachments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9454) Test failures due to new Calcite version
[ https://issues.apache.org/jira/browse/HIVE-9454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292022#comment-14292022 ] Laljo John Pullokkaran commented on HIVE-9454: -- [~brocknoland] we have started analyzing failures. There is already one Calcite bug filed. Test failures due to new Calcite version Key: HIVE-9454 URL: https://issues.apache.org/jira/browse/HIVE-9454 Project: Hive Issue Type: Bug Reporter: Brock Noland Attachments: HIVE-9454.1.patch A bunch of failures have started appearing in patches which seen unrelated. I am thinking we've picked up a new version of Calcite. E.g.: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2488/testReport/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_auto_join12/ {noformat} Running: diff -a /home/hiveptest/54.147.202.89-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/auto_join12.q.out /home/hiveptest/54.147.202.89-hiveptest-1/apache-svn-trunk-source/itests/qtest/../../ql/src/test/results/clientpositive/auto_join12.q.out 32c32 $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src --- $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src 35c35 $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:src --- $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:$hdt$_1:src 39c39 $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src --- $hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:$hdt$_0:src 54c54 $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:src --- $hdt$_0:$hdt$_0:$hdt$_1:$hdt$_1:$hdt$_1:$hdt$_1:src {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Hive-0.14 - Build # 846 - Failure
Changes for Build #846 [ekoifman] HIVE-9361 - Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #846) Status: Failure Check console output at https://builds.apache.org/job/Hive-0.14/846/ to view the results.
[jira] [Commented] (HIVE-9271) Add ability for client to request metastore to fire an event
[ https://issues.apache.org/jira/browse/HIVE-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292243#comment-14292243 ] Sushanth Sowmyan commented on HIVE-9271: Mostly looks good to me, and has my tentative +1. One minor thing : I wondered if fire_notification_event shouldn't return a long instead of a void, thus returning the notification id of the fired event - but that's a strict no-no from a perspective of what a MetaStoreEventListener is supposed to be, since a NotificationListener is only one type of listener. That got me thinking - should this method be called fire_event rather than fire_notification_event? Add ability for client to request metastore to fire an event Key: HIVE-9271 URL: https://issues.apache.org/jira/browse/HIVE-9271 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.15.0 Attachments: HIVE-9271.patch Currently all events in Hive are fired by the metastore. However, there are events that only the client fully understands, such as DML operations. There should be a way for the client to request the metastore to fire a particular event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9449) Push YARN configuration to Spark while deply Spark on YARN[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9449: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to spark branch. Thanks, Chengxiang. Push YARN configuration to Spark while deply Spark on YARN[Spark Branch] Key: HIVE-9449 URL: https://issues.apache.org/jira/browse/HIVE-9449 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-9449.1-spark.patch, HIVE-9449.1-spark.patch, HIVE-9449.2-spark.patch We only push Spark configuration and RSC configuration to Spark while launch Spark cluster now, for Spark on YARN mode, Spark need extra YARN configuration to launch Spark cluster. Besides this, to support dynamically configuration setting for RSC configuration/YARN configuration, we need to recreate SparkSession while RSC configuration/YARN configuration update as well, as they may influence the Spark cluster deployment as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3280) Make HiveMetaStoreClient a public API
[ https://issues.apache.org/jira/browse/HIVE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-3280: Resolution: Fixed Fix Version/s: 0.14.1 Status: Resolved (was: Patch Available) Patch committed to trunk, 0.14 and branch-1. Make HiveMetaStoreClient a public API - Key: HIVE-3280 URL: https://issues.apache.org/jira/browse/HIVE-3280 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Carl Steinbach Assignee: Thejas M Nair Labels: api-addition Fix For: 0.14.1 Attachments: HIVE-3280.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Hive-0.14 - Build # 847 - Fixed
Changes for Build #846 [ekoifman] HIVE-9361 - Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable Changes for Build #847 [thejas] HIVE-3280 : Make HiveMetaStoreClient a public API (Thejas Nair, reviewed by Alan Gates) No tests ran. The Apache Jenkins build system has built Hive-0.14 (build #847) Status: Fixed Check console output at https://builds.apache.org/job/Hive-0.14/847/ to view the results.
[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9333: -- Attachment: HIVE-9333.1.patch Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.1.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). We might save 200% of extra time by doing such change. This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9396) date_add()/date_sub() should allow tinyint/smallint/bigint arguments in addition to int
[ https://issues.apache.org/jira/browse/HIVE-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9396: - Resolution: Fixed Status: Resolved (was: Patch Available) I've committed this to trunk, thanks for doing this [~spena]. date_add()/date_sub() should allow tinyint/smallint/bigint arguments in addition to int --- Key: HIVE-9396 URL: https://issues.apache.org/jira/browse/HIVE-9396 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Sergio Peña Attachments: HIVE-9396.3.patch, HIVE-9396.4.patch {noformat} hive select c1, date_add('1985-01-01', c1) from short1; FAILED: SemanticException [Error 10014]: Line 1:11 Wrong arguments 'c1': DATE_ADD() only takes INT types as second argument, got SHORT {noformat} We should allow date_add()/date_sub() to take any integral type for the 2nd argument, rather than just int. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 30281: Move parquet serialize implementation to DataWritableWriter to improve write speeds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/ --- Review request for hive, Ryan Blue, cheng xu, and Dong Chen. Bugs: HIVE-9333 https://issues.apache.org/jira/browse/HIVE-9333 Repository: hive-git Description --- This patch moves the ParquetHiveSerDe.serialize() implementation to DataWritableWriter class in order to save time in materializing data on serialize(). Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java ea4109d358f7c48d1e2042e5da299475de4a0a29 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java 060b1b722d32f3b2f88304a1a73eb249e150294b ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 41b5f1c3b0ab43f734f8a211e3e03d5060c75434 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java a693aff18516d133abf0aae4847d3fe00b9f1c96 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java 667d3671547190d363107019cd9a2d105d26d336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 007a665529857bcec612f638a157aa5043562a15 serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java PRE-CREATION Diff: https://reviews.apache.org/r/30281/diff/ Testing --- The tests run were the following: 1. JMH (Java microbenchmark) This benchmark called parquet serialize/write methods using text writable objects. Class.method Before Change (ops/s) After Change (ops/s) --- ParquetHiveSerDe.serialize: 19,113 249,528 - 19x speed increase DataWritableWriter.write: 5,033 5,201 - 3.34% speed increase 2. Write 20 million rows (~1GB file) from Text to Parquet I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format using the following statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text; Time (s) it took to write the whole file BEFORE changes: 93.758 s Time (s) it took to write the whole file AFTER changes: 83.903 s It got a 10% of speed inscrease. Thanks, Sergio Pena
[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file
[ https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-9317: Attachment: hive-9327.txt This patch changes no code, just puts the required Apache header on the source files and moves Microsoft's copyright notice to the NOTICE file. move Microsoft copyright to NOTICE file --- Key: HIVE-9317 URL: https://issues.apache.org/jira/browse/HIVE-9317 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Fix For: 0.15.0 Attachments: hive-9327.txt There are a set of files that still have the Microsoft copyright notices. Those notices need to be moved into NOTICES and replaced with the standard Apache headers. {code} ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9468) Test groupby3_map_skew.q fails due to decimal precision difference
Xuefu Zhang created HIVE-9468: - Summary: Test groupby3_map_skew.q fails due to decimal precision difference Key: HIVE-9468 URL: https://issues.apache.org/jira/browse/HIVE-9468 Project: Hive Issue Type: Bug Components: Tests Reporter: Xuefu Zhang From test run, http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/682/testReport: {code} Running: diff -a /home/hiveptest/54.177.132.58-hiveptest-1/apache-svn-spark-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/groupby3_map_skew.q.out /home/hiveptest/54.177.132.58-hiveptest-1/apache-svn-spark-source/itests/qtest/../../ql/src/test/results/clientpositive/groupby3_map_skew.q.out 162c162 130091.0 260.182 256.10355987055016 98.00.0 142.92680950752379 143.06995106518903 20428.07288 20469.0109 --- 130091.0 260.182 256.10355987055016 98.00.0 142.9268095075238 143.06995106518906 20428.07288 20469.0109 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9317) move Microsoft copyright to NOTICE file
[ https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292218#comment-14292218 ] Alan Gates commented on HIVE-9317: -- +1. move Microsoft copyright to NOTICE file --- Key: HIVE-9317 URL: https://issues.apache.org/jira/browse/HIVE-9317 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.15.0 Attachments: hive-9327.txt There are a set of files that still have the Microsoft copyright notices. Those notices need to be moved into NOTICES and replaced with the standard Apache headers. {code} ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9469) Hive Thrift Server throws Socket Timeout Exception: Read time out
Manish Malhotra created HIVE-9469: - Summary: Hive Thrift Server throws Socket Timeout Exception: Read time out Key: HIVE-9469 URL: https://issues.apache.org/jira/browse/HIVE-9469 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.10.0 Environment: 4 core cpu, 15gb memory. 2 thrift server behind load balancer Reporter: Manish Malhotra Hi All, Please review the following problem, I also posted same in the hive-user group, but didnt got any response yet. This is happening quite frequently in our environment. So, it would be great if somebody can see and advise. I'm using Hive Thrift Server in Production which at peak handles around 500 req/min. After certain point the Hive Thrift Server is going into the no response mode and throws Following exception org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out As the metastore we are using MySQL, that is being used by Thrift server. The design / architecture is like this: Oozie -- Hive Action -- ELB (AWS) -- Hive Thrift ( 2 servers) -- MySQL (Master) -- MySQL (Slave). Software versions: Hive version : 0.10.0 Hadoop: 1.2.1 Looks like when the load is beyond some threshold for certain operations it is having problem in responding. As the hive jobs sometimes fails because of this issue, we also have a auto-restart check to see if the Thrift server is not responding, it stops / kills and restart the service. Other tuning done: Thrift Server: Given 11gb heap, and configured CMS GC algo. MySQL: Tuned innodb_buffer, tmp_table and max_heap parameters. So, can somebody please help to understand, what could be the root cause for this or somebody faced the similar issue. I found one related JIRA :https://issues.apache.org/jira/browse/HCATALOG-541 But this JIRA shows that Hive Thrift Server shows OOM error, but in my case I didnt see any OOM error in my case. Regards, Manish Full Exception Stack: at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74) at $Proxy7.getDatabase(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110) at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at
[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file
[ https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-9317: Priority: Blocker (was: Major) move Microsoft copyright to NOTICE file --- Key: HIVE-9317 URL: https://issues.apache.org/jira/browse/HIVE-9317 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.15.0 Attachments: hive-9327.txt There are a set of files that still have the Microsoft copyright notices. Those notices need to be moved into NOTICES and replaced with the standard Apache headers. {code} ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9317) move Microsoft copyright to NOTICE file
[ https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reassigned HIVE-9317: --- Assignee: Owen O'Malley move Microsoft copyright to NOTICE file --- Key: HIVE-9317 URL: https://issues.apache.org/jira/browse/HIVE-9317 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.15.0 Attachments: hive-9327.txt There are a set of files that still have the Microsoft copyright notices. Those notices need to be moved into NOTICES and replaced with the standard Apache headers. {code} ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9317) move Microsoft copyright to NOTICE file
[ https://issues.apache.org/jira/browse/HIVE-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-9317: Status: Patch Available (was: Open) move Microsoft copyright to NOTICE file --- Key: HIVE-9317 URL: https://issues.apache.org/jira/browse/HIVE-9317 Project: Hive Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.15.0 Attachments: hive-9327.txt There are a set of files that still have the Microsoft copyright notices. Those notices need to be moved into NOTICES and replaced with the standard Apache headers. {code} ./common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java ./common/src/java/org/apache/hadoop/hive/common/type/SignedInt128.java ./common/src/java/org/apache/hadoop/hive/common/type/SqlMathUtil.java ./common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSignedInt128.java ./common/src/test/org/apache/hadoop/hive/common/type/TestSqlMathUtil.java ./common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9428) LocalSparkJobStatus may return failed job as successful [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429 ] Xuefu Zhang commented on HIVE-9428: --- +1. groupby3_map_skew.q failure doesn't seem related. Filed HIVE-9468 for that. LocalSparkJobStatus may return failed job as successful [Spark Branch] -- Key: HIVE-9428 URL: https://issues.apache.org/jira/browse/HIVE-9428 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9428.1-spark.patch, HIVE-9428.2-spark.patch, HIVE-9428.3-spark.patch Future is done doesn't necessarily mean the job is successful. We should rely on SparkJobInfo to get job status whenever it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9428) LocalSparkJobStatus may return failed job as successful [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9428: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks, Rui. LocalSparkJobStatus may return failed job as successful [Spark Branch] -- Key: HIVE-9428 URL: https://issues.apache.org/jira/browse/HIVE-9428 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Fix For: spark-branch Attachments: HIVE-9428.1-spark.patch, HIVE-9428.2-spark.patch, HIVE-9428.3-spark.patch Future is done doesn't necessarily mean the job is successful. We should rely on SparkJobInfo to get job status whenever it's available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9468) Test groupby3_map_skew.q fails due to decimal precision difference
[ https://issues.apache.org/jira/browse/HIVE-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292230#comment-14292230 ] Xuefu Zhang commented on HIVE-9468: --- udaf_percentile_approx_23.q is another instance of the problem. Test groupby3_map_skew.q fails due to decimal precision difference -- Key: HIVE-9468 URL: https://issues.apache.org/jira/browse/HIVE-9468 Project: Hive Issue Type: Bug Components: Tests Reporter: Xuefu Zhang From test run, http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/682/testReport: {code} Running: diff -a /home/hiveptest/54.177.132.58-hiveptest-1/apache-svn-spark-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/groupby3_map_skew.q.out /home/hiveptest/54.177.132.58-hiveptest-1/apache-svn-spark-source/itests/qtest/../../ql/src/test/results/clientpositive/groupby3_map_skew.q.out 162c162 130091.0260.182 256.10355987055016 98.00.0 142.92680950752379 143.06995106518903 20428.07288 20469.0109 --- 130091.0260.182 256.10355987055016 98.00.0 142.9268095075238 143.06995106518906 20428.07288 20469.0109 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9333: -- Status: Patch Available (was: In Progress) Attach the patch to run the tests. I also attach a link to the Review Board for code review. Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.1.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). We might save 200% of extra time by doing such change. This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.15 release
From a different perspective, 0.14.1/0.15 proposal allow us to release independently and concurrently. Once they are leased, we can have a consented 1.0 release. On the other hand, 1.0/1.1 would force us to wait to release 1.1 after 1.0 is released. This dependency seems artificial and can be avoided. Thanks, Xuefu On Mon, Jan 26, 2015 at 9:17 AM, Brock Noland br...@cloudera.com wrote: Hi Alan, In all of my experience at Apache, I have been encouraged to release. Contributors rightly want to see their hard work gets in the hands of the users. That's why they contribute after all. Many contributors who have features in trunk would like get those features out into the community. This is completely reasonable of them. After all they've invested significant time in this work. Thus I don't feel we should delay getting their contributions released while we debate 1.0. The two have nothing todo with each other. I've mentioned on the list and in person to Thejas that I wanted this release to specifically avoid the 1.x discussion so it did not get bogged down in the 1.x discussion. Again, this is completely reasonable. In short, everything I have experienced at Apache indicates that the folks who want to release 0.15 should be free to do the work to make that happen. Brock On Mon, Jan 26, 2015 at 7:02 AM, Alan Gates ga...@hortonworks.com wrote: Brock, Given there isn't consensus on numbering yet, could you holding off making the 0.15 branch. We should come to a conclusion on whether we're doing 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers. Alan. Brock Noland br...@cloudera.com January 20, 2015 at 21:25 Just a reminder that I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. After branching I plan on merging only blockers. Brock Brock Noland br...@cloudera.com January 12, 2015 at 14:37 Hi, Projects are instructed in the incubator that releases gain new users and other attention. Additionally, as discussed in this forum I'd like to increase the tempo of our release process[1]. As such, I plan on following this process: 1) Provide two weeks notice of branching 2) Provide two weeks to find issues on the branch and merging only blockers 3) Roll release candidates until a release vote passes As such I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. Cheers, Brock 1. Note I am not complaining as I did not help with releases until this point. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-9397) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS
[ https://issues.apache.org/jira/browse/HIVE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9397: -- Assignee: (was: Gopal V) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS Key: HIVE-9397 URL: https://issues.apache.org/jira/browse/HIVE-9397 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.15.0 Reporter: Damien Carol These queries produce an error : {code:sql} DROP TABLE IF EXISTS foo; CREATE TABLE foo (id int) STORED AS ORC; INSERT INTO TABLE foo VALUES (1); INSERT INTO TABLE foo VALUES (2); INSERT INTO TABLE foo VALUES (3); INSERT INTO TABLE foo VALUES (4); INSERT INTO TABLE foo VALUES (5); SELECT max(id) FROM foo; ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS id; SELECT max(id) FROM foo; {code} The last query throws {{org.apache.hive.service.cli.HiveSQLException}} {noformat} 0: jdbc:hive2://nc-h04:1/casino SELECT max(id) FROM foo; +-+--+ | _c0 | +-+--+ org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException 0: jdbc:hive2://nc-h04:1/casino {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9467) ORC - sort dictionary streams to the end of the stripe
Owen O'Malley created HIVE-9467: --- Summary: ORC - sort dictionary streams to the end of the stripe Key: HIVE-9467 URL: https://issues.apache.org/jira/browse/HIVE-9467 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley When reading ORC files, it would be convenient to group the dictionary streams at the end of the stripe. This would allow the reader to use fewer read operations if they want to load the dictionaries before they load the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.15 release
Hi Alan, In all of my experience at Apache, I have been encouraged to release. Contributors rightly want to see their hard work gets in the hands of the users. That's why they contribute after all. Many contributors who have features in trunk would like get those features out into the community. This is completely reasonable of them. After all they've invested significant time in this work. Thus I don't feel we should delay getting their contributions released while we debate 1.0. The two have nothing todo with each other. I've mentioned on the list and in person to Thejas that I wanted this release to specifically avoid the 1.x discussion so it did not get bogged down in the 1.x discussion. Again, this is completely reasonable. In short, everything I have experienced at Apache indicates that the folks who want to release 0.15 should be free to do the work to make that happen. Brock On Mon, Jan 26, 2015 at 7:02 AM, Alan Gates ga...@hortonworks.com wrote: Brock, Given there isn't consensus on numbering yet, could you holding off making the 0.15 branch. We should come to a conclusion on whether we're doing 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers. Alan. Brock Noland br...@cloudera.com January 20, 2015 at 21:25 Just a reminder that I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. After branching I plan on merging only blockers. Brock Brock Noland br...@cloudera.com January 12, 2015 at 14:37 Hi, Projects are instructed in the incubator that releases gain new users and other attention. Additionally, as discussed in this forum I'd like to increase the tempo of our release process[1]. As such, I plan on following this process: 1) Provide two weeks notice of branching 2) Provide two weeks to find issues on the branch and merging only blockers 3) Roll release candidates until a release vote passes As such I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. Cheers, Brock 1. Note I am not complaining as I did not help with releases until this point. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-9361) Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable
[ https://issues.apache.org/jira/browse/HIVE-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-9361: - Resolution: Fixed Fix Version/s: 1.0.0 0.14.1 0.15.0 Status: Resolved (was: Patch Available) Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable - Key: HIVE-9361 URL: https://issues.apache.org/jira/browse/HIVE-9361 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.15.0, 0.14.1, 1.0.0 Attachments: HIVE-9361.patch it's happening at {noformat} MetaStoreUtils.updateUnpartitionedTableStatsFast(newtCopy, wh.getFileStatusesForSD(newtCopy.getSd()), false, true); {noformat} other methods in this class call getWh() to get Warehouse so this likely explains why it's intermittent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.15 release
I'm not asking you to slow the work for the release. If you want to branch, you can branch. What I'm asking is to hold on the numbering scheme. So you could name the branch 'next' or something and then rename once we come to agreement. Consensus in the community is important, and we should avoid things that make that consensus harder. Alan. Brock Noland mailto:br...@cloudera.com January 26, 2015 at 9:17 Hi Alan, In all of my experience at Apache, I have been encouraged to release. Contributors rightly want to see their hard work gets in the hands of the users. That's why they contribute after all. Many contributors who have features in trunk would like get those features out into the community. This is completely reasonable of them. After all they've invested significant time in this work. Thus I don't feel we should delay getting their contributions released while we debate 1.0. The two have nothing todo with each other. I've mentioned on the list and in person to Thejas that I wanted this release to specifically avoid the 1.x discussion so it did not get bogged down in the 1.x discussion. Again, this is completely reasonable. In short, everything I have experienced at Apache indicates that the folks who want to release 0.15 should be free to do the work to make that happen. Brock Alan Gates mailto:ga...@hortonworks.com January 26, 2015 at 7:02 Brock, Given there isn't consensus on numbering yet, could you holding off making the 0.15 branch. We should come to a conclusion on whether we're doing 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers. Alan. Brock Noland mailto:br...@cloudera.com January 20, 2015 at 21:25 Just a reminder that I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. After branching I plan on merging only blockers. Brock Brock Noland mailto:br...@cloudera.com January 12, 2015 at 14:37 Hi, Projects are instructed in the incubator that releases gain new users and other attention. Additionally, as discussed in this forum I'd like to increase the tempo of our release process[1]. As such, I plan on following this process: 1) Provide two weeks notice of branching 2) Provide two weeks to find issues on the branch and merging only blockers 3) Roll release candidates until a release vote passes As such I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. Cheers, Brock 1. Note I am not complaining as I did not help with releases until this point. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-9448) Merge spark to trunk 1/23/15
[ https://issues.apache.org/jira/browse/HIVE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292199#comment-14292199 ] Szehon Ho commented on HIVE-9448: - Test failure doesnt look related. Committed to trunk, thanks Xuefu for review. Merge spark to trunk 1/23/15 Key: HIVE-9448 URL: https://issues.apache.org/jira/browse/HIVE-9448 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 0.15.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-9448.2.patch, HIVE-9448.3.patch, HIVE-9448.patch Merging latest spark changes to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9447) Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition
[ https://issues.apache.org/jira/browse/HIVE-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292449#comment-14292449 ] Selina Zhang commented on HIVE-9447: The unit test failures seem irrelevant to this patch. Metastore: inefficient Oracle query for removing unused column descriptors when add/drop table/partition Key: HIVE-9447 URL: https://issues.apache.org/jira/browse/HIVE-9447 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.14.0 Reporter: Selina Zhang Assignee: Selina Zhang Attachments: HIVE-9447.1.patch Original Estimate: 3h Remaining Estimate: 3h Metastore needs removing unused column descriptors when drop/add partitions or tables. For query the unused column descriptor, the current implementation utilizes datanuleus' range function, which basically equals LIMIT syntax. However, Oracle does not support LIMIT, the query is converted as {quote} SQL SELECT * FROM (SELECT subq.*,ROWNUM rn FROM (SELECT 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION, A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM drhcat.SDS A0 WHERE A0.CD_ID = ? ) subq ) WHERE rn = 1; {quote} Given that CD_ID is not very selective, this query may have to access large amount of rows (depends how many partitions the table has, millions of rows in our case). Metastore may become unresponsive because of this. Since Metastore only needs to know if the specific CD_ID is referenced in SDS table and does not need access the whole row. We can use {quote} select count(1) from SDS where SDS.CD_ID=? {quote} CD_ID is index column, the above query will do range scan for index, which is faster. For other DBs support LIMIT syntax such as MySQL, this problem does not exist. However, the new query does not hurt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests
[ https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9470: -- Status: Patch Available (was: Open) Use a generic writable object to run ColumnaStorageBench write/read tests -- Key: HIVE-9470 URL: https://issues.apache.org/jira/browse/HIVE-9470 Project: Hive Issue Type: Improvement Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9470.1.patch The ColumnarStorageBench benchmark class is using a Parquet writable object to run all write/read/serialize/deserialize tests. It would be better to use a more generic writable object (like text writables) to get better benchmark results between format storages. Using parquet writables may add advantage when writing parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9448) Merge spark to trunk 1/23/15
[ https://issues.apache.org/jira/browse/HIVE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292487#comment-14292487 ] Lefty Leverenz commented on HIVE-9448: -- Doc note: This adds seven configuration parameters to HiveConf.java and changes the description of another one (see HIVE-9337 for branch commit) so they all need to be documented in the wiki. A new section for Spark should be created in Configuration Properties for these parameters. * hive.spark.client.future.timeout (new description) * hive.spark.job.monitor.timeout * hive.spark.client.connect.timeout * hive.spark.client.server.connect.timeout * hive.spark.client.secret.bits * hive.spark.client.rpc.threads * hive.spark.client.rpc.max.size * hive.spark.client.channel.log.level * [Configuration Properties | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties] Additional documentation is probably needed in the Spark wikidoc for other changes in this patch and in HIVE-9257: * [Hive on Spark: Getting Started | https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started] Merge spark to trunk 1/23/15 Key: HIVE-9448 URL: https://issues.apache.org/jira/browse/HIVE-9448 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 0.15.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-9448.2.patch, HIVE-9448.3.patch, HIVE-9448.patch Merging latest spark changes to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9337) Move more hive.spark.* configurations to HiveConf [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292495#comment-14292495 ] Lefty Leverenz commented on HIVE-9337: -- HIVE-9448 merged these configuration parameters from the Spark branch to trunk. Move more hive.spark.* configurations to HiveConf [Spark Branch] Key: HIVE-9337 URL: https://issues.apache.org/jira/browse/HIVE-9337 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Szehon Ho Assignee: Szehon Ho Priority: Blocker Labels: TODOC-SPARK Fix For: spark-branch Attachments: HIVE-9337-spark.patch, HIVE-9337.2-spark.patch Some hive.spark configurations have been added to HiveConf, but there are some like hive.spark.log.dir that are not there. Also some configurations in RpcConfiguration.java might be eligible to be moved. Without this, these configurations cannot be set dynamically via Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9396) date_add()/date_sub() should allow tinyint/smallint/bigint arguments in addition to int
[ https://issues.apache.org/jira/browse/HIVE-9396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-9396: - Fix Version/s: 0.15.0 date_add()/date_sub() should allow tinyint/smallint/bigint arguments in addition to int --- Key: HIVE-9396 URL: https://issues.apache.org/jira/browse/HIVE-9396 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Sergio Peña Fix For: 0.15.0 Attachments: HIVE-9396.3.patch, HIVE-9396.4.patch {noformat} hive select c1, date_add('1985-01-01', c1) from short1; FAILED: SemanticException [Error 10014]: Line 1:11 Wrong arguments 'c1': DATE_ADD() only takes INT types as second argument, got SHORT {noformat} We should allow date_add()/date_sub() to take any integral type for the 2nd argument, rather than just int. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9448) Merge spark to trunk 1/23/15
[ https://issues.apache.org/jira/browse/HIVE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-9448: - Labels: TODOC15 (was: ) Merge spark to trunk 1/23/15 Key: HIVE-9448 URL: https://issues.apache.org/jira/browse/HIVE-9448 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 0.15.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC15 Attachments: HIVE-9448.2.patch, HIVE-9448.3.patch, HIVE-9448.patch Merging latest spark changes to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292502#comment-14292502 ] Hive QA commented on HIVE-9333: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694605/HIVE-9333.1.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7373 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_parquet org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2522/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694605 - PreCommit-HIVE-TRUNK-Build Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.1.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). We might save 200% of extra time by doing such change. This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7660) Hive to support qualify analytic filtering
[ https://issues.apache.org/jira/browse/HIVE-7660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292392#comment-14292392 ] Alexander Pivovarov commented on HIVE-7660: --- [~viji_r] Can you put query example in the description pls? Hive to support qualify analytic filtering -- Key: HIVE-7660 URL: https://issues.apache.org/jira/browse/HIVE-7660 Project: Hive Issue Type: New Feature Reporter: Viji Priority: Trivial Currently, Hive does not support qualify analytic filtering. It would be useful fi this feature were added in the future. As a workaround, since it is just a filter, we can replace it with a subquery and filter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9448) Merge spark to trunk 1/23/15
[ https://issues.apache.org/jira/browse/HIVE-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-9448: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Merge spark to trunk 1/23/15 Key: HIVE-9448 URL: https://issues.apache.org/jira/browse/HIVE-9448 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 0.15.0 Reporter: Szehon Ho Assignee: Szehon Ho Labels: TODOC15 Fix For: 0.15.0 Attachments: HIVE-9448.2.patch, HIVE-9448.3.patch, HIVE-9448.patch Merging latest spark changes to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6977) Delete Hiveserver1
[ https://issues.apache.org/jira/browse/HIVE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6977: Fix Version/s: 1.0.0 Delete Hiveserver1 -- Key: HIVE-6977 URL: https://issues.apache.org/jira/browse/HIVE-6977 Project: Hive Issue Type: Task Components: JDBC, Server Infrastructure Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Labels: TODOC15 Fix For: 0.15.0, 1.0.0 Attachments: HIVE-6977.1.patch, HIVE-6977.patch See mailing list discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests
[ https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292507#comment-14292507 ] Hive QA commented on HIVE-9470: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694608/HIVE-9470.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2523/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2523/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2523/ Messages: {noformat} This message was trimmed, see log for full details main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-shims-scheduler --- [INFO] Compiling 1 source file to /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/classes [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-shims-scheduler --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-scheduler --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/tmp/conf [copy] Copying 10 files to /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-shims-scheduler --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-shims-scheduler --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-shims-scheduler --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/hive-shims-scheduler-0.15.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-shims-scheduler --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims-scheduler --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/target/hive-shims-scheduler-0.15.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-scheduler/0.15.0-SNAPSHOT/hive-shims-scheduler-0.15.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/shims/scheduler/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-scheduler/0.15.0-SNAPSHOT/hive-shims-scheduler-0.15.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Shims 0.15.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-shims --- [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-shims --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-shims --- [INFO] No sources to compile [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-shims --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/tmp [mkdir] Created dir:
[jira] [Updated] (HIVE-9474) truncate table changes permissions on the target
[ https://issues.apache.org/jira/browse/HIVE-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-9474: --- Description: Create a table test(a string); Change the /user/hive/warehouse/test.db/test permission to something else other than the default, like 777. Then truncate table test; The permission goes back to the default. was: I created a test table in beeline : create table test(a string); Permissions: # file: /user/hive/warehouse/test.db/test # owner: aryan # group: hive user::rwx group::rwx other::--x Now, in beeline : truncate table test; Permissions are now : # file: /user/hive/warehouse/test.db/test # owner: aryan # group: hive user::rwx group::r-x other::r-x Group write permissions have disappeared! truncate table changes permissions on the target Key: HIVE-9474 URL: https://issues.apache.org/jira/browse/HIVE-9474 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9474.1.patch Original Estimate: 4h Remaining Estimate: 4h Create a table test(a string); Change the /user/hive/warehouse/test.db/test permission to something else other than the default, like 777. Then truncate table test; The permission goes back to the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9333: -- Description: The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). This performance issue was found using microbenchmark tests from HIVE-8121. was: The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). We might save 200% of extra time by doing such change. This performance issue was found using microbenchmark tests from HIVE-8121. Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.1.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9333: -- Status: Open (was: Patch Available) Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.1.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). We might save 200% of extra time by doing such change. This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9333: -- Status: Patch Available (was: Open) Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.1.patch, HIVE-9333.2.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9333: -- Attachment: (was: HIVE-9333.1.patch) Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.2.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-9333: -- Attachment: HIVE-9333.2.patch Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.1.patch, HIVE-9333.2.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30281: Move parquet serialize implementation to DataWritableWriter to improve write speeds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/ --- (Updated Ene. 27, 2015, 1:39 a.m.) Review request for hive, Ryan Blue, cheng xu, and Dong Chen. Changes --- I forgot to add the BYTE/DECIMAL implementation. This patch contains them. Bugs: HIVE-9333 https://issues.apache.org/jira/browse/HIVE-9333 Repository: hive-git Description --- This patch moves the ParquetHiveSerDe.serialize() implementation to DataWritableWriter class in order to save time in materializing data on serialize(). Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java ea4109d358f7c48d1e2042e5da299475de4a0a29 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java 060b1b722d32f3b2f88304a1a73eb249e150294b ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 41b5f1c3b0ab43f734f8a211e3e03d5060c75434 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java a693aff18516d133abf0aae4847d3fe00b9f1c96 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java 667d3671547190d363107019cd9a2d105d26d336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 007a665529857bcec612f638a157aa5043562a15 serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java PRE-CREATION Diff: https://reviews.apache.org/r/30281/diff/ Testing --- The tests run were the following: 1. JMH (Java microbenchmark) This benchmark called parquet serialize/write methods using text writable objects. Class.method Before Change (ops/s) After Change (ops/s) --- ParquetHiveSerDe.serialize: 19,113 249,528 - 19x speed increase DataWritableWriter.write: 5,033 5,201 - 3.34% speed increase 2. Write 20 million rows (~1GB file) from Text to Parquet I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format using the following statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text; Time (s) it took to write the whole file BEFORE changes: 93.758 s Time (s) it took to write the whole file AFTER changes: 83.903 s It got a 10% of speed inscrease. Thanks, Sergio Pena
Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/#review69728 --- Looks good, glad we can mostly re-use miniYarnCluster. Some minor comments below. I also agree with Xuefu we dont need more golden files by having some output directory. I'm also ok with not running entire set of spark tests with every run, and just running the list we got from minimr.query.files, it is up to you guys. Only note is, once we check this in, committer will also make some edits to the build machine files to run these in proper batches, o/w they will run the entire TestMiniSparkOnYarnCliDriver suite with one batch. shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java https://reviews.apache.org/r/30264/#comment114485 Need the apache license. shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java https://reviews.apache.org/r/30264/#comment114486 Need a comment. - Szehon Ho On Jan. 26, 2015, 6:37 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/ --- (Updated Jan. 26, 2015, 6:37 a.m.) Review request for hive, Szehon Ho and Xuefu Zhang. Bugs: HIVE-9211 https://issues.apache.org/jira/browse/HIVE-9211 Repository: hive-git Description --- MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are enabled in this unit test except 3 qfile: bucket_num_reducers.q, bucket_num_reducers2.q, udf_using.q, which is not supported in HoS. Diffs - data/conf/spark/hive-site.xml 016f568 data/conf/spark/standalone/hive-site.xml PRE-CREATION data/conf/spark/yarn-client/hive-site.xml PRE-CREATION itests/pom.xml e1e88f6 itests/qtest-spark/pom.xml d12fad5 itests/src/test/resources/testconfiguration.properties f583aaf itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 41a2ab7 ql/src/test/results/clientpositive/miniSparkOnYarn/auto_sortmerge_join_16.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucket4.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucket5.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucket6.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucketizedhiveinputformat.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucketmapjoin6.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucketmapjoin7.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/constprog_partitioner.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/disable_merge_for_bucketing.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/empty_dir_in_table.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/external_table_with_space_in_location_path.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/file_with_header_footer.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/groupby1.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/groupby2.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/import_exported_table.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/index_bitmap3.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/index_bitmap_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_bucketed_table.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_dyn_part.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_map_operators.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_merge.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_num_buckets.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_reducers_power_two.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/input16_cc.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/join1.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/leftsemijoin_mr.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/list_bucket_dml_10.q.java1.7.out
[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9211: Attachment: HIVE-9211.2-spark.patch Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/#review69734 --- Ship it! Looks good to me, leave it to Xuefu to review the other comments. - Szehon Ho On Jan. 27, 2015, 2:03 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/ --- (Updated Jan. 27, 2015, 2:03 a.m.) Review request for hive, Szehon Ho and Xuefu Zhang. Bugs: HIVE-9211 https://issues.apache.org/jira/browse/HIVE-9211 Repository: hive-git Description --- MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are enabled in this unit test except 3 qfile: bucket_num_reducers.q, bucket_num_reducers2.q, udf_using.q, which is not supported in HoS. Diffs - data/conf/spark/hive-site.xml 016f568 data/conf/spark/standalone/hive-site.xml PRE-CREATION data/conf/spark/yarn-client/hive-site.xml PRE-CREATION itests/pom.xml e1e88f6 itests/qtest-spark/pom.xml d12fad5 itests/src/test/resources/testconfiguration.properties f583aaf itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 41a2ab7 ql/src/test/results/clientpositive/spark/bucket5.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucket6.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucketizedhiveinputformat.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/empty_dir_in_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/external_table_with_space_in_location_path.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/file_with_header_footer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/import_exported_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/index_bitmap3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/index_bitmap_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_bucketed_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_dyn_part.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_map_operators.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_merge.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_num_buckets.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_reducers_power_two.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input16_cc.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/list_bucket_dml_10.q.java1.7.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_fs2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_hdfs_file_with_space_in_the_name.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/parallel_orderby.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/quotedid_smb.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/reduce_deduplicate.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/remote_script.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/root_dir_external_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/schemeAuthority.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/schemeAuthority2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/temp_table_external.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/truncate_column_buckets.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/uber_reduce.q.out PRE-CREATION shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java b17f465 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java a61c3ac shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java PRE-CREATION shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 064304c spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java aea90db Diff: https://reviews.apache.org/r/30264/diff/ Testing
[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292875#comment-14292875 ] Szehon Ho commented on HIVE-9211: - +1 looks good to me. Xuefu was looking at this too, so will leave to him to review for the rest. Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Give query results
Hi *. I need some help on finding out where hive stores results that are available in the fetch results thrift api. Do you know if there is a file written in the cluster for any given query? (Metadata, select query, query that triggers mr). Are they in memory or HDFS? Thanks, Joel
[jira] [Updated] (HIVE-9475) HiveMetastoreClient.tableExists does not work
[ https://issues.apache.org/jira/browse/HIVE-9475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9475: --- Fix Version/s: 0.15.0 Assignee: Brock Noland Status: Patch Available (was: Open) HiveMetastoreClient.tableExists does not work - Key: HIVE-9475 URL: https://issues.apache.org/jira/browse/HIVE-9475 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Blocker Fix For: 0.15.0 Attachments: HIVE-9475.1.patch We check the return value against null returning true if the return value is null. This is reversed, we should return true if the value is not null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293036#comment-14293036 ] Xuefu Zhang commented on HIVE-9211: --- [~brocknoland], do you have any idea how Chengxiang may access the container logs? Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9476) Beeline fails to start on trunk
[ https://issues.apache.org/jira/browse/HIVE-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-9476: --- Fix Version/s: 0.15.0 Beeline fails to start on trunk --- Key: HIVE-9476 URL: https://issues.apache.org/jira/browse/HIVE-9476 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.15.0 Reporter: Vaibhav Gumashta Priority: Blocker Fix For: 0.15.0 {code} vgumashta:hive vgumashta$ beeline --verbose=true [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} Working fine on 14.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9476) Beeline fails to start on trunk
Vaibhav Gumashta created HIVE-9476: -- Summary: Beeline fails to start on trunk Key: HIVE-9476 URL: https://issues.apache.org/jira/browse/HIVE-9476 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.15.0 Reporter: Vaibhav Gumashta Priority: Blocker {code} vgumashta:hive vgumashta$ beeline --verbose=true [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:73) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Exception in thread main java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at org.apache.hive.beeline.BeeLineOpts.init(BeeLineOpts.java:101) at org.apache.hive.beeline.BeeLine.init(BeeLine.java:117) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:469) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:453) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} Working fine on 14.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9211: Attachment: HIVE-9211.2-spark.patch Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9211: Attachment: (was: HIVE-9211.2-spark.patch) Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9138) Add some explain to PTF operator
[ https://issues.apache.org/jira/browse/HIVE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-9138: Attachment: HIVE-9138.2.patch.txt Add some explain to PTF operator Key: HIVE-9138 URL: https://issues.apache.org/jira/browse/HIVE-9138 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-9138.1.patch.txt, HIVE-9138.2.patch.txt PTFOperator does not explain anything in explain statement, making it hard to understand the internal works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9474) truncate table changes permissions on the target
[ https://issues.apache.org/jira/browse/HIVE-9474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292876#comment-14292876 ] Hive QA commented on HIVE-9474: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694667/HIVE-9474.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7400 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2526/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2526/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2526/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694667 - PreCommit-HIVE-TRUNK-Build truncate table changes permissions on the target Key: HIVE-9474 URL: https://issues.apache.org/jira/browse/HIVE-9474 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor Fix For: 0.15.0 Attachments: HIVE-9474.1.patch Original Estimate: 4h Remaining Estimate: 4h Create a table test(a string); Change the /user/hive/warehouse/test.db/test permission to something else other than the default, like 777. Then truncate table test; The permission goes back to the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9462) HIVE-8577 - breaks type evolution
[ https://issues.apache.org/jira/browse/HIVE-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292922#comment-14292922 ] Brock Noland commented on HIVE-9462: That test failed because the binary file is not in the patch. HIVE-8577 - breaks type evolution - Key: HIVE-9462 URL: https://issues.apache.org/jira/browse/HIVE-9462 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9462.1.patch, HIVE-9462.2.patch, type_evolution.avro If you write an avro field out as {{int}} and then change it's type to {{long}} you will get an {{UnresolvedUnionException}} due to code in HIVE-8577. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293057#comment-14293057 ] Chengxiang Li commented on HIVE-9211: - I work on Linux. Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9436: --- Attachment: HIVE-9436.3.patch RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9436: --- Status: Patch Available (was: Open) RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9425) External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li reassigned HIVE-9425: --- Assignee: Chengxiang Li External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch] --- Key: HIVE-9425 URL: https://issues.apache.org/jira/browse/HIVE-9425 Project: Hive Issue Type: Sub-task Components: spark-branch Reporter: Xiaomin Zhang Assignee: Chengxiang Li 15/01/20 00:27:31 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: hive-exec-0.15.0-SNAPSHOT.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-maxent-3.0.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: bigbenchqueriesmr.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-tools-1.5.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: jcl-over-slf4j-1.7.5.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 INFO client.RemoteDriver: Received job request fef081b0-5408-4804-9531-d131fdd628e6 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/01/20 00:27:31 INFO client.RemoteDriver: Failed to run job fef081b0-5408-4804-9531-d131fdd628e6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) It seems the additional Jar files are not uploaded to DistributedCache, so that the Driver cannot access it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests
[ https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292872#comment-14292872 ] Dong Chen commented on HIVE-9470: - LGTM. +1 pending test. Use a generic writable object to run ColumnaStorageBench write/read tests -- Key: HIVE-9470 URL: https://issues.apache.org/jira/browse/HIVE-9470 Project: Hive Issue Type: Improvement Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9470.1.patch The ColumnarStorageBench benchmark class is using a Parquet writable object to run all write/read/serialize/deserialize tests. It would be better to use a more generic writable object (like text writables) to get better benchmark results between format storages. Using parquet writables may add advantage when writing parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.15 release
Hi Alan, Thank you very much for the clarification. Naming the branch next for now is fine with me. I've done so. Cheers, Brock On Mon, Jan 26, 2015 at 10:27 AM, Alan Gates ga...@hortonworks.com wrote: I'm not asking you to slow the work for the release. If you want to branch, you can branch. What I'm asking is to hold on the numbering scheme. So you could name the branch 'next' or something and then rename once we come to agreement. Consensus in the community is important, and we should avoid things that make that consensus harder. Alan. Brock Noland br...@cloudera.com January 26, 2015 at 9:17 Hi Alan, In all of my experience at Apache, I have been encouraged to release. Contributors rightly want to see their hard work gets in the hands of the users. That's why they contribute after all. Many contributors who have features in trunk would like get those features out into the community. This is completely reasonable of them. After all they've invested significant time in this work. Thus I don't feel we should delay getting their contributions released while we debate 1.0. The two have nothing todo with each other. I've mentioned on the list and in person to Thejas that I wanted this release to specifically avoid the 1.x discussion so it did not get bogged down in the 1.x discussion. Again, this is completely reasonable. In short, everything I have experienced at Apache indicates that the folks who want to release 0.15 should be free to do the work to make that happen. Brock Alan Gates ga...@hortonworks.com January 26, 2015 at 7:02 Brock, Given there isn't consensus on numbering yet, could you holding off making the 0.15 branch. We should come to a conclusion on whether we're doing 0.14.1/0.15 or 1.0/1.1 before assigning anymore numbers. Alan. Brock Noland br...@cloudera.com January 20, 2015 at 21:25 Just a reminder that I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. After branching I plan on merging only blockers. Brock Brock Noland br...@cloudera.com January 12, 2015 at 14:37 Hi, Projects are instructed in the incubator that releases gain new users and other attention. Additionally, as discussed in this forum I'd like to increase the tempo of our release process[1]. As such, I plan on following this process: 1) Provide two weeks notice of branching 2) Provide two weeks to find issues on the branch and merging only blockers 3) Roll release candidates until a release vote passes As such I plan on branching on 1/26/2015 and start rolling release candidates on 2/9/2015. Cheers, Brock 1. Note I am not complaining as I did not help with releases until this point. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292991#comment-14292991 ] Hive QA commented on HIVE-9436: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694681/HIVE-9436.3.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7397 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2528/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2528/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2528/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694681 - PreCommit-HIVE-TRUNK-Build RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.1 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.3.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9211: --- Attachment: HIVE-9211.2-spark.patch I modified the test framework locally so we should have a source directory for the failed test. Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9462) HIVE-8577 - breaks type evolution
[ https://issues.apache.org/jira/browse/HIVE-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292945#comment-14292945 ] Xuefu Zhang commented on HIVE-9462: --- +1 HIVE-8577 - breaks type evolution - Key: HIVE-9462 URL: https://issues.apache.org/jira/browse/HIVE-9462 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9462.1.patch, HIVE-9462.2.patch, HIVE-9462.3.patch, type_evolution.avro If you write an avro field out as {{int}} and then change it's type to {{long}} you will get an {{UnresolvedUnionException}} due to code in HIVE-8577. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6308) COLUMNS_V2 Metastore table not populated for tables created without an explicit column list.
[ https://issues.apache.org/jira/browse/HIVE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293062#comment-14293062 ] Szehon Ho commented on HIVE-6308: - +1, thanks for adding unit test COLUMNS_V2 Metastore table not populated for tables created without an explicit column list. Key: HIVE-6308 URL: https://issues.apache.org/jira/browse/HIVE-6308 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.10.0 Reporter: Alexander Behm Assignee: Yongzhi Chen Attachments: HIVE-6308.1.patch Consider this example table: CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='file:///path/to/the/schema/test_serializer.avsc'); When I try to run an ANALYZE TABLE for computing column stats on any of the columns, then I get: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Column o_orderpriority for which stats gathering is requested doesn't exist.) at org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2280) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:331) at org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:343) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) The root cause appears to be that the COLUMNS_V2 table in the Metastore isn't populated properly during the table creation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9436) RetryingMetaStoreClient does not retry JDOExceptions
[ https://issues.apache.org/jira/browse/HIVE-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-9436: --- Status: Open (was: Patch Available) Cancelling patch to resubmit an identical .3.patch so that the precommit tests don't skip it after the test issues we've had in the past couple of days. RetryingMetaStoreClient does not retry JDOExceptions Key: HIVE-9436 URL: https://issues.apache.org/jira/browse/HIVE-9436 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.14.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-9436.2.patch, HIVE-9436.patch RetryingMetaStoreClient has a bug in the following bit of code: {code} } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches(JDO[a-zA-Z]*Exception)) { caughtException = (MetaException) e.getCause(); } else { throw e.getCause(); } {code} The bug here is that java String.matches matches the entire string to the regex, and thus, that match will fail if the message contains anything before or after JDO[a-zA-Z]\*Exception. The solution, however, is very simple, we should match (?s).\*JDO[a-zA-Z]\*Exception.\* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/ --- (Updated Jan. 27, 2015, 2:03 a.m.) Review request for hive, Szehon Ho and Xuefu Zhang. Changes --- fixed commented issues. Bugs: HIVE-9211 https://issues.apache.org/jira/browse/HIVE-9211 Repository: hive-git Description --- MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are enabled in this unit test except 3 qfile: bucket_num_reducers.q, bucket_num_reducers2.q, udf_using.q, which is not supported in HoS. Diffs (updated) - data/conf/spark/hive-site.xml 016f568 data/conf/spark/standalone/hive-site.xml PRE-CREATION data/conf/spark/yarn-client/hive-site.xml PRE-CREATION itests/pom.xml e1e88f6 itests/qtest-spark/pom.xml d12fad5 itests/src/test/resources/testconfiguration.properties f583aaf itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 41a2ab7 ql/src/test/results/clientpositive/spark/bucket5.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucket6.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucketizedhiveinputformat.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/empty_dir_in_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/external_table_with_space_in_location_path.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/file_with_header_footer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/import_exported_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/index_bitmap3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/index_bitmap_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_bucketed_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_dyn_part.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_map_operators.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_merge.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_num_buckets.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_reducers_power_two.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input16_cc.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/list_bucket_dml_10.q.java1.7.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_fs2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_hdfs_file_with_space_in_the_name.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/parallel_orderby.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/quotedid_smb.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/reduce_deduplicate.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/remote_script.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/root_dir_external_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/schemeAuthority.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/schemeAuthority2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/temp_table_external.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/truncate_column_buckets.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/uber_reduce.q.out PRE-CREATION shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java b17f465 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java a61c3ac shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java PRE-CREATION shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 064304c spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java aea90db Diff: https://reviews.apache.org/r/30264/diff/ Testing --- Thanks, chengxiang li
Re: Review Request 29196: Add some explain to PTF operator
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29196/ --- (Updated Jan. 27, 2015, 2:14 a.m.) Review request for hive. Bugs: HIVE-9138 https://issues.apache.org/jira/browse/HIVE-9138 Repository: hive-git Description --- PTFOperator does not explain anything in explain statement, making it hard to understand the internal works. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 12fcd6a itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 8e00ee3 ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java daf6cb8 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 57ce849 ql/src/java/org/apache/hadoop/hive/ql/plan/Explain.java a3408a0 ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java 3ac3245 ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java b62ffed ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/BoundaryDef.java 07590c0 ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/OrderExpressionDef.java e367d13 ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PTFExpressionDef.java 5d200fb ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PTFInputDef.java 19ed2f2 ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PTFQueryInputDef.java 11ef932 ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PartitionedTableFunctionDef.java 327304c ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowExpressionDef.java b96e9d6 ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowFrameDef.java 949ed10 ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowFunctionDef.java e4ea358 ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/WindowTableFunctionDef.java 083aaf2 ql/src/test/queries/clientpositive/ptf_matchpath.q 80dbe29 ql/src/test/results/clientpositive/correlationoptimizer12.q.out c32e41e ql/src/test/results/clientpositive/ctas_colname.q.out 95c7acb ql/src/test/results/clientpositive/groupby_resolution.q.out c611f7d ql/src/test/results/clientpositive/ptf.q.out f678035 ql/src/test/results/clientpositive/ptf_matchpath.q.out e0cea0d ql/src/test/results/clientpositive/ptf_streaming.q.out 9cf645d ql/src/test/results/clientpositive/quotedid_basic.q.out b8cd4e9 ql/src/test/results/clientpositive/spark/ptf.q.out 8ca5496 ql/src/test/results/clientpositive/spark/ptf_matchpath.q.out e0cea0d ql/src/test/results/clientpositive/spark/ptf_streaming.q.out f5ee72d ql/src/test/results/clientpositive/spark/subquery_in.q.out 51b92a3 ql/src/test/results/clientpositive/spark/vectorized_ptf.q.out 020fdff ql/src/test/results/clientpositive/subquery_in.q.out a2235af ql/src/test/results/clientpositive/subquery_in_having.q.out 03cc2af ql/src/test/results/clientpositive/subquery_notin.q.out 599a61e ql/src/test/results/clientpositive/subquery_unqualcolumnrefs.q.out 06d5708 ql/src/test/results/clientpositive/tez/ptf.q.out 6f9dd91 ql/src/test/results/clientpositive/tez/ptf_matchpath.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/ptf_streaming.q.out a935ef6 ql/src/test/results/clientpositive/tez/subquery_in.q.out 8bc7892 ql/src/test/results/clientpositive/tez/vectorized_ptf.q.out a814849 ql/src/test/results/clientpositive/vectorized_ptf.q.out 1e3c43c ql/src/test/results/clientpositive/windowing_streaming.q.out ac9e180 Diff: https://reviews.apache.org/r/29196/diff/ Testing --- Thanks, Navis Ryu
Re: Review Request 30151: Remove Extract Operator its friends from codebase.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30151/#review69732 --- This is huge cleanup. Good work! ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out https://reviews.apache.org/r/30151/#comment114490 Any idea why the plan is changed so much? - Navis Ryu On Jan. 24, 2015, 6:08 p.m., Ashutosh Chauhan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30151/ --- (Updated Jan. 24, 2015, 6:08 p.m.) Review request for hive and Navis Ryu. Bugs: HIVE-9416 https://issues.apache.org/jira/browse/HIVE-9416 Repository: hive-git Description --- Remove Extract Operator its friends from codebase. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java c299d3a ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java f3c382a ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 2e6a880 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9ed2c61 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExtractOperator.java 7f4bb64 ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java 24ca89f ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java e16ba6c ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java dc906e8 ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java 3fead79 ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/OpProcFactory.java d6a6ed6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingInferenceOptimizer.java 7954767 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/BucketingSortingOpProcFactory.java cf02bec ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 94b4621 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 4364f28 ql/src/java/org/apache/hadoop/hive/ql/plan/ExtractDesc.java 6762155 ql/src/java/org/apache/hadoop/hive/ql/plan/SelectDesc.java fa6b548 ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 41862e6 ql/src/test/results/clientpositive/bucket1.q.out 13ec735 ql/src/test/results/clientpositive/bucket2.q.out 32a77c3 ql/src/test/results/clientpositive/bucket3.q.out ff7173e ql/src/test/results/clientpositive/bucket4.q.out b99d12f ql/src/test/results/clientpositive/bucket5.q.out 5992d6d ql/src/test/results/clientpositive/bucket6.q.out 5b23d7d ql/src/test/results/clientpositive/bucketsortoptimize_insert_1.q.out 75de953 ql/src/test/results/clientpositive/bucketsortoptimize_insert_2.q.out 599b8b9 ql/src/test/results/clientpositive/bucketsortoptimize_insert_3.q.out 7456ab0 ql/src/test/results/clientpositive/bucketsortoptimize_insert_4.q.out fd99597 ql/src/test/results/clientpositive/bucketsortoptimize_insert_5.q.out 8130ab9 ql/src/test/results/clientpositive/bucketsortoptimize_insert_6.q.out 627aba0 ql/src/test/results/clientpositive/disable_merge_for_bucketing.q.out 9b058c8 ql/src/test/results/clientpositive/dynpart_sort_opt_vectorization.q.out 32e0745 ql/src/test/results/clientpositive/dynpart_sort_optimization.q.out 494bfa3 ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_dynamic.q.out b6e7b88 ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_static.q.out fc6d2ae ql/src/test/results/clientpositive/load_dyn_part2.q.out 26f318a ql/src/test/results/clientpositive/ptf.q.out f678035 ql/src/test/results/clientpositive/ptf_streaming.q.out 9cf645d ql/src/test/results/clientpositive/smb_mapjoin_20.q.out 999dabd ql/src/test/results/clientpositive/smb_mapjoin_21.q.out 539b70e ql/src/test/results/clientpositive/spark/bucket2.q.out 5eb28fa ql/src/test/results/clientpositive/spark/bucket3.q.out 1b1010a ql/src/test/results/clientpositive/spark/bucket4.q.out 7dd49ac ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_2.q.out 365306e ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_4.q.out 3846de7 ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_6.q.out 5b559c4 ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_7.q.out cefc6aa ql/src/test/results/clientpositive/spark/bucketsortoptimize_insert_8.q.out ca44d7c ql/src/test/results/clientpositive/spark/disable_merge_for_bucketing.q.out 3864c44 ql/src/test/results/clientpositive/spark/load_dyn_part2.q.out a8cef34
Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/#review69744 --- Ship it! Ship It! - Xuefu Zhang On Jan. 27, 2015, 2:03 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/ --- (Updated Jan. 27, 2015, 2:03 a.m.) Review request for hive, Szehon Ho and Xuefu Zhang. Bugs: HIVE-9211 https://issues.apache.org/jira/browse/HIVE-9211 Repository: hive-git Description --- MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are enabled in this unit test except 3 qfile: bucket_num_reducers.q, bucket_num_reducers2.q, udf_using.q, which is not supported in HoS. Diffs - data/conf/spark/hive-site.xml 016f568 data/conf/spark/standalone/hive-site.xml PRE-CREATION data/conf/spark/yarn-client/hive-site.xml PRE-CREATION itests/pom.xml e1e88f6 itests/qtest-spark/pom.xml d12fad5 itests/src/test/resources/testconfiguration.properties f583aaf itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 41a2ab7 ql/src/test/results/clientpositive/spark/bucket5.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucket6.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/bucketizedhiveinputformat.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/constprog_partitioner.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/empty_dir_in_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/external_table_with_space_in_location_path.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/file_with_header_footer.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/import_exported_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/index_bitmap3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/index_bitmap_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_bucketed_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_dyn_part.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_map_operators.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_merge.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_num_buckets.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/infer_bucket_sort_reducers_power_two.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/input16_cc.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/list_bucket_dml_10.q.java1.7.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_fs2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/load_hdfs_file_with_space_in_the_name.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/parallel_orderby.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/ql_rewrite_gbtoidx_cbo_2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/quotedid_smb.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/reduce_deduplicate.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/remote_script.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/root_dir_external_table.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/schemeAuthority.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/schemeAuthority2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/temp_table_external.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/truncate_column_buckets.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/uber_reduce.q.out PRE-CREATION shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java b17f465 shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java a61c3ac shims/0.23/src/main/java/org/apache/hadoop/hive/shims/MiniSparkOnYARNCluster.java PRE-CREATION shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 064304c spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java aea90db Diff: https://reviews.apache.org/r/30264/diff/ Testing --- Thanks, chengxiang li
[jira] [Updated] (HIVE-9397) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS
[ https://issues.apache.org/jira/browse/HIVE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-9397: Attachment: HIVE-9397.1.patch.txt SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS Key: HIVE-9397 URL: https://issues.apache.org/jira/browse/HIVE-9397 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.15.0 Reporter: Damien Carol Attachments: HIVE-9397.1.patch.txt These queries produce an error : {code:sql} DROP TABLE IF EXISTS foo; CREATE TABLE foo (id int) STORED AS ORC; INSERT INTO TABLE foo VALUES (1); INSERT INTO TABLE foo VALUES (2); INSERT INTO TABLE foo VALUES (3); INSERT INTO TABLE foo VALUES (4); INSERT INTO TABLE foo VALUES (5); SELECT max(id) FROM foo; ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS id; SELECT max(id) FROM foo; {code} The last query throws {{org.apache.hive.service.cli.HiveSQLException}} {noformat} 0: jdbc:hive2://nc-h04:1/casino SELECT max(id) FROM foo; +-+--+ | _c0 | +-+--+ org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException 0: jdbc:hive2://nc-h04:1/casino {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9397) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS
[ https://issues.apache.org/jira/browse/HIVE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-9397: Assignee: Navis Status: Patch Available (was: Open) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS Key: HIVE-9397 URL: https://issues.apache.org/jira/browse/HIVE-9397 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.15.0 Reporter: Damien Carol Assignee: Navis Attachments: HIVE-9397.1.patch.txt These queries produce an error : {code:sql} DROP TABLE IF EXISTS foo; CREATE TABLE foo (id int) STORED AS ORC; INSERT INTO TABLE foo VALUES (1); INSERT INTO TABLE foo VALUES (2); INSERT INTO TABLE foo VALUES (3); INSERT INTO TABLE foo VALUES (4); INSERT INTO TABLE foo VALUES (5); SELECT max(id) FROM foo; ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS id; SELECT max(id) FROM foo; {code} The last query throws {{org.apache.hive.service.cli.HiveSQLException}} {noformat} 0: jdbc:hive2://nc-h04:1/casino SELECT max(id) FROM foo; +-+--+ | _c0 | +-+--+ org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException 0: jdbc:hive2://nc-h04:1/casino {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9438) The standalone-jdbc jar missing some jars
[ https://issues.apache.org/jira/browse/HIVE-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293065#comment-14293065 ] Szehon Ho commented on HIVE-9438: - +1 The standalone-jdbc jar missing some jars - Key: HIVE-9438 URL: https://issues.apache.org/jira/browse/HIVE-9438 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Priority: Blocker Fix For: 0.15.0 Attachments: HIVE-9438.1.patch The standalone-jdbc jar does not contain all the jars required for secure connections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30281: Move parquet serialize implementation to DataWritableWriter to improve write speeds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/#review69723 --- Thank you for your patch. I have serveral general questions as follows. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java https://reviews.apache.org/r/30281/#comment114475 If compressionType is unneeded, this annotation may be removed as well. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java https://reviews.apache.org/r/30281/#comment114474 Why remove compressionType code here? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java https://reviews.apache.org/r/30281/#comment114487 Why not define writeGroupFields with a parameter of ParquetWritable instead of parsing in object and objectInspector seperatedly? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java https://reviews.apache.org/r/30281/#comment114488 Assume if i%2 equals 0, it means the key. And only the key's value is not null, we'll write the value. What if comes a null value for both the key and value? Can we use the way like the original way that pass in the writable object and handle the null value case in the writeValue method. The code can become more simple and easy to understand. - cheng xu On Jan. 27, 2015, 1:39 a.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/ --- (Updated Jan. 27, 2015, 1:39 a.m.) Review request for hive, Ryan Blue, cheng xu, and Dong Chen. Bugs: HIVE-9333 https://issues.apache.org/jira/browse/HIVE-9333 Repository: hive-git Description --- This patch moves the ParquetHiveSerDe.serialize() implementation to DataWritableWriter class in order to save time in materializing data on serialize(). Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java ea4109d358f7c48d1e2042e5da299475de4a0a29 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java 060b1b722d32f3b2f88304a1a73eb249e150294b ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 41b5f1c3b0ab43f734f8a211e3e03d5060c75434 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java a693aff18516d133abf0aae4847d3fe00b9f1c96 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java 667d3671547190d363107019cd9a2d105d26d336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 007a665529857bcec612f638a157aa5043562a15 serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java PRE-CREATION Diff: https://reviews.apache.org/r/30281/diff/ Testing --- The tests run were the following: 1. JMH (Java microbenchmark) This benchmark called parquet serialize/write methods using text writable objects. Class.method Before Change (ops/s) After Change (ops/s) --- ParquetHiveSerDe.serialize: 19,113 249,528 - 19x speed increase DataWritableWriter.write: 5,033 5,201 - 3.34% speed increase 2. Write 20 million rows (~1GB file) from Text to Parquet I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format using the following statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text; Time (s) it took to write the whole file BEFORE changes: 93.758 s Time (s) it took to write the whole file AFTER changes: 83.903 s It got a 10% of speed inscrease. Thanks, Sergio Pena
[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292936#comment-14292936 ] Hive QA commented on HIVE-9333: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694678/HIVE-9333.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7394 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2527/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694678 - PreCommit-HIVE-TRUNK-Build Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.2.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292934#comment-14292934 ] Xuefu Zhang commented on HIVE-9211: --- +1 pending on test. Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9397) SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS
[ https://issues.apache.org/jira/browse/HIVE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292959#comment-14292959 ] Navis commented on HIVE-9397: - CLI uses OI in fetch task to read rows, which is provided by StatsOptimizer. But beeline uses schema in SemanticAnalyzer which is acquired from RR of final FS. I think StatsOptimizer should not override return schema. SELECT max(bar) FROM foo is broken after ANALYZE ... FOR COLUMNS Key: HIVE-9397 URL: https://issues.apache.org/jira/browse/HIVE-9397 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.15.0 Reporter: Damien Carol Assignee: Navis Attachments: HIVE-9397.1.patch.txt These queries produce an error : {code:sql} DROP TABLE IF EXISTS foo; CREATE TABLE foo (id int) STORED AS ORC; INSERT INTO TABLE foo VALUES (1); INSERT INTO TABLE foo VALUES (2); INSERT INTO TABLE foo VALUES (3); INSERT INTO TABLE foo VALUES (4); INSERT INTO TABLE foo VALUES (5); SELECT max(id) FROM foo; ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS id; SELECT max(id) FROM foo; {code} The last query throws {{org.apache.hive.service.cli.HiveSQLException}} {noformat} 0: jdbc:hive2://nc-h04:1/casino SELECT max(id) FROM foo; +-+--+ | _c0 | +-+--+ org.apache.hive.service.cli.HiveSQLException: java.lang.ClassCastException 0: jdbc:hive2://nc-h04:1/casino {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293039#comment-14293039 ] Brock Noland commented on HIVE-9211: Hmm yeah that is not easy todo. I used to have ptest copy failed tests back to the master but a bad patch easily filled up the disk. [~chengxiang li] - are you testing locally on Linux or Mac? Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9462) HIVE-8577 - breaks type evolution
[ https://issues.apache.org/jira/browse/HIVE-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292781#comment-14292781 ] Hive QA commented on HIVE-9462: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694647/HIVE-9462.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7400 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_type_evolution {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2524/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2524/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2524/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694647 - PreCommit-HIVE-TRUNK-Build HIVE-8577 - breaks type evolution - Key: HIVE-9462 URL: https://issues.apache.org/jira/browse/HIVE-9462 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9462.1.patch, HIVE-9462.2.patch, type_evolution.avro If you write an avro field out as {{int}} and then change it's type to {{long}} you will get an {{UnresolvedUnionException}} due to code in HIVE-8577. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292844#comment-14292844 ] Ferdinand Xu commented on HIVE-9333: Thanks Sergio for your patch. I have left some general questions in the review board. Move parquet serialize implementation to DataWritableWriter to improve write speeds --- Key: HIVE-9333 URL: https://issues.apache.org/jira/browse/HIVE-9333 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9333.2.patch The serialize process on ParquetHiveSerDe parses a Hive object to a Writable object by looping through all the Hive object children, and creating new Writables objects per child. These final writables objects are passed in to the Parquet writing function, and parsed again on the DataWritableWriter class by looping through the ArrayWritable object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write() may be reduced to use just one loop into the DataWritableWriter.write() method in order to increment the writing process speed for Hive parquet. In order to achieve this, we can wrap the Hive object and object inspector on ParquetHiveSerDe.serialize() method into an object that implements the Writable object and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write() method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class. Writable objects are organized differently on any kind of storage formats, so I don't think it is necessary to create and keep the writable objects in the serialize() method as they won't be used until the writing process starts (DataWritableWriter.write()). This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292940#comment-14292940 ] Hive QA commented on HIVE-9211: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694685/HIVE-9211.2-spark.patch {color:red}ERROR:{color} -1 due to 40 failed/errored test(s), 7404 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_groupby2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_join1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/685/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/685/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-685/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 40 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694685 - PreCommit-HIVE-SPARK-Build Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[jira] [Commented] (HIVE-9462) HIVE-8577 - breaks type evolution
[ https://issues.apache.org/jira/browse/HIVE-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292939#comment-14292939 ] Xuefu Zhang commented on HIVE-9462: --- Patch looks good. One minor observation: if the datum is null, then the datumClazz will be null, and then the logged message will have null. I'm not sure if this is intended. HIVE-8577 - breaks type evolution - Key: HIVE-9462 URL: https://issues.apache.org/jira/browse/HIVE-9462 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9462.1.patch, HIVE-9462.2.patch, type_evolution.avro If you write an avro field out as {{int}} and then change it's type to {{long}} you will get an {{UnresolvedUnionException}} due to code in HIVE-8577. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30281: Move parquet serialize implementation to DataWritableWriter to improve write speeds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/#review69748 --- ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java https://reviews.apache.org/r/30281/#comment114550 Are these tests being replaced by something? - Brock Noland On Jan. 27, 2015, 1:39 a.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/ --- (Updated Jan. 27, 2015, 1:39 a.m.) Review request for hive, Ryan Blue, cheng xu, and Dong Chen. Bugs: HIVE-9333 https://issues.apache.org/jira/browse/HIVE-9333 Repository: hive-git Description --- This patch moves the ParquetHiveSerDe.serialize() implementation to DataWritableWriter class in order to save time in materializing data on serialize(). Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java ea4109d358f7c48d1e2042e5da299475de4a0a29 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 9caa4ed169ba92dbd863e4a2dc6d06ab226a4465 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java 060b1b722d32f3b2f88304a1a73eb249e150294b ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 41b5f1c3b0ab43f734f8a211e3e03d5060c75434 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java a693aff18516d133abf0aae4847d3fe00b9f1c96 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java 667d3671547190d363107019cd9a2d105d26d336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 007a665529857bcec612f638a157aa5043562a15 serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetWritable.java PRE-CREATION Diff: https://reviews.apache.org/r/30281/diff/ Testing --- The tests run were the following: 1. JMH (Java microbenchmark) This benchmark called parquet serialize/write methods using text writable objects. Class.method Before Change (ops/s) After Change (ops/s) --- ParquetHiveSerDe.serialize: 19,113 249,528 - 19x speed increase DataWritableWriter.write: 5,033 5,201 - 3.34% speed increase 2. Write 20 million rows (~1GB file) from Text to Parquet I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format using the following statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text; Time (s) it took to write the whole file BEFORE changes: 93.758 s Time (s) it took to write the whole file AFTER changes: 83.903 s It got a 10% of speed inscrease. Thanks, Sergio Pena
[jira] [Created] (HIVE-9475) HiveMetastoreClient.tableExists does not work
Brock Noland created HIVE-9475: -- Summary: HiveMetastoreClient.tableExists does not work Key: HIVE-9475 URL: https://issues.apache.org/jira/browse/HIVE-9475 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Brock Noland Priority: Blocker We check the return value against null returning true if the return value is null. This is reversed, we should return true if the value is not null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8890) HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator recipe
[ https://issues.apache.org/jira/browse/HIVE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-8890: --- Fix Version/s: 0.15.0 HiveServer2 dynamic service discovery: use persistent ephemeral nodes curator recipe Key: HIVE-8890 URL: https://issues.apache.org/jira/browse/HIVE-8890 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.14.1 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.15.0, 0.14.1 Attachments: HIVE-8890.1.patch, HIVE-8890.2.patch Using this recipe gives better reliability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9138) Add some explain to PTF operator
[ https://issues.apache.org/jira/browse/HIVE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293055#comment-14293055 ] Hive QA commented on HIVE-9138: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694686/HIVE-9138.2.patch.txt {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 7400 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_window org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6_subq org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_matchpath {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2529/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2529/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2529/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694686 - PreCommit-HIVE-TRUNK-Build Add some explain to PTF operator Key: HIVE-9138 URL: https://issues.apache.org/jira/browse/HIVE-9138 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-9138.1.patch.txt, HIVE-9138.2.patch.txt PTFOperator does not explain anything in explain statement, making it hard to understand the internal works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30264: HIVE-9221 enable unit test for mini Spark on YARN cluster[Spark Branch]
On 一月 26, 2015, 10:30 p.m., Xuefu Zhang wrote: data/conf/spark/yarn-client/hive-site.xml, line 225 https://reviews.apache.org/r/30264/diff/1/?file=834064#file834064line225 Only one executor? Maybe 2 will make it more general. Yes, that make sense. On 一月 26, 2015, 10:30 p.m., chengxiang li wrote: I'm wondering why we have a new set of .out files? Every Test*CliDriver has its own output directory, i didn't think much about this previously. With your remind, i think, yes, we could share the golden files with TestSparkCliDriver, as it's golden file should be the same as TestMiniSparkOnYarnCliDriver for each qtest. One thing more to be note here is that, as spark.query.files contains more than 500 qtests, and it take long enough time to run a full unit test for hive now, i didn't enable all spark.query.files qtest for TestMiniSparkOnYarnCliDriver, instead, i enable qtests from minimr,query.files for it, which contains about 50 qtests, it takes about 10 minutes in my own desktop. - chengxiang --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/#review69685 --- On 一月 26, 2015, 6:37 a.m., chengxiang li wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30264/ --- (Updated 一月 26, 2015, 6:37 a.m.) Review request for hive, Szehon Ho and Xuefu Zhang. Bugs: HIVE-9211 https://issues.apache.org/jira/browse/HIVE-9211 Repository: hive-git Description --- MiniSparkOnYarnCluster is enabled for unit test, Spark is deployed on miniYarnCluster on yarn-client mode, all qfiles in minimr.query.files are enabled in this unit test except 3 qfile: bucket_num_reducers.q, bucket_num_reducers2.q, udf_using.q, which is not supported in HoS. Diffs - data/conf/spark/hive-site.xml 016f568 data/conf/spark/standalone/hive-site.xml PRE-CREATION data/conf/spark/yarn-client/hive-site.xml PRE-CREATION itests/pom.xml e1e88f6 itests/qtest-spark/pom.xml d12fad5 itests/src/test/resources/testconfiguration.properties f583aaf itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 095b9bd ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 41a2ab7 ql/src/test/results/clientpositive/miniSparkOnYarn/auto_sortmerge_join_16.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucket4.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucket5.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucket6.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucketizedhiveinputformat.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucketmapjoin6.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/bucketmapjoin7.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/constprog_partitioner.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/disable_merge_for_bucketing.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/empty_dir_in_table.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/external_table_with_space_in_location_path.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/file_with_header_footer.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/groupby1.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/groupby2.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/import_exported_table.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/index_bitmap3.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/index_bitmap_auto.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_bucketed_table.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_dyn_part.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_map_operators.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_merge.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_num_buckets.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/infer_bucket_sort_reducers_power_two.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/input16_cc.q.out PRE-CREATION ql/src/test/results/clientpositive/miniSparkOnYarn/join1.q.out PRE-CREATION
[jira] [Updated] (HIVE-9211) Research on build mini HoS cluster on YARN for unit test[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9211: Status: Patch Available (was: Open) Research on build mini HoS cluster on YARN for unit test[Spark Branch] -- Key: HIVE-9211 URL: https://issues.apache.org/jira/browse/HIVE-9211 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M5 Attachments: HIVE-9211.1-spark.patch, HIVE-9211.2-spark.patch HoS on YARN is a common use case in product environment, we'd better enable unit test for this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9471) Bad seek in uncompressed ORC, at row-group boundary.
[ https://issues.apache.org/jira/browse/HIVE-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-9471: --- Status: Patch Available (was: Open) Bad seek in uncompressed ORC, at row-group boundary. Key: HIVE-9471 URL: https://issues.apache.org/jira/browse/HIVE-9471 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-9471.1.patch, data.txt, orc_bad_seek_failure_case.hive, orc_bad_seek_setup.hive Under at least one specific condition, using index-filters in ORC causes a bad seek into the ORC row-group. {code:title=stacktrace} java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for column 2 kind DATA to 0 is outside of the data at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) ... Caused by: java.lang.IllegalArgumentException: Seek in Stream for column 2 kind DATA to 0 is outside of the data at org.apache.hadoop.hive.ql.io.orc.InStream$UncompressedStream.seek(InStream.java:112) at org.apache.hadoop.hive.ql.io.orc.InStream$UncompressedStream.seek(InStream.java:96) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:310) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.seek(RecordReaderImpl.java:1596) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.seek(RecordReaderImpl.java:1337) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.seek(RecordReaderImpl.java:1852) {code} I'll attach the script to reproduce the problem herewith. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9471) Bad seek in uncompressed ORC, at row-group boundary.
[ https://issues.apache.org/jira/browse/HIVE-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-9471: --- Attachment: HIVE-9471.1.patch Here's a tentative patch. (Thanks for the advice, [~prasanthj].) The reader-fix prevents a seek into an empty stream. The writer-change prevents the string-output-stream from being written into the stripe. (Does it make sense to similarly suppress writing out the length-stream?) The test I posted passes. Bad seek in uncompressed ORC, at row-group boundary. Key: HIVE-9471 URL: https://issues.apache.org/jira/browse/HIVE-9471 Project: Hive Issue Type: Bug Components: File Formats, Serializers/Deserializers Affects Versions: 0.14.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-9471.1.patch, data.txt, orc_bad_seek_failure_case.hive, orc_bad_seek_setup.hive Under at least one specific condition, using index-filters in ORC causes a bad seek into the ORC row-group. {code:title=stacktrace} java.io.IOException: java.lang.IllegalArgumentException: Seek in Stream for column 2 kind DATA to 0 is outside of the data at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:227) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) ... Caused by: java.lang.IllegalArgumentException: Seek in Stream for column 2 kind DATA to 0 is outside of the data at org.apache.hadoop.hive.ql.io.orc.InStream$UncompressedStream.seek(InStream.java:112) at org.apache.hadoop.hive.ql.io.orc.InStream$UncompressedStream.seek(InStream.java:96) at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:310) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDictionaryTreeReader.seek(RecordReaderImpl.java:1596) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.seek(RecordReaderImpl.java:1337) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.seek(RecordReaderImpl.java:1852) {code} I'll attach the script to reproduce the problem herewith. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9474) truncate table changes permissions on the target
Aihua Xu created HIVE-9474: -- Summary: truncate table changes permissions on the target Key: HIVE-9474 URL: https://issues.apache.org/jira/browse/HIVE-9474 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.15.0 Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor I created a test table in beeline : create table test(a string); Permissions: # file: /user/hive/warehouse/test.db/test # owner: aryan # group: hive user::rwx group::rwx other::--x Now, in beeline : truncate table test; Permissions are now : # file: /user/hive/warehouse/test.db/test # owner: aryan # group: hive user::rwx group::r-x other::r-x Group write permissions have disappeared! -- This message was sent by Atlassian JIRA (v6.3.4#6332)