[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-02-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317583#comment-14317583
 ] 

Hive QA commented on HIVE-9333:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12698234/HIVE-9333.7.patch

{color:green}SUCCESS:{color} +1 7540 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2767/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2767/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2767/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12698234 - PreCommit-HIVE-TRUNK-Build

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.5.patch, HIVE-9333.6.patch, HIVE-9333.7.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-02-05 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307770#comment-14307770
 ] 

Brock Noland commented on HIVE-9333:


+1

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.5.patch, HIVE-9333.6.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300930#comment-14300930
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12695870/HIVE-9333.6.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7407 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2612/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2612/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2612/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12695870 - PreCommit-HIVE-TRUNK-Build

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.5.patch, HIVE-9333.6.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299274#comment-14299274
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12695588/HIVE-9333.5.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7407 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2591/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2591/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2591/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12695588 - PreCommit-HIVE-TRUNK-Build

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch, HIVE-9333.4.patch, 
 HIVE-9333.5.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298629#comment-14298629
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12695450/HIVE-9333.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2585/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2585/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2585/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2585/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted '.gitignore'
Reverted 'cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java'
Reverted 'ql/src/test/resources/orc-file-dump-dictionary-threshold.out'
Reverted 'ql/src/test/resources/orc-file-has-null.out'
Reverted 'ql/src/test/resources/orc-file-dump.out'
Reverted 'ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java'
Reverted 
'ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/thirdparty 
itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target 
itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target contrib/target service/target serde/target beeline/target 
odbc/target cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/org/apache/hadoop/hive/ql/io/filters 
ql/src/test/resources/orc-file-dump-bloomfilter.out 
ql/src/test/resources/orc-file-dump-bloomfilter2.out 
ql/src/java/org/apache/hadoop/hive/ql/io/filters 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUtils.java
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1656014.

At revision 1656014.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ 

[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-29 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298082#comment-14298082
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch.
LGTM +1

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294145#comment-14294145
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694817/HIVE-9333.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7396 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2535/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2535/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2535/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694817 - PreCommit-HIVE-TRUNK-Build

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292502#comment-14292502
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694605/HIVE-9333.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7373 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_parquet
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2522/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694605 - PreCommit-HIVE-TRUNK-Build

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.1.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 We might save 200% of extra time by doing such change.
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292936#comment-14292936
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694678/HIVE-9333.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7394 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2527/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694678 - PreCommit-HIVE-TRUNK-Build

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292844#comment-14292844
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch. I have left some general questions in the review 
board.

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)