[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

Hive QA (JIRA) Fri, 30 Jan 2015 05:51:08 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298629#comment-14298629
 ]


Hive QA commented on HIVE-9333:
-------------------------------



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12695450/HIVE-9333.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2585/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2585/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2585/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2585/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted '.gitignore'
Reverted 'cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java'
Reverted 'ql/src/test/resources/orc-file-dump-dictionary-threshold.out'
Reverted 'ql/src/test/resources/orc-file-has-null.out'
Reverted 'ql/src/test/resources/orc-file-dump.out'
Reverted 'ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java'
Reverted 
'ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/thirdparty 
itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target 
itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target contrib/target service/target serde/target beeline/target 
odbc/target cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/org/apache/hadoop/hive/ql/io/filters 
ql/src/test/resources/orc-file-dump-bloomfilter.out 
ql/src/test/resources/orc-file-dump-bloomfilter2.out 
ql/src/java/org/apache/hadoop/hive/ql/io/filters 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUtils.java
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1656014.

At revision 1656014.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12695450 - PreCommit-HIVE-TRUNK-Build

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-9333
>                 URL: https://issues.apache.org/jira/browse/HIVE-9333
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch, HIVE-9333.4.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

Reply via email to