[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-02-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317583#comment-14317583
 ] 

Hive QA commented on HIVE-9333:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12698234/HIVE-9333.7.patch

{color:green}SUCCESS:{color} +1 7540 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2767/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2767/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2767/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12698234 - PreCommit-HIVE-TRUNK-Build

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.5.patch, HIVE-9333.6.patch, HIVE-9333.7.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-02-05 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307770#comment-14307770
 ] 

Brock Noland commented on HIVE-9333:


+1

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.5.patch, HIVE-9333.6.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300930#comment-14300930
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12695870/HIVE-9333.6.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7407 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2612/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2612/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2612/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12695870 - PreCommit-HIVE-TRUNK-Build

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.5.patch, HIVE-9333.6.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299274#comment-14299274
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12695588/HIVE-9333.5.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7407 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2591/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2591/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2591/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12695588 - PreCommit-HIVE-TRUNK-Build

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch, HIVE-9333.4.patch, 
> HIVE-9333.5.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298629#comment-14298629
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12695450/HIVE-9333.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2585/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2585/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2585/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-2585/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted '.gitignore'
Reverted 'cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java'
Reverted 'ql/src/test/resources/orc-file-dump-dictionary-threshold.out'
Reverted 'ql/src/test/resources/orc-file-has-null.out'
Reverted 'ql/src/test/resources/orc-file-dump.out'
Reverted 'ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/StreamName.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java'
Reverted 
'ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/thirdparty 
itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target 
itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target contrib/target service/target serde/target beeline/target 
odbc/target cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/org/apache/hadoop/hive/ql/io/filters 
ql/src/test/resources/orc-file-dump-bloomfilter.out 
ql/src/test/resources/orc-file-dump-bloomfilter2.out 
ql/src/java/org/apache/hadoop/hive/ql/io/filters 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcUtils.java
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1656014.

At revision 1656014.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ 

[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-29 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298082#comment-14298082
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch.
LGTM +1

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294145#comment-14294145
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694817/HIVE-9333.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7396 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2535/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2535/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2535/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694817 - PreCommit-HIVE-TRUNK-Build

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292936#comment-14292936
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694678/HIVE-9333.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7394 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2527/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694678 - PreCommit-HIVE-TRUNK-Build

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.2.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292844#comment-14292844
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch. I have left some general questions in the review 
board.

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.2.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292502#comment-14292502
 ] 

Hive QA commented on HIVE-9333:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694605/HIVE-9333.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7373 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_parquet
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2522/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694605 - PreCommit-HIVE-TRUNK-Build

> Move parquet serialize implementation to DataWritableWriter to improve write 
> speeds
> ---
>
> Key: HIVE-9333
> URL: https://issues.apache.org/jira/browse/HIVE-9333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9333.1.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and 
> DataWritableWriter.write()  may be reduced to use just one loop into the 
> DataWritableWriter.write() method in order to increment the writing process 
> speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the 
> Writable object and thus avoid the loop that serialize() does, and leave the 
> loop parser to the DataWritableWriter.write() method. We can see how ORC does 
> this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so 
> I don't think it is necessary to create and keep the writable objects in the 
> serialize() method as they won't be used until the writing process starts 
> (DataWritableWriter.write()).
> We might save 200% of extra time by doing such change.
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)