[
https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102188#comment-14102188
]
Hive QA commented on HIVE-7664:
-------------------------------
{color:red}Overall{color}: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662732/HIVE-7664.1.patch.txt
{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5819 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}
Test results:
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/401/testReport
Console output:
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/401/console
Test logs:
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-401/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12662732
> VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized
> execution and takes 25% CPU
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-7664
> URL: https://issues.apache.org/jira/browse/HIVE-7664
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.1
> Reporter: Mostafa Mokhtar
> Fix For: 0.14.0
>
> Attachments: HIVE-7664.1.patch.txt
>
>
> In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in
> VectorizedBatchUtil.addRowToBatchFrom().
> Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like
> it wasn't optimized for Vectorized processing.
> addRowToBatchFrom is called for every row and for each row and every column
> in the batch getPrimitiveCategory is called to figure the type of each
> column, column types are stored in a HashMap, for VectorGroupByOperator
> columns types won't change between batches, so column types shouldn't be
> looked up for every row.
> I recommend storing the column type in StructObjectInspector so that other
> components can leverage this optimization.
> Also addRowToBatchFrom has a case statement for every row and every column
> used for type casting I recommend encapsulating the type logic in templatized
> methods.
> {code}
> Stack Trace Sample Count Percentage(%)
> VectorizedBatchUtil.addRowToBatchFrom 86 26.543
> AbstractPrimitiveObjectInspector.getPrimitiveCategory() 34 10.494
> LazyBinaryStructObjectInspector.getStructFieldData 25 7.716
> StandardStructObjectInspector.getStructFieldData 4 1.235
> {code}
> The query used :
> {code}
> select
> ss_sold_date_sk
> from
> store_sales
> where
> ss_sold_date between '1998-01-01' and '1998-06-01'
> group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
> having sum(ss_list_price) > 50000000000000;
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)