[ https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209256#comment-14209256 ]
Hive QA commented on HIVE-7664: ------------------------------- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662948/HIVE-7664.2.patch.txt Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1764/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1764/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1764/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1764/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1639245. At revision 1639245. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12662948 - PreCommit-HIVE-TRUNK-Build > VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized > execution and takes 25% CPU > ------------------------------------------------------------------------------------------------- > > Key: HIVE-7664 > URL: https://issues.apache.org/jira/browse/HIVE-7664 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.1 > Reporter: Mostafa Mokhtar > Assignee: Gopal V > Attachments: HIVE-7664.1.patch.txt, HIVE-7664.2.patch.txt > > > In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in > VectorizedBatchUtil.addRowToBatchFrom(). > Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like > it wasn't optimized for Vectorized processing. > addRowToBatchFrom is called for every row and for each row and every column > in the batch getPrimitiveCategory is called to figure the type of each > column, column types are stored in a HashMap, for VectorGroupByOperator > columns types won't change between batches, so column types shouldn't be > looked up for every row. > I recommend storing the column type in StructObjectInspector so that other > components can leverage this optimization. > Also addRowToBatchFrom has a case statement for every row and every column > used for type casting I recommend encapsulating the type logic in templatized > methods. > {code} > Stack Trace Sample Count Percentage(%) > VectorizedBatchUtil.addRowToBatchFrom 86 26.543 > AbstractPrimitiveObjectInspector.getPrimitiveCategory() 34 10.494 > LazyBinaryStructObjectInspector.getStructFieldData 25 7.716 > StandardStructObjectInspector.getStructFieldData 4 1.235 > {code} > The query used : > {code} > select > ss_sold_date_sk > from > store_sales > where > ss_sold_date between '1998-01-01' and '1998-06-01' > group by ss_item_sk , ss_customer_sk , ss_sold_date_sk > having sum(ss_list_price) > 50000000000000; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)