[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-11 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681883#comment-14681883
 ] 

Teddy Choi commented on HIVE-11521:
---

Sorry. The benchmark result above shows the difference between whether AVX=0 
option is applied or not. The result bellow shows the difference between 
before/after applying this patch. 260% ~ 300% performance improvement in 
repeating cases.

Before this patch:
{noformat}
Benchmark   
Mode  Samples   Score   Error  Units
o.a.h.b.v.VectorizationBench.ColAndColBench.bench   
avgt2   169346433.000 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.ColAndRepeatingColBench.bench  
avgt2   503688769.000 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.ColOrColBench.bench
avgt2   184679292.500 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.ColOrRepeatingColBench.bench   
avgt2   522471397.500 ±   NaN  ns/op
...
o.a.h.b.v.VectorizationBench.NotColBench.bench  
avgt2   154808036.000 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.RepeatingColAndColBench.bench  
avgt2   478369669.500 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.RepeatingColOrColBench.bench   
avgt2   514816574.000 ±   NaN  ns/op
{noformat}

After this patch:
{noformat}
Benchmark   
Mode  Samples   Score   Error  Units
o.a.h.b.v.VectorizationBench.ColAndColBench.bench   
avgt2   171249531.500 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.ColAndRepeatingColBench.bench  
avgt2   130488848.500 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.ColOrColBench.bench
avgt2   168669206.500 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.ColOrRepeatingColBench.bench   
avgt2   128768271.500 ±   NaN  ns/op
...
o.a.h.b.v.VectorizationBench.NotColBench.bench  
avgt2   149107679.000 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.RepeatingColAndColBench.bench  
avgt2   138933203.000 ±   NaN  ns/op
o.a.h.b.v.VectorizationBench.RepeatingColOrColBench.bench   
avgt2   140216834.500 ±   NaN  ns/op
{noformat}

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-11 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681969#comment-14681969
 ] 

Ferdinand Xu commented on HIVE-11521:
-

The result looks awesome. Do you have any data about the performance 
improvement when using HQL to retrieve a data set?

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-11 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682023#comment-14682023
 ] 

Teddy Choi commented on HIVE-11521:
---

[~Ferd] I don't have its HQL performance data yet. But I can try in near future.

Here's my rough expectation. There are many other steps to execute Hive query, 
including query planning, disk access, and network traffic. So Hive latency may 
not benefit this much from SIMD optimization. Meanwhile, Hive CPU load may be 
reduced to about 1/4 of previous versions. It means you can do more analytics 
on same machines.

When I try it, I will share its result with you. Thank you. :)

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-11 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682057#comment-14682057
 ] 

Ferdinand Xu commented on HIVE-11521:
-

Thank you for your reply. The reason I raise my question is that we didn't find 
much improvements in the "real" case even though we have a better score in 
micro benchmark.  :)

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-11 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682074#comment-14682074
 ] 

Teddy Choi commented on HIVE-11521:
---

[~Ferd] I see. I will try some benchmark test at GB~TB scale in some day to see 
real performance improvements. Thanks.

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-11 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682246#comment-14682246
 ] 

Teddy Choi commented on HIVE-11521:
---

[~Ferd]. Please check your configuration and distribution. When I use Hive on 
Tez, it shows twelve vectorizations on its execution plan on TPC-H Q2. In 
contrast, Hive on MapReduce shows only two vectorizations on its execution plan 
just for local works. Some Hadoop distribution (such as Hortonworks Data 
Platform) ships with Apache Tez, while some other distributions don't.

{code}
set hive.execution.engine=tez;
set hive.vectorized.execution.enabled=true;
{code}

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682308#comment-14682308
 ] 

Gopal V commented on HIVE-11521:


Repeating data is a click-stream quirk - the TPC benchmarks rarely show up with 
long sequences of de-normalized identical data.

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692247#comment-14692247
 ] 

Hive QA commented on HIVE-11521:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749842/HIVE-11521.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9348 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4924/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4924/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4924/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749842 - PreCommit-HIVE-TRUNK-Build

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11521) Loop optimization for SIMD in logical operators

2015-08-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708756#comment-14708756
 ] 

Ashutosh Chauhan commented on HIVE-11521:
-

+1

> Loop optimization for SIMD in logical operators
> ---
>
> Key: HIVE-11521
> URL: https://issues.apache.org/jira/browse/HIVE-11521
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11521.patch
>
>
> JVM is quite strict on the code schema which may executed with SIMD 
> instructions, take a loop in ColOrCol.java for example,
> {code}
> for (int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] | vector2[i];
> }
> {code}
> The "vector1\[0\]" reference would prevent JVM to execute this part of code 
> with vectorized instructions, we need to assign the "vector1\[0\]" to a 
> variable outside of loop, and use that variable in loop.
> This issues covers AND, OR, NOT logical operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)