[ 
https://issues.apache.org/jira/browse/SPARK-21314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-21314:
---------------------------------
    Labels: bulk-closed performance  (was: performance)

> ByteArrayMethods.arrayEquals could use some optimizations
> ---------------------------------------------------------
>
>                 Key: SPARK-21314
>                 URL: https://issues.apache.org/jira/browse/SPARK-21314
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Sumedh Wale
>            Priority: Minor
>              Labels: bulk-closed, performance
>
> ByteArrayMethods.arrayEquals is commonly invoked in queries especially for 
> UTF8String comparisons. It shows up as having a major contribution for many 
> kinds of queries involving string values like simple filters. An improvement 
> to the same will help quite a range of queries.
> The current implementation:
> {code}
>     int i = 0;
>     while (i <= length - 8) {
>       if (Platform.getLong(leftBase, leftOffset + i) !=
>           Platform.getLong(rightBase, rightOffset + i)) {
>         return false;
>       }
>       i += 8;
>     }
>     while (i < length) {
>       if (Platform.getByte(leftBase, leftOffset + i) !=
>           Platform.getByte(rightBase, rightOffset + i)) {
>         return false;
>       }
>       i += 1;
>     }
> {code}
> can be optimized in two ways:
> a) use getInt comparison in remaining when possible which will be much faster 
> than four byte comparisons
> b) offsets can be manipulated individually instead of adding "i" in every loop
> Above changes gives numbers like below for 15 byte strings:
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11 on Linux 4.4.0-21-generic
> Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
> compare arrayEquals:                     Best/Avg Time(ms)    Rate(M/s)   Per 
> Row(ns)   Relative
> ------------------------------------------------------------------------------------------------
> arrayEquals                                   1230 / 1255         81.3        
>   12.3       1.0X
> arrayEquals2                                   830 /  846        120.4        
>    8.3       1.5X
> {noformat}
> The gains vary from 1.2X to 1.6X for different sizes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to