GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/23004

    [SPARK-26004][SQL] InMemoryTable support StartsWith predicate push down

    ## What changes were proposed in this pull request?
    
    [SPARK-24638](https://issues.apache.org/jira/browse/SPARK-24638) adds 
support for Parquet file `StartsWith` predicate push down.
    `InMemoryTable` can also support this feature.
    
    
    ## How was this patch tested?
    
     unit tests and benchmark tests
    
    benchmark test result:
    ```
    
================================================================================================
    Pushdown benchmark for StringStartsWith
    
================================================================================================
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    StringStartsWith filter: (value like '10%'): Best/Avg Time(ms)    Rate(M/s) 
  Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    InMemoryTable Vectorized                    12068 / 14198          1.3      
   767.3       1.0X
    InMemoryTable Vectorized (Pushdown)           5457 / 8662          2.9      
   347.0       2.2X
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    StringStartsWith filter: (value like '1000%'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    InMemoryTable Vectorized                      5246 / 5355          3.0      
   333.5       1.0X
    InMemoryTable Vectorized (Pushdown)           2185 / 2346          7.2      
   138.9       2.4X
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    StringStartsWith filter: (value like '786432%'): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    InMemoryTable Vectorized                      5112 / 5312          3.1      
   325.0       1.0X
    InMemoryTable Vectorized (Pushdown)           2292 / 2522          6.9      
   145.7       2.2X
    ```
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-26004

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23004.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23004
    
----
commit 7bbdb0713056f387e49cf3921a226554e9af5557
Author: Yuming Wang <yumwang@...>
Date:   2018-11-11T03:56:36Z

    InMemoryTable support StartsWith predicate push down

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to