GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/21547

    [SPARK-24538][SQL] ByteArrayDecimalType support push down to the data 
sources

    ## What changes were proposed in this pull request?
    
    
[ByteArrayDecimalType](https://github.com/apache/spark/blob/e28eb431146bcdcaf02a6f6c406ca30920592a6a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L230)
 support push down to the data sources.
    
    ## How was this patch tested?
    unit tests and manual tests.
    
    **manual tests**:
    ```scala
    spark.range(10000000).selectExpr("id", "cast(id as decimal(9)) as d1", 
"cast(id as decimal(9, 2)) as d2", "cast(id as decimal(18)) as d3", "cast(id as 
decimal(18, 4)) as d4", "cast(id as decimal(38)) as d5", "cast(id as 
decimal(38, 18)) as d6").coalesce(1).write.option("parquet.block.size", 
1048576).parquet("/tmp/spark/parquet/decimal")
    val df = spark.read.parquet("/tmp/spark/parquet/decimal/")
    // Only read about 1 MB data
    df.filter("d6 = 10000").show
    // Read 174.3 MB data
    df.filter("d3 = 10000").show
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-24538

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21547
    
----
commit 96066701ec75d3caa27994c47eab8ff64150b6a5
Author: Yuming Wang <yumwang@...>
Date:   2018-06-13T01:35:55Z

    ByteArrayDecimalType support push down to the data sources

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to