[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

rdblue Wed, 04 Jul 2018 10:50:03 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21556#discussion_r200181749
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
    @@ -82,6 +120,30 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean, pushDownStartWith:
           (n: String, v: Any) => FilterApi.eq(
             intColumn(n),
             Option(v).map(date => 
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull)
    +
    +    case ParquetSchemaType(DECIMAL, INT32, decimal) if pushDownDecimal =>
    +      (n: String, v: Any) => FilterApi.eq(
    +        intColumn(n),
    +        
Option(v).map(_.asInstanceOf[JBigDecimal].unscaledValue().intValue()
    +          .asInstanceOf[Integer]).orNull)
    +    case ParquetSchemaType(DECIMAL, INT64, decimal) if pushDownDecimal =>
    +      (n: String, v: Any) => FilterApi.eq(
    +        longColumn(n),
    +        
Option(v).map(_.asInstanceOf[JBigDecimal].unscaledValue().longValue()
    +          .asInstanceOf[java.lang.Long]).orNull)
    +    // Legacy DecimalType
    +    case ParquetSchemaType(DECIMAL, FIXED_LEN_BYTE_ARRAY, decimal) if 
pushDownDecimal &&
    --- End diff --
    
    The binary used for the legacy type and for fixed-length storage should be 
the same, so I don't understand why there are two different conversion methods. 
Also, because this is using the Parquet schema now, there's no need to base the 
length of this binary on what older versions of Spark did -- in other words, if 
the underlying Parquet type is fixed, then just convert the decimal to that 
size fixed without worrying about legacy types.
    
    I think this should pass in the fixed array's length and convert the 
BigDecimal value to that length array for all cases. That works no matter what 
the file contains.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

Reply via email to