Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21556#discussion_r202240358
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
    @@ -225,12 +316,44 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean, pushDownStartWith:
       def createFilter(schema: MessageType, predicate: sources.Filter): 
Option[FilterPredicate] = {
         val nameToType = getFieldMap(schema)
     
    +    def isDecimalMatched(value: Any, decimalMeta: DecimalMetadata): 
Boolean = value match {
    +      case decimal: JBigDecimal =>
    +        decimal.scale == decimalMeta.getScale
    +      case _ => false
    +    }
    +
    +    // Decimal type must make sure that filter value's scale matched the 
file.
    +    // If doesn't matched, which would cause data corruption.
    +    // Other types must make sure that filter value's type matched the 
file.
    --- End diff --
    
    I would say like .. Parquet's type in the given file should be matched to 
the value's type in the pushed filter in order to push down the filter to 
Parquet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to