Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r200181749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -82,6 +120,30 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: (n: String, v: Any) => FilterApi.eq( intColumn(n), Option(v).map(date => dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull) + + case ParquetSchemaType(DECIMAL, INT32, decimal) if pushDownDecimal => + (n: String, v: Any) => FilterApi.eq( + intColumn(n), + Option(v).map(_.asInstanceOf[JBigDecimal].unscaledValue().intValue() + .asInstanceOf[Integer]).orNull) + case ParquetSchemaType(DECIMAL, INT64, decimal) if pushDownDecimal => + (n: String, v: Any) => FilterApi.eq( + longColumn(n), + Option(v).map(_.asInstanceOf[JBigDecimal].unscaledValue().longValue() + .asInstanceOf[java.lang.Long]).orNull) + // Legacy DecimalType + case ParquetSchemaType(DECIMAL, FIXED_LEN_BYTE_ARRAY, decimal) if pushDownDecimal && --- End diff -- The binary used for the legacy type and for fixed-length storage should be the same, so I don't understand why there are two different conversion methods. Also, because this is using the Parquet schema now, there's no need to base the length of this binary on what older versions of Spark did -- in other words, if the underlying Parquet type is fixed, then just convert the decimal to that size fixed without worrying about legacy types. I think this should pass in the fixed array's length and convert the BigDecimal value to that length array for all cases. That works no matter what the file contains.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org