This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 146f1873421 [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float 146f1873421 is described below commit 146f187342140635b83bfe775b6c327755edfbe1 Author: Brennan Stein <brennan.st...@ekata.com> AuthorDate: Mon Aug 29 10:55:30 2022 +0900 [SPARK-40212][SQL] SparkSQL castPartValue does not properly handle byte, short, or float ### What changes were proposed in this pull request? The `castPartValueToDesiredType` function now returns byte for ByteType and short for ShortType, rather than ints; also floats for FloatType rather than double. ### Why are the changes needed? Previously, attempting to read back in a file partitioned on one of these column types would result in a ClassCastException at runtime (for Byte, `java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Byte`). I can't think this is anything but a bug, as returning the correct data type prevents the crash. ### Does this PR introduce _any_ user-facing change? Yes: it changes the observed behavior when reading in a byte/short/float-partitioned file. ### How was this patch tested? Added unit test. Without the `castPartValueToDesiredType` updates, the test fails with the stated exception. === I'll note that I'm not familiar enough with the spark repo to know if this will have ripple effects elsewhere, but tests pass on my fork and since the very similar https://github.com/apache/spark/pull/36344/files only needed to touch these two files I expect this change is self-contained as well. Closes #37659 from BrennanStein/spark40212. Authored-by: Brennan Stein <brennan.st...@ekata.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- .../sql/execution/datasources/PartitioningUtils.scala | 7 +++++-- .../parquet/ParquetPartitionDiscoverySuite.scala | 17 +++++++++++++++++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala index 3cc69656bb7..d4989606927 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala @@ -530,9 +530,12 @@ object PartitioningUtils extends SQLConfHelper { case _ if value == DEFAULT_PARTITION_NAME => null case NullType => null case StringType => UTF8String.fromString(unescapePathName(value)) - case ByteType | ShortType | IntegerType => Integer.parseInt(value) + case ByteType => Integer.parseInt(value).toByte + case ShortType => Integer.parseInt(value).toShort + case IntegerType => Integer.parseInt(value) case LongType => JLong.parseLong(value) - case FloatType | DoubleType => JDouble.parseDouble(value) + case FloatType => JDouble.parseDouble(value).toFloat + case DoubleType => JDouble.parseDouble(value) case _: DecimalType => Literal(new JBigDecimal(value)).value case DateType => Cast(Literal(value), DateType, Some(zoneId.getId)).eval() diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala index fb5595322f7..6151e1d7cb1 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala @@ -1095,6 +1095,23 @@ abstract class ParquetPartitionDiscoverySuite checkAnswer(readback, Row(0, "AA") :: Row(1, "-0") :: Nil) } } + + test("SPARK-40212: SparkSQL castPartValue does not properly handle byte, short, float") { + withTempDir { dir => + val data = Seq[(Int, Byte, Short, Float)]( + (1, 2, 3, 4.0f) + ) + data.toDF("a", "b", "c", "d") + .write + .mode("overwrite") + .partitionBy("b", "c", "d") + .parquet(dir.getCanonicalPath) + val res = spark.read + .schema("a INT, b BYTE, c SHORT, d FLOAT") + .parquet(dir.getCanonicalPath) + checkAnswer(res, Seq(Row(1, 2, 3, 4.0f))) + } + } } class ParquetV1PartitionDiscoverySuite extends ParquetPartitionDiscoverySuite { --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org