This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new c1888cdf5361 [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true c1888cdf5361 is described below commit c1888cdf53610909af996c7f41ee0cd7ee0691db Author: yangjie01 <yangji...@baidu.com> AuthorDate: Mon Dec 25 15:42:13 2023 -0800 [SPARK-40876][SQL][TESTS][FOLLOWUP] Fix failed test in `ParquetTypeWideningSuite` when `SPARK_ANSI_SQL_MODE` is set to true ### What changes were proposed in this pull request? This pr aims to change the test inputs in `ParquetTypeWideningSuite` to valid int to fix failed test in `ParquetTypeWideningSuite` when SPARK_ANSI_SQL_MODE` is set to true ### Why are the changes needed? Fix the day test failure when `SPARK_ANSI_SQL_MODE` is set to true. - https://github.com/apache/spark/actions/runs/7318074558/job/19934321639 - https://github.com/apache/spark/actions/runs/7305312703/job/19908735746 - https://github.com/apache/spark/actions/runs/7311683968/job/19921532402 ``` [info] - unsupported parquet conversion IntegerType -> TimestampType *** FAILED *** (68 milliseconds) [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 261.0 failed 1 times, most recent failure: Lost task 1.0 in stage 261.0 (TID 523) (localhost executor driver): org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value '1.23' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. I [...] [info] == DataFrame == [info] "cast" was called from [info] org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite.writeParquetFiles(ParquetTypeWideningSuite.scala:113) [info] [info] at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:145) [info] at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51) [info] at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34) [info] at org.apache.spark.sql.catalyst.util.UTF8StringUtils.toIntExact(UTF8StringUtils.scala) [info] at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) [info] at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) [info] at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) [info] at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:388) [info] at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:101) [info] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891) [info] at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891) [info] at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) [info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) [info] at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) [info] at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) [info] at org.apache.spark.scheduler.Task.run(Task.scala:141) [info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:628) [info] at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) [info] at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:96) [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:631) [info] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [info] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [info] at java.base/java.lang.Thread.run(Thread.java:840) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual check ``` SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite" ``` **Before** ``` [info] Run completed in 27 seconds, 432 milliseconds. [info] Total number of tests run: 34 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 31, failed 3, canceled 0, ignored 0, pending 0 [info] *** 3 TESTS FAILED *** [error] Failed tests: [error] org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite ``` **After** ``` [info] Run completed in 28 seconds, 880 milliseconds. [info] Total number of tests run: 31 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #44481 from LuciferYang/SPARK-40876-FOLLOWUP. Authored-by: yangjie01 <yangji...@baidu.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- .../execution/datasources/parquet/ParquetTypeWideningSuite.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala index 72580f7078e2..0a8618944241 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTypeWideningSuite.scala @@ -166,9 +166,9 @@ class ParquetTypeWideningSuite (Seq("1", "2", Int.MinValue.toString), LongType, IntegerType), (Seq("1.23", "10.34"), DoubleType, FloatType), (Seq("1.23", "10.34"), FloatType, LongType), - (Seq("1.23", "10.34"), LongType, DateType), - (Seq("1.23", "10.34"), IntegerType, TimestampType), - (Seq("1.23", "10.34"), IntegerType, TimestampNTZType), + (Seq("1", "10"), LongType, DateType), + (Seq("1", "10"), IntegerType, TimestampType), + (Seq("1", "10"), IntegerType, TimestampNTZType), (Seq("2020-01-01", "2020-01-02", "1312-02-27"), DateType, TimestampType) ) } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org