[PR] [GLUTEN-11088][VL] Fix GlutenParquetIOSuite compatibility issues for Spark 4.0 [incubator-gluten]

via GitHub Wed, 10 Dec 2025 23:15:40 -0800


baibaichen opened a new pull request, #11281:
URL: https://github.com/apache/incubator-gluten/pull/11281


   ## What changes were proposed in this pull request?
   
   This PR fixes compatibility issues in `GlutenParquetIOSuite` for Spark 4.0 
by addressing the following Spark 4.0 shim layer changes:
   
   1. **Respect `mapreduce.output.basename` configuration**: Updated 
`SparkWriteFilesCommitProtocol` to honor the `mapreduce.output.basename` 
configuration when generating output file names, aligning with 
[SPARK-49991](https://github.com/apache/spark/pull/48494).
   
   2. **Proper error handling in write operations**: Replaced direct exception 
throwing with `GlutenFileFormatWriter.throwWriteError` to use Spark's 
standardized error handling mechanism 
(`QueryExecutionErrors.taskFailedWhileWritingRowsError`).
   
   3. **Code quality improvements**:
      - Added explicit type annotations to `sparkStageId`, `sparkPartitionId`, 
and `sparkAttemptNumber` for better type safety
      - Changed `fileNames` initialization from `null` to underscore idiom 
(`_`) for cleaner Scala style
      - Migrated from deprecated `scala.collection.JavaConverters` to 
`scala.jdk.CollectionConverters`
      - Simplified `TextScan` instantiation by removing redundant `new` keyword 
(applies to Scala 3/case class patterns)
   
   4. **Test coverage**: Re-enabled 3 previously excluded tests in 
`VeloxTestSettings`:
      - `SPARK-49991: Respect 'mapreduce.output.basename' to generate file 
names`
      - `SPARK-6330 regression test`
      - `SPARK-7837 Do not close output writer twice when commitTask() fails`
   
   ## Why are the changes needed?
   
   The Spark 4.0 upgrade introduced breaking changes in the shim layer:
   - The file naming convention now supports custom basename configuration 
through `mapreduce.output.basename`
   - Error handling APIs were refactored to use centralized error builders
   - The previous direct exception throwing approach is incompatible with Spark 
4.0's error handling framework
   
   Without these changes, `GlutenParquetIOSuite` tests fail due to:
   1. Incorrect file name generation (missing basename support)
   2. Incompatible exception types when write operations fail
   3. Deprecated Scala collection conversion APIs
   
   ## How was this patch tested?
   
   - Re-enabled and verified all 3 previously excluded tests pass successfully
   - Existing GlutenParquetIOSuite tests continue to pass
   - Validated file naming with custom `mapreduce.output.basename` 
configurations
   - Confirmed error handling produces correct exception types and messages
   
   ## Related Issue
   
   Addresses #11088 (Track on Spark-4.0 failed unit tests)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [GLUTEN-11088][VL] Fix GlutenParquetIOSuite compatibility issues for Spark 4.0 [incubator-gluten]

Reply via email to