RushabhK opened a new pull request, #9993: URL: https://github.com/apache/incubator-gluten/pull/9993
## What changes were proposed in this pull request? This PR fixes the failed tasks files write issue with the Manifest committer: https://github.com/apache/incubator-gluten/issues/9801 Before this, in the creation the new task attempt temporary path for Manifest committer, the file path creation was defaulting to the base write path(.spark staging directory). Sample log: `1749656453822 25/06/11 15:40:53 [Executor task launch worker for task 206.0 in stage 2.0 (TID 1801)] ERROR VeloxColumnarWriteFilesRDD: Velox staging write path: gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb` This led to all the base files being copies to the target location as a part of the commit task. Sample log: `Velox staging write path: gs://<some_path>/.spark-staging-c5287b54-5545-47b5-908a-584d09787d71/_temporary/f12b689f-8508-4422-b7cd-aa79864e6428/00/tasks/attempt_202506131552457352180433024741873_0001_m_000147_148` (Fixes: \#GLUTEN-9801) With this fix, I upgrade the hadoop client version from `3.3.4` to `3.3.6` which has the `ManifestCommitter` support. I then handle the case for ManifestCommitter in the new task attempt temporary path creation to get the work path, similar to the `FileOutputCommitter` ## How was this patch tested? 1. I took the gluten build with these changes, built my new spark image 2. I have a spark job which writes parquet with 300 tasks, 8 core per executor is the config. 3. While it is writing from the 300 tasks, I kill 5 of the executors (40 failed tasks), it retries and then it finishes. 4. I then try reading the parquet files and just do a df.count() on it for it to materialize. With this fix, I am no longer finding the invalid parquet execption while reading the files and my data is matching exactly like the Vanilla spark's run. Have tested this on multiple runs to validate the fix. 5. I also added the logs for the Velox staging write path: https://github.com/RushabhK/incubator-gluten/blob/v1.3.0-fixes/backends-velox/src/main/scala/org/apache/spark/sql/execution/VeloxColumnarWriteFilesExec.scala#L213 This write path is now fixed from earlies .spark staging directory to now a temp path inside .spark staging directory like mentioned in the above sample logs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
