RushabhK opened a new pull request, #9993:
URL: https://github.com/apache/incubator-gluten/pull/9993

   ## What changes were proposed in this pull request?
   
   This PR fixes the failed tasks files write issue with the Manifest 
committer: https://github.com/apache/incubator-gluten/issues/9801
   Before this, in the creation the new task attempt temporary path for 
Manifest committer, the file path creation was defaulting to the base write 
path(.spark staging directory). 
   Sample log: `1749656453822   25/06/11 15:40:53 [Executor task launch worker 
for task 206.0 in stage 2.0 (TID 1801)] ERROR VeloxColumnarWriteFilesRDD: Velox 
staging write path: 
gs://<some_path>/.spark-staging-a0487498-59b2-4317-a70f-b72f303e3bfb`
   This led to all the base files being copies to the target location as a part 
of the commit task.
   Sample log: `Velox staging write path: 
gs://<some_path>/.spark-staging-c5287b54-5545-47b5-908a-584d09787d71/_temporary/f12b689f-8508-4422-b7cd-aa79864e6428/00/tasks/attempt_202506131552457352180433024741873_0001_m_000147_148`
   
   (Fixes: \#GLUTEN-9801)
   With this fix, I upgrade the hadoop client version from `3.3.4` to `3.3.6` 
which has the `ManifestCommitter` support. I then handle the case for 
ManifestCommitter in the new task attempt temporary path creation to get the 
work path, similar to the `FileOutputCommitter`
   
   
   ## How was this patch tested?
   1. I took the gluten build with these changes, built my new spark image
   2. I have a spark job which writes parquet with 300 tasks, 8 core per 
executor is the config.
   3. While it is writing from the 300 tasks, I kill 5 of the executors (40 
failed tasks), it retries and then it finishes.
   4. I then try reading the parquet files and just do a df.count() on it for 
it to materialize. With this fix, I am no longer finding the invalid parquet 
execption while reading the files and my data is matching exactly like the 
Vanilla spark's run. Have tested this on multiple runs to validate the fix.
   5. I also added the logs for the Velox staging write path: 
https://github.com/RushabhK/incubator-gluten/blob/v1.3.0-fixes/backends-velox/src/main/scala/org/apache/spark/sql/execution/VeloxColumnarWriteFilesExec.scala#L213
   This write path is now fixed from earlies .spark staging directory to now a 
temp path inside .spark staging directory like mentioned in the above sample 
logs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to