RushabhK commented on PR #9844:
URL: 
https://github.com/apache/incubator-gluten/pull/9844#issuecomment-2954629022

   @JkSelf I added some logs for better visibility around what all files the 
abortTask is deleting.
   I can see in all reproducing scenarios that the abortTask is always having 0 
files: 
https://github.com/RushabhK/incubator-gluten/blob/v1.3.0-fixes/backends-velox/src/main/scala/org/apache/spark/sql/execution/SparkWriteFilesCommitProtocol.scala#L104
   Sample log:
   ```
   ERROR SparkWriteFilesCommitProtocol: Filenames info: 0 files, file names: 
   ERROR SparkWriteFilesCommitProtocol: Filenames info: 0 files, file names: 
   ```
   
   So is the code in the catchBlock suggests 0 files: 
https://github.com/RushabhK/incubator-gluten/blob/v1.3.0-fixes/backends-velox/src/main/scala/org/apache/spark/sql/execution/VeloxColumnarWriteFilesExec.scala#L244
   Sample log:
   ```
   ERROR VeloxColumnarWriteFilesRDD: Commit failed, aborting task. fileNames 
size: 0Deleting staging files
   ERROR VeloxColumnarWriteFilesRDD: Commit failed, aborting task. fileNames 
size: 0Deleting staging files
   ```
   
   I had added more logs to check the fileNames status at every point. This 
suggest the fileNames size to be 1 in all the logs: 
https://github.com/RushabhK/incubator-gluten/blob/v1.3.0-fixes/backends-velox/src/main/scala/org/apache/spark/sql/execution/VeloxColumnarWriteFilesExec.scala#L139
   Sample logs:
   ```
   ERROR VeloxColumnarWriteFilesRDD: Current filenames size: 1, filenames: 
date_key=2025-05-26/hour=00/gluten-part-b37dc941-ec8f-4a26-a189-0b9119014c8b.zstd.parquet
   ERROR VeloxColumnarWriteFilesRDD: Current filenames size: 1, filenames: 
date_key=2025-05-26/hour=00/gluten-part-c924ab72-003b-4ae0-9f90-54695ce851d4.zstd.parquet
   ERROR VeloxColumnarWriteFilesRDD: Current filenames size: 1, filenames: 
date_key=2025-05-26/hour=00/gluten-part-6cac7566-982f-486a-b918-e2c0e2bed5a2.zstd.parquet
   ERROR VeloxColumnarWriteFilesRDD: Current filenames size: 1, filenames: 
date_key=2025-05-26/hour=00/gluten-part-6cac7566-982f-486a-b918-e2c0e2bed5a2.zstd.parquet
   ```
   
   
   The problem is 0 files are being collected while calling the abortTask. This 
is the issue which needs to be addressed, this is why it's not deleting any 
files when the abort task is being called.
   @JkSelf @FelixYBW Could you think of any possible reasons for this? Do let 
me know what all steps / logs I can add for us to be able to troubleshoot this 
further.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to