Hello all,

FileCommitProtocol<https://github.com/apache/spark/blob/6bbfb45ffe75aa6c27a7bf3c3385a596637d1822/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala>
 is the class to commit Spark job output (staging file & directory renaming, 
etc). During Spark 3.2 development, we added new functions into this class to 
allow more flexible output file naming (the PR detail is 
here<https://github.com/apache/spark/pull/33012>). We didn’t delete the 
existing file naming functions (newTaskTempFile(ext) & 
newTaskTempFileAbsPath(ext)), because we were aware of many other downstream 
projects or codebases already implemented their own custom implementation for 
FileCommitProtocol. Delete the existing functions would be a breaking change 
for them when upgrading Spark version, and we would like to avoid this 
unpleasant surprise for anyone if possible. But we also need to clean up legacy 
as we evolve our codebase. The newly added functions should supersede the 
legacy ones, and the cost to migrate would be fairly minimal.

So for next step, I would like to propose:

  *   Spark 3.3 (now): Add @deprecate annotation to legacy functions in 
FileCommitProtocol - 
newTaskTempFile(ext)<https://github.com/apache/spark/blob/6bbfb45ffe75aa6c27a7bf3c3385a596637d1822/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala#L98>
 & 
newTaskTempFileAbsPath(ext)<https://github.com/apache/spark/blob/6bbfb45ffe75aa6c27a7bf3c3385a596637d1822/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala#L135>.
 So developers depending on the legacy functions would notice this and take 
action to move to new functions.
  *   Next Spark major release (or whenever people feel comfortable): delete 
the legacy functions mentioned above from our codebase.

The PR to add @deprecate annotation is ready for review 
https://github.com/apache/spark/pull/35311 . Feel free to comment here or on 
the PR for further discussion.

Thanks,
Cheng Su (@c21)

Reply via email to