LuciferYang edited a comment on pull request #33556:
URL: https://github.com/apache/spark/pull/33556#issuecomment-889606046


   > The assumption being made here is that if no objects are written, then 
nothing was written to the file.
   This is not a general assumption to make (there could be some header written 
for example - not just right now, but in future as we add support for other 
serializers/codecs/etc).
   
   
https://github.com/apache/spark/blob/0ece865ea4b78f8144defcadd143fccf3dc99743/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L293-L298
   
   
https://github.com/apache/spark/blob/0ece865ea4b78f8144defcadd143fccf3dc99743/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L301-L320
   
   I mean, if the code runs to line 318,  all data has been `flush()`ed on line 
311, so `revert Partial Writes` here does not change any structure of the file 
because the `reportedPosition = committedPosition` and `numRecordsWritten = 0` 
after `flush()`
   
   On the other hand, if we add some meta to the file in the future, should the 
empty file also need contain these meta? If yes, should we also manual 
`flush()` meta first before `revertPartialWritesAndClose()`?
   
   If we call `revertPartialWritesAndClose()` directly without manually 
`flush()` the meta first, the meta of the empty file length will also be 
truncate to 0 because only `flush()` change the `committedPosition`. 
   
   And if we manually `flush()` the first, the `revertPartialWritesAndClose()` 
will no longer change the file structure and `close()` is enough.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to