LuciferYang edited a comment on pull request #33556: URL: https://github.com/apache/spark/pull/33556#issuecomment-889606046
> The assumption being made here is that if no objects are written, then nothing was written to the file. This is not a general assumption to make (there could be some header written for example - not just right now, but in future as we add support for other serializers/codecs/etc). https://github.com/apache/spark/blob/0ece865ea4b78f8144defcadd143fccf3dc99743/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L293-L298 https://github.com/apache/spark/blob/0ece865ea4b78f8144defcadd143fccf3dc99743/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L301-L320 I mean, if the code runs to line 318, all data has been `flush()`ed on line 311, so `revert Partial Writes` here does not change any structure of the file because the `reportedPosition = committedPosition` and `numRecordsWritten = 0` after `flush()` On the other hand, if we add some meta to the file in the future, should the empty file also need contain these meta? If yes, should we also manual `flush()` meta first before `revertPartialWritesAndClose()`? If we call `revertPartialWritesAndClose()` directly without manually `flush()` the meta first, the meta of the empty file length will also be truncate to 0 because only `flush()` change the `committedPosition`. And if we manually `flush()` the first, the `revertPartialWritesAndClose()` will no longer change the file structure and `close()` is enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org