LuciferYang opened a new pull request, #3450: URL: https://github.com/apache/parquet-java/pull/3450
### Rationale for this change Fixes #3011 After a write error (e.g. OOM during page flush), `InternalParquetRecordWriter` sets its `aborted` flag to true and re-throws the exception. However, subsequent calls to `write()` are silently accepted without checking this flag. Since `close()` skips flushing when `aborted` is true, all data written after the error is silently discarded, producing a corrupted Parquet file without a footer. Users only discover the corruption when they attempt to read the file later. ### What changes are included in this PR? Added an `aborted` state check at the beginning of `write()`. If the writer has been aborted due to a previous error, an `IOException` is thrown immediately with a clear error message, preventing further writes to a writer in an undefined state. ### Are these changes tested? Yes. Added `testWriteAfterAbortShouldThrow` in `TestParquetWriterError` that verifies: 1. Writing to an aborted writer throws `IOException` with the expected message 2. `close()` on an aborted writer completes without throwing All existing tests in `parquet-hadoop` pass without modification. ### Are there any user-facing changes? Yes. Users who previously caught write exceptions and continued writing to the same `ParquetWriter` will now receive an `IOException` on subsequent write attempts. This is an intentional change to prevent silent data loss — the correct behavior after a write failure is to discard the writer and create a new one. Closes #3011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
