mzzz-zzm opened a new issue, #961:
URL: https://github.com/apache/iceberg-go/issues/961
### Apache Iceberg version
main (development)
### Please describe the bug 🐞
## Bug
In `table/rolling_data_writer.go`, the `stream()` goroutine uses this
deferred cleanup when `currentWriter` is non-nil on the error-exit path:
```go
defer func() {
if currentWriter != nil {
currentWriter.Close() // <-- panics when zero rows have been
written
}
}()
```
`Close()` calls `pqWriter.Close()` internally, which tries to finalize the
Parquet file footer. When the writer was opened but no rows were written yet
(e.g. `openFileWriter` succeeded but the subsequent `ToRequestedSchema` or
`Write` call failed), the Arrow Parquet library panics because it cannot
finalize a footer for an emty/inconsistent file.
## Reproduction
1. Open a `RollingDataWriter` for any write operation.
2. Cause `ToRequestedSchema` (or `currentWriter.Write` to return an error on
the first batch -- so `openFileWriter` has already been called but no rows have
been flushed.
3. `stream()` returns early via `r.sendError(err); return`
4. The deferred `currentWriter.Close()` fires --> panic inside
`pqWriter.Close()`
## Root Cause
`Close()` assumes the Parquet writer is in finalizable state (at least one
row group written). On the error path this is not guaranteed.
## Fix
Replace `Close()` with a separate `Abort()` method on the `FileWriter`
inerface that closes only the underlying file handle without touching the
Parquet writer:
```go
// In FileWriter I/F
Abort() error
// Implementation (ParquetFileWriter):
func (w *ParquetFileWriter) Abort() error {
return w.fileCloser.Close()
}
// In stream() deferred cleanup:
defer func() {
if currentWriter != nil {
_ = currentWriter.Abort()
}
}()
```
`Abort()` bypasses `pqWriter.Close()` entirely, so it is always safe to call
regardless of how many rows have been written. The incomplete file left in
object storage becomes an orphan and should be cleaned up by the caller's
orphan-tracking path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]