Re: [PR] feat(table): roll parquet files based on actual compressed size [iceberg-go]

via GitHub Tue, 03 Mar 2026 03:22:13 -0800


twuebi commented on code in PR #759:
URL: https://github.com/apache/iceberg-go/pull/759#discussion_r2877689088



##########
table/rolling_data_writer.go:
##########
@@ -141,70 +284,79 @@ func (r *RollingDataWriter) stream(outputDataFilesCh 
chan<- iceberg.DataFile) {
        defer r.wg.Done()
        defer close(r.errorCh)
 
-       recordIter := func(yield func(arrow.RecordBatch, error) bool) {
-               for record := range r.recordCh {
-                       if !yield(record, nil) {
-                               return
-                       }
+       var currentWriter tblutils.FileWriter
+       defer func() {
+               if currentWriter != nil {
+                       currentWriter.Close()
                }
-       }
+       }()
 
-       binPackedRecords := binPackRecords(recordIter, defaultBinPackLookback, 
r.factory.targetFileSize)

Review Comment:
   My understanding is, that the binpacking was primarily used to write files 
that are equal or smaller than the target file size, not considering 
compression, with this change, we're tracking actual file sizes and no longer 
need to binpack records based on estimates. This is modeled after iceberg-javas 
`RollingFileWriter` / `BaseRollingWriter`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(table): roll parquet files based on actual compressed size [iceberg-go]

Reply via email to