joellubi opened a new issue, #39789:
URL: https://github.com/apache/arrow/issues/39789

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   When using the following props for a ParquetWriter, there is a discrepancy 
between the sum of `RowGroupTotalBytesWritten()` for each `Write()` call and 
the actual number of bytes seen by the target `io.Writer` interface. 
   
   ```go
   parquetProps := parquet.NewWriterProperties(
                parquet.WithAllocator(memory.DefaultAllocator),
                parquet.WithCompression(compress.Codecs.Snappy),
                parquet.WithCompressionLevel(flate.DefaultCompression),
                parquet.WithDictionaryDefault(false),
                parquet.WithStats(false),
                   parquet.WithMaxRowGroupLength(math.MaxInt64),
        )
   arrowProps := 
pqarrow.NewArrowWriterProperties(pqarrow.WithAllocator(memory.DefaultAllocator))
   ```
   
   In this specific case, a 13 MB file had only reported about 10 MB written 
via `RowGroupTotalBytesWritten()` calls. Some of the discrepancy can be 
attributed to metadata that is not included in the row groups, but this likely 
doesn't explain the entire difference. We should investigate the root cause and 
either fix it or document the explanation for future users of this API.
   
   Related to [arrow-adbc@1456](https://github.com/apache/arrow-adbc/pull/1456)
   
   ### Component(s)
   
   Go, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to