DuanWeiFan commented on issue #511:
URL: https://github.com/apache/arrow-go/issues/511#issuecomment-3357616720

   After some investigation, I realize `RowGroupTotalCompressedBytes()` will 
get populated as soon as one of the DataPage got flushed. 
   Wondering if that makes sense to add a `totalCompressedBytes` & 
`totalBytesWritten` in FileWriter such that it can track the total number of 
bytes for every row group instead of just the current row group.
   
   The two new methods: `TotalBytesWritten()` & `TotalCompressedBytes()` will 
then be able to report the total bytes written by the FileWriter.
   ```
   type FileWriter struct {
     ...
     totalCompressedBytes int64
     totalBytesWritten    int64
   }
   // NewRowGroup does what it says on the tin, creates a new row group in the 
underlying file.
   // Equivalent to `AppendRowGroup` on a file.Writer
   func (fw *FileWriter) NewRowGroup() {
     if fw.rgw != nil {
       fw.totalCompressedBytes += fw.rgw.TotalCompressedBytes()
       fw.totalBytesWritten += fw.rgw.TotalBytesWritten()
       fw.rgw.Close()
     }
     fw.rgw = fw.wr.AppendRowGroup()
     fw.colIdx = 0
   }
   
   // NewBufferedRowGroup starts a new memory Buffered Row Group to allow 
writing columns / records
   // without immediately flushing them to disk. This allows using 
WriteBuffered to write records
   // and decide where to break your row group based on the TotalBytesWritten 
rather than on the max
   // row group len. If using Records, this should be paired with 
WriteBuffered, while
   // Write will always write a new record as a row group in and of itself.
   func (fw *FileWriter) NewBufferedRowGroup() {
     if fw.rgw != nil {
       fw.totalCompressedBytes += fw.rgw.TotalCompressedBytes()
       fw.totalBytesWritten += fw.rgw.TotalBytesWritten()
       fw.rgw.Close()
     }
     fw.rgw = fw.wr.AppendBufferedRowGroup()
     fw.colIdx = 0
   }
   
   // TotalCompressedBytes returns the total number of bytes after compression
   // that have been written to the file so far. It includes all the closed row 
groups
   // and the current row group.
   func (fw *FileWriter) TotalCompressedBytes() int64 {
     return fw.totalCompressedBytes + fw.RowGroupTotalCompressedBytes()
   }
   
   // TotalBytesWritten returns the total number of bytes
   // that have been written to the file so far. It includes all the closed row 
groups
   // and the current row group.
   func (fw *FileWriter) TotalBytesWritten() int64 {
     return fw.totalBytesWritten + fw.RowGroupTotalBytesWritten()
   }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to