Re: [PR] feat(parquet): utilize memory allocator in `serializedPageReader` [arrow-go]

via GitHub Tue, 26 Aug 2025 19:09:40 -0700


joechenrh commented on PR #485:
URL: https://github.com/apache/arrow-go/pull/485#issuecomment-3226505040


   The background is that we are using this library to parse parquet and 
convert to our own data format. I passed a simple allocator into the reader to 
track the memory usage. But it appreard that the allocated size was not zero 
after the program finished.
   
   ```go
   type testAllocator struct {
        allocated atomic.Int64
   }
   
   func (a *testAllocator) Allocate(n int) []byte {
        a.allocated.Add(int64(n))
        return make([]byte, n)
   }
   
   func (a *testAllocator) Free(b []byte) {
        a.allocated.Add(-int64(len(b)))
   }
   
   ...
   ```
   
   And I think there are some problems in the `serializedPageReader`
   - The buffer to store page is get from allocator. But we only retain this 
buffer for data page v2, and we are still using the memory after `buf.Release()`
   -  For data page v2, the free opeartion is done in the next call of `Next`. 
But if we only need to read a portion of rows, we may lose the chance to 
release the last page, as well as the buffer it retained.
   
   And this is the purpose of this PR, no performance improvement, but some 
refactor. But I'm not sure if it's suitable to do such refactor, could you give 
me some suggestions? Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(parquet): utilize memory allocator in `serializedPageReader` [arrow-go]

Reply via email to