joechenrh commented on PR #485:
URL: https://github.com/apache/arrow-go/pull/485#issuecomment-3226505040
The background is that we are using this library to parse parquet and
convert to our own data format. I passed a simple allocator into the reader to
track the memory usage. But it appreard that the allocated size was not zero
after the program finished.
```go
type testAllocator struct {
allocated atomic.Int64
}
func (a *testAllocator) Allocate(n int) []byte {
a.allocated.Add(int64(n))
return make([]byte, n)
}
func (a *testAllocator) Free(b []byte) {
a.allocated.Add(-int64(len(b)))
}
...
```
And I think there are some problems in the `serializedPageReader`
- The buffer to store page is get from allocator. But we only retain this
buffer for data page v2, and we are still using the memory after `buf.Release()`
- For data page v2, the free opeartion is done in the next call of `Next`.
But if we only need to read a portion of rows, we may lose the chance to
release the last page, as well as the buffer it retained.
And this is the purpose of this PR, no performance improvement, but some
refactor. But I'm not sure if it's suitable to do such refactor, could you give
me some suggestions? Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]