martin-g commented on code in PR #19599:
URL: https://github.com/apache/datafusion/pull/19599#discussion_r2657777540
##########
datafusion/common/src/stats.rs:
##########
@@ -321,6 +321,13 @@ impl Statistics {
}
}
+ /// Returns the memory size in bytes.
+ pub fn heap_size(&self) -> usize {
+ // column_statistics + num_rows + total_byte_size
+ self.column_statistics.capacity() * size_of::<ColumnStatistics>()
Review Comment:
IMO you need to iterate over the column_statistics and add their heap
allocations
https://github.com/mkleen/datafusion/blob/41b875d19d9c3671e141cec11814afd91a06a0f1/datafusion/common/src/stats.rs#L732-L736
- there are `Precision<ScalarValue>` fields and the ScalarValue enum has
variants which use `String`, `Vec` and `Box`
##########
datafusion/common/src/stats.rs:
##########
@@ -321,6 +321,13 @@ impl Statistics {
}
}
+ /// Returns the memory size in bytes.
+ pub fn heap_size(&self) -> usize {
+ // column_statistics + num_rows + total_byte_size
+ self.column_statistics.capacity() * size_of::<ColumnStatistics>()
+ + size_of::<Precision<usize>>() * 2
Review Comment:
Here `Precision<usize>` is an enum and does not have a heap allocated
fields, so it is allocated in the stack.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]