alamb commented on a change in pull request #505:
URL: https://github.com/apache/arrow-rs/pull/505#discussion_r660128501
##########
File path: arrow/src/array/array.rs
##########
@@ -198,10 +198,14 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual {
}
/// Returns the total number of bytes of memory occupied by the buffers
owned by this array.
Review comment:
```suggestion
/// Returns the total number of bytes of memory pointed to by this array.
/// The buffers store bytes in the Arrow memory format, and include the
data as well as the validity map.
```
The distinction between `buffers` and `physically occupied` has always been
somewhat confusing to me. Perhaps we can take this opportunity to clarify what
they mean
##########
File path: arrow/src/array/array.rs
##########
@@ -661,4 +666,63 @@ mod tests {
null_array.data().buffers()[0].len()
);
}
+
+ #[test]
+ fn test_memory_size_primitive() {
+ let arr = PrimitiveArray::<Int64Type>::from_iter_values(0..128);
+ let empty =
+
PrimitiveArray::<Int64Type>::from(ArrayData::new_empty(arr.data_type()));
+
+ // substract empty array to avoid magic numbers for the size of
additional fields
+ assert_eq!(
+ arr.get_array_memory_size() - empty.get_array_memory_size(),
Review comment:
this is a cool calculation 👍
##########
File path: arrow/src/array/array.rs
##########
@@ -198,10 +198,14 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual {
}
/// Returns the total number of bytes of memory occupied by the buffers
owned by this array.
- fn get_buffer_memory_size(&self) -> usize;
+ fn get_buffer_memory_size(&self) -> usize {
+ self.data_ref().get_buffer_memory_size()
+ }
/// Returns the total number of bytes of memory occupied physically by
this array.
Review comment:
```suggestion
/// Returns the total number of bytes of memory occupied physically by
this array.
/// This value will always be greater than returned by
`get_buffer_memory_size()` and
/// includes the overhead of the data structures that contain the
pointers to the various buffers.
```
##########
File path: arrow/src/array/data.rs
##########
@@ -354,12 +354,7 @@ impl ArrayData {
/// Returns the total number of bytes of memory occupied physically by
this [ArrayData].
pub fn get_array_memory_size(&self) -> usize {
- let mut size = 0;
- // Calculate size of the fields that don't have
[get_array_memory_size] method internally.
- size += mem::size_of_val(self)
- - mem::size_of_val(&self.buffers)
Review comment:
since `child_data` and `null_bitmap` include the size of `self` in the
results of `bitmap.get_array_memory_size()` and `child.get_array_memory_size()`
I think we still need to subtract them off.
Perhaps a pattern such as
```rust
if let Some(bitmap) = &self.null_bitmap {
size += bitmap.get_array_memory_size()
}
for child in &self.child_data {
size += child.get_array_memory_size();
}
```
would make the intent clearer
##########
File path: arrow/src/array/data.rs
##########
@@ -354,12 +354,7 @@ impl ArrayData {
/// Returns the total number of bytes of memory occupied physically by
this [ArrayData].
pub fn get_array_memory_size(&self) -> usize {
- let mut size = 0;
- // Calculate size of the fields that don't have
[get_array_memory_size] method internally.
- size += mem::size_of_val(self)
- - mem::size_of_val(&self.buffers)
Review comment:
since `child_data` and `null_bitmap` include the size of `self` in the
results of `bitmap.get_array_memory_size()` and `child.get_array_memory_size()`
I think we still need to subtract them off.
Perhaps a pattern such as *edited*
```rust
if let Some(bitmap) = &self.null_bitmap {
size += bitmap.get_array_memory_size()
size -= mem::size_of_val(&bitmap);
}
```
would make the intent clearer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]