Re: [PR] GH-38836:[Go] Add Size() for ArrayData [arrow]

via GitHub Mon, 27 Nov 2023 07:56:20 -0800


zeroshade commented on code in PR #38839:
URL: https://github.com/apache/arrow/pull/38839#discussion_r1406370169



##########
go/arrow/array/data.go:
##########
@@ -190,9 +190,32 @@ func (d *Data) SetDictionary(dict arrow.ArrayData) {
        }
 }
 
+// Returns the size of Data in bytes
+func (d *Data) SizeInBytes() uint64 {

Review Comment:
   The convention for godoc is that the comment should start with the name of 
the method, i.e. `// SizeInBytes returns ....`
   
   I think we should also include the fact that this does the recursion into 
the children in the comment so that it's obvious in the documentation and users 
don't have to read the code to find that out.



##########
go/arrow/array/data.go:
##########
@@ -190,9 +190,32 @@ func (d *Data) SetDictionary(dict arrow.ArrayData) {
        }
 }
 
+// Returns the size of Data in bytes
+func (d *Data) SizeInBytes() uint64 {
+       var size uint64
+
+       if d == nil {
+               return 0
+       }
+
+       for _, b := range d.Buffers() {
+               size += uint64(b.Len())
+       }

Review Comment:
   should the documentation say that the return of `SizeInBytes` is an 
upper-bound since we don't take into account any offset?
   
   The reason i say this is that taking into account the offset would actually 
be quite complex since the offset value is based on the element, not the raw 
bytes, and thus would make this computation significantly more complex. I think 
it's not particularly worthwhile to incorporate the offset into this and is 
good enough to say that the `SizeInBytes` is just an upper-bound if offset is 0.



##########
go/arrow/array.go:
##########
@@ -81,6 +81,8 @@ type ArrayData interface {
        // Dictionary returns the ArrayData object for the dictionary if this 
is a
        // dictionary array, otherwise it will be nil.
        Dictionary() ArrayData
+       // SizeInBytes() returns the size of the ArrayData in bytes.

Review Comment:
   ```suggestion
        // SizeInBytes returns the size of the ArrayData buffers and any 
children and/or dictionary in bytes.
   ```



##########
go/arrow/array/data_test.go:
##########
@@ -49,3 +49,46 @@ func TestDataReset(t *testing.T) {
                data.Reset(&arrow.Int64Type{}, 5, data.Buffers(), nil, 1, 2)
        }
 }
+
+func TestSizeInBytes(t *testing.T) {
+
+       var buffers1 = make([]*memory.Buffer, 0, 3)
+
+       for i := 0; i < cap(buffers1); i++ {
+               buffers1 = append(buffers1, 
memory.NewBufferBytes([]byte("15-bytes-buffer")))
+       }
+
+       // test 1: data with buffers only
+       data := NewData(&arrow.StringType{}, 10, buffers1, nil, 0, 0)
+       expectedSize := uint64(45)
+       if actualSize := data.SizeInBytes(); actualSize != expectedSize {
+               t.Errorf("expected size %d, got %d", expectedSize, actualSize)
+       }

Review Comment:
   can you wrap these with `t.Run(<test name>, func(t *testing.T) {` so they 
show up as individual tests which can be run/debugged independantly? It makes 
it easier to find and diagnose if a test fails.



##########
go/arrow/array/data_test.go:
##########
@@ -49,3 +49,46 @@ func TestDataReset(t *testing.T) {
                data.Reset(&arrow.Int64Type{}, 5, data.Buffers(), nil, 1, 2)
        }
 }
+
+func TestSizeInBytes(t *testing.T) {

Review Comment:
   we should add a test for a sliced array, a sliced array with children, and 
an array with children which are slices. 
   
   As per my comment above, it should confirm that `SizeInBytes` returns values 
that are upper bounds, not taking the offset into account.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-38836:[Go] Add Size() for ArrayData [arrow]

Reply via email to