[ https://issues.apache.org/jira/browse/ARROW-15065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462943#comment-17462943 ]
Vibhatha Lakmal Abeykoon edited comment on ARROW-15065 at 12/21/21, 3:10 AM: ----------------------------------------------------------------------------- [~westonpace] with the updated description, Can we remove the exiting`nbytes` calculation logic with the function `TotalBufferSize`? /// \brief The sum of bytes in each buffer referenced by the array /// /// Note: An array may only reference a portion of a buffer. /// This method will overestimate in this case and return the /// byte size of the entire buffer. /// Note: If a buffer is referenced multiple times then it will /// only be counted once. int64_t ARROW_EXPORT TotalBufferSize(const ArrayData& array_data); /// \brief The sum of bytes in each buffer referenced by the array /// \see TotalBufferSize(const ArrayData& array_data) for details Looking into the Python code to do the existing `nbytes` calculation vs the C++ `TotalBufferSize`, there is a dictionary data calculation component which differs them? But I am not 100% sure whether I am getting the logic clearly. If these two represents two concepts, should we do a new PR to expose this or use the same PR? Just curious. was (Author: vibhatha): [~westonpace] with the updated description, Can we remove the `nbytes` calculation logic with the function `TotalBufferSize`? /// \brief The sum of bytes in each buffer referenced by the array /// /// Note: An array may only reference a portion of a buffer. /// This method will overestimate in this case and return the /// byte size of the entire buffer. /// Note: If a buffer is referenced multiple times then it will /// only be counted once. int64_t ARROW_EXPORT TotalBufferSize(const ArrayData& array_data); /// \brief The sum of bytes in each buffer referenced by the array /// \see TotalBufferSize(const ArrayData& array_data) for details > [Python][R] Expose ReferencedBufferSize to python/R > --------------------------------------------------- > > Key: ARROW-15065 > URL: https://issues.apache.org/jira/browse/ARROW-15065 > Project: Apache Arrow > Issue Type: Improvement > Components: Python, R > Reporter: Weston Pace > Assignee: Vibhatha Lakmal Abeykoon > Priority: Major > Labels: good-first-issue > > This could be a method on arrays, chunked arrays, record batches, and tables. > This method takes array offsets into account. > This should replace the existing nbytes behavior. A new method should be > created (total_buffer_bytes) that has the old behavior. We will need clear > commenting about the difference between the two of them. Both can be useful > depending on the need. -- This message was sent by Atlassian Jira (v8.20.1#820001)