[jira] [Comment Edited] (ARROW-15065) [Python][R] Expose ReferencedBufferSize to python/R

Vibhatha Lakmal Abeykoon (Jira) Mon, 20 Dec 2021 19:11:07 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462943#comment-17462943
 ]


Vibhatha Lakmal Abeykoon edited comment on ARROW-15065 at 12/21/21, 3:10 AM:
-----------------------------------------------------------------------------

[~westonpace] with the updated description, Can we remove the exiting`nbytes` 
calculation logic with the function `TotalBufferSize`? 

 
/// \brief The sum of bytes in each buffer referenced by the array
///
/// Note: An array may only reference a portion of a buffer.
/// This method will overestimate in this case and return the
/// byte size of the entire buffer.
/// Note: If a buffer is referenced multiple times then it will
/// only be counted once.
int64_t ARROW_EXPORT TotalBufferSize(const ArrayData& array_data);
/// \brief The sum of bytes in each buffer referenced by the array
/// \see TotalBufferSize(const ArrayData& array_data) for details
 

Looking into the Python code to do the existing `nbytes` calculation vs the C++ 
`TotalBufferSize`, there is a dictionary data calculation component which 
differs them?

But I am not 100% sure whether I am getting the logic clearly. 

If these two represents two concepts, should we do a new PR to expose this or 
use the same PR? Just curious. 

 


was (Author: vibhatha):
[~westonpace] with the updated description, Can we remove the `nbytes` 
calculation logic with the function `TotalBufferSize`? 

 
/// \brief The sum of bytes in each buffer referenced by the array
///
/// Note: An array may only reference a portion of a buffer.
/// This method will overestimate in this case and return the
/// byte size of the entire buffer.
/// Note: If a buffer is referenced multiple times then it will
/// only be counted once.
int64_t ARROW_EXPORT TotalBufferSize(const ArrayData& array_data);
/// \brief The sum of bytes in each buffer referenced by the array
/// \see TotalBufferSize(const ArrayData& array_data) for details
 

 

 

> [Python][R] Expose ReferencedBufferSize to python/R
> ---------------------------------------------------
>
>                 Key: ARROW-15065
>                 URL: https://issues.apache.org/jira/browse/ARROW-15065
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python, R
>            Reporter: Weston Pace
>            Assignee: Vibhatha Lakmal Abeykoon
>            Priority: Major
>              Labels: good-first-issue
>
> This could be a method on arrays, chunked arrays, record batches, and tables. 
>  This method takes array offsets into account.
> This should replace the existing nbytes behavior.  A new method should be 
> created (total_buffer_bytes) that has the old behavior.  We will need clear 
> commenting about the difference between the two of them.  Both can be useful 
> depending on the need.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (ARROW-15065) [Python][R] Expose ReferencedBufferSize to python/R

Reply via email to