[
https://issues.apache.org/jira/browse/ARROW-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477904#comment-17477904
]
Jeroen van Straten edited comment on ARROW-7051 at 1/19/22, 2:39 PM:
---------------------------------------------------------------------
As it turns out, at least some of the functions using {{MakeArrayOfNull()}} are
mutating the resulting array (it should go without saying that this is a good
way to create garbage when different kinds of buffers share the same bit of
memory), as evidenced by the tests failing now that the new implementation just
so happens to actually return {{Buffers}} that are marked as immutable. There
was no documentation in the header about the result of {{MakeArrayOfNull()}}
being immutable, so this is not really surprising. The same is probably true
for {{MakeArrayFromScalar()}}, which calls {{MakeArrayOfNull()}} in some
special cases.
I've now made separate versions of {{MakeArrayOfNull()}} and
{{MakeArrayFromScalar()}} for mutable and immutable use cases, and am slowly
trying to figure out which of the invocations need the array to be mutable and
which don't. I'm also replacing {{MakeArrayOfNull(type, /\*length=\*/0)}}
invocations with {{MakeEmptyArray(type)}}, which seems more suitable in those
cases. I'm getting rather worried about poking around changing semantics all
over Arrow without fully understanding Arrow first, though...
was (Author: JIRAUSER282962):
As it turns out, at least some of the functions using {{MakeArrayOfNull()}} are
mutating the resulting array (it should go without saying that this is a good
way to create garbage when different kinds of buffers share the same bit of
memory), as evidenced by the tests failing now that the new implementation just
so happens to actually return {{Buffers}} that are marked as immutable. There
was no documentation in the header about the result of {{MakeArrayOfNull()}}
being immutable, so this is not really surprising. The same is probably true
for {{MakeArrayFromScalar()}}, which calls {{MakeArrayOfNull()}} in some
special cases.
I've now made separate versions of {{MakeArrayOfNull()}} and
{{MakeArrayFromScalar()}} for mutable and immutable use cases, and am slowly
trying to figure out which of the invocations need the array to be mutable and
which don't. I'm also replacing {{MakeArrayOfNull(type, /*length=*/0)}}
invocations with {{MakeEmptyArray(type)}}, which seems more suitable in those
cases. I'm getting rather worried about poking around changing semantics all
over Arrow without fully understanding Arrow first, though...
> [C++] Improve MakeArrayOfNull to support creation of multiple arrays
> --------------------------------------------------------------------
>
> Key: ARROW-7051
> URL: https://issues.apache.org/jira/browse/ARROW-7051
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 0.14.0
> Reporter: Ben Kietzman
> Assignee: Jeroen van Straten
> Priority: Minor
> Labels: beginner, good-first-issue, pull-request-available
> Time Spent: 2h
> Remaining Estimate: 0h
>
> MakeArrayOfNull reuses a single buffer of {{0}} for all buffers in the array
> it creates. It could be extended to reuse that same buffer for all buffers in
> multiple arrays. This optimization will make RecordBatchProjector and
> ConcatenateTablesWithPromotion more memory efficient
--
This message was sent by Atlassian Jira
(v8.20.1#820001)