[
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17658994#comment-17658994
]
Rok Mihevc commented on ARROW-1964:
-----------------------------------
This issue has been migrated to [issue
#17951|https://github.com/apache/arrow/issues/17951] on GitHub. Please see the
[migration documentation|https://github.com/apache/arrow/issues/14542] for
further details.
> [Python] Expose Builder classes
> -------------------------------
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Uwe Korn
> Assignee: Donal Simmie
> Priority: Major
> Labels: beginner, pull-request-available
> Fix For: 0.10.0
>
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> Having the builder classes available from Python would be very helpful.
> Currently a construction of an Arrow array always need to have a Python list
> or numpy array as intermediate. As the builder in combination with jemalloc
> are very efficient in building up non-chunked memory, it would be nice to
> directly use them in certain cases.
> The most useful builders are the
> [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
> and
> [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
> as they provide functionality to create columns that are not easily
> constructed using NumPy methods in Python.
> The basic approach would be to wrap the C++ classes in
> https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
> so that they can be used from Cython. Afterwards, we should start a new file
> {{python/pyarrow/builder.pxi}} where we have classes take typical Python
> objects like {{str}} and pass them on to the C++ classes. At the end, these
> classes should also return (Python accessible) {{pyarrow.Array}} instances.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)