[
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-1964:
----------------------------------
Labels: beginner pull-request-available (was: beginner)
> [Python] Expose Builder classes
> -------------------------------
>
> Key: ARROW-1964
> URL: https://issues.apache.org/jira/browse/ARROW-1964
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Uwe L. Korn
> Priority: Major
> Labels: beginner, pull-request-available
> Fix For: 1.0.0
>
>
> Having the builder classes available from Python would be very helpful.
> Currently a construction of an Arrow array always need to have a Python list
> or numpy array as intermediate. As the builder in combination with jemalloc
> are very efficient in building up non-chunked memory, it would be nice to
> directly use them in certain cases.
> The most useful builders are the
> [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
> and
> [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
> as they provide functionality to create columns that are not easily
> constructed using NumPy methods in Python.
> The basic approach would be to wrap the C++ classes in
> https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
> so that they can be used from Cython. Afterwards, we should start a new file
> {{python/pyarrow/builder.pxi}} where we have classes take typical Python
> objects like {{str}} and pass them on to the C++ classes. At the end, these
> classes should also return (Python accessible) {{pyarrow.Array}} instances.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)