[ 
https://issues.apache.org/jira/browse/ARROW-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Hagerman reassigned ARROW-1964:
------------------------------------

    Assignee:     (was: Alex Hagerman)

> [Python] Expose Builder classes
> -------------------------------
>
>                 Key: ARROW-1964
>                 URL: https://issues.apache.org/jira/browse/ARROW-1964
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>              Labels: beginner, pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Having the builder classes available from Python would be very helpful. 
> Currently a construction of an Arrow array always need to have a Python list 
> or numpy array as intermediate. As  the builder in combination with jemalloc 
> are very efficient in building up non-chunked memory, it would be nice to 
> directly use them in certain cases.
> The most useful builders are the 
> [StringBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L714]
>  and 
> [DictionaryBuilder|https://github.com/apache/arrow/blob/5030e235047bdffabf6a900dd39b64eeeb96bdc8/cpp/src/arrow/builder.h#L872]
>  as they provide functionality to create columns that are not easily 
> constructed using NumPy methods in Python.
> The basic approach would be to wrap the C++ classes in 
> https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd
>  so that they can be used from Cython. Afterwards, we should start a new file 
> {{python/pyarrow/builder.pxi}} where we have classes take typical Python 
> objects like {{str}} and pass them on to the C++ classes. At the end, these 
> classes should also return (Python accessible) {{pyarrow.Array}} instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to