[ 
https://issues.apache.org/jira/browse/ARROW-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758028#comment-16758028
 ] 

Uwe L. Korn commented on ARROW-4437:
------------------------------------

[~TennyZhuang] Yes, it would be nice to have the builder APIs exposed in 
Python. This is a really good beginner task. Would you like to have some 
guidance on how to expose them?

> [Python] Add builder API
> ------------------------
>
>                 Key: ARROW-4437
>                 URL: https://issues.apache.org/jira/browse/ARROW-4437
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>         Environment: Python 3.7.0 pyarrow-0.12.0
>            Reporter: Zhuang Tianyi
>            Priority: Minor
>
> There is no [Array 
> Builder|https://arrow.apache.org/docs/cpp/api/builder.html#_CPPv3N5arrow12ArrayBuilderE]
>  API in python bindings. When I generate data from a stream, I have to build 
> a python list (high overhead) or pandas, then finalize it by call pa.array 
> with copy operation. It seems like that we can build an Array directly from 
> some (two or three) pa.ResizableBuffer in O(1) time.
> It's possible that maintain these buffers (value buffer, null bitmap, offset 
> buffer) manually by current exported API, but not safe enough.
>  
> I found undocumented StringBuilder API in 
> [python/pyarrow/builder.pxi|https://github.com/apache/arrow/blob/master/python/pyarrow/builder.pxi],
>  corresponding to 
> [https://arrow.apache.org/docs/cpp/api/builder.html#classarrow_1_1_string_builder].
>  Will other ArrayBuilder APIs to be add in python binding?
>  
> ----
> Something more
> a BatchBuilder API is better if possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to