aosingh opened a new issue, #478:
URL: https://github.com/apache/arrow-nanoarrow/issues/478

   Thanks to the Arrow community for developing this lightweight wrapper. 
   
   I am planning to add support for Apache Arrow in one of the projects I am 
working on. The aim is to leverage nanoarrow to support exporting tabular data 
in arrow format. 
   
   Users will have access to a function `to_arrow()`:
   
   ```python
   import nanoarrow as na
   
   def gen_name():
       for i in range(100):
           yield "John Doe"
   
   def gen_age():
       for i in range(100):
           yield 34
   
   
   def to_arrow():
       results = [na.c_array(gen_name(), na.string()), na.c_array(gen_age(), 
na.int64())]
       return results
   
   ```
   
   Users of the library can optionally install `pyarrow` and `pandas` to work 
with the exported data. And the export works fine!
   
   ```python
   import pyarrow as pa
   parray = pa.Table.from_arrays(to_arrow(), names=["name", "age"])
   print(parray.to_pandas())
   
   ```
   
   ```bash
           name  age
   0   John Doe   34
   1   John Doe   34
   2   John Doe   34
   3   John Doe   34
   4   John Doe   34
   ..       ...  ...
   95  John Doe   34
   96  John Doe   34
   97  John Doe   34
   98  John Doe   34
   99  John Doe   34
   
   [100 rows x 2 columns]
   ```
   
   Adding a third field  `timestamp` to the above list raises an error:
   
   ```python
   def gen_timestamp():
       for i in range(100):
           yield datetime.datetime.now().timestamp()
   
   result = [na.c_array(gen_name(), na.string()),
             na.c_array(gen_age(), na.int64()),
             na.c_array(gen_timestamp(), na.timestamp("s"))]
   
   parray = pa.Table.from_arrays(result, names=["name", "age", "timestamp"])
   
   print(parray.to_pandas())
   
   ``` 
   
   Error:
   ```
   Traceback (most recent call last):
     File "/Users/as/nanoarrow/simple.py", line 28, in <module>
       na.c_array(gen_timestamp(), na.timestamp("s"))]
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/Users/as/nanoarrow/arrow-nanoarrow/python/src/nanoarrow/c_array.py", line 
131, in c_array
       raise ValueError(
   ValueError: An error occurred whilst converting generator to 
nanoarrow.c_array: 
    Can't build array of type timestamp from iterable
   ```
   
   I understand the source of error is the 
[mapping](https://github.com/apache/arrow-nanoarrow/blob/main/python/src/nanoarrow/c_array.py#L531C1-L552)
 maintained for each datatype. 
   
   How can I add support to incrementally build arrays for more datatypes ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to