[ 
https://issues.apache.org/jira/browse/ARROW-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-5286:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/21754

> [Python] support Structs in Table.from_pandas given a known schema
> ------------------------------------------------------------------
>
>                 Key: ARROW-5286
>                 URL: https://issues.apache.org/jira/browse/ARROW-5286
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Assignee: Joris Van den Bossche
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> ARROW-2073 implemented creating a StructArray from an array of tuples (in 
> addition to from dicts). 
> This works in {{pyarrow.array}} (specifying the proper type):
> {code}
> In [2]: df = pd.DataFrame({'tuples': [(1, 2), (3, 4)]})                       
>                                                                               
>   
> In [3]: struct_type = pa.struct([('a', pa.int64()), ('b', pa.int64())])       
>                                                                               
>   
> In [4]: pa.array(df['tuples'], type=struct_type)                              
>                                                                               
>   
> Out[4]: 
> <pyarrow.lib.StructArray object at 0x7f1b02ff6818>
> -- is_valid: all not null
> -- child 0 type: int64
>   [
>     1,
>     3
>   ]
> -- child 1 type: int64
>   [
>     2,
>     4
>   ]
> {code}
> But does not yet work when converting a DataFrame to Table while specifying 
> the type in a schema:
> {code}
> In [5]: pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)])) 
>                                                                               
>   
> ---------------------------------------------------------------------------
> KeyError                                  Traceback (most recent call last)
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
> get_logical_type(arrow_type)
>      68     try:
> ---> 69         return logical_type_map[arrow_type.id]
>      70     except KeyError:
> KeyError: 24
> During handling of the above exception, another exception occurred:
> NotImplementedError                       Traceback (most recent call last)
> <ipython-input-5-c18748f9b954> in <module>
> ----> 1 pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))
> ~/scipy/repos/arrow/python/pyarrow/table.pxi in 
> pyarrow.lib.Table.from_pandas()
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
> dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
>     483     metadata = construct_metadata(df, column_names, index_columns,
>     484                                   index_descriptors, preserve_index,
> --> 485                                   types)
>     486     return all_names, arrays, metadata
>     487 
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in construct_metadata(df, 
> column_names, index_levels, index_descriptors, preserve_index, types)
>     207         metadata = get_column_metadata(df[col_name], 
> name=sanitized_name,
>     208                                        arrow_type=arrow_type,
> --> 209                                        field_name=sanitized_name)
>     210         column_metadata.append(metadata)
>     211 
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
> get_column_metadata(column, name, arrow_type, field_name)
>     149     dict
>     150     """
> --> 151     logical_type = get_logical_type(arrow_type)
>     152 
>     153     string_dtype, extra_metadata = get_extension_dtype_info(column)
> ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
> get_logical_type(arrow_type)
>      77         elif isinstance(arrow_type, pa.lib.Decimal128Type):
>      78             return 'decimal'
> ---> 79         raise NotImplementedError(str(arrow_type))
>      80 
>      81 
> NotImplementedError: struct<a: int64, b: int64>
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to