Will Jones created ARROW-15247: ---------------------------------- Summary: [Python] Convert array of Pandas dataframe to struct column Key: ARROW-15247 URL: https://issues.apache.org/jira/browse/ARROW-15247 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 6.0.1 Reporter: Will Jones
Currently, converting a Pandas dataframe with a column of dataframes to Arrow fails with "Could not convert <data> with type DataFrame: did not recognize Python value type when inferring an Arrow data type". We should be able to convert this to a List<Struct> array, similar to how [the R binding do it|https://arrow.apache.org/docs/r/articles/arrow.html#r-to-arrow]. This could even be bi-directional, where structs could be parsed back into a column of dataframe in {{to_pandas()}} Here is an example that currently fails: {code:python} import pandas as pd import pyarrow as pa df1 = pd.DataFrame({ 'x': [1, 2, 3], 'y': ['a', 'b', 'c'] }) df = pd.DataFrame({ 'df': [df1]*10 }) pa.Table.from_pandas(df) {code} Here's what the other directly might look like for the same data: {code:python} sub_tab = [{'x': 1, 'y': 'a'}, {'x': 2, 'y': 'b'}, {'x': 3, 'y': 'c'}] tab = pa.table({ 'df': pa.array([sub_tab]*10) }) print(tab.schema) # df: list<item: struct<x: int64, y: string>> # child 0, item: struct<x: int64, y: string> # child 0, x: int64 # child 1, y: string tab.to_pandas() {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)