[ https://issues.apache.org/jira/browse/ARROW-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-2814: -------------------------------- Summary: [Python] Type inference bug in Table.from_pandas (was: Error inferring Arrow type for Python object array. Got Python object of type dict but can only handle these types: string, bool, float, int, date, time, decimal, list, array) > [Python] Type inference bug in Table.from_pandas > ------------------------------------------------ > > Key: ARROW-2814 > URL: https://issues.apache.org/jira/browse/ARROW-2814 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.9.0 > Reporter: rob > Priority: Blocker > Fix For: 0.10.0 > > Attachments: > part-00000-8f03690f-736d-43a9-9287-6db9e228d59c.c000.gz.parquet > > > There is a problem when trying to run pa.Table.from_pandas() on a parquet > file that has a json string in it. I have attached the file to this ticket > that is the source of the problem and the code below will show the error. > h2. Reproducible code > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > pd.options.display.max_colwidth = 10000 > pq_table = > pq.read_table("part-00000-8f03690f-736d-43a9-9287-6db9e228d59c.c000.gz.parquet") > > panda_table = pq_table.to_pandas() > orginal_count = len(panda_table) > h2. Fails > table_output = pa.Table.from_pandas(panda_table) > del panda_table['payload'] > h2. Works > table_output = pa.Table.from_pandas(panda_table) > h2. Payload is the faulty column. Print out data > pq_table = > pq.read_table("part-00000-8f03690f-736d-43a9-9287-6db9e228d59c.c000.gz.parquet") > > panda_table = pq_table.to_pandas() > orginal_count = len(panda_table) > table_output = pa.Table.from_pandas(panda_table[['payload']]) > panda_table[['payload']] -- This message was sent by Atlassian JIRA (v7.6.3#76005)