[ https://issues.apache.org/jira/browse/ARROW-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-5169: -------------------------------- Summary: [Python] non-nullable fields are converted to nullable in {{Table.from_pandas}} (was: non-nullable fields are converted to nullable in {{Table.from_pandas}}) > [Python] non-nullable fields are converted to nullable in > {{Table.from_pandas}} > ------------------------------------------------------------------------------- > > Key: ARROW-5169 > URL: https://issues.apache.org/jira/browse/ARROW-5169 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.13.0 > Reporter: giacomo > Priority: Major > Fix For: 0.14.0 > > > In version 0.13.0, the {{Table.from_pandas}} function modifies the input > schema by making all non-nullable types nullable. > This can cause problems for example with this code: > {code} > df = pd.DataFrame(list(range(200)), columns=['numcol']) > schema = pa.schema([ > pa.field('numcol', pa.int64(), nullable=False), > ]) > writer = pq.ParquetWriter(io.BytesIO(), schema, version='2.0') > table = pa.Table.from_pandas(df, schema=schema) > writer.write_table(table) > {code} > Which fails due to the writer schema and the table schema being different. > I believe the direct cause could be > [https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L622] > where nullable is set to True by default, resulting in the table schema being > modified. > > Thanks for your valuable work on this library. > Giacomo -- This message was sent by Atlassian JIRA (v7.6.3#76005)