[ https://issues.apache.org/jira/browse/ARROW-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662192#comment-17662192 ]
Rok Mihevc commented on ARROW-5169: ----------------------------------- This issue has been migrated to [issue #21648|https://github.com/apache/arrow/issues/21648] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Python] non-nullable fields are converted to nullable in > {{Table.from_pandas}} > ------------------------------------------------------------------------------- > > Key: ARROW-5169 > URL: https://issues.apache.org/jira/browse/ARROW-5169 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.13.0 > Reporter: giacomo > Assignee: Joris Van den Bossche > Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > In version 0.13.0, the {{Table.from_pandas}} function modifies the input > schema by making all non-nullable types nullable. > This can cause problems for example with this code: > {code} > df = pd.DataFrame(list(range(200)), columns=['numcol']) > schema = pa.schema([ > pa.field('numcol', pa.int64(), nullable=False), > ]) > writer = pq.ParquetWriter(io.BytesIO(), schema, version='2.0') > table = pa.Table.from_pandas(df, schema=schema) > writer.write_table(table) > {code} > Which fails due to the writer schema and the table schema being different. > I believe the direct cause could be > [https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L622] > where nullable is set to True by default, resulting in the table schema being > modified. > > Thanks for your valuable work on this library. > Giacomo -- This message was sent by Atlassian Jira (v8.20.10#820010)