[ https://issues.apache.org/jira/browse/ARROW-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-2020: -------------------------------- Summary: [Python] Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps (was: pyarrow: Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps) > [Python] Parquet segfaults if coercing ns timestamps and writing 96-bit > timestamps > ---------------------------------------------------------------------------------- > > Key: ARROW-2020 > URL: https://issues.apache.org/jira/browse/ARROW-2020 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.8.0 > Environment: OS: Mac OS X 10.13.2 > Python: 3.6.4 > PyArrow: 0.8.0 > Reporter: Diego Argueta > Priority: Major > Fix For: 0.9.0 > > Attachments: crash-report.txt > > > If you try to write a PyArrow table containing nanosecond-resolution > timestamps to Parquet using `coerce_timestamps` and > `use_deprecated_int96_timestamps=True`, the Arrow library will segfault. > The crash doesn't happen if you don't coerce the timestamp resolution or if > you don't use 96-bit timestamps. > > > *To Reproduce:* > > {code:java} > > import datetime > import pyarrow > from pyarrow import parquet > schema = pyarrow.schema([ > pyarrow.field('last_updated', pyarrow.timestamp('ns')), > ]) > data = [ > pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')), > ] > table = pyarrow.Table.from_arrays(data, ['last_updated']) > with open('test_file.parquet', 'wb') as fdesc: > parquet.write_table(table, fdesc, > coerce_timestamps='us', # 'ms' works too > use_deprecated_int96_timestamps=True){code} > > See attached file for the crash report. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)