[ 
https://issues.apache.org/jira/browse/ARROW-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2020:
--------------------------------
    Summary: [Python] Parquet segfaults if coercing ns timestamps and writing 
96-bit timestamps  (was: pyarrow: Parquet segfaults if coercing ns timestamps 
and writing 96-bit timestamps)

> [Python] Parquet segfaults if coercing ns timestamps and writing 96-bit 
> timestamps
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-2020
>                 URL: https://issues.apache.org/jira/browse/ARROW-2020
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>         Environment: OS: Mac OS X 10.13.2
> Python: 3.6.4
> PyArrow: 0.8.0
>            Reporter: Diego Argueta
>            Priority: Major
>             Fix For: 0.9.0
>
>         Attachments: crash-report.txt
>
>
> If you try to write a PyArrow table containing nanosecond-resolution 
> timestamps to Parquet using `coerce_timestamps` and 
> `use_deprecated_int96_timestamps=True`, the Arrow library will segfault.
> The crash doesn't happen if you don't coerce the timestamp resolution or if 
> you don't use 96-bit timestamps.
>  
>  
> *To Reproduce:*
>  
> {code:java}
>  
> import datetime
> import pyarrow
> from pyarrow import parquet
> schema = pyarrow.schema([
>     pyarrow.field('last_updated', pyarrow.timestamp('ns')),
> ])
> data = [
>     pyarrow.array([datetime.datetime.now()], pyarrow.timestamp('ns')),
> ]
> table = pyarrow.Table.from_arrays(data, ['last_updated'])
> with open('test_file.parquet', 'wb') as fdesc:
>     parquet.write_table(table, fdesc,
>                         coerce_timestamps='us',  # 'ms' works too
>                         use_deprecated_int96_timestamps=True){code}
>  
> See attached file for the crash report.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to