[jira] [Commented] (ARROW-2082) [Python] SegFault in pyarrow.parquet.write_table with specific options

ASF GitHub Bot (JIRA) Mon, 16 Apr 2018 11:38:38 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439852#comment-16439852
 ]


ASF GitHub Bot commented on ARROW-2082:
---------------------------------------

joshuastorck opened a new pull request #456: ARROW-2082: Prevent segfault that 
was occurring when writing a nanosecond timestamp with arrow writer properties 
set to coerce timestamps and support deprecated int96 timestamps.
URL: https://github.com/apache/parquet-cpp/pull/456
 
 
   The bug was a due to the fact that the physical type was int64 but the 
WriteTimestamps function was taking a path that assumed the physical type was 
int96. This caused memory corruption because it was writing past the end of the 
array. The bug was fixed by checking that coerce timestamps is disabled when 
writing int96. 
   
   A unit test was added for the regression.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] SegFault in pyarrow.parquet.write_table with specific options
> ----------------------------------------------------------------------
>
>                 Key: ARROW-2082
>                 URL: https://issues.apache.org/jira/browse/ARROW-2082
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>         Environment: tested on MacOS High Sierra with python 3.6 and Ubuntu 
> Xenial (Python 3.5)
>            Reporter: Clément Bouscasse
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> I originally filed an issue in the pandas project but we've tracked it down 
> to arrow itself, when called via pandas in specific circumstances:
> [https://github.com/pandas-dev/pandas/issues/19493]
> basically using
> {code:java}
>  df.to_parquet('filename.parquet', flavor='spark'){code}
> gives a seg fault if `df` contains a datetime column.
> Under the covers,  pandas translates this to the following call:
> {code:java}
> pq.write_table(table, 'output.parquet', flavor='spark', compression='snappy', 
> coerce_timestamps='ms')
> {code}
> which gives me an instant crash.
> There is a repro on the github ticket.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2082) [Python] SegFault in pyarrow.parquet.write_table with specific options

Reply via email to