[ https://issues.apache.org/jira/browse/ARROW-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927800#comment-16927800 ]
Joris Van den Bossche commented on ARROW-6520: ---------------------------------------------- [~wesmckinn] ah, I started looking at it as well. Sorry should have noted here (only mentioned it on the standup). Now, I went away for dinner so didn't do anything worthy of actual code yet, but I was planning to: - accept `pyarrow.Array` in `pyarrow.array` and if a `type` is specified try to `cast` it to that type (which will still mean that the example of the top post will now fail, as we don't have a cast from var binary to fixed size binary). This should then also fix `from_dict` using `pyarrow.array(.. type=type)` on all input. - switch the order of the `names` and `schema` keywords of the `pyarrow.table` factory function to preserve backwards compatibility. > [Python] Segmentation fault on writing tables with fixed size binary fields > ---------------------------------------------------------------------------- > > Key: ARROW-6520 > URL: https://issues.apache.org/jira/browse/ARROW-6520 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.1 > Environment: python(3.7.3), pyarrow(0.14.1), arrow-cpp(0.14.1), > parquet-cpp(1.5.1), Arch Linux x86_64 > Reporter: Furkan Tektas > Assignee: Wes McKinney > Priority: Critical > Labels: newbie > Fix For: 0.15.0 > > > I'm not sure if this should be reported to Parquet or here. > When I tried to serialize a pyarrow table with a fixed size binary field > (holds 16 byte UUID4 information) to a parquet file, segmentation fault > occurs. > Here is the minimal example to reproduce: > {{import pyarrow as pa}} > {{from pyarrow import parquet as pq}} > {{data = \{"col": pa.array([b"1234" for _ in range(10)])}}} > {{fields = [("col", pa.binary(4))]}} > {{schema = pa.schema(fields)}} > {{table = pa.table(data, schema)}} > {{pq.write_table(table, "test.parquet")}} > {{segmentation fault (core dumped) ipython}} > > Yet, it works if I don't specify the size of the binary field. > {{import pyarrow as pa}} > {{from pyarrow import parquet as pq}} > {{data = \{"col": pa.array([b"1234" for _ in range(10)])}}} > {{fields = [("col", pa.binary())]}} > {{schema = pa.schema(fields)}} > {{table = pa.table(data, schema)}} > {{pq.write_table(table, "test.parquet")}} > Thanks, -- This message was sent by Atlassian Jira (v8.3.2#803003)