Joris Van den Bossche created ARROW-6762:
--------------------------------------------
Summary: [C++] JSON reader segfaults on newline
Key: ARROW-6762
URL: https://issues.apache.org/jira/browse/ARROW-6762
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Joris Van den Bossche
Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that trying
to read this file on master results in a segfault:
{code}
In [1]: from pyarrow import json
...: import pyarrow.parquet as pq
...:
...: r = json.read_json('SampleRecord.jl')
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1002 09:56:55.362766 13035 reader.cc:93] Check failed:
(string_view(*next_partial).find_first_not_of(" \t\n\r")) ==
(string_view::npos)
*** Check failure stack trace: ***
Aborted (core dumped)
{code}
while with 0.14.1 this works fine:
{code}
In [24]: from pyarrow import json
...: import pyarrow.parquet as pq
...:
...: r = json.read_json('SampleRecord.jl')
In [25]: r
Out[25]:
pyarrow.Table
_type: string
provider_name: string
arrival: timestamp[s]
berthed: timestamp[s]
berth: null
cargoes: list<item: struct<movement: string, product: string, volume: string,
volume_unit: string, buyer: null, seller: null>>
child 0, item: struct<movement: string, product: string, volume: string,
volume_unit: string, buyer: null, seller: null>
child 0, movement: string
child 1, product: string
child 2, volume: string
child 3, volume_unit: string
child 4, buyer: null
child 5, seller: null
departure: timestamp[s]
eta: null
installation: null
port_name: string
next_zone: null
reported_date: timestamp[s]
shipping_agent: null
vessel: struct<beam: null, build_year: null, call_sign: null, dead_weight:
null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo:
string, length: int64, mmsi: null, name: string, type: null, vessel_type: null>
child 0, beam: null
child 1, build_year: null
child 2, call_sign: null
child 3, dead_weight: null
child 4, dwt: null
child 5, flag_code: null
child 6, flag_name: null
child 7, gross_tonnage: null
child 8, imo: string
child 9, length: int64
child 10, mmsi: null
child 11, name: string
child 12, type: null
child 13, vessel_type: null
In [26]: pa.__version__
Out[26]: '0.14.1'
{code}
cc [~apitrou] [~bkietz]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)