Joris Van den Bossche created ARROW-6762: --------------------------------------------
Summary: [C++] JSON reader segfaults on newline Key: ARROW-6762 URL: https://issues.apache.org/jira/browse/ARROW-6762 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Joris Van den Bossche Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that trying to read this file on master results in a segfault: {code} In [1]: from pyarrow import json ...: import pyarrow.parquet as pq ...: ...: r = json.read_json('SampleRecord.jl') WARNING: Logging before InitGoogleLogging() is written to STDERR F1002 09:56:55.362766 13035 reader.cc:93] Check failed: (string_view(*next_partial).find_first_not_of(" \t\n\r")) == (string_view::npos) *** Check failure stack trace: *** Aborted (core dumped) {code} while with 0.14.1 this works fine: {code} In [24]: from pyarrow import json ...: import pyarrow.parquet as pq ...: ...: r = json.read_json('SampleRecord.jl') In [25]: r Out[25]: pyarrow.Table _type: string provider_name: string arrival: timestamp[s] berthed: timestamp[s] berth: null cargoes: list<item: struct<movement: string, product: string, volume: string, volume_unit: string, buyer: null, seller: null>> child 0, item: struct<movement: string, product: string, volume: string, volume_unit: string, buyer: null, seller: null> child 0, movement: string child 1, product: string child 2, volume: string child 3, volume_unit: string child 4, buyer: null child 5, seller: null departure: timestamp[s] eta: null installation: null port_name: string next_zone: null reported_date: timestamp[s] shipping_agent: null vessel: struct<beam: null, build_year: null, call_sign: null, dead_weight: null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: string, length: int64, mmsi: null, name: string, type: null, vessel_type: null> child 0, beam: null child 1, build_year: null child 2, call_sign: null child 3, dead_weight: null child 4, dwt: null child 5, flag_code: null child 6, flag_name: null child 7, gross_tonnage: null child 8, imo: string child 9, length: int64 child 10, mmsi: null child 11, name: string child 12, type: null child 13, vessel_type: null In [26]: pa.__version__ Out[26]: '0.14.1' {code} cc [~apitrou] [~bkietz] -- This message was sent by Atlassian Jira (v8.3.4#803005)