[ https://issues.apache.org/jira/browse/ARROW-10910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322162#comment-17322162 ]
Alessandro Molina commented on ARROW-10910: ------------------------------------------- It seems to me that it no longer causes a segfault anymore. Nor with the legacy implementation: ``` >>> pq.read_table(None, use_legacy_dataset=True) ... File "pyarrow/io.pxi", line 1474, in pyarrow.lib.get_reader reader[0] = nf.get_random_access_file() AttributeError: 'NoneType' object has no attribute 'get_random_access_file' ``` Nor when explicitly making a `ParquetFile` ``` >>> pq.ParquetFile(None) ... File "pyarrow/io.pxi", line 1474, in pyarrow.lib.get_reader reader[0] = nf.get_random_access_file() AttributeError: 'NoneType' object has no attribute 'get_random_access_file' ``` I guess a possible improvement would be to unsupported arguments in `io.get_native_file` and throw a `ValueError` there instead of propagating the `None` value. > [Python] Segmentation Fault when None given to read_table with legacy dataset > ----------------------------------------------------------------------------- > > Key: ARROW-10910 > URL: https://issues.apache.org/jira/browse/ARROW-10910 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.17.0 > Environment: python: 3.8.3.final.0 > python-bits: 64 > OS: Linux > OS-release: 5.4.0-56-generic > machine: x86_64 > processor: x86_64 > byteorder: little > LC_ALL: None > LANG: en_US.UTF-8 > LOCALE: en_US.UTF-8 > pyarrow: 0.17.0 > Reporter: Charles Burkland > Assignee: Ian Cook > Priority: Major > Labels: Bug:Generic, Python3, Segmenation_Fault, pyarrow > Fix For: 5.0.0 > > > h3. Code Sample (copy-pasteable) > {code:python} > import pyarrow.parquet as pq > pq.read_table(None) > {code} > h3. Description > The above snippet will produce a Segmentation Fault, which is highly > undesirable. The reason I discovered this, was I had a function that was > supposed to return a file path, but on my first iteration I forgot to return. > Thus, when I ran my module with > {code:python} > pq.read_table(generate_fp()){code} > it produced a Segmentation Fault. > h3. Expected Output > Ideally this will raise an *ValueError*, indicating to the user that *None* > is an invalid source/file path. In my opinion, this is much more desirable > than a violent segfault. -- This message was sent by Atlassian Jira (v8.3.4#803005)