[
https://issues.apache.org/jira/browse/ARROW-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172098#comment-17172098
]
Joris Van den Bossche commented on ARROW-9662:
----------------------------------------------
[~ketzer] thanks for the report!
Two questions:
- Would you be able to test this with the latest pyarrow release (1.0.0 instead
of 0.17.1)
- Can you reproduce this with a dummy data file (so eg if you generate some
random data and write this to feather)? Or is it only with the specific file in
question? And if so, what does the file look like? (or how was it written?)
> Python feather reader segfaults on some file without explicit columns
> ---------------------------------------------------------------------
>
> Key: ARROW-9662
> URL: https://issues.apache.org/jira/browse/ARROW-9662
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: Alex
> Priority: Major
> Attachments: 2020-07-30_BF43CA09E5404F9_actuals.feather.zstd
>
>
> this code fails:
> {{from pyarrow import feather}}
> {{feather.read_feather('2020-07-30_BF43CA09E5404F9_actuals.feather.zstd',
> columns=None, use_threads=bool(True))}}
>
> >>> from pyarrow import feather
> >>> feather.read_feather('2020-07-30_BF43CA09E5404F9_actuals.feather.zstd',
> columns=None, use_threads=bool(True))
> [1] 37494 segmentation fault python
> and this not
> {{from pyarrow import feather}}
> {{feather.read_feather('2020-07-30_BF43CA09E5404F9_actuals.feather.zstd',
> columns=['prediction_id', 'class'], use_threads=bool(True))}}
>
> env:
> MacOS Catalina 10.15.5
> $ python -V
> Python 3.7.0
> $ pip list
> Package Version Location
> ------------------------------------ -------------
> ------------------------------------
> adal 1.2.3
> aioredis 1.3.1
> amqp 2.5.2
> apipkg 1.5
> appdirs 1.4.3
> applicationinsights 0.11.9
> async-timeout 3.0.1
> asyncio 3.4.3
> asyncio-redis 0.15.1
> attrs 19.3.0
> azure-common 1.1.25
> azure-core 1.4.0
> azure-graphrbac 0.61.1
> azure-identity 1.2.0
> azure-mgmt-authorization 0.60.0
> azure-mgmt-containerregistry 2.8.0
> azure-mgmt-keyvault 2.2.0
> azure-mgmt-resource 8.0.1
> azure-mgmt-storage 9.0.0
> azureml-automl-core 1.3.0
> azureml-automl-runtime 1.3.0
> azureml-core 1.3.0.post2
> azureml-dataprep 1.4.3
> azureml-dataprep-native 14.1.0
> azureml-defaults 1.3.0
> azureml-explain-model 1.3.0
> azureml-interpret 1.3.0
> azureml-model-management-sdk 1.0.1b6.post1
> azureml-pipeline 1.3.0
> azureml-pipeline-core 1.3.0
> azureml-pipeline-steps 1.3.0
> azureml-sdk 1.3.0
> azureml-telemetry 1.3.0
> azureml-train 1.3.0
> azureml-train-automl 1.3.0
> azureml-train-automl-client 1.3.0
> azureml-train-automl-runtime 1.3.0
> azureml-train-core 1.3.0.post1
> azureml-train-restclients-hyperdrive 1.3.0
> backports.tempfile 1.0
> backports.weakref 1.0.post1
> beautifulsoup4 4.8.2
> billiard 3.6.3.0
> bleach 3.1.4
> boto 2.49.0
> boto3 1.12.34
> botocore 1.15.34
> cachetools 4.0.0
> celery 4.4.0
> certifi 2019.11.28
> cffi 1.14.0
> chardet 3.0.4
> click 7.1.1
> cloudpickle 1.4.1
> configparser 3.7.4
> contextlib2 0.6.0.post1
> coverage 5.0.4
> cryptography 2.9.2
> Cython 0.29.17
> dill 0.3.1.1
> distlib 0.3.0
> distro 1.5.0
> docker 4.2.0
> docutils 0.15.2
> dotnetcore2 2.1.14
> entrypoints 0.3
> execnet 1.7.1
> fastapi 0.53.2
> feather-format 0.4.1
> filelock 3.0.12
> fire 0.3.1
> flake8 3.7.9
> Flask 1.0.3
> freezegun 0.3.15
> fsspec 0.7.1
> fusepy 3.0.1
> gensim 3.8.3
> gevent 1.4.0
> google-api-core 1.16.0
> google-auth 1.12.0
> google-cloud-automl 0.10.0
> google-cloud-core 1.3.0
> google-cloud-storage 1.26.0
> google-resumable-media 0.5.0
> googleapis-common-protos 1.51.0
> greenlet 0.4.15
> grpcio 1.27.2
> gunicorn 19.9.0
> h11 0.9.0
> hiredis 1.0.1
> HLL 1.3.1
> httptools 0.1.1
> idna 2.9
> importlib-metadata 1.6.0
> interpret-community 0.9.2
> interpret-core 0.1.20
> isodate 0.6.0
> itsdangerous 1.1.0
> jeepney 0.4.3
> Jinja2 2.11.2
> jmespath 0.9.5
> joblib 0.14.1
> json-logging-py 0.2
> JsonForm 0.0.2
> jsonpickle 1.3
> jsonschema 3.2.0
> JsonSir 0.0.2
> keras2onnx 1.6.1
> keyring 21.2.0
> kombu 4.6.8
> liac-arff 2.4.0
> lightgbm 2.3.0
> lxml 4.5.0
> mangum 0.9.0
> MarkupSafe 1.1.1
> mccabe 0.6.1
> mock 4.0.2
> more-itertools 8.2.0
> msal 1.2.0
> msal-extensions 0.1.3
> msrest 0.6.13
> msrestazure 0.6.3
> multidict 4.7.5
> ndg-httpsclient 0.5.1
> nimbusml 1.7.0
> numpy 1.16.2
> oauthlib 3.1.0
> onnx 1.6.0
> onnxconverter-common 1.6.0
> onnxmltools 1.4.1
> packaging 20.3
> pandas 0.23.4
> pathspec 0.8.0
> patsy 0.5.1
> pip 10.0.1
> pkginfo 1.5.0.1
> pluggy 0.13.1
> pmdarima 1.1.1
> portalocker 1.7.0
> protobuf 3.11.3
> psutil 5.7.0
> py 1.8.1
> py-cpuinfo 5.0.0
> pyarrow 0.17.1
> pyasn1 0.4.8
> pyasn1-modules 0.2.8
> pycodestyle 2.5.0
> pycparser 2.20
> pydantic 1.4
> pyflakes 2.1.1
> Pygments 2.6.1
> PyJWT 1.7.1
> pyOpenSSL 19.1.0
> pyparsing 2.4.6
> pyrsistent 0.16.0
> pytest 5.4.1
> pytest-cov 2.8.1
> pytest-forked 1.1.3
> pytest-runner 5.2
> pytest-xdist 1.31.0
> python-dateutil 2.8.1
> python-dotenv 0.14.0
> Python-EasyConfig 0.1.7
> pytz 2019.3
> PyYAML 5.3.1
> readme-renderer 25.0
> redis 3.4.1
> requests 2.23.0
> requests-oauthlib 1.3.0
> requests-toolbelt 0.9.1
> Resource 0.2.1
> rsa 4.0
> ruamel.yaml 0.16.10
> ruamel.yaml.clib 0.2.0
> s3fs 0.4.2
> s3transfer 0.3.3
> scikit-learn 0.20.3
> scipy 1.1.0
> SecretStorage 3.1.2
> setuptools 46.1.3
> shap 0.34.0
> shortuuid 1.0.1
> simplejson 3.17.2
> six 1.14.0
> skl2onnx 1.4.9
> sklearn-pandas 1.7.0
> smart-open 1.9.0
> soupsieve 2.0
> starlette 0.13.2
> statsmodels 0.10.2
> termcolor 1.1.0
> toml 0.10.0
> tox 3.14.6
> tqdm 4.45.0
> twine 3.1.1
> typing-extensions 3.7.4.2
> urllib3 1.25.8
> uvicorn 0.11.3
> uvloop 0.14.0
> vcrpy 4.0.2
> vine 1.3.0
> virtualenv 20.0.15
> wcwidth 0.1.9
> webencodings 0.5.1
> websocket-client 0.57.0
> websockets 8.1
> Werkzeug 0.16.1
> wheel 0.30.0
> wrapt 1.12.1
> wsgi-intercept 1.9.2
> yarl 1.4.2
> zipp 3.1.0
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)