[ https://issues.apache.org/jira/browse/ARROW-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927613#comment-16927613 ]
david cottrell commented on ARROW-5156: --------------------------------------- I had a go at the pandas change but was hitting other errors to do with schemas. For now, people should know that a workaround is simply to do something like this at the lower level: {code:python} table = pa.Table.from_pandas(df, preserve_index=preserve_index) filesystem = None if _is_s3_path(filename): import s3fs filesystem = s3fs.S3FileSystem() pq.write_to_dataset(table, root_path=filename, partition_cols=partition_cols, preserve_index=preserve_index, use_dictionary=True, filesystem=filesystem) {code} > [Python] `df.to_parquet('s3://...', partition_cols=...)` fails with > `'NoneType' object has no attribute '_isfilestore'` > ----------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-5156 > URL: https://issues.apache.org/jira/browse/ARROW-5156 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.12.1 > Environment: Mac, Linux > Reporter: Victor Shih > Priority: Major > Labels: parquet > Fix For: 1.0.0 > > > According to > [https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#partitioning-parquet-files], > writing a parquet to S3 with `partition_cols` should work, but it fails for > me. Example script: > {code:java} > import pandas as pd > import sys > print(sys.version) > print(pd._version_) > df = pd.DataFrame([{'a': 1, 'b': 2}]) > df.to_parquet('s3://my_s3_bucket/x.parquet', engine='pyarrow') > print('OK 1') > df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], > engine='pyarrow') > print('OK 2') > {code} > Output: > {noformat} > 3.5.2 (default, Feb 14 2019, 01:46:27) > [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] > 0.24.2 > OK 1 > Traceback (most recent call last): > File "./t.py", line 14, in <module> > df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], > engine='pyarrow') > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", > line 2203, in to_parquet > partition_cols=partition_cols, **kwargs) > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", > line 252, in to_parquet > partition_cols=partition_cols, **kwargs) > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", > line 118, in write > partition_cols=partition_cols, **kwargs) > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", > line 1227, in write_to_dataset > _mkdir_if_not_exists(fs, root_path) > File > "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", > line 1182, in _mkdir_if_not_exists > if fs._isfilestore() and not fs.exists(path): > AttributeError: 'NoneType' object has no attribute '_isfilestore' > {noformat} > > Original issue - [https://github.com/apache/arrow/issues/4030] -- This message was sent by Atlassian Jira (v8.3.2#803003)