[ https://issues.apache.org/jira/browse/ARROW-18228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628957#comment-17628957 ]
Vadym Dytyniak commented on ARROW-18228: ---------------------------------------- [~willjones127] It helped. Do you recommend to use this strategy or it means that we exceed rate limit and should review our implementation? > AWS Error SLOW_DOWN during PutObject operation > ---------------------------------------------- > > Key: ARROW-18228 > URL: https://issues.apache.org/jira/browse/ARROW-18228 > Project: Apache Arrow > Issue Type: Bug > Affects Versions: 10.0.0 > Reporter: Vadym Dytyniak > Priority: Major > > We use Dask to parallelise read/write operations and pyarrow to write dataset > from worker nodes. > After pyarrow released version 10.0.0, our data flows automatically switched > to the latest version and some of them started to fail with the following > error: > {code:java} > File "/usr/local/lib/python3.10/dist-packages/org/store/storage.py", line > 768, in _write_partition > ds.write_dataset( > File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line > 988, in write_dataset > _filesystemdataset_write( > File "pyarrow/_dataset.pyx", line 2859, in > pyarrow._dataset._filesystemdataset_write > check_status(CFileSystemDataset.Write(c_options, c_scanner)) > File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status > raise IOError(message) > OSError: When creating key 'equities.us.level2.by_security/' in bucket > 'org-prod': AWS Error SLOW_DOWN during PutObject operation: Please reduce > your request rate. {code} > In total flow failed many times: most failed with the error above, but one > failed with: > {code:java} > File "/usr/local/lib/python3.10/dist-packages/chronos/store/storage.py", line > 857, in _load_partition > table = ds.dataset( > File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line > 752, in dataset > return _filesystem_dataset(source, **kwargs) > File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line > 444, in _filesystem_dataset > fs, paths_or_selector = _ensure_single_source(source, filesystem) > File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line > 411, in _ensure_single_source > file_info = filesystem.get_file_info(path) > File "pyarrow/_fs.pyx", line 564, in pyarrow._fs.FileSystem.get_file_info > info = GetResultValue(self.fs.GetFileInfo(path)) > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > return check_status(status) > File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status > raise IOError(message) > OSError: When getting information for key > 'ns/date=2022-10-31/channel=4/feed=A/9f41f928eedc431ca695a7ffe5fc60c2-0.parquet' > in bucket 'org-poc': AWS Error NETWORK_CONNECTION during HeadObject > operation: curlCode: 28, Timeout was reached {code} > > Do you have any idea what was changed for dataset write between 9.0.0 and > 10.0.0 to help us to fix the issue? -- This message was sent by Atlassian Jira (v8.20.10#820010)