[ 
https://issues.apache.org/jira/browse/ARROW-18228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628957#comment-17628957
 ] 

Vadym Dytyniak commented on ARROW-18228:
----------------------------------------

[~willjones127] It helped. Do you recommend to use this strategy or it means 
that we exceed rate limit and should review our implementation?  

> AWS Error SLOW_DOWN during PutObject operation
> ----------------------------------------------
>
>                 Key: ARROW-18228
>                 URL: https://issues.apache.org/jira/browse/ARROW-18228
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 10.0.0
>            Reporter: Vadym Dytyniak
>            Priority: Major
>
> We use Dask to parallelise read/write operations and pyarrow to write dataset 
> from worker nodes.
> After pyarrow released version 10.0.0, our data flows automatically switched 
> to the latest version and some of them started to fail with the following 
> error:
> {code:java}
> File "/usr/local/lib/python3.10/dist-packages/org/store/storage.py", line 
> 768, in _write_partition
>     ds.write_dataset(
>   File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 
> 988, in write_dataset
>     _filesystemdataset_write(
>   File "pyarrow/_dataset.pyx", line 2859, in 
> pyarrow._dataset._filesystemdataset_write
>     check_status(CFileSystemDataset.Write(c_options, c_scanner))
>   File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
>     raise IOError(message)
> OSError: When creating key 'equities.us.level2.by_security/' in bucket 
> 'org-prod': AWS Error SLOW_DOWN during PutObject operation: Please reduce 
> your request rate. {code}
> In total flow failed many times: most failed with the error above, but one 
> failed with:
> {code:java}
> File "/usr/local/lib/python3.10/dist-packages/chronos/store/storage.py", line 
> 857, in _load_partition
>     table = ds.dataset(
>   File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 
> 752, in dataset
>     return _filesystem_dataset(source, **kwargs)
>   File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 
> 444, in _filesystem_dataset
>     fs, paths_or_selector = _ensure_single_source(source, filesystem)
>   File "/usr/local/lib/python3.10/dist-packages/pyarrow/dataset.py", line 
> 411, in _ensure_single_source
>     file_info = filesystem.get_file_info(path)
>   File "pyarrow/_fs.pyx", line 564, in pyarrow._fs.FileSystem.get_file_info
>     info = GetResultValue(self.fs.GetFileInfo(path))
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>     return check_status(status)
>   File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
>     raise IOError(message)
> OSError: When getting information for key 
> 'ns/date=2022-10-31/channel=4/feed=A/9f41f928eedc431ca695a7ffe5fc60c2-0.parquet'
>  in bucket 'org-poc': AWS Error NETWORK_CONNECTION during HeadObject 
> operation: curlCode: 28, Timeout was reached {code}
>  
> Do you have any idea what was changed for dataset write between 9.0.0 and 
> 10.0.0 to help us to fix the issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to