[ 
https://issues.apache.org/jira/browse/ARROW-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2002:
----------------------------------
    Labels: pull-request-available  (was: )

> use pyarrow download file will raise queue.Full exceptions sometimes
> --------------------------------------------------------------------
>
>                 Key: ARROW-2002
>                 URL: https://issues.apache.org/jira/browse/ARROW-2002
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>         Environment: operating system: all
> platform: all
>            Reporter: kmiku7
>            Priority: Major
>              Labels: pull-request-available
>
> When we download file from hdfs, if the speed writer thread write data is 
> slower than read speed, download() will raise queue.Fulll exceptions, because 
> write_queue is full.
> I think when we download file, we can wait until write_queue has space to 
> enqueue new item if writer_thread is alive. Like what upload() does.
> {code}
> >>> import pyarrow as pa
> >>> cli = pa.hdfs.connect(user='USERNAME')
> >>> cli.download('/REMOTE/HDFS/PATH', '/LOCAL/FILE/PATH')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyarrow/io-hdfs.pxi", line 428, in 
> pyarrow.lib.HadoopFileSystem.download 
> (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66399)
>   File "pyarrow/io-hdfs.pxi", line 429, in 
> pyarrow.lib.HadoopFileSystem.download 
> (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66351)
>   File "pyarrow/io.pxi", line 315, in pyarrow.lib.NativeFile.download 
> (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:52249)
>   File "/usr/lib/python3.4/queue.py", line 187, in put_nowait
>     return self.put(item, block=False)
>   File "/usr/lib/python3.4/queue.py", line 133, in put
>     raise Full
> queue.Full
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to