[ https://issues.apache.org/jira/browse/ARROW-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-2002: ---------------------------------- Labels: pull-request-available (was: ) > use pyarrow download file will raise queue.Full exceptions sometimes > -------------------------------------------------------------------- > > Key: ARROW-2002 > URL: https://issues.apache.org/jira/browse/ARROW-2002 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.8.0 > Environment: operating system: all > platform: all > Reporter: kmiku7 > Priority: Major > Labels: pull-request-available > > When we download file from hdfs, if the speed writer thread write data is > slower than read speed, download() will raise queue.Fulll exceptions, because > write_queue is full. > I think when we download file, we can wait until write_queue has space to > enqueue new item if writer_thread is alive. Like what upload() does. > {code} > >>> import pyarrow as pa > >>> cli = pa.hdfs.connect(user='USERNAME') > >>> cli.download('/REMOTE/HDFS/PATH', '/LOCAL/FILE/PATH') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "pyarrow/io-hdfs.pxi", line 428, in > pyarrow.lib.HadoopFileSystem.download > (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66399) > File "pyarrow/io-hdfs.pxi", line 429, in > pyarrow.lib.HadoopFileSystem.download > (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66351) > File "pyarrow/io.pxi", line 315, in pyarrow.lib.NativeFile.download > (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:52249) > File "/usr/lib/python3.4/queue.py", line 187, in put_nowait > return self.put(item, block=False) > File "/usr/lib/python3.4/queue.py", line 133, in put > raise Full > queue.Full > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)