[jira] [Updated] (ARROW-13317) [Python] Improve documentation on what 'use_threads' does in 'read_feather'

Arun Joseph (Jira) Mon, 12 Jul 2021 13:27:06 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-13317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arun Joseph updated ARROW-13317:
--------------------------------
    Description: 
The current documentation for 
[read_feather|https://arrow.apache.org/docs/python/generated/pyarrow.feather.read_feather.html]
 states the following:

*use_threads* (_bool__,_ _default True_) – Whether to parallelize reading using 
multiple threads.

if the underlying file uses compression, then multiple threads can still be 
spawned. The verbiage of the *use_threads* is ambiguous on whether the 
restriction on multiple threads is only for the conversion from pyarrow to the 
pandas dataframe vs the reading/decompression of the file itself which might 
spawn additional threads.

[set_cpu_count|http://arrow.apache.org/docs/python/generated/pyarrow.set_cpu_count.html#pyarrow.set_cpu_count]
 might be good to mention as a way to actually limit threads spawned

  was:
The current documentation for 
[read_feather|https://arrow.apache.org/docs/python/generated/pyarrow.feather.read_feather.html]
 states the following:

*use_threads* (_bool__,_ _default True_) – Whether to parallelize reading using 
multiple threads.

if the underlying file uses compression, then multiple threads can still be 
spawned. The verbiage of the *use_threads* is ambiguous on whether the 
restriction on multiple threads is only for the conversion from pyarrow to the 
pandas dataframe vs the reading/decompression of the file itself which might 
spawn additional threads.

[set_cpu_count|http://arrow.apache.org/docs/python/generated/pyarrow.set_cpu_count.html#pyarrow.set_cpu_count]
 might good to mention as well


> [Python] Improve documentation on what 'use_threads' does in 'read_feather'
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-13317
>                 URL: https://issues.apache.org/jira/browse/ARROW-13317
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>    Affects Versions: 4.0.1
>            Reporter: Arun Joseph
>            Priority: Trivial
>              Labels: documentation
>
> The current documentation for 
> [read_feather|https://arrow.apache.org/docs/python/generated/pyarrow.feather.read_feather.html]
>  states the following:
> *use_threads* (_bool__,_ _default True_) – Whether to parallelize reading 
> using multiple threads.
> if the underlying file uses compression, then multiple threads can still be 
> spawned. The verbiage of the *use_threads* is ambiguous on whether the 
> restriction on multiple threads is only for the conversion from pyarrow to 
> the pandas dataframe vs the reading/decompression of the file itself which 
> might spawn additional threads.
> [set_cpu_count|http://arrow.apache.org/docs/python/generated/pyarrow.set_cpu_count.html#pyarrow.set_cpu_count]
>  might be good to mention as a way to actually limit threads spawned



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-13317) [Python] Improve documentation on what 'use_threads' does in 'read_feather'

Reply via email to