[jira] [Updated] (ARROW-15784) [C++][Python] Parallel parquet file reading disabled with single file reads

Weston Pace (Jira) Thu, 24 Feb 2022 18:37:05 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-15784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Weston Pace updated ARROW-15784:
--------------------------------
    Issue Type: Bug  (was: Improvement)

> [C++][Python] Parallel parquet file reading disabled with single file reads
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-15784
>                 URL: https://issues.apache.org/jira/browse/ARROW-15784
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 7.0.0
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>             Fix For: 7.0.1
>
>
> There is a flag {{enable_parallel_column_conversion}} which was passed down 
> from python to C++ when reading parquet datasets which controlled whether we 
> would read columns in parallel.  This was allowed for single files but not 
> for reading multiple files.  This was an old check to help prevent nested 
> deadlock.
> Nested deadlock is no longer an issue and the flag was mostly inert once we 
> removed the synchronous scanner.
> Unfortunately, when we removed the synchronous scanner we forgot to remove 
> this flag and the result was that a single-file read ended up disabling 
> parallelism.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (ARROW-15784) [C++][Python] Parallel parquet file reading disabled with single file reads

Reply via email to