Adam Shirey created PARQUET-1888:
------------------------------------

             Summary: Provide guidance on number of file descriptors needed to 
read Parquet file
                 Key: PARQUET-1888
                 URL: https://issues.apache.org/jira/browse/PARQUET-1888
             Project: Parquet
          Issue Type: Task
         Environment: {code:bash}
$ rustc --version
rustc 1.47.0-nightly (6c8927b0c 2020-07-26)
{code}
Cargo.toml:
{code:yaml}
[dependencies]
parquet = "0.17"
rayon = "1.1"
...
{code}
            Reporter: Adam Shirey


I have a series of Parquet files that are 181 columns wide, and I'm processing 
them in parallel (using [rayon|https://github.com/rayon-rs/rayon/]). I ran into 
the OS limit (default 1024 according to {{ulimit -n}}) of open file descriptors 
when doing this, but each file was consuming 208 descriptors.

Is there a deterministic calculation for how many file descriptors will be used 
to process files so that one can determine appropriate multithreading in a 
situation like this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to