Adam Shirey created PARQUET-1888: ------------------------------------ Summary: Provide guidance on number of file descriptors needed to read Parquet file Key: PARQUET-1888 URL: https://issues.apache.org/jira/browse/PARQUET-1888 Project: Parquet Issue Type: Task Environment: {code:bash} $ rustc --version rustc 1.47.0-nightly (6c8927b0c 2020-07-26) {code} Cargo.toml: {code:yaml} [dependencies] parquet = "0.17" rayon = "1.1" ... {code} Reporter: Adam Shirey
I have a series of Parquet files that are 181 columns wide, and I'm processing them in parallel (using [rayon|https://github.com/rayon-rs/rayon/]). I ran into the OS limit (default 1024 according to {{ulimit -n}}) of open file descriptors when doing this, but each file was consuming 208 descriptors. Is there a deterministic calculation for how many file descriptors will be used to process files so that one can determine appropriate multithreading in a situation like this? -- This message was sent by Atlassian Jira (v8.3.4#803005)