[ https://issues.apache.org/jira/browse/PARQUET-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166259#comment-17166259 ]
Gabor Szadovszky commented on PARQUET-1888: ------------------------------------------- I guess this issue is about the rust implementation of parquet (parquet-rs). Based on the [README|https://github.com/sunchao/parquet-rs/blob/master/README.md] it is moved to Arrow so you should create a jira [there|https://issues.apache.org/jira/projects/ARROW/issues]. > Provide guidance on number of file descriptors needed to read Parquet file > -------------------------------------------------------------------------- > > Key: PARQUET-1888 > URL: https://issues.apache.org/jira/browse/PARQUET-1888 > Project: Parquet > Issue Type: Task > Environment: {code:bash} > $ rustc --version > rustc 1.47.0-nightly (6c8927b0c 2020-07-26) > {code} > Cargo.toml: > {code:yaml} > [dependencies] > parquet = "0.17" > rayon = "1.1" > ... > {code} > Reporter: Adam Shirey > Priority: Trivial > > I have a series of Parquet files that are 181 columns wide, and I'm > processing them in parallel (using > [rayon|https://github.com/rayon-rs/rayon/]). I ran into the OS limit (default > 1024 according to {{ulimit -n}}) of open file descriptors when doing this, > but each file was consuming 208 descriptors. > Is there a deterministic calculation for how many file descriptors will be > used to process files so that one can determine appropriate multithreading in > a situation like this? -- This message was sent by Atlassian Jira (v8.3.4#803005)