[jira] [Commented] (PARQUET-1888) Provide guidance on number of file descriptors needed to read Parquet file

Gabor Szadovszky (Jira) Tue, 28 Jul 2020 01:35:09 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166259#comment-17166259
 ]


Gabor Szadovszky commented on PARQUET-1888:
-------------------------------------------

I guess this issue is about the rust implementation of parquet (parquet-rs). 
Based on the 
[README|https://github.com/sunchao/parquet-rs/blob/master/README.md] it is 
moved to Arrow so you should create a jira 
[there|https://issues.apache.org/jira/projects/ARROW/issues].

> Provide guidance on number of file descriptors needed to read Parquet file
> --------------------------------------------------------------------------
>
>                 Key: PARQUET-1888
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1888
>             Project: Parquet
>          Issue Type: Task
>         Environment: {code:bash}
> $ rustc --version
> rustc 1.47.0-nightly (6c8927b0c 2020-07-26)
> {code}
> Cargo.toml:
> {code:yaml}
> [dependencies]
> parquet = "0.17"
> rayon = "1.1"
> ...
> {code}
>            Reporter: Adam Shirey
>            Priority: Trivial
>
> I have a series of Parquet files that are 181 columns wide, and I'm 
> processing them in parallel (using 
> [rayon|https://github.com/rayon-rs/rayon/]). I ran into the OS limit (default 
> 1024 according to {{ulimit -n}}) of open file descriptors when doing this, 
> but each file was consuming 208 descriptors.
> Is there a deterministic calculation for how many file descriptors will be 
> used to process files so that one can determine appropriate multithreading in 
> a situation like this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-1888) Provide guidance on number of file descriptors needed to read Parquet file

Reply via email to