[ https://issues.apache.org/jira/browse/ARROW-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284027#comment-17284027 ]
Ahmed Riza edited comment on ARROW-6154 at 2/12/21, 11:46 PM: -------------------------------------------------------------- I've come across the same error. In my case it appears to be due to the `try_clone` calls in [https://github.com/apache/arrow/blob/master/rust/parquet/src/util/io.rs#L82.] I have a Parquet file with 3000 columns (see attached example), and the `try_clone` calls here eventually fail as it ends up creating too many open file descriptors{color:#000000}.{color} Here's a stack trace from `gdb` which leads to the call in `io.rs`. This can be reproduced by using the attached Parquet file. One could increase the `ulimit -n` on Linux to get around this, but not really a solution, since the code path ends up just creating potentially a very large number of open file descriptors. This is the initial stack trace when the footer is first read. The code in `io.rs` gets called for every column subsequently as well when reading the columns (see {color:#cc844f}fn {color}{color:#8ec1ff}reader_tree {color}in `parquet/record/reader.rs`) {code:java} #0 parquet::util::io::FileSource<std::fs::File>::new<std::fs::File> (fd=0x7ffff7c3fafc, start=807191, length=65536) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/util/io.rs:82 #1 0x00005555558294ce in parquet::file::serialized_reader::{{impl}}::get_read (self=0x7ffff7c3fafc, start=807191, length=65536) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:59 #2 0x000055555590a3fc in parquet::file::footer::parse_metadata<std::fs::File> (chunk_reader=0x7ffff7c3fafc) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/footer.rs:57 #3 0x0000555555845db1 in parquet::file::serialized_reader::SerializedFileReader<std::fs::File>::new<std::fs::File> (chunk_reader=...) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:134 #4 0x0000555555845bb6 in parquet::file::serialized_reader::{{impl}}::try_from (file=...) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:81 #5 0x0000555555845c4a in parquet::file::serialized_reader::{{impl}}::try_from (path=0x7ffff0000d20) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:90 #6 0x0000555555845d34 in parquet::file::serialized_reader::{{impl}}::try_from (path="resources/parquet/part-00001-33e6c49b-d6cb-4175-bc41-7198fd777d3a-c000.snappy.parquet") at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:98 #7 0x000055555577c7f5 in data_rust::parquet::parquet_demo::test::test_read_multiple_files () at /work/rust/data-rust/src/parquet/parquet_demo.rs:103 {code} was (Author: dr.r...@gmail.com): I've come across the same error. In my case it appears to be due to the `try_clone` calls in [https://github.com/apache/arrow/blob/master/rust/parquet/src/util/io.rs#L82.] I have a Parquet file with 3000 columns (see attached example), and the `try_clone` calls here eventually fail as it ends up creating too many open file descriptors{color:#000000}.{color} Here's a stack trace from `gdb` which leads to the call in `io.rs`. This can be reproduced by using the attached Parquet file. One could increase the `ulimit -n` on Linux to get around this, but not really a solution, since the code path ends up just creating potentially a very large number of open file descriptors. {code:java} #0 parquet::util::io::FileSource<std::fs::File>::new<std::fs::File> (fd=0x7ffff7c3fafc, start=807191, length=65536) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/util/io.rs:82 #1 0x00005555558294ce in parquet::file::serialized_reader::{{impl}}::get_read (self=0x7ffff7c3fafc, start=807191, length=65536) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:59 #2 0x000055555590a3fc in parquet::file::footer::parse_metadata<std::fs::File> (chunk_reader=0x7ffff7c3fafc) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/footer.rs:57 #3 0x0000555555845db1 in parquet::file::serialized_reader::SerializedFileReader<std::fs::File>::new<std::fs::File> (chunk_reader=...) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:134 #4 0x0000555555845bb6 in parquet::file::serialized_reader::{{impl}}::try_from (file=...) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:81 #5 0x0000555555845c4a in parquet::file::serialized_reader::{{impl}}::try_from (path=0x7ffff0000d20) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:90 #6 0x0000555555845d34 in parquet::file::serialized_reader::{{impl}}::try_from (path="resources/parquet/part-00001-33e6c49b-d6cb-4175-bc41-7198fd777d3a-c000.snappy.parquet") at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:98 #7 0x000055555577c7f5 in data_rust::parquet::parquet_demo::test::test_read_multiple_files () at /work/rust/data-rust/src/parquet/parquet_demo.rs:103 {code} > [Rust] [Parquet] Too many open files (os error 24) > -------------------------------------------------- > > Key: ARROW-6154 > URL: https://issues.apache.org/jira/browse/ARROW-6154 > Project: Apache Arrow > Issue Type: Bug > Components: Rust > Reporter: Yesh > Priority: Major > Attachments: > part-00009-33e6c49b-d6cb-4175-bc41-7198fd777d3a-c000.snappy.parquet > > > Used [rust]*parquet-read binary to read a deeply nested parquet file and see > the below stack trace. Unfortunately won't be able to upload file.* > {code:java} > stack backtrace: > 0: std::panicking::default_hook::{{closure}} > 1: std::panicking::default_hook > 2: std::panicking::rust_panic_with_hook > 3: std::panicking::continue_panic_fmt > 4: rust_begin_unwind > 5: core::panicking::panic_fmt > 6: core::result::unwrap_failed > 7: parquet::util::io::FileSource<R>::new > 8: <parquet::file::reader::SerializedRowGroupReader<R> as > parquet::file::reader::RowGroupReader>::get_column_page_reader > 9: <parquet::file::reader::SerializedRowGroupReader<R> as > parquet::file::reader::RowGroupReader>::get_column_reader > 10: parquet::record::reader::TreeBuilder::reader_tree > 11: parquet::record::reader::TreeBuilder::reader_tree > 12: parquet::record::reader::TreeBuilder::reader_tree > 13: parquet::record::reader::TreeBuilder::reader_tree > 14: parquet::record::reader::TreeBuilder::reader_tree > 15: parquet::record::reader::TreeBuilder::build > 16: <parquet::record::reader::RowIter as > core::iter::traits::iterator::Iterator>::next > 17: parquet_read::main > 18: std::rt::lang_start::{{closure}} > 19: std::panicking::try::do_call > 20: __rust_maybe_catch_panic > 21: std::rt::lang_start_internal > 22: main{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)