Fabio Batista da Silva created ARROW-5317:
---------------------------------------------
Summary: [Rust] [Parquet] impl IntoIterator for
SerializedFileReader
Key: ARROW-5317
URL: https://issues.apache.org/jira/browse/ARROW-5317
Project: Apache Arrow
Issue Type: Improvement
Components: Rust
Reporter: Fabio Batista da Silva
This is a follow up to [https://github.com/apache/arrow/issues/4301].
The current implementation of a row iterator *RowIter* borrows the *FileReader*
which the user has to keep the file reader alive for as long as the iterator
is alive..
And make is hard to iterate over multiple *FileReader* / *RowIter*..
{code:java}
fn main() {
let path1 = Path::new("path-to/1.snappy.parquet");
let path2 = Path::new("path-to/2.snappy.parquet");
let vec = vec![path1, path2];
let it = vec.iter()
.map(|p| {
File::open(p).unwrap()
})
.map(|f| {
SerializedFileReader::new(f).unwrap()
})
.flat_map(|reader| -> RowIter {
RowIter::from_file(None, &reader).unwrap()
//| | |
//| | `reader` is borrowed here
//| returns a value referencing data owned by the current function
})
;
for r in it {
println!("{}", r);
}
}
{code}
One solution could be to implement a row iterator that takes owners of the
reader.
Perhaps implementing *std::iter::IntoIterator* for the *SerializedFileReader*
{code:java}
....
.map(|p| {
File::open(p).unwrap()
})
.map(|f| {
SerializedFileReader::new(f).unwrap()
})
.flat_map(|r| -> r)
....
{code}
Happy to put a PR out with this..
Please let me know if this makes sense and you guys already have some way of
doing this..
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)