double-free opened a new issue, #6268:
URL: https://github.com/apache/arrow-rs/issues/6268
# Feature Description
I'm effectively using `parquet_derive` in my project, and I found that there
are two inconvenient constraints:
1. The `ParquetRecordReader` enforces the struct to organize fields exactly
in the **same order** in the parquet file.
2. The `ParquetRecordReader` enforces the struct to parse **all fields** in
the parquet file.
As describe in its document:
> Derive flat, simple RecordReader implementations. Works by parsing a
struct tagged with #[derive(ParquetRecordReader)] and emitting the correct
writing code for each field of the struct. Column readers are generated in the
order they are defined.
In my use cases (and I believe these are common requests), user should be
able to read pruned parquet file, and they should have the freedom to
re-organize fields' ordering in decoded struct.
# My Solution
I introduced a `HashMap` to map field name to its index. Of course, it
assumes field name is unique, and this is always true since the current
`parquet_derive` macro is applied to a flat struct without nesting.
# Pros and Cons
Obviously removing those two constraints makes `parquet_derive` a more handy
tool.
But it has some implied changes:
- previously, since the `ParquetRecordReader` relies only on the index of
fields, it allows that a field is named as `abc` to implicitly rename itself to
`bcd` in the encoded struct. After this change, user must guarantee that the
field name in `ParquetRecordReader` to exist in parquet columns.
- I think it is more intuitive and more natural to constrain the "field
name" rather than "index", if we use `ParquetRecordReader` to derive a decoder
macro.
- allowing reading partial parquet file may improve the performance for some
users, but introducing a `HashMap` in the parser may slowdown the function a
bit.
- when the `num_records` in a single parsing call is large enough, the
cost of `HashMap` lookup is negligible.
Both implied changes seem to have a more positive impact than negative
impact. Please review if this is a reasonable feature request.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]