kylebarron commented on code in PR #7370: URL: https://github.com/apache/arrow-rs/pull/7370#discussion_r2023530167
########## parquet/src/arrow/arrow_reader/mod.rs: ########## @@ -43,14 +43,51 @@ mod filter; mod selection; pub mod statistics; -/// Builder for constructing parquet readers into arrow. +/// Builder for constructing Parquet readers that decode into [Apache Arrow] +/// arrays. /// /// Most users should use one of the following specializations: /// /// * synchronous API: [`ParquetRecordBatchReaderBuilder::try_new`] /// * `async` API: [`ParquetRecordBatchStreamBuilder::new`] /// +/// # Features +/// * Projection pushdown: [`Self::with_projection`] +/// * Cached metadata: [`ArrowReaderMetadata::load`] +/// * Offset skipping: [`Self::with_offset`] and [`Self::with_limit`] +/// * Row group filtering: [`Self::with_row_groups`] +/// * Range filtering: [`Self::with_row_selection`] +/// * Row level filtering: [`Self::with_row_filter`] +/// +/// # Implementing Predicate Pushdown +/// +/// [`Self::with_row_filter`] permits filter evaluation *during* the decoding +/// process, which is efficient and allows the most low level optimizations. +/// +/// However, most Parquet based systems will apply filters at many steps prior +/// to decoding such as pruning files, row groups and data pages. This crate +/// provides the low level APIs needed to implement such filtering, but not +/// include any logic to actually evaluate predicates. For example: +/// +/// * [`Self::with_row_groups`] for Row Group pruning Review Comment: ```suggestion /// * [`Self::with_row_groups`] for Row Group pruning ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
