alamb commented on code in PR #6068: URL: https://github.com/apache/arrow-rs/pull/6068#discussion_r1684914725
########## parquet/src/file/page_index/index_reader.rs: ########## @@ -81,9 +82,9 @@ pub fn read_columns_indexes<R: ChunkReader>( /// Return an empty vector if this row group does not contain an /// [`OffsetIndex]`. /// -/// See [Column Index Documentation] for more details. +/// See [Page Index Documentation] for more details. /// -/// [Column Index Documentation]: https://github.com/apache/parquet-format/blob/master/PageIndex.md +/// [Page Index Documentation]: https://github.com/apache/parquet-format/blob/master/PageIndex.md pub fn read_pages_locations<R: ChunkReader>( Review Comment: Any chance you can file a ticket that tracks the work to do (one never knows, sometimes other people show up and help with things 🎣 ) ########## parquet/src/file/page_index/offset_index.rs: ########## @@ -0,0 +1,50 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! [`OffsetSizeIndex`] structure holding decoded [`OffsetIndex`] information + +use crate::errors::ParquetError; +use crate::format::{OffsetIndex, PageLocation}; + +/// [`OffsetIndex`] information for a column chunk. Contains offsets and sizes for each page +/// in the chunk. Optionally stores fully decoded page sizes for BYTE_ARRAY columns. +#[derive(Debug, Clone, PartialEq)] +pub struct OffsetSizeIndex { Review Comment: > OffsetIndex has the same data members as OffsetSizeIndex, so is it worth defining a new struct? It seems there's precedent for using the thrift objects from format elsewhere in the metadata. I think we should define a new Rust struct so that we aren't tied to whatever thrift generates. The existing naming of the metadata structs is terribly confusing in my mind, so whatever we can do to make it better (at least not worse) would be good Let's go with `OffsetIndexMetaData` for now, and maybe we can deprecate the existing `ParquetOffsetIndex` typedef in `file/metadata/mod.rs` with a note that we eventually plan to rename `OffsetIndexMetaData` to `ParquetOffsetIndex` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
