wgtmac commented on code in PR #17877: URL: https://github.com/apache/arrow/pull/17877#discussion_r1113841086
########## cpp/src/parquet/column_reader.h: ########## @@ -364,10 +373,15 @@ class PARQUET_EXPORT RecordReader { } /// \brief Decoded values, including nulls, if any + /// FLBA and ByteArray types do not use this array and read into their own + /// builders. uint8_t* values() const { return values_->mutable_data(); } - /// \brief Number of values written including nulls (if any) - /// There is no read-ahead/buffering for values. + /// \brief Number of values written, including space left for nulls if any. + /// If this Reader was constructed with read_dense_for_nullable(), there is no space for Review Comment: I didn't see any function named `read_dense_for_nullable()`. Should we use `read_dense_for_nullable_` instead? ########## cpp/src/parquet/column_reader.cc: ########## @@ -1803,7 +1805,104 @@ class TypedRecordReader : public TypedColumnReaderImpl<DType>, CheckNumberDecoded(num_decoded, values_to_read); } - // Return number of logical records read + // Reads repeated records and returns number of records read. Fills in + // values_to_read and null_count. + int64_t ReadRepeatedRecords(int64_t num_records, int64_t* values_to_read, + int64_t* null_count) { + const int64_t start_levels_position = levels_position_; + // Note that repeated records may be required or nullable. If they have + // an optional parent in the path, they will be nullable, otherwise, + // they are required. We use leaf_info_->HasNullableValues() that looks + // at repeated_ancestor_def_level to determine if it is required or + // nullable. Even if they are required, we may have to read ahead and + // delimit the records to get the right number of values and they will + // have associated levels. + int64_t records_read = DelimitRecords(num_records, values_to_read); + if (!nullable_values()) { Review Comment: When will this branch be hit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org