alamb commented on code in PR #2116:
URL: https://github.com/apache/arrow-rs/pull/2116#discussion_r927056687
##########
parquet/src/column/reader/decoder.rs:
##########
@@ -277,25 +288,25 @@ enum LevelDecoderInner {
impl ColumnLevelDecoder for ColumnLevelDecoderImpl {
type Slice = [i16];
- fn new(max_level: i16, encoding: Encoding, data: ByteBufferPtr) -> Self {
- let bit_width = num_required_bits(max_level as u64);
+ fn set_data(&mut self, encoding: Encoding, data: ByteBufferPtr) {
Review Comment:
I am not an expert in this area, but the new code structure seems to make
sense to me
##########
parquet/src/arrow/array_reader/byte_array.rs:
##########
@@ -127,15 +124,11 @@ impl<I: OffsetSizeTrait + ScalarValue> ArrayReader for
ByteArrayReader<I> {
}
fn get_def_levels(&self) -> Option<&[i16]> {
- self.def_levels_buffer
- .as_ref()
- .map(|buf| buf.typed_data())
+ self.def_levels_buffer.as_ref().map(|buf| buf.typed_data())
Review Comment:
it is not entirely clear to me why the formatting changed on these lines --
not that it is a bad change, but it seems like it wasn't a semantic change
either 🤷
##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -76,33 +77,8 @@ where
{
/// Create a new [`GenericRecordReader`]
pub fn new(desc: ColumnDescPtr) -> Self {
- Self::new_with_options(desc, false)
- }
-
- /// Create a new [`GenericRecordReader`] with the ability to only generate
the bitmask
- ///
- /// If `null_mask_only` is true only the null bitmask will be generated and
- /// [`Self::consume_def_levels`] and [`Self::consume_rep_levels`] will
always return `None`
- ///
- /// It is insufficient to solely check that that the max definition level
is 1 as we
- /// need there to be no nullable parent array that will required decoded
definition levels
- ///
- /// In particular consider the case of:
- ///
- /// ```ignore
- /// message nested {
- /// OPTIONAL Group group {
- /// REQUIRED INT32 leaf;
- /// }
- /// }
- /// ```
- ///
- /// The maximum definition level of leaf is 1, however, we still need to
decode the
- /// definition levels so that the parent group can be constructed correctly
- ///
- pub(crate) fn new_with_options(desc: ColumnDescPtr, null_mask_only: bool)
-> Self {
let def_levels = (desc.max_def_level() > 0)
- .then(|| DefinitionLevelBuffer::new(&desc, null_mask_only));
+ .then(|| DefinitionLevelBuffer::new(&desc,
packed_null_mask(&desc)));
Review Comment:
Is this is the key change in this PR? that the decision to use a null mask
is pushed down to this level?
##########
parquet/src/column/reader.rs:
##########
@@ -195,7 +195,6 @@ where
///
/// `values` will be contiguously populated with the non-null values. Note
that if the column
/// is not required, this may be less than either `batch_size` or the
number of levels read
- #[inline]
Review Comment:
as in "when you leave `inline` the benchmarks get slower"?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]