AshinGau commented on code in PR #17054:
URL: https://github.com/apache/doris/pull/17054#discussion_r1115206899
##########
be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:
##########
@@ -91,23 +105,42 @@ Status ColumnChunkReader::load_page_data() {
RETURN_IF_ERROR(_block_compress_codec->decompress(compressed_data,
&_page_data));
} else {
RETURN_IF_ERROR(_page_reader->get_page_data(_page_data));
+ if (header.__isset.data_page_header_v2) {
Review Comment:
The same logical as compressed data, maybe we can add a function.
##########
be/src/vec/exec/format/parquet/level_decoder.cpp:
##########
@@ -63,6 +63,20 @@ doris::Status
doris::vectorized::LevelDecoder::init(doris::Slice* slice,
return Status::OK();
}
+doris::Status doris::vectorized::LevelDecoder::init_v2(doris::Slice* levels,
+
doris::vectorized::level_t max_level,
+ uint32_t num_levels) {
+ _encoding = tparquet::Encoding::RLE;
+ _bit_width = BitUtil::log2(max_level + 1);
+ _max_level = max_level;
+ _num_levels = num_levels;
+ size_t byte_length = levels->size;
+ _rle_decoder = RleDecoder<level_t>((uint8_t*)levels->data, byte_length,
_bit_width);
+ levels->data += byte_length;
+ levels->size -= byte_length;
Review Comment:
Parameter `levels` will not be used anymore after this function, so we no
need to rewrite the data and size.
##########
be/src/vec/exec/format/parquet/level_decoder.h:
##########
@@ -27,6 +27,8 @@
namespace doris::vectorized {
+static const size_t V1_LEVEL_SIZE = 4;
Review Comment:
Use constexpr. It's better to rename to `V1_SIZE_LENGTH`.
The the const variable should be a static member of LevelDecoder, or move it
to cpp file, otherwise it will be a global variable in `doris::vectorized`
namespace.
##########
be/src/vec/exec/format/parquet/vparquet_column_chunk_reader.cpp:
##########
@@ -83,6 +87,16 @@ Status ColumnChunkReader::load_page_data() {
if (_block_compress_codec != nullptr) {
Review Comment:
Page v2 use `is_compressed` to identity whether it's compressed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]