LuciferYang opened a new pull request, #3446:
URL: https://github.com/apache/parquet-java/pull/3446
### Rationale for this change
Fixes #3307
When repetition/definition levels are empty (i.e., `maxLevel == 0` for
required, non-repeated fields), `DevNullValuesWriter` is used as a no-op writer
that produces zero bytes. However, its `getEncoding()` method returns the
deprecated `BIT_PACKED` encoding, which gets written into the `DataPageHeader`
metadata (`repetition_level_encoding` / `definition_level_encoding`) and the
column chunk encoding list.
Since parquet-java already uses `RLE` as the encoding for levels, the
metadata should reflect `RLE` rather than the deprecated `BIT_PACKED`.
### What changes are included in this PR?
Changed `DevNullValuesWriter.getEncoding()` to return `Encoding.RLE` instead
of `Encoding.BIT_PACKED`.
### Are these changes tested?
Yes. All existing tests in `parquet-column` and `parquet-hadoop` pass
without modification.
### Are there any user-facing changes?
Newly written Parquet files will report `RLE` instead of `BIT_PACKED` in
page header metadata for empty repetition/definition levels. This has no impact
on file compatibility — readers do not decode level data when the byte length
is zero, regardless of the encoding value in the header.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]