[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687228#comment-17687228 ]
ASF GitHub Bot commented on PARQUET-2241: ----------------------------------------- emkornfield commented on PR #192: URL: https://github.com/apache/parquet-format/pull/192#issuecomment-1426163629 Seems OK to me. > ByteStreamSplitDecoder broken in presence of nulls > -------------------------------------------------- > > Key: PARQUET-2241 > URL: https://issues.apache.org/jira/browse/PARQUET-2241 > Project: Parquet > Issue Type: Bug > Components: parquet-format, parquet-mr > Affects Versions: format-2.8.0 > Reporter: Xuwei Fu > Priority: Major > > > This problem is shown in this issue: > [https://github.com/apache/arrow/issues/15173|https://github.com/apache/arrow/issues/15173Let] > Let me talk about it briefly: > * Encoder doesn't write "num_values" on Page payload for BYTE_STREAM_SPLIT, > but using "num_values" as stride in BYTE_STREAM_SPLIT > * When decoding, for DATA_PAGE_V2, it can now the num_values and num_nulls in > the page, however, in DATA_PAGE_V1, without statistics, we should read > def-levels and rep-levels to get the real num-of-values. And without the > num-of-values, we aren't able to decode BYTE_STREAM_SPLIT correctly > > The bug-reproducing code is in the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)