[ https://issues.apache.org/jira/browse/PARQUET-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686941#comment-17686941 ]
Gang Wu commented on PARQUET-2241: ---------------------------------- Have you seen any relevant issue in production? [~gershinsky] [~gszadovszky] [~sha...@uber.com] [~chaosun] > ByteStreamSplitDecoder broken in presence of nulls > -------------------------------------------------- > > Key: PARQUET-2241 > URL: https://issues.apache.org/jira/browse/PARQUET-2241 > Project: Parquet > Issue Type: Bug > Components: parquet-format > Affects Versions: format-2.8.0 > Reporter: Xuwei Fu > Priority: Major > Fix For: format-2.10.0 > > > > This problem is shown in this issue: > [https://github.com/apache/arrow/issues/15173|https://github.com/apache/arrow/issues/15173Let] > Let me talk about it briefly: > * Encoder doesn't write "num_values" on Page payload for BYTE_STREAM_SPLIT, > but using "num_values" as stride in BYTE_STREAM_SPLIT > * When decoding, for DATA_PAGE_V2, it can now the num_values and num_nulls in > the page, however, in DATA_PAGE_V1, without statistics, we should read > def-levels and rep-levels to get the real num-of-values. And without the > num-of-values, we aren't able to decode BYTE_STREAM_SPLIT correctly > > The bug-reproducing code is in the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)