Ty! On Fri, Nov 17, 2023 at 9:12 PM wish maple <maplewish...@gmail.com> wrote:
> Hi, > > The parquet is divided into arrow and parquet part. > > 1. The parquet part lowest position is parquet decoder, in [1]. > The float point might choosing PLAIN, RLE_DCIT or BYTE_STREAM_SPLIT > encoding. > 2. parquet::ColumnReader is applied beyond decoder, each row-group might > have > one or two ( if choosing dictionary encoding and fall-back to plain, > there're > two encoding in a RowGroup for a column). This is in [2] > > Other modules are mentioned by Bryce. > > Best, > Xuwei Fu > > [1] https://github.com/apache/arrow/blob/main/cpp/src/parquet/encoding.cc > [2] > https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc > > Li Jin <ice.xell...@gmail.com> 于2023年11月18日周六 05:27写道: > > > Hi, > > > > I am recently investigating a null/nan issue with Parquet and Arrow and > > wonder if someone can give me a pointer to the code that decodes Parquet > > row group into Arrow float/double arrays? > > > > Thanks, > > Li > > >