Hi Vukasin,

The order is not fixed and we should follow the same order of *Stream *fields
in the *StripeFooter*.

Best,
Gang

On Thu, Mar 23, 2023 at 8:23 AM Vukasin Milovanovic
<vmilovano...@nvidia.com.invalid> wrote:

> Hi,
>
> I'm working on an issue in the ORC reader in
> https://github.com/rapidsai/cudf. This reader uses the row index to
> parallelize the reads of row groups on the GPU.
> I've found that the issue stems from the unexpected order of row index
> streams. Namely, the order does not seem to match the order of
> corresponding data stream descriptors in the file footer.
> In this specific case, file footer contains the LENGTH stream of a string
> column before its DATA stream. However, the row index streams seem to be
> stored in the opposite order.
>
> So, my question is: what is the order of row index streams in an ORC file
> (within each column)? Is it fixed for the given TypeKind, or are they
> indeed ordered to correspond to the data stream order?
>
> Thank you,
> Vukasin
>

Reply via email to