scovich commented on PR #7307: URL: https://github.com/apache/arrow-rs/pull/7307#issuecomment-3313868161
> I think the biggest thing we need to do is to sort out the API for "how does a user request the (virtual) row number column" as todays `ProjectionMask` is insufficient > > @scovich and @etseidl 's idea to use some sort of Arrow metadata is interesting, but I am not quite sure how it would look The "standard" way in most engines I've seen would (in arrow-rs) include an extension type, that the parquet reader recognizes, in the parquet reader's read schema. Nice, because other readers could choose to honor the same extension type and produce row indexes as well. But it does open the question of whether the parquet reader's output should strip away the metadata -- since arguably the row indexes are just normal data once they've been produced -- and if not, how to prevent e.g. writing the values of a "row index" field back to parquet. Is that a bearable approach? Of should we keep thinking of other ways? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
