scovich commented on PR #7307:
URL: https://github.com/apache/arrow-rs/pull/7307#issuecomment-3313868161

   > I think the biggest thing we need to do is to sort out the API for "how 
does a user request the (virtual) row number column" as todays `ProjectionMask` 
is insufficient
   > 
   > @scovich and @etseidl 's idea to use some sort of Arrow metadata is 
interesting, but I am not quite sure how it would look
   
   The "standard" way in most engines I've seen would (in arrow-rs) include an 
extension type, that the parquet reader recognizes, in the parquet reader's 
read schema. Nice, because other readers could choose to honor the same 
extension type and produce row indexes as well. But it does open the question 
of whether the parquet reader's output should strip away the metadata -- since 
arguably the row indexes are just normal data once they've been produced -- and 
if not, how to prevent e.g. writing the values of a "row index" field back to 
parquet.
   
   Is that a bearable approach? Of should we keep thinking of other ways?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to