Re: [PR] Add support for file row numbers in Parquet readers [arrow-rs]

via GitHub Thu, 25 Sep 2025 12:46:48 -0700


scovich commented on code in PR #7307:
URL: https://github.com/apache/arrow-rs/pull/7307#discussion_r2380158074



##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -52,12 +70,13 @@ fn build_reader(
     field: &ParquetField,
     mask: &ProjectionMask,
     row_groups: &dyn RowGroups,
+    row_number_column: Option<&str>,

Review Comment:
   That could potentially work, but the problem is a row number column is only 
meaningful in a read request schema. Once row numbers hit the output, they're 
just normal int64 values from then on. Things get a lot harder to reason about 
if the extension type persists. For example, in a join of multiple tables, 
where each scan is producing row numbers for its respective files, one could 
easily end up with two row number columns in the join's output. And the parquet 
writer would definitely need to block writing such columns, or at least strip 
away the metadata?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add support for file row numbers in Parquet readers [arrow-rs]

Reply via email to