scovich commented on code in PR #7307:
URL: https://github.com/apache/arrow-rs/pull/7307#discussion_r2380158074
##########
parquet/src/arrow/array_reader/builder.rs:
##########
@@ -52,12 +70,13 @@ fn build_reader(
field: &ParquetField,
mask: &ProjectionMask,
row_groups: &dyn RowGroups,
+ row_number_column: Option<&str>,
Review Comment:
That could potentially work, but the problem is a row number column is only
meaningful in a read request schema. Once row numbers hit the output, they're
just normal int64 values from then on. Things get a lot harder to reason about
if the extension type persists. For example, in a join of multiple tables,
where each scan is producing row numbers for its respective files, one could
easily end up with two row number columns in the join's output. And the parquet
writer would definitely need to block writing such columns, or at least strip
away the metadata?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]