mbutrovich commented on issue #20132:
URL: https://github.com/apache/datafusion/issues/20132#issuecomment-4380997495

   I'd like to open a PR that lands a subset of the work @jkylling prototyped 
in #20133, as physical-layer plumbing below this epic.
   
   **What I'd build.** Extend `TableSchema` with a `virtual_columns: 
Vec<FieldRef>` list. Thread it into `ParquetOpener` via 
`ArrowReaderOptions::with_virtual_columns(...)`, handling the three arrow-rs 
edges (predicate-pushdown schema, projection mask, output stream schema) the 
same way #20133 did. No changes to `ListingTable`, SQL planning, 
`information_schema`, or any `_metadata` surface.
   
   **Approach.** The core logic would follow @jkylling's #20133 with credit in 
the commit message and PR description, narrower in scope and focused on what 
downstream physical-plan consumers need today.
   
   **Why this is distinct from #20071.** The UDF-rewrite pattern in #20071 
works for file-constant metadata (`input_file_name`, modification time) because 
the opener substitutes a literal. It cannot express reader-generated virtual 
columns like row number or row group index, whose values change per row and are 
produced by the parquet reader itself via 
`ArrowReaderOptions::with_virtual_columns`. The two approaches are 
complementary, not overlapping.
   
   **Why it does not preempt this epic.** Any UX you settle on (relation-scoped 
hidden columns, `_metadata` struct, scalar UDFs) will need some way for the 
opener to produce these columns. `TableSchema::with_virtual_columns` is that 
seam. It makes no commitment to visibility, collision, or naming policy.
   
   **Who is ready to consume it today.**
   - Apache DataFusion Comet (apache/datafusion-comet#3432): the native scan 
currently falls back to Spark whenever `_tmp_metadata_row_index` is requested. 
Comet builds physical plans directly and does not need the SQL UX.
   - Vortex (per @AdamGS above).
   
   #20071 landed the same shape of change (minimal opener-boundary plumbing, 
deferring UX to this epic) with approval from @alamb.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to