kosiew commented on code in PR #21770:
URL: https://github.com/apache/datafusion/pull/21770#discussion_r3122224517


##########
datafusion/physical-plan/src/recursive_query.rs:
##########
@@ -317,6 +317,20 @@ impl RecursiveQueryStream {
         mut batch: RecordBatch,
     ) -> Poll<Option<Result<RecordBatch>>> {
         let baseline_metrics = self.baseline_metrics.clone();
+
+        // Rebind to the declared output schema. The recursive term is planned
+        // independently from the static term and its projection may leave
+        // columns un-aliased (e.g. `upper(r.val)` vs the anchor's
+        // `upper(val) AS val`); downstream consumers that key on
+        // `batch.schema().field(i).name()` (TopK, CSV/JSON writers, custom
+        // collectors) would otherwise see the recursive branch's names leak
+        // through. Logical-plan coercion guarantees matching types, so this
+        // is a zero-copy field rebind.
+        if batch.schema() != self.schema {

Review Comment:
   Nice fix. The schema rebind logic looks correct.
   
   One small thought: Arrow already exposes `RecordBatch::with_schema(...)` for 
this exact zero-copy operation. Using that here might make the intent a bit 
clearer, and it would let Arrow enforce the `target schema is a superset` 
contract directly instead of rebuilding the batch manually with `try_new(...)`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to