gstvg commented on issue #21231:
URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4152648709

   ```
   case x@0
   when y@1 then array_transform(z@2, l@3 -> l@3 + 1)
   end
   ```
   That's how it will look when column capture support get's reintroduced. 
   Before projecting, can't we just remove the out of bounds indexes relative 
to the batch being projected? Since lambda variables are always pushed to the 
end of the batch, all of the out of bounds ones should refer lambda variables 
not yet available. 
   
   ```rust
       fn case_when_with_expr(
           &self,
           batch: &RecordBatch,
           projected: &ProjectedCaseBody,
       ) -> Result<ColumnarValue> {
           let return_type = self.data_type(&batch.schema())?;
           // projected.projection may include indexes of lambda variables not 
available on this batch
           let projection = projected
               .projection
               .iter()
               .copied()
               .filter(|index| *index < batch.num_columns())
               .collect::<Vec<_>>();
           if projection.len() < batch.num_columns() {
               let projected_batch = batch.project(&projection)?;
               projected
                   .body
                   .case_when_with_expr(&projected_batch, &return_type)
           } else {
               self.body.case_when_with_expr(batch, &return_type)
           }
       }
   ```
   
   There are cases where variables should be projected:
   ```
   array_transform(
       x@0, 
       a@2 -> case 
           when y@1 > 2 then array_transform(a@2, b@3 -> b@3 +1)
           else then array_transform(a@2, c@3 -> c@3 * 2)
       end
   )
   ```
   Here, CaseWhen should project `a@2` but not `b@3` nor `c@3`
   To be stricter, we can also modify the column indices collecting tree 
traversal to keep track of new lambdas variables introduced by lambda nodes, 
and skip projecting them. On the example above, CaseWhen can see the lambdas `b 
-> b + 1` and `c -> c*2`, store the name of it's parameters `b` and `c`, and 
skip collecting the indices of any variable with those names. Since it wouldn't 
have seen the lambda `a -> case ..`, it should project the index of variable 
named `a` normally
   
   ```
   case x@0
   when y@1 then array_transform(z@2, l@0' -> l@0' + 1)
   end
   ```
   That's how it looks like today at 
https://github.com/apache/datafusion/pull/18921/changes/1b7f4bfd2c3aa9d9e3c431775e199dc4113e27d3.
 Since there's no column capture for now and no regular column can exist within 
a lambda body, we just jump all lambda nodes, and none of the variables of it's 
body will be visited.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to