rluvaton commented on issue #21231:
URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4157922228

   the case when that I have a problem with is inside the lambda function, like
   ```
   transform(list, x => when x > col('other_col') then col('other_col') when x 
< col('another_col') then x else -2)
   ```
   
   ------
   
   > Before projecting, can't we just remove the out of bounds indexes relative 
to the batch being projected? Since lambda variables are always pushed to the 
end of the batch, all of the out of bounds ones should refer lambda variables 
not yet available.
   
   No, because of the index is inbound? you still don't want to capture that
   
   ------
   
   the problem beside the API for detecting and creating with different index 
custom column implementation is that you should traverse the expression through 
`children` function, as it can go too deep
   
   Lets say we have this expression:
   
   ```
   array_transform(
        very_nested_list,
        nested_list => 
                case
                when something
                then transform(
                        concat(nested_list, external_array),
                        list => len(list) + len(nested_list)
                )
   )
   ```
   
   if you get all expression that implement `column_index` so in the subtree 
under the CASE WHEN you will get the following expressions:
   ```
   nested_list, external_array, list, nested_list
   ```
   
   the `list` here is not bounded to the same schema as the `CASE WHEN` 
context, it is in a different schema, so the referenced index might not exist 
in the schema that the `CASE WHEN` is in.
   
   so you should **not** traverses/modify the tree through `children` as it 
will lead you to that. instead you should traverse/modify through children that 
have the same input schema as you, and then the innermost `transform` lambda 
physical expression will not return expression under the lambda function 
itself, and only under `concat(nested_list, external_array)`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to