2010YOUY01 commented on issue #18070: URL: https://github.com/apache/datafusion/issues/18070#issuecomment-3419730649
> > Could you check the size of the inputs for the NLJoin? If this is NLJoin related it could be similar to [#17547](https://github.com/apache/datafusion/issues/17547) and [#17488](https://github.com/apache/datafusion/issues/17488) . > > I ran `EXPLAIN ANAYZE` > > The relevant part of the plan I think is like this (tiny left input - 1 row, giant right input 22M rows) > > ``` > NestedLoopJoinExec > ProjectionExec (1 row) > CoalesceBatchesExec (21917655 rows) > ``` > > The full plan is here: > > Full Output This 1 row in the left side is a lengthy aggregated array The NLJ implementation is: ``` for each right_batch: for each left_row: join(left_row, right_batch) ``` and this `join(left_row, right_batch)` line will first repeat the left_row to the same length as `right_batch`, then apply the join filter. It's easier to be implemented this way due to the current filter API limitation. This repeating/copying row step is done through `to_array_of_size()` in `50.0` https://github.com/apache/datafusion/blob/f199b000861360aca01d4f1b9104bf73e9d831cc/datafusion/physical-plan/src/joins/nested_loop_join.rs#L1668 Note in `49.0` it's also copying the the row but with a different API. I suspect the slow down reason is: the `50.0` version is performing deep copy and `49.0` is doing shallow copy for this lengthy array in the left one-row input. If that's the case, an easier fix would be making `to_array_of_size()` doing shallow copy on inner array/list. An alternative fix can be directly evaluating filter on `row X batch`, without repeating/copying the row to the same size as the batch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
