2010YOUY01 commented on PR #16996: URL: https://github.com/apache/datafusion/pull/16996#issuecomment-3173879649
> Btw Would be that possible to calculate the cost of the join like in https://www.youtube.com/watch?v=RcEW0P8iVTc ? > > The video shows multiple implementations for NLJ and how to calculate the cost and describe pseudo code, it would be super useful for community and further improvements. > > From what I understood, the left side scanned once, and entirely saved in memory, what about right scans? Perhaps in future we can play with blocks of input left batches to prevent OOM Yes, that's exactly the idea for the future memory-limited NLJ implementation -- for each buffered left batches (under memory limit), do 1 round of right scan. Though I don't think there are much tuning opportunities here, I think input scanning would be expensive if it's a parquet file, so the goal here is to minimize the number of right scans, and we should buffer as much left batches as possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org