leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-662286709
> @leanken ... IMHO, this is getting close and really you have done an amazing job not just trying out all the different ideas and approaches but also paying attention to testing etc. > > As for my vote on the question of different node vs hacking at the BHJ: I think I am finally convinced that having a separate node is the way to go here since there is just too much difference wrt to BHJ. Your answer on `isUniqueKeyCodePath` was the clincher for me. It's indeed better to have a different node than making another core node harder to follow. It would also restrict the performance impact of this PR to avoid making any inadvertent regressions into core BHJ. > > My last remaining request is wrt to improving the unit testing coverage. I was wondering if we can copy paste some of the not-in query tests from SubQuerySuite and in addition to testing the results, also assert that the optimization kicks in or not. Ideally it would be great to reuse those tests without duplicating them, but its better to duplicate than not test them with this special optimization !. > > Thanks. add case in SubquerySuite. check on both result and whether the newExecNode hit. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org