Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/4934#issuecomment-77801646 Thank you for this PR @staple ! @mengxr I suggested to @staple to first implement without backtracking to keep the PR as simple as possible. According to his plots (see JIRA), even without backtracking, this PR achieves fewer iterations with the same cost per iteration. Note that backtracking requires several additional map-reduces per iteration. This makes it unclear when backtracking is best used. So I suggested to first merge the case that is a clear win (fewer iterations in the same cost per iteration). I think we should merge this without backtracking, and then have another PR to properly evaluate how backtracking affects total cost with the goal of also merging backtracking. It seems @staple has already implemented backtracking (because he has results in the JIRA), but kept them out of this PR to keep it simple, so we can tackle that afterwards.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org