Github user rezazadeh commented on the pull request:

    https://github.com/apache/spark/pull/4934#issuecomment-77801646
  
    Thank you for this PR @staple !
    
    @mengxr I suggested to @staple to first implement without backtracking to 
keep the PR as simple as possible. According to his plots (see JIRA), even 
without backtracking, this PR achieves fewer iterations with the same cost per 
iteration.
    
    Note that backtracking requires several additional map-reduces per 
iteration. This makes it unclear when backtracking is best used. So I suggested 
to first merge the case that is a clear win (fewer iterations in the same cost 
per iteration). I think we should merge this without backtracking, and then 
have another PR to properly evaluate how backtracking affects total cost with 
the goal of also merging backtracking.
    
    It seems @staple has already implemented backtracking (because he has 
results in the JIRA), but kept them out of this PR to keep it simple, so we can 
tackle that afterwards.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to