Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21346
  
    > is this effectively dead code at this point?
    
    yes, thats right.  this PR by itself is not useful.  Its a step towards 
https://github.com/apache/spark/pull/21451
    
    This is a good point to put in the PR summary -- I'll do that, and also 
your summary notes above, if you don't mind.
    
    > what are the major risks of this change in terms of introducing 
performance or correctness issues? If we identify risks (e.g. "this is a 
historically tricky area of code?"), can we mitigate those risks through 
correctness testing / load testing?
    
    I've made an effort to make minimal modifications to all existing code 
paths, to minimize the risk of introducing bugs in current functionality.  My 
intention is to only turn it on by default initially for cases we know would 
fail with the old code -- when the data is > 2gb 
([SPARK-24297](https://issues.apache.org/jira/browse/SPARK-24297)).  I've added 
unit tests and shared the test I'm doing on a cluster just to find holes in 
functionality (posted on the parent jira here: 
https://issues.apache.org/jira/browse/SPARK-6235?focusedCommentId=16484069&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16484069).
  I have not done load testing yet but plan to.  Extra testing, of course, 
would certainly be good.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to