Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3518#issuecomment-66868725
  
    IMO, this patch still needs a lot of work before it will be ready to merge. 
 I'm not convinced that telling me which RDD referenced the unserializable 
object, by itself, will be a helpful debugging tool.  In many cases, it's 
obvious which object is non-serializable: for instance, say that I try to 
serialize a database connection pool instance.  If it's an 
explicitly-referenced user-created object, then it's usually not too hard to 
find out the source of the reference.  The hard cases are where implicit 
references to non-serializable objects like SparkContext have been included in 
the closure.  In these cases, I might only have one RDD in my dependency chain 
and still run into serialization issues, in which case I don't feel that this 
patch's current approach will be very helpful to me for debugging.  It would be 
much more useful to print a chain of references to a non-serializable object of 
the appropriate type.
    
    Do you mind running some examples in the `spark-shell` and pasting the 
output generated by this patch?  This would help me and other reviewers to 
asses whether this patch's current approach is useful.
    
    There are also many code style issues here, but I don't want to spend too 
much time commenting on them before we make sure that the high-level approach 
is okay.
    
    Other reviewers: please take a look at the JIRA and chime in here.  Do you 
think that this patch's current functionality is useful, or should we block / 
wait in favor of a more full-featured solution?  I think that we have plenty of 
time before 1.3.0, so I'm in favor of taking more time to implement a more 
full-featured approach since I don't think we're in a huge rush for this 
feature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to