GitHub user ilganeli opened a pull request:

    https://github.com/apache/spark/pull/3518

    [SPARK-3694] RDD and Task serialization debugging output

    Hi all - in addition to what was explicitly requested in the original JIRA, 
I also added the ability to have a trace of the serialization for RDDs so that 
you can see which specific dependency is unserializable. For debugging task 
serialization, I added a debug log output that shows the file and jar 
dependencies. However, I am unsure whether I can add more functionality there. 
For the RDD, it is possible to attempt to serialize each dependency in turn, 
which is why I can identify which component fails. For task debugging, I did 
not see a straightforward way to do the same thing. If anyone can suggest an 
approach here, I would be happily to implement it. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ilganeli/spark SPARK-3694B

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3518
    
----
commit 6c997629e4d3bf9bccfbe9c3fa65aa1afa4bfca0
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-10-30T15:02:04Z

    Created class to traverse dependency graph of RDD

commit 47ccc227e5bdf14a1db20edfcf1b8f9c77b3b64a
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-10-30T22:06:04Z

    Started walker code

commit a8d5332a71fbad4cca0aa1a7ca73db8e1386e15f
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-11-06T18:40:38Z

    RDD WAlker updates

commit a63652f8240e0c370100ab05a11c95beaf47faa5
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-11-06T18:42:48Z

    Added debug output to task serialization. Added debug output to RDD 
serialization.

commit 05f2cc0665af3ca297936c8c4c5f6128be5a1ddc
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-11-06T18:51:50Z

    Rebase

commit cbb1d771f4576c6ba981252cd8b7490722317ddf
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-11-14T19:03:25Z

    Style errors

commit 183100019a0866e515edd0164db9c4c7fdf3ee5f
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-11-29T16:21:43Z

    Merge remote-tracking branch 'upstream/master'

commit 916a31c57d89bc6fb83b33fdf70dfc1b94192cc5
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-11-29T23:52:00Z

    Manual merge of updates

commit bfb723de65e60aabb9cccc3b45ccc4638f12583d
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-11-29T23:55:40Z

    Added helper files

commit e0a81537d5962f8bc79b8b9193a30b46827246ed
Author: Ilya Ganelin <ilya.gane...@capitalone.com>
Date:   2014-11-30T00:45:52Z

    Fixed whitespace errors

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to