[PySpark] - Broadcast Variable Pickle Registry Usage?

2017-05-24 Thread Michael Mansour (CS)
Hi all, I’m poking around the Pyspark.Broadcast module, and I notice that one can pass in a `pickle_registry` and a `path`. The documentation does not outline the pickle registry use and I’m curious about how to use it, and if there are any advantages to it. Thanks, Michael Mansour

Re: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Michael Mansour (CS)
Debugging PySpark is admittedly difficult, and I’ll agree too that the docs can be lacking at times. PySpark docstrings are sometimes just missing or are incomplete. While I don’t write in Scala, I find that the Scala-spark source code and docs can fill in the PySpark gaps, and can only point