Why SparkR didn't reuse PythonRDD
On behalf of Renyi Xiong - When reading Spark codebase, looks to me PythonRDD.scala is reusable, I wonder why SparkR choose to implement its own RRDD.scala? thanks Daniel
Re: Why SparkR didn't reuse PythonRDD
PythonRDD.scala has a number of PySpark specific conventions (for example worker reuse, exceptions etc.) and PySpark specific protocols (e.g. for communicating accumulators, broadcasts between the JVM and Python etc.). While it might be possible to refactor the two classes to share some more code