sanha opened a new pull request #28: [NEMO-12] Frontend support for Scala Spark
URL: https://github.com/apache/incubator-nemo/pull/28
 
 
   JIRA: [NEMO-12: Frontend support for Scala 
Spark](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-12)
   
   **Major changes:**
   - Implement Scala side Spark frontend
     - Implement our `RDD` and `PairRDDFunctions` in Scala (due to the 
"implicit conversion" of Scala)
     - Move major IR DAG structuring implementation from `JavaRDD` / 
`JavaPairRDD` to `RDD` / `PairRDDFunctions`
       - Because Spark's `JavaRDD` takes `RDD` and we must follow this to 
extends `JavaRDD` and `JavaPairRDD`
   - Just convey function calls for `JavaRDD` and `JavaPairRDD` to `RDD`
     - Make the main implementation of transformations and actions in `RDD` 
take Java functions and regard the overridden methods which take Scala 
functions as wrappers
     - Convert Scala functions to Java functions in here through 
`SparkFrontendUtils`
   
   **Minor changes to note:**
   - Merge `java` and `scala` pacakage under `compiler.frontend.spark.core`
   - Support collection (not only for Object elements but) for primitive 
elements in `CollectTransform`
   - Add `SparkWordCount` programmed with Scala RDD
   
   **Tests for the changes:**
   - New integration tests in `SparkScalaITCase` that tests `SparkPi` and 
`SparkWordCount` programmed with Scala RDD cover these new features
   
   **Other comments:**
   - Pair function calls for our Scala RDD (such as groupByKey) will be 
automatically conveyed to our Scala PairRDDFunctions through implicit 
conversion just like Spark, so any extra code modification is not needed for 
Spark Scala program (except to change Spark’s SparkSession to our’s)
   
   resolves [NEMO-12: Frontend support for Scala 
Spark](https://issues.apache.org/jira/projects/NEMO/issues/NEMO-12)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to