Hi all, SparkR supports calling MLlib functionality with an R-friendly API. Since Spark 1.5 the (new) SparkML API which is based on pipelines and parameters has matured significantly. It allows users build and maintain complicated machine learning pipelines. A lot of this functionality is difficult to expose using the simple formula-based API in SparkR.
I just submitted a SPIP <https://issues.apache.org/jira/browse/SPARK-21190> to propose a new R package, SparkML, to be distributed along with SparkR as part of Apache Spark. Please view the JIRA ticket and provide feedback & comments. Thanks, --Hossein