[ https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532101#comment-14532101 ]
Sun Rui commented on SPARK-7230: -------------------------------- One question here is there are still some basic RDD API methods provided in DataFrame, like map()/flatMap()/MapPartitions() and foreach(). What's our policy on these methods()? Will we also make them private for 1.4 or we will support them for long term? > Make RDD API private in SparkR for Spark 1.4 > -------------------------------------------- > > Key: SPARK-7230 > URL: https://issues.apache.org/jira/browse/SPARK-7230 > Project: Spark > Issue Type: Sub-task > Components: SparkR > Affects Versions: 1.4.0 > Reporter: Shivaram Venkataraman > Assignee: Shivaram Venkataraman > Priority: Critical > Fix For: 1.4.0 > > > This ticket proposes making the RDD API in SparkR private for the 1.4 > release. The motivation for doing so are discussed in a larger design > document aimed at a more top-down design of the SparkR APIs. A first cut that > discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI > The main points in that document that relate to this ticket are: > - The RDD API requires knowledge of the distributed system and is pretty low > level. This is not very suitable for a number of R users who are used to more > high-level packages that work out of the box. > - The RDD implementation in SparkR is not fully robust right now: we are > missing features like spilling for aggregation, handling partitions which > don't fit in memory etc. There are further limitations like lack of hashCode > for non-native types etc. which might affect user experience. > The only change we will make for now is to not export the RDD functions as > public methods in the SparkR package and I will create another ticket for > discussing more details public API for 1.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org