[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563554#comment-14563554
 ] 

Vincent Warmerdam edited comment on SPARK-7230 at 5/28/15 7:52 PM:
-------------------------------------------------------------------

I can actually see some sense in this decision (even though I can imagine that 
not all developers will be pleased) but if this decision is now final, it might 
be good to explicitly communicate this API change on the old SparkR github 
page. At the moment it seems to suggest that the API remain the same because it 
was merged in april.

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this breaking change would be communicated 
it would alleviate some pain to starting developers as well as prepare R users 
on what to expect in the 1.4 release. 

I would have posted this issue on the github page but it seems closed. 


was (Author: cantdutchthis):
I can actually see some sense in this decision (even though I can imagine that 
not all developers will be pleased) but if this decision is now final, it might 
be good to explicitly communicate this API change on the old SparkR github 
page. 

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this breaking change would be communicated 
it would alleviate some pain to starting developers as well as prepare R users 
on what to expect in the 1.4 release. 

I would have posted this issue on the github page but it seems closed. 

> Make RDD API private in SparkR for Spark 1.4
> --------------------------------------------
>
>                 Key: SPARK-7230
>                 URL: https://issues.apache.org/jira/browse/SPARK-7230
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>    Affects Versions: 1.4.0
>            Reporter: Shivaram Venkataraman
>            Assignee: Shivaram Venkataraman
>            Priority: Critical
>             Fix For: 1.4.0
>
>
> This ticket proposes making the RDD API in SparkR private for the 1.4 
> release. The motivation for doing so are discussed in a larger design 
> document aimed at a more top-down design of the SparkR APIs. A first cut that 
> discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
> The main points in that document that relate to this ticket are:
> - The RDD API requires knowledge of the distributed system and is pretty low 
> level. This is not very suitable for a number of R users who are used to more 
> high-level packages that work out of the box.
> - The RDD implementation in SparkR is not fully robust right now: we are 
> missing features like spilling for aggregation, handling partitions which 
> don't fit in memory etc. There are further limitations like lack of hashCode 
> for non-native types etc. which might affect user experience.
> The only change we will make for now is to not export the RDD functions as 
> public methods in the SparkR package and I will create another ticket for 
> discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to