[jira] [Comment Edited] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-05-29 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564943#comment-14564943
 ] 

Vincent Warmerdam edited comment on SPARK-7230 at 5/29/15 3:49 PM:
---

[~shivaram] on it. you should see some pull requests. i've also added a pull 
request for the master github branch but this is only to edit the readme file 
to have a similar message listed. 


was (Author: cantdutchthis):
[~shivaram] on it. you should see some pull requests soon. ill also add a pull 
request for the master github branch but this is only to edit the readme file 
to have a similar message listed. 

 Make RDD API private in SparkR for Spark 1.4
 

 Key: SPARK-7230
 URL: https://issues.apache.org/jira/browse/SPARK-7230
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Critical
 Fix For: 1.4.0


 This ticket proposes making the RDD API in SparkR private for the 1.4 
 release. The motivation for doing so are discussed in a larger design 
 document aimed at a more top-down design of the SparkR APIs. A first cut that 
 discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
 The main points in that document that relate to this ticket are:
 - The RDD API requires knowledge of the distributed system and is pretty low 
 level. This is not very suitable for a number of R users who are used to more 
 high-level packages that work out of the box.
 - The RDD implementation in SparkR is not fully robust right now: we are 
 missing features like spilling for aggregation, handling partitions which 
 don't fit in memory etc. There are further limitations like lack of hashCode 
 for non-native types etc. which might affect user experience.
 The only change we will make for now is to not export the RDD functions as 
 public methods in the SparkR package and I will create another ticket for 
 discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-05-28 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563554#comment-14563554
 ] 

Vincent Warmerdam edited comment on SPARK-7230 at 5/28/15 7:59 PM:
---

I can actually see some sense in this decision (even though I can imagine that 
not all developers will be pleased) but if this decision is now final, it might 
be good to explicitly communicate this API change on the old SparkR github 
page. At the moment it seems to suggest that the API remain the same because it 
was merged in april.

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this breaking change would be communicated 
it would alleviate some pain to starting developers as well as prepare R users 
on what to expect in the 1.4 release. 

I would have posted this issue on the github page but it seems closed. 


was (Author: cantdutchthis):
I can actually see some sense in this decision (even though I can imagine that 
not all developers will be pleased) but if this decision is now final, it might 
be good to explicitly communicate this API change on the old SparkR github 
page. At the moment it seems to suggest that the API remain the same because it 
was merged in april.

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this breaking change would be communicated 
it would alleviate some pain to starting developers as well as prepare R users 
on what to expect in the 1.4 release. 

I would have posted this issue on the github page but it seems closed. 

 Make RDD API private in SparkR for Spark 1.4
 

 Key: SPARK-7230
 URL: https://issues.apache.org/jira/browse/SPARK-7230
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Critical
 Fix For: 1.4.0


 This ticket proposes making the RDD API in SparkR private for the 1.4 
 release. The motivation for doing so are discussed in a larger design 
 document aimed at a more top-down design of the SparkR APIs. A first cut that 
 discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
 The main points in that document that relate to this ticket are:
 - The RDD API requires knowledge of the distributed system and is pretty low 
 level. This is not very suitable for a number of R users who are used to more 
 high-level packages that work out of the box.
 - The RDD implementation in SparkR is not fully robust right now: we are 
 missing features like spilling for aggregation, handling partitions which 
 don't fit in memory etc. There are further limitations like lack of hashCode 
 for non-native types etc. which might affect user experience.
 The only change we will make for now is to not export the RDD functions as 
 public methods in the SparkR package and I will create another ticket for 
 discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-05-28 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563554#comment-14563554
 ] 

Vincent Warmerdam edited comment on SPARK-7230 at 5/28/15 7:52 PM:
---

I can actually see some sense in this decision (even though I can imagine that 
not all developers will be pleased) but if this decision is now final, it might 
be good to explicitly communicate this API change on the old SparkR github 
page. At the moment it seems to suggest that the API remain the same because it 
was merged in april.

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this breaking change would be communicated 
it would alleviate some pain to starting developers as well as prepare R users 
on what to expect in the 1.4 release. 

I would have posted this issue on the github page but it seems closed. 


was (Author: cantdutchthis):
I can actually see some sense in this decision (even though I can imagine that 
not all developers will be pleased) but if this decision is now final, it might 
be good to explicitly communicate this API change on the old SparkR github 
page. 

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this breaking change would be communicated 
it would alleviate some pain to starting developers as well as prepare R users 
on what to expect in the 1.4 release. 

I would have posted this issue on the github page but it seems closed. 

 Make RDD API private in SparkR for Spark 1.4
 

 Key: SPARK-7230
 URL: https://issues.apache.org/jira/browse/SPARK-7230
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Critical
 Fix For: 1.4.0


 This ticket proposes making the RDD API in SparkR private for the 1.4 
 release. The motivation for doing so are discussed in a larger design 
 document aimed at a more top-down design of the SparkR APIs. A first cut that 
 discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
 The main points in that document that relate to this ticket are:
 - The RDD API requires knowledge of the distributed system and is pretty low 
 level. This is not very suitable for a number of R users who are used to more 
 high-level packages that work out of the box.
 - The RDD implementation in SparkR is not fully robust right now: we are 
 missing features like spilling for aggregation, handling partitions which 
 don't fit in memory etc. There are further limitations like lack of hashCode 
 for non-native types etc. which might affect user experience.
 The only change we will make for now is to not export the RDD functions as 
 public methods in the SparkR package and I will create another ticket for 
 discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-05-28 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563554#comment-14563554
 ] 

Vincent Warmerdam edited comment on SPARK-7230 at 5/28/15 7:46 PM:
---

If this decision is now final, it might be good to explicitly communicate this 
API change on the old SparkR github page. 

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this would be communicated it would 
alleviate some pain to starting developers. 


was (Author: cantdutchthis):
If this decision is now final, it might be good to explicitly communicate this 
API change on the old SparkR github page. 

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation was better. If this would be communicated it would alleviate 
some pain to starting developers. 

 Make RDD API private in SparkR for Spark 1.4
 

 Key: SPARK-7230
 URL: https://issues.apache.org/jira/browse/SPARK-7230
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Critical
 Fix For: 1.4.0


 This ticket proposes making the RDD API in SparkR private for the 1.4 
 release. The motivation for doing so are discussed in a larger design 
 document aimed at a more top-down design of the SparkR APIs. A first cut that 
 discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
 The main points in that document that relate to this ticket are:
 - The RDD API requires knowledge of the distributed system and is pretty low 
 level. This is not very suitable for a number of R users who are used to more 
 high-level packages that work out of the box.
 - The RDD implementation in SparkR is not fully robust right now: we are 
 missing features like spilling for aggregation, handling partitions which 
 don't fit in memory etc. There are further limitations like lack of hashCode 
 for non-native types etc. which might affect user experience.
 The only change we will make for now is to not export the RDD functions as 
 public methods in the SparkR package and I will create another ticket for 
 discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-05-28 Thread Vincent Warmerdam (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563554#comment-14563554
 ] 

Vincent Warmerdam edited comment on SPARK-7230 at 5/28/15 7:49 PM:
---

I can actually see some sense in this decision (even though I can imagine that 
not all developers will be pleased) but if this decision is now final, it might 
be good to explicitly communicate this API change on the old SparkR github 
page. 

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this breaking change would be communicated 
it would alleviate some pain to starting developers as well as prepare R users 
on what to expect in the 1.4 release. 

I would have posted this issue on the github page but it seems closed. 


was (Author: cantdutchthis):
If this decision is now final, it might be good to explicitly communicate this 
API change on the old SparkR github page. 

Although I should have been reading the news on SparkR from here; I was using 
https://github.com/amplab-extras/SparkR-pkg as a reference until now because 
the documentation seemed better. If this would be communicated it would 
alleviate some pain to starting developers. 

 Make RDD API private in SparkR for Spark 1.4
 

 Key: SPARK-7230
 URL: https://issues.apache.org/jira/browse/SPARK-7230
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Critical
 Fix For: 1.4.0


 This ticket proposes making the RDD API in SparkR private for the 1.4 
 release. The motivation for doing so are discussed in a larger design 
 document aimed at a more top-down design of the SparkR APIs. A first cut that 
 discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
 The main points in that document that relate to this ticket are:
 - The RDD API requires knowledge of the distributed system and is pretty low 
 level. This is not very suitable for a number of R users who are used to more 
 high-level packages that work out of the box.
 - The RDD implementation in SparkR is not fully robust right now: we are 
 missing features like spilling for aggregation, handling partitions which 
 don't fit in memory etc. There are further limitations like lack of hashCode 
 for non-native types etc. which might affect user experience.
 The only change we will make for now is to not export the RDD functions as 
 public methods in the SparkR package and I will create another ticket for 
 discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7230) Make RDD API private in SparkR for Spark 1.4

2015-04-29 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520635#comment-14520635
 ] 

Patrick Wendell edited comment on SPARK-7230 at 4/30/15 1:13 AM:
-

Yes - removing API's is really difficult for existing users. That's why the 
proposal does limit the number of exposed API's, because users of Spark have an 
expectation they will be supported. Part of merging into the upstream project 
is looking at which API's the commitership are comfortable supporting in the 
long term. As it stands, there isn't yet widespread support in the 
committership for supporting low level ETL code in R in the long term. We'd 
rather have narrower and simpler API's and add more enhancements over time 
according to user demand.

Of course we'll make a good faith effort to support API's that are useful to 
existing projects.


was (Author: pwendell):
Yes - removing API's is really difficult for existing users. That's why the 
proposal does limit the number of exposed API's, because users of Spark have an 
exception they will be supported. Part of merging into the upstream project is 
looking at which API's the commitership are comfortable supporting in the long 
term. As it stands, there isn't yet widespread support in the committership for 
supporting low level ETL code in R in the long term. We'd rather have narrower 
and simpler API's and add more enhancements over time according to user demand.

Of course we'll make a good faith effort to support API's that are useful to 
existing projects.

 Make RDD API private in SparkR for Spark 1.4
 

 Key: SPARK-7230
 URL: https://issues.apache.org/jira/browse/SPARK-7230
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Critical

 This ticket proposes making the RDD API in SparkR private for the 1.4 
 release. The motivation for doing so are discussed in a larger design 
 document aimed at a more top-down design of the SparkR APIs. A first cut that 
 discusses motivation and proposed changes can be found at http://goo.gl/GLHKZI
 The main points in that document that relate to this ticket are:
 - The RDD API requires knowledge of the distributed system and is pretty low 
 level. This is not very suitable for a number of R users who are used to more 
 high-level packages that work out of the box.
 - The RDD implementation in SparkR is not fully robust right now: we are 
 missing features like spilling for aggregation, handling partitions which 
 don't fit in memory etc. There are further limitations like lack of hashCode 
 for non-native types etc. which might affect user experience.
 The only change we will make for now is to not export the RDD functions as 
 public methods in the SparkR package and I will create another ticket for 
 discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org