[ https://issues.apache.org/jira/browse/SPARK-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711973#comment-14711973 ]
Shivaram Venkataraman commented on SPARK-7264: ---------------------------------------------- Sorry for the delay in getting back on this and thanks everybody for your comments. There are a number of issues that are intermingled in the original design doc and I have created two separate smaller design docs that we can hopefully iterate on to build things out. In this JIRA, I'd like to focus on running parallel R functions on Spark to provide functionality similar to `snow` or `doParallel` etc. I've created a new document detailing syntax and other aspects of this at https://docs.google.com/document/d/1oegI3OjmK_a-ME4m7sdL4ZlzY7wkXzfaX69GqQqK0VI/edit# In https://issues.apache.org/jira/browse/SPARK-6817, I have posted a design doc that outlines running user-defined R functions on Spark DataFrames. Lets continue the discussion about that API in SPARK-6817. > SparkR API for parallel functions > --------------------------------- > > Key: SPARK-7264 > URL: https://issues.apache.org/jira/browse/SPARK-7264 > Project: Spark > Issue Type: New Feature > Components: SparkR > Reporter: Shivaram Venkataraman > > This is a JIRA to discuss design proposals for enabling parallel R > computation in SparkR without exposing the entire RDD API. > The rationale for this is that the RDD API has a number of low level > functions and we would like to expose a more light-weight API that is both > friendly to R users and easy to maintain. > http://goo.gl/GLHKZI has a first cut design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org