[ https://issues.apache.org/jira/browse/SPARK-12173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343212#comment-15343212 ]
Sun Rui commented on SPARK-12173: --------------------------------- [~rxin] yes R don't need compile time type safety, but map/reduce functions are popular in R, for example lapply() applies a function to each item of a list or vector. For now, sparkR support spark.lapply() similar to lapply(). The internal implementation internally depends on RDD. We could change the implementation to use Dataset but not exposing Dataset API, something like: change the R vector/list to a Dataset call Dataset functions on it Collect the result back as R vector/list Not exposing Dataset API means SparkR does not provides distributed vector/list abstraction, SparkR users have to use DataFrame for distributed vector/list , which seems is not convenient to R users. [~shivaram] what do you think? > Consider supporting DataSet API in SparkR > ----------------------------------------- > > Key: SPARK-12173 > URL: https://issues.apache.org/jira/browse/SPARK-12173 > Project: Spark > Issue Type: Sub-task > Components: SparkR > Reporter: Felix Cheung > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org