Re: SparkR interaction with R libraries (currently 1.5.2)

2016-06-07 Thread Sun Rui
Hi, Ian,
You should not use the Spark DataFrame a_df in your closure.
For an R function for lapplyPartition, the parameter is a list of lists, 
representing the rows in the corresponding partition.
In Spark 2.0, SparkR provides a new public API called dapply, which can apply 
an R function to each partition of a Spark DataFrame. The input of the R 
function is a data.frame corresponds to the partition data, and the output is 
also a data.frame.
you may download the Spark 2.0 preview release and give it a try.
> On Jun 8, 2016, at 01:58, rachmaninovquartet  
> wrote:
> 
> Hi,
> I'm trying to figure out how to work with R libraries in spark, properly.
> I've googled and done some trial and error. The main error, I've been
> running into is "cannot coerce class "structure("DataFrame", package =
> "SparkR")" to a data.frame". I'm wondering if there is a way to use the R
> dataframe functionality on worker nodes or if there is a way to "hack" the R
> function in order to make it accept Spark dataframes. Here is an example of
> what I'm trying to do, with a_df being a spark dataframe:
> 
> ***DISTRIBUTED***
> #0 filter out nulls
> a_df <- filter(a_df, isNotNull(a_df$Ozone))
> 
> #1 make closure
> treeParty <- function(x) {
># Use sparseMatrix function from the Matrix package
>air.ct <- ctree(Ozone ~ ., data = a_df)
> }
> 
> #2  put package in context
> SparkR:::includePackage(sc, partykit)
> 
> #3 apply to all partitions
> partied <- SparkR:::lapplyPartition(a_df, treeParty)
> 
> 
> **LOCAL***
> Here is R code that works with a local dataframe, local_df:
> local_df <- subset(airquality, !is.na(Ozone))
> air.ct <- ctree(Ozone ~ ., data = local_df)
> 
> Any advice would be greatly appreciated!
> 
> Thanks,
> 
> Ian
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-interaction-with-R-libraries-currently-1-5-2-tp27107.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SparkR interaction with R libraries (currently 1.5.2)

2016-06-07 Thread rachmaninovquartet
Hi,
I'm trying to figure out how to work with R libraries in spark, properly.
I've googled and done some trial and error. The main error, I've been
running into is "cannot coerce class "structure("DataFrame", package =
"SparkR")" to a data.frame". I'm wondering if there is a way to use the R
dataframe functionality on worker nodes or if there is a way to "hack" the R
function in order to make it accept Spark dataframes. Here is an example of
what I'm trying to do, with a_df being a spark dataframe:

***DISTRIBUTED***
#0 filter out nulls
a_df <- filter(a_df, isNotNull(a_df$Ozone))

#1 make closure
treeParty <- function(x) {
# Use sparseMatrix function from the Matrix package
air.ct <- ctree(Ozone ~ ., data = a_df)
}

#2  put package in context
SparkR:::includePackage(sc, partykit)

#3 apply to all partitions
partied <- SparkR:::lapplyPartition(a_df, treeParty)


**LOCAL***
Here is R code that works with a local dataframe, local_df:
local_df <- subset(airquality, !is.na(Ozone))
air.ct <- ctree(Ozone ~ ., data = local_df)

Any advice would be greatly appreciated!

Thanks,

Ian




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-interaction-with-R-libraries-currently-1-5-2-tp27107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org