Hi, Ian,
You should not use the Spark DataFrame a_df in your closure.
For an R function for lapplyPartition, the parameter is a list of lists,
representing the rows in the corresponding partition.
In Spark 2.0, SparkR provides a new public API called dapply, which can apply
an R function to each partition of a Spark DataFrame. The input of the R
function is a data.frame corresponds to the partition data, and the output is
also a data.frame.
you may download the Spark 2.0 preview release and give it a try.
> On Jun 8, 2016, at 01:58, rachmaninovquartet
> wrote:
>
> Hi,
> I'm trying to figure out how to work with R libraries in spark, properly.
> I've googled and done some trial and error. The main error, I've been
> running into is "cannot coerce class "structure("DataFrame", package =
> "SparkR")" to a data.frame". I'm wondering if there is a way to use the R
> dataframe functionality on worker nodes or if there is a way to "hack" the R
> function in order to make it accept Spark dataframes. Here is an example of
> what I'm trying to do, with a_df being a spark dataframe:
>
> ***DISTRIBUTED***
> #0 filter out nulls
> a_df <- filter(a_df, isNotNull(a_df$Ozone))
>
> #1 make closure
> treeParty <- function(x) {
># Use sparseMatrix function from the Matrix package
>air.ct <- ctree(Ozone ~ ., data = a_df)
> }
>
> #2 put package in context
> SparkR:::includePackage(sc, partykit)
>
> #3 apply to all partitions
> partied <- SparkR:::lapplyPartition(a_df, treeParty)
>
>
> **LOCAL***
> Here is R code that works with a local dataframe, local_df:
> local_df <- subset(airquality, !is.na(Ozone))
> air.ct <- ctree(Ozone ~ ., data = local_df)
>
> Any advice would be greatly appreciated!
>
> Thanks,
>
> Ian
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-interaction-with-R-libraries-currently-1-5-2-tp27107.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org