Hi,
I'm trying to figure out how to work with R libraries in spark, properly.
I've googled and done some trial and error. The main error, I've been
running into is "cannot coerce class "structure("DataFrame", package =
"SparkR")" to a data.frame". I'm wondering if there is a way to use the R
dataframe functionality on worker nodes or if there is a way to "hack" the R
function in order to make it accept Spark dataframes. Here is an example of
what I'm trying to do, with a_df being a spark dataframe:

***DISTRIBUTED***
#0 filter out nulls
a_df <- filter(a_df, isNotNull(a_df$Ozone))

#1 make closure
treeParty <- function(x) {
    # Use sparseMatrix function from the Matrix package
    air.ct <- ctree(Ozone ~ ., data = a_df)
}

#2  put package in context
SparkR:::includePackage(sc, partykit)

#3 apply to all partitions
partied <- SparkR:::lapplyPartition(a_df, treeParty)


**LOCAL***
Here is R code that works with a local dataframe, local_df:
local_df <- subset(airquality, !is.na(Ozone))
air.ct <- ctree(Ozone ~ ., data = local_df)

Any advice would be greatly appreciated!

Thanks,

Ian




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-interaction-with-R-libraries-currently-1-5-2-tp27107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to