Hi, I'm trying to figure out how to work with R libraries in spark, properly. I've googled and done some trial and error. The main error, I've been running into is "cannot coerce class "structure("DataFrame", package = "SparkR")" to a data.frame". I'm wondering if there is a way to use the R dataframe functionality on worker nodes or if there is a way to "hack" the R function in order to make it accept Spark dataframes. Here is an example of what I'm trying to do, with a_df being a spark dataframe:
***DISTRIBUTED*** #0 filter out nulls a_df <- filter(a_df, isNotNull(a_df$Ozone)) #1 make closure treeParty <- function(x) { # Use sparseMatrix function from the Matrix package air.ct <- ctree(Ozone ~ ., data = a_df) } #2 put package in context SparkR:::includePackage(sc, partykit) #3 apply to all partitions partied <- SparkR:::lapplyPartition(a_df, treeParty) **LOCAL*** Here is R code that works with a local dataframe, local_df: local_df <- subset(airquality, !is.na(Ozone)) air.ct <- ctree(Ozone ~ ., data = local_df) Any advice would be greatly appreciated! Thanks, Ian -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-interaction-with-R-libraries-currently-1-5-2-tp27107.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org