I am trying to use an existing R package in SparkR. I am trying to follow the
example at https://amplab-extras.github.io/SparkR-pkg/ in the section "Using
existing R packages".

Here is the sample in ample extras --
  generateSparse <- function(x) {
    # Use sparseMatrix function from the Matrix package
    sparseMatrix(i=c(1, 2, 3), j=c(1, 2, 3), x=c(1, 2, 3))
  }
  includePackage(sc, Matrix)
  sparseMat <- lapplyPartition(rdd, generateSparse)

My package (named 'galileo') consists of a number of clustering methods that
operate on input in a dense matrix.

Here is my code prototype, based on the above sample:

t1 <- jsonFile(sqlContext,"/root/test1.txt")
runGalileo <- function(x) {
   galileo(x,model="kmeans",dist="maximum", K=5)
}
SparkR:::includePackage(sc,galileo)
f <- SparkR:::lapplyPartition(t1,runGalileo)

I'm assuming t1 would be a data frame created from data coming from my
existing application as json ( in the prototype from a file, ultimately from
MongoDB).

So my first question is - what should that json look like to represent a
dense matrix (dgeMatrix in R, perhaps) ? 

Question two- I have noted that some of the APIs in the example are no
longer readily available (I had to add "SparkR:::" to use lapplyPartition -
in Spark 1.4). Is there a different way I should be calling existing R
packages?

Where I am coming from is that I was developing a distributed worker
akka/scala framework for scaling out my use of R to run a large numbers of R
methods on behalf of a large number of users through multiple RServe
instances. The call "galileo(x,model="kmeans",dist="maximum", K=5)", where x
is the dense matrix, is typical of the R calls I was sending to RServe. As I
was developing this I kept running into posts to the Spark User group when I
googled troublesome stack traces I was encountering. As I began to become
familiar with Spark and saw that it included SparkR, I came to see this as
an alternative to me developing my own system with all the challenges I was
anticipating.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/using-existing-R-packages-from-SparkR-tp24693.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to