I think the answer to this depends on what granularity you want to run the algorithm on. If its on the entire Spark DataFrame and if you except the data frame to be very large then it isn't easy to use the existing R function. However if you want to run the algorithm on smaller subsets of the data you can look at the support for UDFs we have in SparkR at http://spark.apache.org/docs/latest/sparkr.html#applying-user-defined-function
Thanks Shivaram On Tue, Nov 15, 2016 at 3:56 AM, pietrop <pietro.pu...@gmail.com> wrote: > Hi all, > I'm writing here after some intensive usage on pyspark and SparkSQL. > I would like to use a well known function in the R world: coxph() from the > survival package. > From what I understood, I can't parallelize a function like coxph() because > it isn't provided with the SparkR package. In other words, I should > implement a SparkR compatible algorithm instead of using coxph(). > I have no chance to make coxph() parallelizable, right? > More generally, I think this is true for any non-spark function which only > accept data.frame format as the data input. > > Do you plan to implement the coxph() counterpart in Spark? The most useful > version of this model is the Cox Regression Model for Time-Dependent > Covariates, which is missing from ANY ML framework as far as I know. > > Thank you > Pietro > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-R-guidelines-for-non-spark-functions-and-coxph-Cox-Regression-for-Time-Dependent-Covariates-tp28077.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org