I think the answer to this depends on what granularity you want to run
the algorithm on. If its on the entire Spark DataFrame and if you
except the data frame to be very large then it isn't easy to use the
existing R function. However if you want to run the algorithm on
smaller subsets of the data you can look at the support for UDFs we
have in SparkR at
http://spark.apache.org/docs/latest/sparkr.html#applying-user-defined-function

Thanks
Shivaram

On Tue, Nov 15, 2016 at 3:56 AM, pietrop <pietro.pu...@gmail.com> wrote:
> Hi all,
> I'm writing here after some intensive usage on pyspark and SparkSQL.
> I would like to use a well known function in the R world: coxph() from the
> survival package.
> From what I understood, I can't parallelize a function like coxph() because
> it isn't provided with the SparkR package. In other words, I should
> implement a SparkR compatible algorithm instead of using coxph().
> I have no chance to make coxph() parallelizable, right?
> More generally, I think this is true for any non-spark function which only
> accept data.frame format as the data input.
>
> Do you plan to implement the coxph() counterpart in Spark? The most useful
> version of this model is the Cox Regression Model for Time-Dependent
> Covariates, which is missing from ANY ML framework as far as I know.
>
> Thank you
>  Pietro
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-R-guidelines-for-non-spark-functions-and-coxph-Cox-Regression-for-Time-Dependent-Covariates-tp28077.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to