[ https://issues.apache.org/jira/browse/SPARK-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634621#comment-14634621 ]
Meihua Wu commented on SPARK-8518: ---------------------------------- [~mengxr] [~yanboliang] Sounds like to plan. We would start with something simple. I agree that the Cox PH model is a non-parametric model. It is not easy to implement it efficiently in Spark: To determine the contribution of a particular row in the RDD to the objective function, you will need to reference to other rows in the RDD, effectively breaking the parallelism. The log-linear model of survival models are often called Accelerated Failure Time (AFT) model (https://en.wikipedia.org/wiki/Accelerated_failure_time_model). For AFT, there are again two favor: parametric vs non-parametric. For the parametric favor, the commonly used model is based on Weilbull / exponential distribution. Under these models, each row in the RDD contribute to the objective function independently, thus easily parallelizable. > Log-linear models for survival analysis > --------------------------------------- > > Key: SPARK-8518 > URL: https://issues.apache.org/jira/browse/SPARK-8518 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: Xiangrui Meng > Assignee: Yanbo Liang > Original Estimate: 168h > Remaining Estimate: 168h > > We want to add basic log-linear models for survival analysis. The > implementation should match the result from R's survival package > (http://cran.r-project.org/web/packages/survival/index.html). > Design doc from [~yanboliang]: > https://docs.google.com/document/d/1fLtB0sqg2HlfqdrJlNHPhpfXO0Zb2_avZrxiVoPEs0E/pub -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org