[ https://issues.apache.org/jira/browse/SPARK-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641711#comment-14641711 ]
Meihua Wu commented on SPARK-8518: ---------------------------------- [~yanboliang] Here's my two cents :) For the Cox model, you will need to find the \beta that maximize the log partial likelihood: l(\beta) =... the 3rd formula in the wiki page https://en.wikipedia.org/wiki/Proportional_hazards_model. There are two summations. The first one involves summation over the records indexed by i. For each record i, you will need to do another summation over the records indexed by j. The complexity of the 2nd summation is O(n). In the end, the double summation might be O(n^2). I guess we might be able to improve this to O(n*log(n)) by a pre-processing step of sorting by Y_i. But still not O(n). The exponential/Weibull model is like linear regression: there is only one summation in the objective function and each term in the summation is O(1). So the overall complexity is O(n). In the end, I am not saying the Cox model is not good [it is actually more flexible and robust.]. But I think for our first step, the exponential/Weibull model is easier to implement and computational-wise scales better for massive data. > Log-linear models for survival analysis > --------------------------------------- > > Key: SPARK-8518 > URL: https://issues.apache.org/jira/browse/SPARK-8518 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: Xiangrui Meng > Assignee: Yanbo Liang > Original Estimate: 168h > Remaining Estimate: 168h > > We want to add basic log-linear models for survival analysis. The > implementation should match the result from R's survival package > (http://cran.r-project.org/web/packages/survival/index.html). > Design doc from [~yanboliang]: > https://docs.google.com/document/d/1fLtB0sqg2HlfqdrJlNHPhpfXO0Zb2_avZrxiVoPEs0E/pub -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org