[ 
https://issues.apache.org/jira/browse/SPARK-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641711#comment-14641711
 ] 

Meihua Wu commented on SPARK-8518:
----------------------------------

[~yanboliang] Here's my two cents :)

For the Cox model, you will need to find the \beta that maximize the log 
partial likelihood: l(\beta) =... the 3rd formula in the wiki page 
https://en.wikipedia.org/wiki/Proportional_hazards_model. There are two 
summations. The first one involves summation over the records indexed by i. For 
each record i, you will need to do another summation over the records indexed 
by j. The complexity of the 2nd summation is O(n). In the end, the double 
summation might be O(n^2). I guess we might be able to improve this to 
O(n*log(n)) by a pre-processing step of sorting by Y_i. But still not O(n).

The exponential/Weibull model is like linear regression: there is only one 
summation in the objective function and each term in the summation is O(1). So 
the overall complexity is O(n).

In the end, I am not saying the Cox model is not good [it is actually more 
flexible and robust.]. But I think for our first step, the exponential/Weibull 
model is easier to implement and computational-wise scales better for massive 
data.

> Log-linear models for survival analysis
> ---------------------------------------
>
>                 Key: SPARK-8518
>                 URL: https://issues.apache.org/jira/browse/SPARK-8518
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Yanbo Liang
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We want to add basic log-linear models for survival analysis. The 
> implementation should match the result from R's survival package 
> (http://cran.r-project.org/web/packages/survival/index.html).
> Design doc from [~yanboliang]: 
> https://docs.google.com/document/d/1fLtB0sqg2HlfqdrJlNHPhpfXO0Zb2_avZrxiVoPEs0E/pub



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to