[ 
https://issues.apache.org/jira/browse/SPARK-19234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830037#comment-15830037
 ] 

Yanbo Liang edited comment on SPARK-19234 at 1/19/17 2:33 PM:
--------------------------------------------------------------

[~admackin] Nice catch. Just like [~srowen] said, the AFT survival model is 
regression the log of failure time, so it's invalid when failure time is zero. 
I think the correct fix should be throwing error for non-positive failure time.
Double check with R, it throw error for zero failure time:
{code}
data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 
0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 
0.231, 1.151, -0.226))
model <- survreg(Surv(time, censor) ~ a + b, data)

Error in survreg(Surv(time, censor) ~ a + b, data) : 
  Invalid survival times for this distribution
{code}


was (Author: yanboliang):
[~admackin] Nice catch. Just like [~srowen] side, the AFT survival model is 
regression the log of failure time, so it's invalid when failure time is zero. 
I think the correct fix should be throwing error for non-positive failure time.
Double check with R, it throw error for zero failure time:
{code}
data <- list(time = c(1.218, 0.0, 3.627, 0.273, 4.199), censor = c(1.0, 0.0, 
0.0, 1.0, 0.0), a = c(1.56, 0.346, 1.38, 0.52, 0.795), b = c(-0.605, 2.158, 
0.231, 1.151, -0.226))
model <- survreg(Surv(time, censor) ~ a + b, data)

Error in survreg(Surv(time, censor) ~ a + b, data) : 
  Invalid survival times for this distribution
{code}

> AFTSurvivalRegression chokes silently or with confusing errors when any 
> labels are zero
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-19234
>                 URL: https://issues.apache.org/jira/browse/SPARK-19234
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.1.0
>         Environment: spark-shell or pyspark
>            Reporter: Andrew MacKinlay
>            Priority: Minor
>         Attachments: spark-aft-failure.txt
>
>
> If you try and use AFTSurvivalRegression and any label in your input data is 
> 0.0, you get coefficients of 0.0 returned, and in many cases, errors like 
> this:
> {{17/01/16 15:10:50 ERROR StrongWolfeLineSearch: Encountered bad values in 
> function evaluation. Decreasing step size to NaN}}
> Zero should, I think, be an allowed value for survival analysis. I don't know 
> if this is a pathological case for AFT specifically as I don't know enough 
> about it, but this behaviour is clearly undesirable. If you have any labels 
> of 0.0, you get either a) obscure error messages, with no knowledge of the 
> cause and coefficients which are all zero or b) no errors messages at all and 
> coefficients of zero (arguably worse, since you don't even have console 
> output to tell you something's gone awry). If AFT doesn't work with 
> zero-valued labels, Spark should fail fast and let the developer know why. If 
> it does, we should get results here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to