Re: [R] Concordance Index - interpretation
K F Pearce wrote: Hello everyone. This is a question regarding generation of the concordance index (c index) in R using the function rcorr.cens. In particular about interpretation of its direction and form of the 'predictor'. Since Frank Harrell hasn't replied I'll contribute my 2 cents. One of the arguments is a numeric predictor variable ( presumably this is just a *single* predictor variable). Say this variable takes numeric values Am I correct in thinking that if the c index is 0.5 (with Somers D positive) then this tells us that the higher the numeric values of the 'predictor', the greater the survival probability and similarly if the c index is 0.5 (with Somers D negative) then this tells us that the higher the numeric values of the 'predictor' the lower the survival probability ? The c-index is a generalisation of the area under the ROC curve (AUC), therefore it measures how well your model discriminates between different responses, i.e., is your predicted response low for low observed responses and high for high observed responses. So C 0.5 implies a good prediction ability, C = 0.5 implies no predictive ability (no better than random guessing), and C 0.5 implies good anti-prediction (worse than random, but if you flip the prediction direction it becomes a good prediction). The c index estimates the probability of concordance between predicted and observed responsesHarrel et al (1996) says in predicting time until death, concordance is calculated by considering all possible pairs of patients, at least one of whom has died. If the *predicted* survival time (probability) is larger for the patient who (actually) lived longer, the predictions for that pair are said to be concordant with the (actual) outcomes. . I have read that the c index is defined by the proportion of all usable patients in which the predictions and outcomes are concordant. Now, secondly, I'd like to ask what form the predictor can take. Presumably if the predictor was a continuous-type variable e.g. 'age' then predicted survival probability (calculated internally via Cox regression?) would be compared with actual survival time for each specific age to get the c index? Now, if the predictor was an *ordinal categorical variable* where 1=worst group and 5=best group - I presume that the c index would be calculated similarly but this time there would be many ties in the predictor (as regards predicted survival probability) - hence if I wanted to count all ties in such a case I would keep the default argument outx=FALSE? Both the predictor and the actual response can be either continuous or categorical, as long as they are ordinal (since it's a rank-based method). I don't know about the outx part. Does anyone have a clear reference which gives the formula used to generate the concordance index (with worked examples)? I think the explanation in Harrell 1996, Section 5.5 is pretty clear, but perhaps could've used some pseudocode. Anyway, I understand it as: 1) Create all pairs of observed responses. 2) For all valid response pairs, i.e., pairs where one response y_1 is greater than the other y_2, test whether the corresponding predictions are concordant, i.e, yhat_1 yhat_2. If so add 1 to the running sum s. If yhat_1 = yhat_2, add 0.5 to the sum. Count the number n of valid response pairs. 3) Divide the total sum s by the number of valid response pairs n. Here's my simple attempt, unoptimised and doesn't handle censoring: # yhat: predicted response # y: observed response concordance - function(yhat, y) { s - 0 n - 0 for(i in seq(along=y)) { for(j in seq(along=y)) { if(i != j) { if(y[i] y[j]) { s - s + (yhat[i] yhat[j]) + 0.5 * (yhat[i] == yhat[j]) n - n + 1 } } } } s / n } See also Harrell's 2001 book Regression Modeling Strategies, and for the special case of binary outcomes (which is the AUC), Hanley and McNeil (1982) The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve, Radiology 143:29--36. Cheers, Gad -- Gad Abraham Dept. CSSE and NICTA The University of Melbourne Parkville 3010, Victoria, Australia email: gabra...@csse.unimelb.edu.au web: http://www.csse.unimelb.edu.au/~gabraham __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Concordance Index - interpretation
Gad Abraham wrote: K F Pearce wrote: Hello everyone. This is a question regarding generation of the concordance index (c index) in R using the function rcorr.cens. In particular about interpretation of its direction and form of the 'predictor'. Since Frank Harrell hasn't replied I'll contribute my 2 cents. One of the arguments is a numeric predictor variable ( presumably this is just a *single* predictor variable). Say this variable takes numeric values Am I correct in thinking that if the c index is 0.5 (with Somers D positive) then this tells us that the higher the numeric values of the 'predictor', the greater the survival probability and similarly if the c index is 0.5 (with Somers D negative) then this tells us that the higher the numeric values of the 'predictor' the lower the survival probability ? The c-index is a generalisation of the area under the ROC curve (AUC), therefore it measures how well your model discriminates between different responses, i.e., is your predicted response low for low observed responses and high for high observed responses. So C 0.5 implies a good prediction ability, C = 0.5 implies no predictive ability (no better than random guessing), and C 0.5 implies good anti-prediction (worse than random, but if you flip the prediction direction it becomes a good prediction). The c index estimates the probability of concordance between predicted and observed responsesHarrel et al (1996) says in predicting time until death, concordance is calculated by considering all possible pairs of patients, at least one of whom has died. If the *predicted* survival time (probability) is larger for the patient who (actually) lived longer, the predictions for that pair are said to be concordant with the (actual) outcomes. . I have read that the c index is defined by the proportion of all usable patients in which the predictions and outcomes are concordant. Now, secondly, I'd like to ask what form the predictor can take. Presumably if the predictor was a continuous-type variable e.g. 'age' then predicted survival probability (calculated internally via Cox regression?) would be compared with actual survival time for each specific age to get the c index? Now, if the predictor was an *ordinal categorical variable* where 1=worst group and 5=best group - I presume that the c index would be calculated similarly but this time there would be many ties in the predictor (as regards predicted survival probability) - hence if I wanted to count all ties in such a case I would keep the default argument outx=FALSE? Both the predictor and the actual response can be either continuous or categorical, as long as they are ordinal (since it's a rank-based method). I don't know about the outx part. Does anyone have a clear reference which gives the formula used to generate the concordance index (with worked examples)? I think the explanation in Harrell 1996, Section 5.5 is pretty clear, but perhaps could've used some pseudocode. Anyway, I understand it as: 1) Create all pairs of observed responses. 2) For all valid response pairs, i.e., pairs where one response y_1 is greater than the other y_2, test whether the corresponding predictions are concordant, i.e, yhat_1 yhat_2. If so add 1 to the running sum s. If yhat_1 = yhat_2, add 0.5 to the sum. Count the number n of valid response pairs. 3) Divide the total sum s by the number of valid response pairs n. Here's my simple attempt, unoptimised and doesn't handle censoring: # yhat: predicted response # y: observed response concordance - function(yhat, y) { s - 0 n - 0 for(i in seq(along=y)) { for(j in seq(along=y)) { if(i != j) { if(y[i] y[j]) { s - s + (yhat[i] yhat[j]) + 0.5 * (yhat[i] == yhat[j]) n - n + 1 } } } } s / n } See also Harrell's 2001 book Regression Modeling Strategies, and for the special case of binary outcomes (which is the AUC), Hanley and McNeil (1982) The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve, Radiology 143:29--36. Cheers, Gad Thanks for the great reply Gad. outx=TRUE is used to not 'penalize' for ties on the predictions (or the single variable given as x); this results in Goodman-Kruskal gamma-type rank correlation indexes. When comparing different predictions with different number of ties, it is especially not a good idea to discard ties in x. The Fortran code that comes with Hmisc can also be viewed to see the exact algorithms. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
[R] Concordance Index - interpretation
Hello everyone. This is a question regarding generation of the concordance index (c index) in R using the function rcorr.cens. In particular about interpretation of its direction and form of the 'predictor'. One of the arguments is a numeric predictor variable ( presumably this is just a *single* predictor variable). Say this variable takes numeric values Am I correct in thinking that if the c index is 0.5 (with Somers D positive) then this tells us that the higher the numeric values of the 'predictor', the greater the survival probability and similarly if the c index is 0.5 (with Somers D negative) then this tells us that the higher the numeric values of the 'predictor' the lower the survival probability ? The c index estimates the probability of concordance between predicted and observed responsesHarrel et al (1996) says in predicting time until death, concordance is calculated by considering all possible pairs of patients, at least one of whom has died. If the *predicted* survival time (probability) is larger for the patient who (actually) lived longer, the predictions for that pair are said to be concordant with the (actual) outcomes. . I have read that the c index is defined by the proportion of all usable patients in which the predictions and outcomes are concordant. Now, secondly, I'd like to ask what form the predictor can take. Presumably if the predictor was a continuous-type variable e.g. 'age' then predicted survival probability (calculated internally via Cox regression?) would be compared with actual survival time for each specific age to get the c index? Now, if the predictor was an *ordinal categorical variable* where 1=worst group and 5=best group - I presume that the c index would be calculated similarly but this time there would be many ties in the predictor (as regards predicted survival probability) - hence if I wanted to count all ties in such a case I would keep the default argument outx=FALSE? Does anyone have a clear reference which gives the formula used to generate the concordance index (with worked examples)? Many thanks for your help on these interpretations Kind Regards, Kim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.