Re: [R] rpart with interval censored data crashes R
Thanks for such a complete answer, that is very helpful. Best regards, Keith Jewell Terry Therneau thern...@mayo.edu wrote in message news:200901121858.n0ciw0g06...@hsrnfs-101.mayo.edu... Thank you for the input on rpart -- I just saw the message today. 1. You are right, it should not crash. Why it crashes rpart is simply that I (the author) never ever tried using interval censored data in the call. Real users try the most amazing things I'll fix it in my local version, but putting in a no no no message. My local version and the R version, maintained by Brian, have drifted quite far apart however. 2. Rpart deals with right censored data using the same trick as Cox models, by thinking of it as observation of a Poisson process; number of events seen over a given time window. The fact that the number is always 0 or 1 doesn't hinder the mathematical trick, which is based in counting process theory. BUT - the trick only works for right censored data. Using the mid points of your intervals is the only approach that comes readily to mind. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rpart with interval censored data crashes R
Thank you for the input on rpart -- I just saw the message today. 1. You are right, it should not crash. Why it crashes rpart is simply that I (the author) never ever tried using interval censored data in the call. Real users try the most amazing things I'll fix it in my local version, but putting in a no no no message. My local version and the R version, maintained by Brian, have drifted quite far apart however. 2. Rpart deals with right censored data using the same trick as Cox models, by thinking of it as observation of a Poisson process; number of events seen over a given time window. The fact that the number is always 0 or 1 doesn't hinder the mathematical trick, which is based in counting process theory. BUT - the trick only works for right censored data. Using the mid points of your intervals is the only approach that comes readily to mind. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpart with interval censored data crashes R
On a Leopard Mac with the Urbanek compiled 64 bit R, one sees this: library(rpart) library(survival) Loading required package: splines fit-rpart(Surv(N,Y,type=interval2)~Salt+pH+Temp, data=myD) *** caught segfault *** address 0x0, cause 'memory not mapped' Traceback: 1: .C(C_rpartexp2, as.integer(length(dtimes)), as.double(dtimes), as.double(.Machine$double.eps), keep = integer(length(dtimes))) 2: (get(paste(rpart, method, sep = .)))(Y, offset, , wt) 3: rpart(Surv(N, Y, type = interval2) ~ Salt + pH + Temp, data = myD) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Choosing 4 does save the workspace. -- David Winsemius On Jan 9, 2009, at 9:04 AM, Keith Jewell wrote: Hi Everyone, This example code results in R 'crashing'; that is the R application closes with no warnings or error messages. #--- myD - read.table(stdin(), header=TRUE, nrows=20) Broth Salt pH TempN Y Growth 13109.0 2.92 10 90.0 NA0 26156.0 7.82 30 1.0 21 32172.0 7.34 10 7.0 81 4338 10.0 4.44 10 90.0 NA0 52404.0 7.33 10 20.0 211 6336 10.0 3.90 10 90.0 NA0 72797.0 6.73 10 90.0 NA0 8 10219.0 5.03 45 8.0 91 99747.0 4.01 45 90.0 NA0 10 2657.0 2.93 10 90.0 NA0 11 9344.0 5.28 45 0.1 11 12 6699.0 5.03 30 90.0 NA0 13 875 10.0 6.24 37 1.0 21 14 3852.0 5.84 20 1.0 21 15 5622.0 5.84 30 0.1 11 16 7180.5 5.54 37 0.1 11 17 8459.0 5.03 37 3.0 61 18 9132.0 5.84 45 0.1 11 19 5774.0 4.10 30 90.0 NA0 20200.5 7.44 8 24.0 271 library(rpart) library(survival) fit-rpart(Surv(N,Y,type=interval2)~Salt+pH+Temp, data=myD) #- Professor Ripley helpfully pointed out that the documentation does not say that interval censoring is supported, and indeed this seems only to happen with interval censored data. ?rpart indicates that the dependent variable may be a survival object. Neither ?rpart nor An Introduction to Recursive Partitioning Using the RPART Routines (Therneau et al 1997) suggest that the dependent variable may contain interval censored data, but neither do they suggest it shouldn't; i.e. as far as I'm aware (!) this restriction is not documented. This post has three purposes: 1) Bring this behaviour - especially the crash in response to 'bad' data - to the attention of the authors. 2) Seek an explanation of the restriction (if intentional). In my simplicity, it seems that interval censored data should be easier to handle than left or right censored - after all the information content is greater. 3) Seek guidance on how to work around the problem. I'm minded to replace the interval censored data by the mid points of the intervals. Does anyone have any comments on such an approach? Any comments gratefully received. Keith Jewell == Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.1 year = 2009 month = 01 day = 07 svn rev = 47502 language = R version.string = R version 2.8.1 Patched (2009-01-07 r47502) Windows Server 2003 x64 (build 3790) Service Pack 2 Locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rpart with interval censored data crashes R
Hi Everyone, This example code results in R 'crashing'; that is the R application closes with no warnings or error messages. #--- myD - read.table(stdin(), header=TRUE, nrows=20) Broth Salt pH TempN Y Growth 13109.0 2.92 10 90.0 NA0 26156.0 7.82 30 1.0 21 32172.0 7.34 10 7.0 81 4338 10.0 4.44 10 90.0 NA0 52404.0 7.33 10 20.0 211 6336 10.0 3.90 10 90.0 NA0 72797.0 6.73 10 90.0 NA0 8 10219.0 5.03 45 8.0 91 99747.0 4.01 45 90.0 NA0 10 2657.0 2.93 10 90.0 NA0 11 9344.0 5.28 45 0.1 11 12 6699.0 5.03 30 90.0 NA0 13 875 10.0 6.24 37 1.0 21 14 3852.0 5.84 20 1.0 21 15 5622.0 5.84 30 0.1 11 16 7180.5 5.54 37 0.1 11 17 8459.0 5.03 37 3.0 61 18 9132.0 5.84 45 0.1 11 19 5774.0 4.10 30 90.0 NA0 20200.5 7.44 8 24.0 271 library(rpart) library(survival) fit-rpart(Surv(N,Y,type=interval2)~Salt+pH+Temp, data=myD) #- Professor Ripley helpfully pointed out that the documentation does not say that interval censoring is supported, and indeed this seems only to happen with interval censored data. ?rpart indicates that the dependent variable may be a survival object. Neither ?rpart nor An Introduction to Recursive Partitioning Using the RPART Routines (Therneau et al 1997) suggest that the dependent variable may contain interval censored data, but neither do they suggest it shouldn't; i.e. as far as I'm aware (!) this restriction is not documented. This post has three purposes: 1) Bring this behaviour - especially the crash in response to 'bad' data - to the attention of the authors. 2) Seek an explanation of the restriction (if intentional). In my simplicity, it seems that interval censored data should be easier to handle than left or right censored - after all the information content is greater. 3) Seek guidance on how to work around the problem. I'm minded to replace the interval censored data by the mid points of the intervals. Does anyone have any comments on such an approach? Any comments gratefully received. Keith Jewell == Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Patched major = 2 minor = 8.1 year = 2009 month = 01 day = 07 svn rev = 47502 language = R version.string = R version 2.8.1 Patched (2009-01-07 r47502) Windows Server 2003 x64 (build 3790) Service Pack 2 Locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.