[R] X11 font
I get the following error out of R, on a newer Ubuntu installation. Error in `axis()`: ! X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 12 could not be loaded Backtrace: 1. graphics::matplot(...) 3. graphics::plot.default(...) 4. graphics (local) localAxis(...) 6. graphics:::Axis.default(...) 7. graphics::axis(side = side, at = at, labels = labels, ...) The context is pretty simple: library(survival) matplot(60:100, 36525* survexp.us[61:101, 1:2, "2014"], col=2:1, lty=1, lwd=2, xlab="Age", ylab="Death rate per 100", log='y', type='l', yaxt='n', main="US Death Rates") axis(2, c(1,2,5,10,20, 50), c(1,2,5,10, 20, 50), las=2) This code works fine on my screen. The error comes up when I put it into an .Rmd file and apply rmarkdown::render() to it. Likely some font file is needed, but what one? Terry Version: R Under development (unstable) (2023-08-01 r84818) -- "Unsuffered Consequences" Copyright (C) 2023 The R Foundation for Statistical Computing Platform: aarch64-unknown-linux-gnu -- Terry M Therneau, PhD Department of Quantitative Health Sciences Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extract from a list of lists
Thanks everyone for prodding my gray matter, which seems to be getting stiffer as I approach 70 (< 90 days). -- I'll continue to use the $ or [[ forms. That will suffice. -- I thought there might be a base R variant, e.g. something like extract( list, element-name); probably cross talk in my brain from the rstan library -- Gregg's note shows such a function in purr. But I rather like having as few dependencies as possible in a package, one usage is normally not enough, at least for something this simple. Terry T. On 12/27/22 14:38, Bert Gunter wrote: > Well, I prefer Greg's approach, but if you want to avoid calls to $ or > `[[` then you could do: > > unlist(fits)[ rep(names(fits[[1]]) == 'iter', length(fits))] > > > Cheers, > Bert > > On Tue, Dec 27, 2022 at 9:46 AM Greg Snow<538...@gmail.com> wrote: >> Another option is the map family of functions in the purrr package >> (yes, this depends on another package being loaded, which may affect >> things if you are including this in your own package, creating a >> dependency). >> >> In map and friends, if the "function" is a string or integer, then it >> is taken as the piece to be extracted, so you should be able to do >> something like: >> >> library(purrr) >> map(fits, 'iter') >> # or >> map_int(fits, 'iter') >> # or >> map_dbl(fits, 'iter') >> >> which of the last 2 to use depends on how `iter` is stored. >> >> On Tue, Dec 27, 2022 at 10:16 AM Therneau, Terry M., Ph.D. via R-help >> wrote: >>> I not uncommonly have the following paradym >>> fits <- lapply(argument, function) >>> >>> resulting in a list of function results. Often, the outer call is to >>> mclapply, and the >>> function encodes some long calculation, e.g. multiple chains in an MCMC. >>> Assume for illustration that each function returns a list with elements >>> beta, loglik, iter. >>> >>> Then sapply(fits, function(x) x$iter) >>> will give me a vector, with the number of iterations used by each instance. >>> >>> I've often been suspicious that there is some simple shorthand for the >>> "grab all the >>> elements named iter" that skips the explicit x$iter function. Am I indeed >>> overlooking >>> something?I don't expect a speed increase, just cleaner code. >>> >>> Terry T. >>> >>> -- >>> Terry M Therneau, PhD >>> Department of Quantitative Health Sciences >>> Mayo Clinic >>> thern...@mayo.edu >>> >>> "TERR-ree THUR-noh" >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctherneau%40mayo.edu%7Cebbda887f4b94223376608dae84a58ae%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638077703266213025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=wksTglkeQKmMSWWKHLsUijH5A25cH%2FuwxSNBOeNr9Sg%3D=0 >>> PLEASE do read the posting >>> guidehttps://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html=05%7C01%7Ctherneau%40mayo.edu%7Cebbda887f4b94223376608dae84a58ae%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638077703266213025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=9GASHCZVgnzNqAUrWimS7UphiqjYLOf50M1qWv6jeN0%3D=0 >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> -- >> Gregory (Greg) L. Snow Ph.D. >> 538...@gmail.com >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctherneau%40mayo.edu%7Cebbda887f4b94223376608dae84a58ae%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638077703266213025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=wksTglkeQKmMSWWKHLsUijH5A25cH%2FuwxSNBOeNr9Sg%3D=0 >> PLEASE do read the posting >> guidehttps://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html=05%7C01%7Ctherneau%40mayo.edu%7Cebbda887f4b94223376608dae84a58ae%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638077703266213025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=9GASHCZVgnzNqAUrWimS7UphiqjYLOf50M1qWv6jeN0%3D=0 >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MCMC tools
Is there a convenient package that computes standard covergence summaries for and MCMC run? This is something that I likely knew once and have now forgotton. More detail: I'm trying to understand the MCMC done by a particular model called Subtype and Stage Inference (SuStaIn), suffice it to say that I am cautious of some details. The model doesn't fit into the standard packages, so I've set it up and run my own Metropolis chains. I don't want to expend energy also creating R-hat, ESS, and other sensible summaries; even more so to find the inevitable programming mistakes I'll make if I create them myself. For the truly curious. I have measurments of tau depostion for 86 brain regions (43 * left/right) of interest from 1925 scans. One hypothesis in dementia research is that tau deposition occurs over time, in a pattern; there is likely more than one pattern. The algorithm is looking a permutations of the 86 regions, in search of a small collection that best fits all the subjects. There is a general background of statistical work that has shown that ranking is a hard problem, in the sense of having a small variance for individual ranks. SuStaIn tends to give small variances. Terry T. -- Terry M Therneau, PhD Department of Quantitative Health Sciences Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extract from a list of lists
I not uncommonly have the following paradym fits <- lapply(argument, function) resulting in a list of function results. Often, the outer call is to mclapply, and the function encodes some long calculation, e.g. multiple chains in an MCMC. Assume for illustration that each function returns a list with elements beta, loglik, iter. Then sapply(fits, function(x) x$iter) will give me a vector, with the number of iterations used by each instance. I've often been suspicious that there is some simple shorthand for the "grab all the elements named iter" that skips the explicit x$iter function. Am I indeed overlooking something? I don't expect a speed increase, just cleaner code. Terry T. -- Terry M Therneau, PhD Department of Quantitative Health Sciences Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to find "first" or "last" record after sort in R
I prefer the duplicated() function, since the final code will be clear to a future reader. (Particularly when I am that future reader). last <- !duplicated(mydata$ID, fromLast=TRUE) # point to the last ID for each subject mydata$data3[last] <- NA Terry T. (I read the list once a day in digest form, so am always a late reply.) On 9/10/21 5:00 AM, r-help-requ...@r-project.org wrote: Hello List, Please look at the sample data frame below: ID date1 date2 date3 1 2015-10-08 2015-12-17 2015-07-23 2 2016-01-16 NA 2015-10-08 3 2016-08-01 NA 2017-01-10 3 2017-01-10 NA 2016-01-16 4 2016-01-19 2016-02-24 2016-08-01 5 2016-03-01 2016-03-10 2016-01-19 This data frame was sorted by ID and date1. I need to set the column date3 as missing for the "last" record for each ID. In the sample data set, the ID 1, 2, 4 and 5 has one row only, so they can be consider as first and last records. the data3 can be set as missing. But the ID 3 has 2 rows. Since I sorted the data by ID and date1, the ID=3 and date1=2017-01-10 should be the last record only. I need to set date3=NA for this row only. the question is, how can I identify the "last" record and set it as NA in date3 column. Thank you, Kai [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxph means not equal to means of model matrix
On 9/3/21 12:59 PM, Bond, Stephen wrote: > > I looked at the nocenter and it says (-1,0,1) values but it seems that any > three-level > factor is included in that (represented as 1,2,3 in R) . > A factor is turned into a set of 0/1 dummy variable, so the nocenter applies.� I will add more clarification to the documentation. > Also, is the baseline curve now showing the reference level and not the > fictional .428 > sex? If I predict the risk for a new row, should I multiply the coefficient > shown in the > output by 1 for a sex=1? It used to be (1-.428)*coef. > Yes, the "mean" component is the reference level for predict and survfit.� If I could go back in time it would be labeled as "reference" instead of "mean".�� Another opportunity for me to make the documentation clearer. Good questions, � Terry T > Thanks for clarifying. > > SB > > *From:* Therneau, Terry M., Ph.D. > *Sent:* Friday, 3 September, 2021 12:37 > *To:* Bond, Stephen > *Cc:* R-help > *Subject:* Re: coxph means not equal to means of model matrix > > [EXTERNAL] > > -- > > See ?coxph, in particular the new "nocenter" option. > > Basically, the "mean" component is used to center later computations.� This > can be > critical for continuous variables, avoiding overflow in the exp function, but > is not > necessary for 0/1 covariates.�� The fact that the default survival curve > would be for a > sex of .453, say, was off-putting to many. > > Terry T. > > On 9/3/21 11:01 AM, Bond, Stephen wrote: > > Hi, > > Please, help me understand what is happening with the means of a Cox > model? > > I have: > > R version 4.0.2 (2020-06-22) -- "Taking Off Again" > > Copyright (C) 2020 The R Foundation for Statistical Computing > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > getOption("contrasts") > > ��� unordered ordered > > "contr.treatment" "contr.poly" > > According to the help �coxph.object has a component holding the means of > the X > (model.matrix). This does not hold any more. > > ``` > > library(survival) > > test1 <- list(time=c(4,3,1,1,2,2,3), > > ���status=c(1,1,1,0,1,1,0), > > ���x=c(0,2,1,1,1,0,0), > > ���sex=factor(c(0,0,0,0,1,1,1))) > > m1 <- coxph(Surv(time, status) ~ x + sex, test1) > > m1$means > > ##��� x� sex1 > > ## 0.7142857 0.000 > > colMeans(model.matrix(m1)) > > ## x� sex1 > > ## 0.7142857 0.4285714 > > ``` > > Will new observations be scored using the zero mean from the object?? Is > this just a > reporting change where $means shows the reference level and no longer the > mean of > the model matrix?? > > Thanks everybody > > ATTENTION : This email originated outside your organization. Exercise caution > before > clicking links, opening attachments, or responding with personal information. > > -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxph means not equal to means of model matrix
See ?coxph, in particular the new "nocenter" option. Basically, the "mean" component is used to center later computations.� This can be critical for continuous variables, avoiding overflow in the exp function, but is not necessary for 0/1 covariates.�� The fact that the default survival curve would be for a sex of .453, say, was off-putting to many. Terry T. On 9/3/21 11:01 AM, Bond, Stephen wrote: > > Hi, > > Please, help me understand what is happening with the means of a Cox model? > > I have: > > R version 4.0.2 (2020-06-22) -- "Taking Off Again" > > Copyright (C) 2020 The R Foundation for Statistical Computing > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > getOption("contrasts") > > ��� unordered�� ordered > > "contr.treatment"� "contr.poly" > > According to the help �coxph.object has a component holding the means of the > X > (model.matrix). This does not hold any more. > > ``` > > library(survival) > > test1 <- list(time=c(4,3,1,1,2,2,3), > > ���status=c(1,1,1,0,1,1,0), > > ���x=c(0,2,1,1,1,0,0), > > ���sex=factor(c(0,0,0,0,1,1,1))) > > m1 <- coxph(Surv(time, status) ~ x + sex, test1) > > m1$means > > ##��� x� sex1 > > ## 0.7142857 0.000 > > colMeans(model.matrix(m1)) > > ## x� sex1 > > ## 0.7142857 0.4285714 > > ``` > > Will new observations be scored using the zero mean from the object?? Is this > just a > reporting change where $means shows the reference level and no longer the > mean of the > model matrix?? > > Thanks everybody > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] syvcoxph and cox.zph for testing the PH assumption
On 7/11/21 5:00 AM, r-help-requ...@r-project.org wrote: Hello, is it kosher to call cox.zph on a syvcoxph model fit? I see that someone proposed a modified version of cox.zph that uses resid(fit, 'schoenfeld', **weighted=TRUE**). https://stats.stackexchange.com/questions/265307/assessing-proportional-hazards-assumption-of-a-cox-model-with-caseweights Is that all it takes? Thanks, Youyi The cox.zph function does a formal score test. No, it does not account for robust variance. I hadn't considered that case, but will now think about it. It is quite easy to show that there is a problem: just give everyone a weight of 100. The stackexchange conversation was new to me. The solution there won't work with the current code, which does not make use of resid(). It has been updated to do the proper score test, the older version of cox.zph, which they modified, used an approximation. Terry T. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] inverse of the methods function
Is there a complement to the methods function, that will list all the defined methods for a class? One solution is to look directly at the NAMESPACE file, for the package that defines it, and parse out the entries. I was looking for something built-in, i.e., easier. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] evil attributes
I wrote: "I confess to being puzzled WHY the R core has decided on this definition..." After just a little more thought let me answer my own question. a. The as.vector() function is designed to strip off everything extraneous and leave just the core. (I have a mental image of Jack Webb saying "Just the facts ma'am"). I myself use it freqently in the test suite for survival, in cases where I'm checking the corrent numeric result and don't care about any attached names. b. is.vector(x) essentially answers the question "does x look like a result of as.vector?" Nevertheless I understand Roger's confusion. -- Terry M Therneau, PhD Department of Quantitative Health Sciences Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "exact" p-values
I am late to this discussion -- I read R-help as a once-a-day summary. A few comments. 1. In the gene-discovery subfield of statistics (SNP studies, etc.) there is a huge multiple-testing problem. In defense, the field thinks in terms of thresholds like 1e-5 or 1e-10 rather than the .05 or .01 most of us are used to. In that literature, they do care about 1e-16 vs 1e-20. We can all argue about whether that is a sensible approach or not, but it is what it is. I think that this is the context of the journal's request, i.e., they want the actual number, however you calculate it. My own opinion is that these rarified p-values are an arbitrary scale, one that no longer has a probability interpretation. For the central limit theorem to be correct that far from the mean requires a sample size that is beyond imagination (`number of atoms in the earth' order of size). Such a scale may still be useful, but it's not really a probability. 2. The label of "Fisher's exact test" has caused decades of confusion. In this context the word means "a particular test whose distribution can be completely enumerated": it does not mean either "correct" or "precise". The original enumeration methods had limitations with resspect to the sample size or the presence of complications such as tied values; from the discussion so far it would appear that the 'exact' argument of wilcox.test uses such a method. Cyrus Mehta did nice work on improved algorithms that do not have these restrictions, methods that have been refiined and expanded in the software offerings from Cytel among others. Perhaps someone could update R's code to use this, but see 3 below. My own opinion is that permutation tests are an important tool, one "wrench" in our statistical toolbox. But they are only one tool out of many. I am quite put off by arguments that purposefully conflate "exact" and "correct". 3. The concordance statistic C, the Wilcoxon test, and Somer's d are all the same statistic, just written a little differently. (Somer's d is essentially Kendalls' tau, but with a slightly different rule for ties). A test for C=.5 is the same as a Wilcoxon. For a binary response C = the area under the reciever operating curve (AUC). The concordance command in the surivival library computes this statistic for continuous, binary, or censored responses. The variance is based on a jackknife argument, and is computed by organizing the data into a binary tree structure, very similar to the methods used by Mehta, is efficient for large n and is valid for ties. Perhaps add a link in the wilcox.test help page? Footnote: AUC is a special case of C but not vice versa. People sometimes try to extend AUC to the other data types, but IMHO with only moderate success. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] temporary clipping
In one of my plot functions it is convenient to use clipping to restrict the range of some output. But at the end of the function I'd like to turn it off, i.e., so that a subsequent use of legend(), text() or whatever is not affected. I don't quite see how to do this -- it seems that the only way to turn off clip is with a new plot. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic thern...@mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "chi-square" | "chi-squared" | "chi squared" | "chi, square"
Martin, A fun question. Looking back at my oldest books, Feller (1950) used chi-square. Then I walked down the hall to our little statistics library and looked at Johnson and Kotz, "Continous Univariate Distributions", since each chapter therein has comments about the history of the distribution. a. They use 'chi-square' throughout their history section, tracing the distribution back to work in the 1800s. But, those earliest papers apparently didn't name their results as chi- whatever, so an "origin" story didn't pan out. b. They have 13 pages of references, and for fun I counted the occurence of variants. The majority of papers don't have the word in the title at all and the next most common is the Greek symbol. Here are the years of the others: chi-square: 73 43 65 80 86 73 82 73 69 69 78 64 64 86 65 86 82 82 76 82 88 81 74 77 87 86 93 69 60 88 88 80 77 41 59 79 31 chi-squared: 72 76 82 83 89 79 69 67 77 78 69 77 83 88 87 89 78 chi: 92 73 89 87 chi-squares: 77 83 chi-bar-square: 91 There doesn't look to be a trend over time. The 1922 Fisher reference uses the Greek symbol, by the way. Terry T [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conflicting results for a time-varying coefficient in a Cox model
This is an excellent question. The answer, in this particular case, mostly has to do with the outlier time values. (I've never been convinced that the death at time 999 isn't really a misplaced code for "missing", actually). If you change the knots used by the spline you can get quite different values. For instance, using a smaller data set: fit1 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno, veteran) zph1 <- cox.zph(fit1, transform='identity') plot(zph1[3]) dtime <- unique(veteran$time[veteran$status ==1]) # all of the death times veteran2 <- survSplit( Surv(time, status) ~ ., data=veteran, cut=dtime) fit2 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno + karno:ns(time, df=4), data=veteran2) tx <- 0:100 * 10 # x positions for plot ncall <- attr(terms(fit2), "predvars")[[6]] ty <- eval(ncall, data.frame(time = tx)) %*% coef(fit2)[4:7] + coef(fit2)[3] lines(tx, ty, col=2) - Now it looks even worse! The only difference is that the ns() function has chosen a different set of knots. The test used by the cox.zph function is based on a score test and is solid. The plot that it produces uses a smoothed approximation to the variance matrix and is approximate. So the diagnostic plot will never exactly match an actual fit. In this data set the outliers exacerbate the issue. To see this try a different time scale. zph2 <- cox.zph(fit1, transform= sqrt) plot(zph2[3]) veteran2$stime <- sqrt(veteran2$time) fit3 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno + karno:ns(stime, df=4), data=veteran2) ncall3 <-attr(terms(fit3), "predvars")[[6]] ty3 <- eval(ncall3, data.frame(stime= sqrt(tx))) %*% coef(fit3)[4:7] + coef(fit3)[3] lines(sqrt(tx), ty3, col=2) The right tail is now better behaved. Eliminating the points >900 makes things even better behaved. Terry T. On 8/8/19 9:07 AM, Ferenci Tamas wrote: > I was thinking of two possible ways to > plot a time-varying coefficient in a Cox model. > > One is simply to use survival::plot.cox.zph which directly produces a > beta(t) vs t diagram. > > The other is to transform the dataset to counting process format and > manually include an interaction with time, expanded with spline (to be > similar to plot.cox.zph). Plotting the coefficient produces the needed > beta(t) vs t diagram. > > I understand that they're slightly different approaches, so I don't > expect totally identical results, but nevertheless, they approximate > the very same thing, so I do expect that the results are more or less > similar. > > However: > > library( survival ) > library( splines ) > > data( veteran ) > > zp <- cox.zph( coxph(Surv(time, status) ~ trt + prior + karno, > data = veteran ), transform = "identity" )[ 3 ] > > veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, > data = veteran, cut = 1:max(veteran$time) ) > > fit <- coxph(Surv(tstart,time, status) ~ trt + prior + karno + > karno:ns( time, df = 4 ), data = veteran3 ) > cf <- coef( fit ) > nsvet <- ns( veteran3$time, df = 4 ) > > plot( zp ) > lines( 0:1000, ns( 0:1000, df = 4, knots = attr( nsvet, "knots" ), > Boundary.knots = attr( nsvet, "Boundary.knots" ) )%*%cf[ > grep( "karno:ns", names( cf ) ) ] + cf["karno"], > type = "l", col = "red" ) > > Where is the mistake? Something must be going on here, because the > plots are vastly different... > > Thank you very much in advance, > Tamas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trying to understand the magic of lm (Still trying)
John, The text below is cut out of a "how to write a package" course I gave at the R conference in Vanderbilt. I need to find a home for the course notes, because it had a lot of tidbits that are not well explained in the R documentation. Terry T. Model frames: One of the first tasks of any modeling routine is to construct a special data frame containing the covariates that will be used, via a call to the model.frame function. The code to do this is found in many routines, and can be a little opaque on first view. The obvious code would be \begin{verbatim} coxph <- function(formula, data, weights, subset, na.action, init, control, ties= c("efron", "breslow", "exact"), singular.ok =TRUE, robust=FALSE, model=FALSE, x=FALSE, y=TRUE, tt, method=ties, ...) { mf <- model.frame(formula, data, subset, weights, na.action) \end{verbatim} since those are the coxph arguments that are passed forward to the model.frame routine. However, this simple approach will fail with a ``not found'' error message if any of the data, subset, weights, etc. arguments are missing. Programs have to take the slightly more complicated approach of constructing a call. \begin{verbatim} Call <- match.call() indx <- match(c("formula", "data", "weights", "subset", "na.action"), names(Call), nomatch=0) if (indx[1] ==0) stop("A formula argument is required") temp <- Call[c(1,indx)] # only keep the arguments we wanted temp[[1]] <- as.name('model.frame') # change the function called mf <- eval(temp, parent.frame()) Y <- model.response(mf) etc. \end{verbatim} We start with a copy of the call to the program, which we want to save anyway as documentation in the output object. Then subscripting is used to extract only the portions of the call that we want, saving the result in a temporary. This is based on the fact that a call object can be viewed as a list whose first element is the name of the function to call, followed by the arguments to the call. Note the use of \code{nomatch=0}; if any arguments on the list are missing they will then be missing in \code{temp}, without generating an error message. The \mycode{temp} variable will contain a object of type ``call'', which is an unevaluated call to a routine. Finally, the name of the function to be called is changed from ``coxph'' to ``model.frame'' and the call is evaluated. In many of the core routines the result is stored in a variable ``m''. This is a horribly short and non-descriptive name. (The above used mf which isn't a much better.) Many routines also use ``m'' for the temporary variable leading to \code{m <- eval(m, parent.frame())}, but I think that is unnecessarily confusing. The list of names in the match call will include all arguments that should be evaluated within context of the named dataframe. This can include more than the list above, the survfit routine for instance has an optional argument ``id'' that names an identifying variable (several rows of the data may represent a single subject), and this is included along with ``formula'' etc in the list of choices in the match function. The order of names in the list makes no difference. The id is later retrieved with \code{model.extract(m, 'id')}, which will be NULL if the argument was not supplied. At the time that coxph was written I had not caught on to this fact and thought that all variables that came from a data frame had to be represented in the formula somehow, thus the use of \code{cluster(id)} as part of the formula, in order to denote a grouping variable. On 5/11/19 5:00 AM, r-help-requ...@r-project.org wrote: > A number of people have helped me in my mission to understand how lm (and > other fucntions) are able to pass a dataframe and then refer to a specific > column in the dataframe. I thank everyone who has responded. I now know a bit > about deparse(substitute(xx)), but I still don't fully understand how it > works. The program below attempts to print a column of a dataframe from a > function whose parameters include the dataframe (df) and the column requested > (col). The program works fine until the last print statement were I receive > an error, Error in `[.data.frame`(df, , col) : object 'y' not found . I hope > someone can explain to me (1) why my code does not work, and (2) what I can > do to fix it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding a hex sticker to a package
I've created a hex sticker for survival. How should that be added to the package directory? It's temporarily in man/figures on the github page. Terry T. (Actually, the idea was from Ryan Lennon. I liked it, and we found someone with actual graphical skills to execute it. ) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.