[R] X11 font

2023-08-16 Thread Therneau, Terry M., Ph.D. via R-help
I get the following error out of R, on a newer Ubuntu installation.

Error in `axis()`:
! X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 12 could 
not be loaded
Backtrace:
  1. graphics::matplot(...)
  3. graphics::plot.default(...)
  4. graphics (local) localAxis(...)
  6. graphics:::Axis.default(...)
  7. graphics::axis(side = side, at = at, labels = labels, ...)

The context is pretty simple:

library(survival)
matplot(60:100, 36525* survexp.us[61:101, 1:2, "2014"], col=2:1, lty=1, lwd=2,
     xlab="Age", ylab="Death rate per 100", log='y', type='l', yaxt='n',
     main="US Death Rates")
axis(2, c(1,2,5,10,20, 50), c(1,2,5,10, 20, 50), las=2)

This code works fine on my screen.   The error comes up when I put it into an 
.Rmd file 
and apply rmarkdown::render() to it.

Likely some font file is needed, but what one?

Terry

Version:
R Under development (unstable) (2023-08-01 r84818) -- "Unsuffered Consequences"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: aarch64-unknown-linux-gnu

-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract from a list of lists

2022-12-27 Thread Therneau, Terry M., Ph.D. via R-help
Thanks everyone for prodding my gray matter, which seems to be getting stiffer 
as I 
approach 70 (< 90 days).

  --  I'll continue to use the $ or [[ forms.   That will suffice.

--  I thought there might be a base R variant, e.g. something like  extract( 
list, 
element-name); probably cross talk in my brain from the rstan library

-- Gregg's note shows such a function in purr.  But I rather like having as few 
dependencies as possible in a package, one usage is normally not enough, at 
least for 
something this simple.

Terry T.

On 12/27/22 14:38, Bert Gunter wrote:
> Well, I prefer Greg's approach, but if you want to avoid calls to $ or
> `[[` then you could do:
>
> unlist(fits)[ rep(names(fits[[1]]) == 'iter', length(fits))]
>
>
> Cheers,
> Bert
>
> On Tue, Dec 27, 2022 at 9:46 AM Greg Snow<538...@gmail.com>  wrote:
>> Another option is the map family of functions in the purrr package
>> (yes, this depends on another package being loaded, which may affect
>> things if you are including this in your own package, creating a
>> dependency).
>>
>> In map and friends, if the "function" is a string or integer, then it
>> is taken as the piece to be extracted, so you should be able to do
>> something like:
>>
>> library(purrr)
>> map(fits, 'iter')
>> # or
>> map_int(fits, 'iter')
>> # or
>> map_dbl(fits, 'iter')
>>
>> which of the last 2 to use depends on how `iter` is stored.
>>
>> On Tue, Dec 27, 2022 at 10:16 AM Therneau, Terry M., Ph.D. via R-help
>>   wrote:
>>> I not uncommonly have the following paradym
>>>  fits <- lapply(argument, function)
>>>
>>> resulting in a list of function results.   Often, the outer call is to 
>>> mclapply, and the
>>> function encodes some long calculation, e.g. multiple chains in an MCMC.
>>> Assume for illustration that each function returns a list with elements   
>>> beta, loglik,  iter.
>>>
>>> Then  sapply(fits,  function(x) x$iter)
>>> will give me a vector, with the number of iterations used by each instance.
>>>
>>> I've often been suspicious that there is some simple shorthand for the 
>>> "grab all the
>>> elements named iter" that skips the explicit x$iter function.   Am I indeed 
>>> overlooking
>>> something?I don't expect a speed increase, just cleaner code.
>>>
>>> Terry T.
>>>
>>> --
>>> Terry M Therneau, PhD
>>> Department of Quantitative Health Sciences
>>> Mayo Clinic
>>> thern...@mayo.edu
>>>
>>> "TERR-ree THUR-noh"
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctherneau%40mayo.edu%7Cebbda887f4b94223376608dae84a58ae%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638077703266213025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=wksTglkeQKmMSWWKHLsUijH5A25cH%2FuwxSNBOeNr9Sg%3D=0
>>> PLEASE do read the posting 
>>> guidehttps://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html=05%7C01%7Ctherneau%40mayo.edu%7Cebbda887f4b94223376608dae84a58ae%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638077703266213025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=9GASHCZVgnzNqAUrWimS7UphiqjYLOf50M1qWv6jeN0%3D=0
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> 538...@gmail.com
>>
>> __
>> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and more, see
>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctherneau%40mayo.edu%7Cebbda887f4b94223376608dae84a58ae%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638077703266213025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=wksTglkeQKmMSWWKHLsUijH5A25cH%2FuwxSNBOeNr9Sg%3D=0
>> PLEASE do read the posting 
>> guidehttps://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html=05%7C01%7Ctherneau%40mayo.edu%7Cebbda887f4b94223376608dae84a58ae%7Ca25fff9c3f634fb29a8ad9bdd0321f9a%7C0%7C0%7C638077703266213025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=9GASHCZVgnzNqAUrWimS7UphiqjYLOf50M1qWv6jeN0%3D=0
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MCMC tools

2022-12-27 Thread Therneau, Terry M., Ph.D. via R-help
Is there a convenient package that computes standard covergence summaries for 
and MCMC 
run?    This is something that I likely knew once and have now forgotton.

  More detail:  I'm trying to understand the MCMC done by a particular model 
called 
Subtype and Stage Inference (SuStaIn), suffice it to say that I am cautious of 
some 
details.    The model doesn't fit into the standard packages, so I've set it up 
and run my 
own Metropolis chains.   I don't want to expend energy also creating R-hat, 
ESS, and other 
sensible summaries; even more so to find the inevitable programming mistakes 
I'll make if 
I create them myself.

  For the truly curious.   I have measurments of tau depostion for 86 brain 
regions (43 * 
left/right) of interest from 1925 scans. One hypothesis in dementia research is 
that tau 
deposition occurs over time, in a pattern;  there is likely more than one 
pattern.    The 
algorithm is looking a permutations of the 86 regions, in search of a small 
collection 
that best fits all the subjects.    There is a general background of 
statistical work that 
has shown that ranking is a hard problem, in the sense of having a small 
variance for 
individual ranks.   SuStaIn tends to give small variances.

Terry T.

-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extract from a list of lists

2022-12-27 Thread Therneau, Terry M., Ph.D. via R-help
I not uncommonly have the following paradym
    fits <- lapply(argument, function)

resulting in a list of function results.   Often, the outer call is to 
mclapply, and the 
function encodes some long calculation, e.g. multiple chains in an MCMC.
Assume for illustration that each function returns a list with elements   beta, 
loglik,  iter.

Then  sapply(fits,  function(x) x$iter)
will give me a vector, with the number of iterations used by each instance.

I've often been suspicious that there is some simple shorthand for the "grab 
all the 
elements named iter" that skips the explicit x$iter function.   Am I indeed 
overlooking 
something?    I don't expect a speed increase, just cleaner code.

Terry T.

-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to find "first" or "last" record after sort in R

2021-09-10 Thread Therneau, Terry M., Ph.D. via R-help
I prefer the duplicated() function, since the final code will be clear to a future reader. 
 (Particularly when I am that future reader).


last <- !duplicated(mydata$ID, fromLast=TRUE)  # point to the last ID for each 
subject
mydata$data3[last] <- NA

Terry T.

(I read the list once a day in digest form, so am always a late reply.)

On 9/10/21 5:00 AM, r-help-requ...@r-project.org wrote:

Hello List,
Please look at the sample data frame below:

ID         date1              date2             date3
1    2015-10-08    2015-12-17    2015-07-23

2    2016-01-16    NA                 2015-10-08
3    2016-08-01    NA                 2017-01-10
3    2017-01-10    NA                 2016-01-16
4    2016-01-19    2016-02-24   2016-08-01
5    2016-03-01    2016-03-10   2016-01-19
This data frame was sorted by ID and date1. I need to set the column date3 as missing for 
the "last" record for each ID. In the sample data set, the ID 1, 2, 4 and 5 has 
one row only, so they can be consider as first and last records. the data3 can be set as 
missing. But the ID 3 has 2 rows. Since I sorted the data by ID and date1, the ID=3 and 
date1=2017-01-10 should be the last record only. I need to set date3=NA for this row only.

the question is, how can I identify the "last" record and set it as NA in date3 
column.
Thank you,
Kai
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph means not equal to means of model matrix

2021-09-03 Thread Therneau, Terry M., Ph.D. via R-help


On 9/3/21 12:59 PM, Bond, Stephen wrote:
>
> I looked at the nocenter and it says (-1,0,1) values but it seems that any 
> three-level 
> factor is included in that (represented as 1,2,3 in R) .
>
A factor is turned into a set of 0/1 dummy variable, so the nocenter applies.� 
I will add 
more clarification to the documentation.

> Also, is the baseline curve now showing the reference level and not the 
> fictional .428 
> sex? If I predict the risk for a new row, should I multiply the coefficient 
> shown in the 
> output by 1 for a sex=1? It used to be (1-.428)*coef.
>
Yes, the "mean" component is the reference level for predict and survfit.� If I 
could go 
back in time it would be labeled as "reference" instead of "mean".�� Another 
opportunity 
for me to make the documentation clearer.

Good questions,
 � Terry T

> Thanks for clarifying.
>
> SB
>
> *From:* Therneau, Terry M., Ph.D. 
> *Sent:* Friday, 3 September, 2021 12:37
> *To:* Bond, Stephen 
> *Cc:* R-help 
> *Subject:* Re: coxph means not equal to means of model matrix
>
> [EXTERNAL]
>
> --
>
> See ?coxph, in particular the new "nocenter" option.
>
> Basically, the "mean" component is used to center later computations.� This 
> can be 
> critical for continuous variables, avoiding overflow in the exp function, but 
> is not 
> necessary for 0/1 covariates.�� The fact that the default survival curve 
> would be for a 
> sex of .453, say, was off-putting to many.
>
> Terry T.
>
> On 9/3/21 11:01 AM, Bond, Stephen wrote:
>
> Hi,
>
> Please, help me understand what is happening with the means of a Cox 
> model?
>
> I have:
>
> R version 4.0.2 (2020-06-22) -- "Taking Off Again"
>
> Copyright (C) 2020 The R Foundation for Statistical Computing
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> getOption("contrasts")
>
> ��� unordered ordered
>
> "contr.treatment" "contr.poly"
>
> According to the help �coxph.object has a component holding the means of 
> the X
> (model.matrix). This does not hold any more.
>
> ```
>
> library(survival)
>
> test1 <- list(time=c(4,3,1,1,2,2,3),
>
> ���status=c(1,1,1,0,1,1,0),
>
> ���x=c(0,2,1,1,1,0,0),
>
> ���sex=factor(c(0,0,0,0,1,1,1)))
>
> m1 <- coxph(Surv(time, status) ~ x + sex, test1)
>
> m1$means
>
> ##��� x� sex1
>
> ## 0.7142857 0.000
>
> colMeans(model.matrix(m1))
>
> ## x� sex1
>
> ## 0.7142857 0.4285714
>
> ```
>
> Will new observations be scored using the zero mean from the object?? Is 
> this just a
> reporting change where $means shows the reference level and no longer the 
> mean of
> the model matrix??
>
> Thanks everybody
>
> ATTENTION : This email originated outside your organization. Exercise caution 
> before 
> clicking links, opening attachments, or responding with personal information.
>
> --


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph means not equal to means of model matrix

2021-09-03 Thread Therneau, Terry M., Ph.D. via R-help
See ?coxph, in particular the new "nocenter" option.

Basically, the "mean" component is used to center later computations.� This can 
be 
critical for continuous variables, avoiding overflow in the exp function, but 
is not 
necessary for 0/1 covariates.�� The fact that the default survival curve would 
be for a 
sex of .453, say, was off-putting to many.

Terry T.


On 9/3/21 11:01 AM, Bond, Stephen wrote:
>
> Hi,
>
> Please, help me understand what is happening with the means of a Cox model?
>
> I have:
>
> R version 4.0.2 (2020-06-22) -- "Taking Off Again"
>
> Copyright (C) 2020 The R Foundation for Statistical Computing
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> getOption("contrasts")
>
> ��� unordered�� ordered
>
> "contr.treatment"� "contr.poly"
>
> According to the help �coxph.object has a component holding the means of the 
> X 
> (model.matrix). This does not hold any more.
>
> ```
>
> library(survival)
>
> test1 <- list(time=c(4,3,1,1,2,2,3),
>
> ���status=c(1,1,1,0,1,1,0),
>
> ���x=c(0,2,1,1,1,0,0),
>
> ���sex=factor(c(0,0,0,0,1,1,1)))
>
> m1 <- coxph(Surv(time, status) ~ x + sex, test1)
>
> m1$means
>
> ##��� x� sex1
>
> ## 0.7142857 0.000
>
> colMeans(model.matrix(m1))
>
> ## x� sex1
>
> ## 0.7142857 0.4285714
>
> ```
>
> Will new observations be scored using the zero mean from the object?? Is this 
> just a 
> reporting change where $means shows the reference level and no longer the 
> mean of the 
> model matrix??
>
> Thanks everybody
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] syvcoxph and cox.zph for testing the PH assumption

2021-07-12 Thread Therneau, Terry M., Ph.D. via R-help




On 7/11/21 5:00 AM, r-help-requ...@r-project.org wrote:

Hello, is it kosher to call cox.zph on a syvcoxph model fit? I see that
someone proposed a modified version of cox.zph that uses resid(fit,
'schoenfeld', **weighted=TRUE**).

https://stats.stackexchange.com/questions/265307/assessing-proportional-hazards-assumption-of-a-cox-model-with-caseweights
Is that all it takes?
Thanks,
Youyi


The cox.zph function does a formal score test.  No, it does not account for robust 
variance.  I hadn't considered that case, but will now think about it.  It is quite easy 
to show that there is a problem: just give everyone a weight of 100.


The stackexchange conversation was new to me.  The solution there won't work with the 
current code, which does not make use of resid().  It has been updated to do the proper 
score test, the older version of cox.zph, which they modified, used an approximation.


Terry T.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] inverse of the methods function

2021-05-03 Thread Therneau, Terry M., Ph.D. via R-help
Is there a complement to the methods function, that will list all the defined 
methods for 
a class?    One solution is to look directly at the NAMESPACE file, for the 
package that 
defines it, and parse out the entries.   I was looking for something built-in, 
i.e., easier.


-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] evil attributes

2021-04-11 Thread Therneau, Terry M., Ph.D. via R-help
I wrote: "I confess to being puzzled WHY the R core has decided on this 
definition..."
After just a little more thought let me answer my own question.

a. The as.vector() function is designed to strip off everything extraneous and 
leave just 
the core.   (I have a mental image of Jack Webb saying "Just the facts ma'am"). 
  I myself 
use it freqently in the test suite for survival, in cases where I'm checking 
the corrent 
numeric result and don't care about any attached names.

  b. is.vector(x) essentially answers the question "does x look like a result 
of as.vector?"

Nevertheless I understand Roger's confusion.

-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "exact" p-values

2021-03-20 Thread Therneau, Terry M., Ph.D. via R-help
I am late to this discussion -- I read R-help as a once-a-day summary.  A few 
comments.

1. In the gene-discovery subfield of statistics (SNP studies, etc.)  there is a 
huge 
multiple-testing problem.  In defense, the field thinks in terms of thresholds 
like 1e-5 
or 1e-10 rather than the .05 or .01 most of us are used to.   In that 
literature, they do 
care about  1e-16 vs 1e-20.    We can all argue about whether that is a 
sensible approach 
or not, but it is what it is.  I think that this is the context of the 
journal's request, 
i.e., they want the actual number, however you calculate it.

My own opinion is that these rarified p-values are an arbitrary scale, one that 
no longer 
has a probability interpretation.   For the central limit theorem to be correct 
that far 
from the mean requires a sample size that is beyond imagination  (`number of 
atoms in the 
earth' order of size).   Such a scale may still be useful, but it's not really 
a probability.

2. The label of "Fisher's exact test" has caused decades of confusion.  In this 
context 
the word means "a particular test whose distribution can be completely 
enumerated": it 
does not mean either "correct" or "precise".  The original enumeration methods 
had 
limitations with resspect to the sample size or the presence of complications 
such as tied 
values;  from the discussion so far it would appear that the 'exact' argument 
of 
wilcox.test uses such a method.   Cyrus Mehta did nice work on improved 
algorithms that do 
not have these restrictions, methods that have been refiined and expanded in 
the software 
offerings from Cytel among others. Perhaps someone could update R's code to use 
this, but 
see 3 below.

My own opinion is that permutation tests are an important tool, one "wrench" in 
our 
statistical toolbox.   But they are only one tool out of many.  I am quite put 
off by 
arguments that purposefully conflate "exact" and "correct".

3. The concordance statistic C, the Wilcoxon test, and Somer's d are all the 
same 
statistic, just written a little differently. (Somer's d is essentially 
Kendalls' tau, but 
with a slightly different rule for ties).  A test for C=.5 is the same as a 
Wilcoxon.  For 
a binary response C = the area under the reciever operating curve (AUC).   The 
concordance 
command in the surivival library computes this statistic for continuous, 
binary, or 
censored responses.    The variance is based on a jackknife argument, and is 
computed by 
organizing the data into a binary tree structure, very similar to the methods 
used by 
Mehta, is efficient for large n and is valid for ties.   Perhaps add a link in 
the  
wilcox.test help page?

Footnote: AUC is a special case of C but not vice versa.  People sometimes try 
to extend 
AUC to the other data types, but IMHO with only moderate success.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] temporary clipping

2020-10-26 Thread Therneau, Terry M., Ph.D. via R-help
In one of my plot functions it is convenient to use clipping to restrict the 
range of some 
output.
But at the end of the function I'd like to turn it off, i.e., so that a 
subsequent use of 
legend(), text() or whatever is not affected.
I don't quite see how to do this -- it seems that the only way to turn off clip 
is with a 
new plot.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "chi-square" | "chi-squared" | "chi squared" | "chi, square"

2019-10-19 Thread Therneau, Terry M., Ph.D. via R-help
Martin,
   A fun question.

Looking back at my oldest books, Feller (1950) used chi-square.
Then I walked down the hall to our little statistics library and looked at 
Johnson and 
Kotz, "Continous Univariate Distributions", since each chapter therein has 
comments about 
the history of the distribution.

  a.  They use 'chi-square' throughout their history section, tracing the 
distribution 
back to work in the 1800s.  But, those earliest papers apparently didn't name 
their 
results as chi- whatever, so an "origin" story didn't pan out.

  b. They have 13 pages of references, and for fun I counted the occurence of 
variants.  
The majority of papers don't have the word in the title at all and the next 
most common is 
the Greek symbol. Here are the years of the others:

chi-square:   73 43 65 80 86 73 82 73 69 69 78 64 64 86 65 86 82 82 76 82 88 81 
74 77 87 
86 93 69 60 88 88 80 77 41 59 79 31
chi-squared: 72 76 82 83 89 79 69 67 77 78 69 77 83 88 87 89 78
chi:  92 73 89 87
chi-squares: 77 83
chi-bar-square: 91

There doesn't look to be a trend over time.  The 1922 Fisher reference uses the 
Greek 
symbol, by the way.

Terry T


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conflicting results for a time-varying coefficient in a Cox model

2019-08-08 Thread Therneau, Terry M., Ph.D. via R-help
This is an excellent question.
The answer, in this particular case, mostly has to do with the outlier time 
values.  (I've 
never been convinced that the death at time 999 isn't really a misplaced code 
for 
"missing", actually).    If you change the knots used by the spline you can get 
quite 
different values.
For instance, using a smaller data set:

fit1 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno, veteran)
zph1 <- cox.zph(fit1, transform='identity')
plot(zph1[3])

dtime <- unique(veteran$time[veteran$status ==1])    # all of the death times
veteran2 <- survSplit( Surv(time, status) ~ ., data=veteran, cut=dtime)
fit2 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno +
     karno:ns(time, df=4),  data=veteran2)
tx <- 0:100 * 10    # x positions for plot
ncall <- attr(terms(fit2), "predvars")[[6]]
ty <- eval(ncall, data.frame(time = tx)) %*% coef(fit2)[4:7] + coef(fit2)[3]
lines(tx, ty, col=2)

-

Now it looks even worse!  The only difference is that the ns() function has 
chosen a 
different set of knots.

The test used by the cox.zph function is based on a score test and is solid.  
The plot 
that it produces uses a smoothed approximation to the variance matrix and is 
approximate.  
So the diagnostic plot will never exactly match an actual fit.   In this data 
set the 
outliers exacerbate the issue.  To see this try a different time scale.


zph2 <- cox.zph(fit1, transform= sqrt)
plot(zph2[3])
veteran2$stime <- sqrt(veteran2$time)
fit3 <- coxph(Surv(tstart, time, status) ~ trt + prior + karno +
    karno:ns(stime, df=4),  data=veteran2)

ncall3 <-attr(terms(fit3), "predvars")[[6]]
ty3 <- eval(ncall3, data.frame(stime= sqrt(tx))) %*% coef(fit3)[4:7] + 
coef(fit3)[3]
lines(sqrt(tx), ty3, col=2)



The right tail is now better behaved.   Eliminating the points >900 makes 
things even 
better behaved.

Terry T.




On 8/8/19 9:07 AM, Ferenci Tamas wrote:
> I was thinking of two possible ways to
> plot a time-varying coefficient in a Cox model.
>
> One is simply to use survival::plot.cox.zph which directly produces a
> beta(t) vs t diagram.
>
> The other is to transform the dataset to counting process format and
> manually include an interaction with time, expanded with spline (to be
> similar to plot.cox.zph). Plotting the coefficient produces the needed
> beta(t) vs t diagram.
>
> I understand that they're slightly different approaches, so I don't
> expect totally identical results, but nevertheless, they approximate
> the very same thing, so I do expect that the results are more or less
> similar.
>
> However:
>
> library( survival )
> library( splines )
>
> data( veteran )
>
> zp <- cox.zph( coxph(Surv(time, status) ~ trt + prior + karno,
>   data = veteran ), transform = "identity" )[ 3 ]
>
> veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
> data = veteran, cut = 1:max(veteran$time) )
>
> fit <- coxph(Surv(tstart,time, status) ~ trt + prior + karno +
> karno:ns( time, df = 4 ), data = veteran3 )
> cf <- coef( fit )
> nsvet <- ns( veteran3$time, df = 4 )
>
> plot( zp )
> lines( 0:1000, ns( 0:1000, df = 4, knots = attr( nsvet, "knots" ),
> Boundary.knots = attr( nsvet, "Boundary.knots" ) )%*%cf[
>   grep( "karno:ns", names( cf ) ) ] + cf["karno"],
> type = "l", col = "red" )
>
> Where is the mistake? Something must be going on here, because the
> plots are vastly different...
>
> Thank you very much in advance,
> Tamas


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trying to understand the magic of lm (Still trying)

2019-05-13 Thread Therneau, Terry M., Ph.D. via R-help
John,

  The text below is cut out of a "how to write a package" course I gave at the 
R 
conference in Vanderbilt.   I need to find a home for the course notes, because 
it had a 
lot of tidbits that are not well explained in the R documentation.
Terry T.



Model frames:
One of the first tasks of any modeling routine is to construct a special data 
frame 
containing the covariates that will be used, via a call to the model.frame 
function. The 
code to do this is found in many routines, and can be a little opaque on first 
view. The 
obvious code would be
\begin{verbatim}
coxph <- function(formula, data, weights, subset, na.action,
     init, control, ties= c("efron", "breslow", "exact"),
     singular.ok =TRUE, robust=FALSE,
     model=FALSE, x=FALSE, y=TRUE,  tt, method=ties, ...) {

  mf <- model.frame(formula, data, subset, weights, na.action)
\end{verbatim}
since those are the coxph arguments that are passed forward to the model.frame 
routine.  
However, this simple approach will fail with a ``not found'' error message if 
any of the 
data, subset, weights, etc. arguments are missing. Programs have to take the 
slightly more 
complicated approach of constructing a call.
\begin{verbatim}
Call <- match.call()
indx <- match(c("formula", "data", "weights", "subset", "na.action"),
   names(Call), nomatch=0)
if (indx[1] ==0) stop("A formula argument is required")
temp <- Call[c(1,indx)]  # only keep the arguments we wanted
temp[[1]] <- as.name('model.frame')  # change the function called
mf <- eval(temp, parent.frame())

Y <- model.response(mf)
etc.
\end{verbatim}

We start with a copy of the call to the program, which we want to save anyway 
as 
documentation in the output object. Then subscripting is used to extract only 
the portions 
of the call that we want, saving the result in a temporary. This is based on 
the fact that 
a call object can be viewed as a list whose first element is the name of the 
function to 
call, followed by the arguments to the call. Note the use of \code{nomatch=0}; 
if any 
arguments on the list are missing they will then be missing in \code{temp}, 
without 
generating an error message. The \mycode{temp} variable will contain a object 
of type 
``call'', which is an unevaluated call to a routine.  Finally, the name of the 
function to 
be called is changed from ``coxph'' to ``model.frame'' and the call is 
evaluated.  In many 
of the core routines the result is stored in a variable ``m''.  This is a 
horribly short 
and non-descriptive name. (The above used mf which isn't a much better.)  Many 
routines 
also use ``m'' for the temporary variable leading to \code{m <- eval(m, 
parent.frame())}, 
but I think that is unnecessarily confusing.

The list of names in the match call will include all arguments that should be 
evaluated 
within context of the named dataframe. This can include more than the list 
above, the 
survfit routine for instance has an optional argument ``id'' that names an 
identifying 
variable (several rows of the data may represent a single subject), and this is 
included 
along with ``formula'' etc in the list of choices in the match function.  The 
order of 
names in the list makes no difference.  The id is later retrieved with 
\code{model.extract(m, 'id')}, which will be NULL if the argument was not 
supplied. At the 
time that coxph was written I had not caught on to this fact and thought that 
all 
variables that came from a data frame had to be represented in the formula 
somehow, thus 
the use of \code{cluster(id)} as part of the formula, in order to denote a 
grouping variable.

On 5/11/19 5:00 AM, r-help-requ...@r-project.org wrote:
> A number of people have helped me in my mission to understand how lm (and 
> other fucntions) are able to pass a dataframe and then refer to a specific 
> column in the dataframe. I thank everyone who has responded. I now know a bit 
> about deparse(substitute(xx)), but I still don't fully understand how it 
> works. The program below attempts to print a column of a dataframe from a 
> function whose parameters include the dataframe (df) and the column requested 
> (col). The program works fine until the last print statement were I receive 
> an error,  Error in `[.data.frame`(df, , col) : object 'y' not found . I hope 
> someone can explain to me (1) why my code does not work, and (2) what I can 
> do to fix it.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding a hex sticker to a package

2019-01-21 Thread Therneau, Terry M., Ph.D. via R-help
I've created a hex sticker for survival.  How should that be added to the 
package 
directory?   It's temporarily in man/figures on the github page.

Terry T.

(Actually, the idea was from Ryan Lennon. I liked it, and we found someone with 
actual 
graphical skills to execute it. )

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.