[R] scaling of nonbinROC penalties - accurate classification with random data?
[[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scaling of nonbinROC penalties - accurate classification with random data?
of the penalty matrix. So, I would like to ask, ought there to be some constraint on the values of the penalty matrix? For example, (a) should the penalty matrix always contain at least one penalty with a value of 1 and/or (b) should there be any other constraint on the sum of penalties in the matrix (e.g. should the matrix sum to some multiple of the number of categories), or (c) is one free to use arbitrarily-scaled penalty matrices? I apologise if I am wasting your by making an obvious mistake. I am a clinician, not a statistician. So, I do not understand the mathematics. Thanks, in advance, for your help, Jonathan Williams __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scaling of nonbinROC penalties
multiple of the number of categories), or (c) is one free to use arbitrarily-scaled penalty matrices for estimates of the accuracy of an ordinal gold standard? Thanks, in advance, for your help, Jonathan Williams [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Dear Helpers, I wrote a simple function to standardise variables if they contain more than one value. If the elements of the variable are all identical, then I want the function to return zero. When I submit variables whose elements are all identical to the function, it returns not zero, but NaNs. zt=function(x){if (length(table(x)1)) y=(x-mean(x))/sd(x) else if (length(table(x)==1)) y=0; return(y)} zt(c(1:10)) #[1] -1.4863011 -1.1560120 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.1560120 1.4863011 zt(rep(1,10)) #[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Would you be so kind as to point out what I am doing wrong, here? How can I obtain zeros from my function, instead of NaNs? (I obtain NaNs also if I set the function to zt=function(x){if (length(table(x)1)) y=(x-mean(x))/sd(x) else if (length(table(x)==1)) y=rep(0, length(x)); return(y)} ). Thanks, in advance, for your help, Jonathan Williams __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: NaN from function
Dear Helpers, I wrote a simple function to standardise variables if they contain more than one value. If the elements of the variable are all identical, then I want the function to return zero. When I submit variables whose elements are all identical to the function, it returns not zero, but NaNs. zt=function(x){if (length(table(x)1)) y=(x-mean(x))/sd(x) else if (length(table(x)==1)) y=0; return(y)} zt(c(1:10)) #[1] -1.4863011 -1.1560120 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.1560120 1.4863011 zt(rep(1,10)) #[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Would you be so kind as to point out what I am doing wrong, here? How can I obtain zeros from my function, instead of NaNs? (I obtain NaNs also if I set the function to zt=function(x){if (length(table(x)1)) y=(x-mean(x))/sd(x) else if (length(table(x)==1)) y=rep(0, length(x)); return(y)} ). Thanks, in advance, for your help, Jonathan Williams [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] repository of earlier Windows versions of R packages
Dear Helpers, I was trying to find a repository of earlier Windows versions of R packages. However, while I can find the Archives for Linux versions (in the Old Sources section of each package's Downloads) , I cannot find one for Windows versions. Does such a repository exist? If so, where can I find it?ThanksJonathan Williams [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nested factors different with/out brackets - is this a design feature?
Dear R Helpers, I came across discrepancies in estimates for nested factors in linear models computed with brackets and also when a crossing factor is defined as a factor or as a dummy (0/1) vector. These are probably designed features that I do not understand. I wonder if someone would be so kind as to explain them to me. First, if I do not bracket the nested pair of factors A/C, then (1) I obtain coefficients for all 3 levels (0/1/2) of the crossing factor B (compare models m0 and m1, below). Moreover, the values of some corresponding coefficients in the 2 models are not the same. So, in m1 B0:A1:D1 = -0.13112 and in m2, A1:C1 (the corresponding term in a model with no B0 coefficients) = -0.13112. But, the coefficients for B1:A1:C1 in m1 and m2 are 0.08909 and 0.22021. Why do they differ, here? Also, (2) if I bracket the nested pair of factors (A/C), then I obtain the B:C interaction - which I did not expect. I thought that if I bracket (A/C) - then this would constrain the model to generate only A, A:B and A:B:C as in m1 (compare models m1 and m2). Second, if I use restrict the levels of the crossing factor B to be 0 or 1 (via subset=B!=2) and compare this model with a dummy vector b with identical 0/1 values, then I again obtain different outputs - compare m3 and m4, below. Again, there is no simple correspondence between m3 and m4, even though all(dat$B==dat$b) returns TRUE. So, I do not understand what is happening here, either when comparing m1 with m2 or when comparing m3 with m4. With many thanks, in anticipation of your help in explaining this, Jonathan Williams Here is a simple sample code to generate the discrepancies:- set.seed(1) A=factor(rep(c(1:5),600)) B=factor(rep(c(0:2),each=1000)) b=as.numeric(as.character(B)) C=factor(rbinom(3000,1,0.5)) set.seed(2) y=rnorm(3000) dat=data.frame(A,B,b,C,y) m0=lm(y~B*A); summary(m0) m1=lm(y~B*A/C,dat); summary(m1) m2=lm(y~B*(A/C),dat); summary(m2) m3=lm(y~B*A/C,dat,subset=B!=2); summary(m3) m4=lm(y~b*A/C,dat,subset=B!=2); summary(m4) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subtracting 100 from strptime year vector generates missing values in POSIXct where none appear to exist in strptime year vector
Thanks Don MacQueen for this reply to my initial query - please SEE MY REPLIES TO THESE IDEAS AND FURTHER INFORMATION BELOW From: Don MacQueen [m...@llnl.gov] Sent: 23 February 2010 21:25 To: Jonathan Williams; r-help@r-project.org Subject: Re: [R] Problem with strptime generating missing values where none appear to exist What happens if you do all that NA checking on dob *before* subtracting 100 from dob$year? What happens if you use difftime() before subtracting the 100? Do you get any NAs if you convert dob to POSIXct? (these are just investigative ideas, obviously) -Don == What happens if you use difftime() before subtracting the 100? Good thought - if I use difftime before subtracting 100 from dob$year, then there are no missing values! But, it is not at all obvious to me why this should be so. Here are dob$years for the dates that go through OK after subtracting 100:- table(dob$year[!is.na(difftime(sdate,dob))]) 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 42 43 46 48 2 12 18 20 24 32 40 52 44 16 30 20 40 62 41 46 60 33 15 16 28 21 23 16 16 4 4 4 4 and now here are the values for the dates that generate missing values:- table(dob$year[is.na(difftime(sdate,dob))]) 17 23 25 27 28 29 30 31 38 39 40 7 4 13 8 6 9 4 4 4 7 3 I see no obvious differences that could account for the generation of missing values - all of the years that generate missing values are represented in those that don't! Converting dob and sdate to POSIXct does not make any difference to the basic problem:- dob[is.na(difftime(as.POSIXct(sdate),as.POSIXct(dob)))] [1] 1927-04-03 1927-04-03 1927-04-03 1927-04-03 1925-04-11 1925-04-11 1925-04-11 1925-04-11 1925-04-11 [10] 1939-04-03 1939-04-03 1939-04-03 1940-12-30 1940-12-30 1940-12-30 1917-10-14 1917-10-14 1917-10-14 [19] 1917-10-14 1925-04-16 1925-04-16 1925-04-16 1925-04-16 1927-04-05 1927-04-05 1927-04-05 1927-04-05 [28] 1939-04-08 1939-04-08 1939-04-08 1939-04-08 1938-10-24 1938-10-24 1938-10-24 1938-10-24 1930-10-16 [37] 1930-10-16 1930-10-16 1930-10-16 1923-04-17 1923-04-17 1923-04-17 1923-04-17 1929-04-17 1929-04-17 [46] 1929-04-17 1929-04-17 1929-04-17 1925-04-11 1925-04-11 1925-04-11 1925-04-11 1931-04-02 1931-04-02 [55] 1931-04-02 1931-04-02 1929-04-18 1929-04-18 1929-04-18 1929-04-18 1917-10-22 1917-10-22 1917-10-22 [64] 1928-03-28 1928-03-28 1928-03-28 1928-04-09 1928-04-09 1928-04-09 One good thing, though - the missing values (however they arise) are at least apparent in as.POSIXct(dob), where they are invisible in strptime(as.character(BDT),'%d-%b-%y'):- strptime(as.character(BDT),'%d-%b-%y') [1] 2022-07-14 2022-07-14 2022-07-14 2022-07-14 2021-03-23 2021-03-23 2021-03-23 2027-08-27 2027-08-27 [10] 2027-08-27 2027-08-27 2040-04-05 2040-04-05 2040-04-05 2040-04-05 2023-12-15 2023-12-15 2023-12-15 [19] 2023-12-15 2017-08-19 2017-08-19 2017-08-19 2017-08-19 2017-08-31 2017-08-31 2017-08-31 2017-08-31 [28] 2031-05-12 2031-05-12 2031-05-12 2031-05-12 2031-05-07 2031-05-07 2031-05-07 2031-05-07 2026-12-31 [37] 2026-12-31 2026-12-31 2026-12-31 2037-08-20 2037-08-20 2037-08-20 2037-08-20 2033-12-08 2033-12-08 [46] 2033-12-08 2033-12-08 2038-07-17 2038-07-17 2038-07-17 2038-07-17 2020-10-09 2020-10-09 2020-10-09 [55] 2020-10-09 2025-04-29 2025-04-29 2025-04-29 2025-04-29 2024-07-03 2024-07-03 2024-07-03 2024-07-03 [64] 2030-09-21 2030-09-21 2030-09-21 2030-09-21 2023-08-03 2023-08-03 2023-08-03 2023-08-03 2024-05-10 [73] 2024-05-10 2024-05-10 2024-05-10 2038-05-31 2038-05-31 2038-05-31 2038-05-31 2028-08-23 2028-08-23 [82] 2028-08-23 2028-08-23 2031-11-19 2031-11-19 2022-12-12 2022-12-12 2022-12-12 2022-12-12 2023-09-14 [91] 2023-09-14 2023-09-14 2023-09-14 2021-01-12 2021-01-12 2021-01-12 2021-01-12 2021-01-12 2018-11-04 [100] 2018-11-04 2018-11-04 2029-08-19 2029-08-19 2029-08-19 2029-08-19 2027-04-03 2027-04-03 2027-04-03 [109] 2027-04-03 2021-03-27 2021-03-27 2021-03-27 2021-03-27 2021-03-27 2030-07-04 2030-07-04 2030-07-04 [118] 2030-07-04 2030-07-04 2023-06-08 2023-06-08 2023-06-08 2023-06-08 2029-05-02 2029-05-02 2029-05-02 [127] 2029-05-02 2029-05-02 2023-12-20 2023-12-20 2023-12-20 2023-12-20 2037-05-25 2037-05-25 2037-05-25 [136] 2037-05-25 2037-05-25 2025-04-11 2025-04-11 2025-04-11 2025-04-11 2025-04-11 2032-08-12 2032-08-12 [145] 2032-08-12 2032-08-12 2024-08-16 2024-08-16 2024-08-16 2024-08-16 2043-09-17 2043-09-17 2043-09-17 [154] 2043-09-17 2028-09-12 2028-09-12 2028-09-12 2028-09-12 2036-08-18 2036-08-18 2036-08-18 2036-08-18 [163] 2018-07-16 2018-07-16 2032-11-10 2032-11-10 2032-11-10 2032-11-10 2032-05-18 2032-05-18 2032-05-18 [172] 2032-05-18 2023-05-08 2023-05-08 2023-05-08 2023-05-08 2020-11-02 2020-11-02 2020-11-02 2020-11-02 [181] 2031-12-03 2031-12-03 2031-12-03 2031-12-03 2030-06-13 2030-06-13 2030-06-13 2030-06-13 2019-06-16 [190] 2019-06-16 2019-06-16 2019-06
[R] Mixed Latin, Greek and subscript characters in axis label
Dear R-helpers, I have been trying to figure out how to plot a graph with an axis label consisting of a mixture of Latin, Greek and subscript characters. Specifically, I need to write A[beta]{1-42}, where A is Latin script A, [beta] is Greek lower case beta and {1-42} is subscript '1-42'. I can use xlab=expression(beta[1-42]) and obtain the [beta]{1-42} part of the label. But, I can't add the preceding Latin character A to this expression. I have tried xlab=expression(A,beta[1-42]), which simply prints 'A'. I have tried xlab=paste('A',expression(beta[1-42])), but this prints A (beta[1-42]). Anything else that I try returns an error (e.g. xlab=expression(A beta[1]42]) returns 'Error: unexpected symbol'). So, I would be very grateful if someone can tell me how to write my label. Thanks, Jonathan Williams __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] is there a way generate correlated binomial data in R?
Dear R Helpers, Is there a way to generate multivariate correlated binomial data in R, similar to how the rmvbin procedure in package bindata can generate multivariate correlated binary data? Thanks for your help, Jonathan Williams __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AICs from lmer different with summary and anova
Dear R Helpers, I have noticed that when I use lmer to analyse data, the summary function gives different values for the AIC, BIC and log-likelihood compared with the anova function. Here is a sample program #make some data set.seed(1); datx=data.frame(array(runif(720),c(240,3),dimnames=list(NULL,c('x1','x2','y' id=rep(1:120,2); datx=cbind(id,datx) #give x1 a slight relation with y (only necessary to make the random effects non-zero in this artificial example) datx$x1=(datx$y*0.1)+datx$x1 library(lme4) #fit the data fit0=lmer(y~x1+x2+(1|id), data=datx); print(summary(fit0),corr=F) fit1=lmer(y~x1+x2+(1+x1|id), data=datx); print(summary(fit1),corr=F) #compare the models anova(fit0,fit1) Now, look at the output, below. You can see that the AIC from print(summary(fit0)) is 87.34, but the AIC for fit0 in anova(fit0,fit1) is 73.965. There are similar changes for the values of BIC and logLik. Am I doing something wrong, here? If not, which are the real AIC and logLik values for the different models? Thanks for your help, Jonathan Williams Output:- fit0=lmer(y~x1+x2+(1|id), data=datx); print(summary(fit0),corr=F) Linear mixed model fit by REML Formula: y ~ x1 + x2 + (1 | id) Data: datx AIC BIC logLik deviance REMLdev 87.34 104.7 -38.6763.96 77.34 Random effects: Groups NameVariance Std.Dev. id (Intercept) 0.016314 0.12773 Residual 0.062786 0.25057 Number of obs: 240, groups: id, 120 Fixed effects: Estimate Std. Error t value (Intercept) 0.503760.05219 9.652 x1 0.089790.06614 1.358 x2 -0.066500.06056 -1.098 fit1=lmer(y~x1+x2+(1+x1|id), data=datx); print(summary(fit1),corr=F) Linear mixed model fit by REML Formula: y ~ x1 + x2 + (1 + x1 | id) Data: datx AIC BIC logLik deviance REMLdev 90.56 114.9 -38.2863.18 76.56 Random effects: Groups NameVariance Std.Dev. Corr id (Intercept) 0.0076708 0.087583 x1 0.0056777 0.075351 1.000 Residual 0.0618464 0.248689 Number of obs: 240, groups: id, 120 Fixed effects: Estimate Std. Error t value (Intercept) 0.500780.05092 9.835 x1 0.092360.06612 1.397 x2 -0.065150.06044 -1.078 anova(fit0,fit1) Data: datx Models: fit0: y ~ x1 + x2 + (1 | id) fit1: y ~ x1 + x2 + (1 + x1 | id) Df AIC BIC logLik Chisq Chi Df Pr(Chisq) fit0 5 73.965 91.368 -31.982 fit1 7 77.181 101.545 -31.590 0.7839 2 0.6757 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmer overdispersion
I got a similar problem when I used family=quasibinomial with my data. But, the problem disappeared when I used family=binomial. I assumed that Douglas Bates et al. had amended the lmer program to detect over-dispersion, so that it is no longer necessary to specify its possible presence with family=quasi... But, I may be wrong. If you get more information about this from the great man, then would you please let me know? Thanks, Jonathan Williams __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] why does regexpr not work with '.'
Dear R Helpers, I am running R 2.6.2 on a Windows XP machine. I am trying to use regexpr to locate full stops in strings, but, without success. Here an example:- f=a,[EMAIL PROTECTED]: #define an arbitrary test string regexpr(',',f) #find the occurrences of ',' in f - should be one at location 2 # and this is what regexpr finds #[1] 2 #attr(,match.length) #[1] 1 regexpr('@',f) #find occurrences of '@' in f - should be one at location 6 # and this is what regexpr finds #[1] 6 #attr(,match.length) #[1] 1 regexpr('.',f) #find the occurrences '.' in f - should be one at location 4 # but regexpr gives 1 at location 1 #[1] 1 #attr(,match.length) #[1] 1 Sorry if I am missing something obvious. I'd be very grateful if someone would please show me how to use regexpr to locate '.' in my string! Thanks, Jonathan Williams __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.