[R] scaling of nonbinROC penalties - accurate classification with random data?

2013-01-24 Thread Jonathan Williams



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scaling of nonbinROC penalties - accurate classification with random data?

2013-01-24 Thread Jonathan Williams
 of the penalty matrix.

So, I would like to ask, ought there to be some constraint on the values of the 
penalty matrix? For example, (a) should the penalty matrix always contain at 
least one penalty with a value of 1 and/or (b) should there be any other 
constraint on the sum of penalties in the matrix (e.g. should the matrix sum to 
some multiple of the number of categories), or (c) is one free to use 
arbitrarily-scaled penalty matrices?

I apologise if I am wasting your by making an obvious mistake. I am a 
clinician, not a statistician. So, I do not understand the mathematics. 

Thanks, in advance, for your help,

Jonathan Williams 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scaling of nonbinROC penalties

2013-01-18 Thread Jonathan Williams
 
multiple of the number of categories), or (c) is one free to use 
arbitrarily-scaled penalty matrices for estimates of the accuracy of an ordinal 
gold standard?

Thanks, in advance, for your help,

Jonathan Williams


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2012-02-23 Thread Jonathan Williams
Dear Helpers,

I wrote a simple function to standardise variables if they contain more than 
one value. If the elements of the variable are all identical, then I want the 
function to return zero.

When I submit variables whose elements are all identical to the function, it 
returns not zero, but NaNs.

zt=function(x){if (length(table(x)1)) y=(x-mean(x))/sd(x) else if 
(length(table(x)==1)) y=0; return(y)}

zt(c(1:10))
#[1] -1.4863011 -1.1560120 -0.8257228 -0.4954337 -0.1651446  0.1651446  
0.4954337  0.8257228  1.1560120  1.4863011

zt(rep(1,10))
#[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Would you be so kind as to point out what I am doing wrong, here? How can I 
obtain zeros from my function, instead of NaNs? (I obtain NaNs also if I set 
the function to zt=function(x){if (length(table(x)1)) y=(x-mean(x))/sd(x) else 
if (length(table(x)==1)) y=rep(0, length(x)); return(y)} ).

Thanks, in advance, for your help,

Jonathan Williams

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] FW: NaN from function

2012-02-23 Thread Jonathan Williams

Dear Helpers,

I wrote a simple function to standardise variables if they contain more than 
one value. If the elements of the variable are all identical, then I want the 
function to return zero.

When I submit variables whose elements are all identical to the function, it 
returns not zero, but NaNs.

zt=function(x){if (length(table(x)1)) y=(x-mean(x))/sd(x) else if 
(length(table(x)==1)) y=0; return(y)}

zt(c(1:10))
#[1] -1.4863011 -1.1560120 -0.8257228 -0.4954337 -0.1651446  0.1651446  
0.4954337  0.8257228  1.1560120  1.4863011

zt(rep(1,10))
#[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Would you be so kind as to point out what I am doing wrong, here? How can I 
obtain zeros from my function, instead of NaNs? (I obtain NaNs also if I set 
the function to zt=function(x){if (length(table(x)1)) y=(x-mean(x))/sd(x) else 
if (length(table(x)==1)) y=rep(0, length(x)); return(y)} ).

Thanks, in advance, for your help,

Jonathan Williams

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] repository of earlier Windows versions of R packages

2010-11-15 Thread Jonathan Williams

Dear Helpers,
I was trying to find a repository of earlier Windows versions of R packages. 
However, while I can find the Archives for Linux versions (in the Old Sources 
section of each package's Downloads) , I cannot find one for Windows versions. 
Does such a repository exist? If so, where can I find it?ThanksJonathan Williams
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nested factors different with/out brackets - is this a design feature?

2010-05-06 Thread Jonathan Williams
Dear R Helpers,

I came across discrepancies in estimates for nested factors in linear models 
computed with brackets and also when a crossing factor is defined as a factor 
or as a dummy (0/1) vector. These are probably designed features that I do not 
understand. I wonder if someone would be so kind as to explain them to me.

First, if I do not bracket the nested pair of factors A/C, then (1) I obtain 
coefficients for all 3 levels (0/1/2) of the crossing factor B (compare models 
m0 and m1, below). Moreover, the values of some corresponding coefficients in 
the 2 models are not the same. So, in m1 B0:A1:D1 = -0.13112 and in m2, A1:C1 
(the corresponding term in a model with no B0 coefficients) = -0.13112. But, 
the coefficients for B1:A1:C1 in m1 and m2 are 0.08909 and 0.22021. Why do they 
differ, here? Also, (2) if I bracket the nested pair of factors (A/C), then I 
obtain the B:C interaction - which I did not expect. I thought that if I 
bracket (A/C) - then this would constrain the model to generate only A, A:B and 
A:B:C as in m1 (compare models m1 and m2).

Second, if I use restrict the levels of the crossing factor B to be 0 or 1 (via 
subset=B!=2) and compare this model with a dummy vector b with identical 0/1 
values, then I again obtain different outputs - compare m3 and m4, below. 
Again, there is no simple correspondence between m3 and m4, even though 
all(dat$B==dat$b) returns TRUE.

So, I do not understand what is happening here, either when comparing m1 with 
m2 or when comparing m3 with m4.

With many thanks, in anticipation of your help in explaining this,

Jonathan Williams

Here is a simple sample code to generate the discrepancies:-

set.seed(1)
A=factor(rep(c(1:5),600))
B=factor(rep(c(0:2),each=1000))
b=as.numeric(as.character(B))
C=factor(rbinom(3000,1,0.5))
set.seed(2)
y=rnorm(3000)
dat=data.frame(A,B,b,C,y)
m0=lm(y~B*A); summary(m0)
m1=lm(y~B*A/C,dat); summary(m1)
m2=lm(y~B*(A/C),dat); summary(m2)

m3=lm(y~B*A/C,dat,subset=B!=2); summary(m3)
m4=lm(y~b*A/C,dat,subset=B!=2); summary(m4)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subtracting 100 from strptime year vector generates missing values in POSIXct where none appear to exist in strptime year vector

2010-02-23 Thread Jonathan Williams
Thanks Don MacQueen for this reply to my initial query - please SEE MY REPLIES 
TO THESE IDEAS AND FURTHER INFORMATION BELOW

From: Don MacQueen [m...@llnl.gov]
Sent: 23 February 2010 21:25
To: Jonathan Williams; r-help@r-project.org

Subject: Re: [R] Problem with strptime generating missing values where none 
appear to exist

What happens if you do all that NA checking on dob  *before* subtracting 100 
from dob$year?

What happens if you use difftime() before subtracting the 100?

Do you get any NAs if you convert dob to POSIXct?

(these are just investigative ideas, obviously)

-Don

==

What happens if you use difftime() before subtracting the 100?
Good thought - if I use difftime before subtracting 100 from dob$year, then 
there are no missing values!

But, it is not at all obvious to me why this should be so. Here are dob$years 
for the dates that go through  OK after subtracting 100:-

 table(dob$year[!is.na(difftime(sdate,dob))])
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 42 
43 46 48
 2 12 18 20 24 32 40 52 44 16 30 20 40 62 41 46 60 33 15 16 28 21 23 16 16  4  
4  4  4

and now here are the values for the dates that generate missing values:-

 table(dob$year[is.na(difftime(sdate,dob))])
17 23 25 27 28 29 30 31 38 39 40
 7  4 13  8  6  9  4  4  4  7  3

I see no obvious differences that could account for the generation of missing 
values - all of the years that generate missing values are represented in those 
that don't!

Converting dob and sdate to POSIXct does not make any difference to the basic 
problem:-

dob[is.na(difftime(as.POSIXct(sdate),as.POSIXct(dob)))]
 [1] 1927-04-03 1927-04-03 1927-04-03 1927-04-03 1925-04-11 
1925-04-11 1925-04-11 1925-04-11 1925-04-11
[10] 1939-04-03 1939-04-03 1939-04-03 1940-12-30 1940-12-30 
1940-12-30 1917-10-14 1917-10-14 1917-10-14
[19] 1917-10-14 1925-04-16 1925-04-16 1925-04-16 1925-04-16 
1927-04-05 1927-04-05 1927-04-05 1927-04-05
[28] 1939-04-08 1939-04-08 1939-04-08 1939-04-08 1938-10-24 
1938-10-24 1938-10-24 1938-10-24 1930-10-16
[37] 1930-10-16 1930-10-16 1930-10-16 1923-04-17 1923-04-17 
1923-04-17 1923-04-17 1929-04-17 1929-04-17
[46] 1929-04-17 1929-04-17 1929-04-17 1925-04-11 1925-04-11 
1925-04-11 1925-04-11 1931-04-02 1931-04-02
[55] 1931-04-02 1931-04-02 1929-04-18 1929-04-18 1929-04-18 
1929-04-18 1917-10-22 1917-10-22 1917-10-22
[64] 1928-03-28 1928-03-28 1928-03-28 1928-04-09 1928-04-09 
1928-04-09

One good thing, though - the missing values (however they arise) are at least 
apparent in as.POSIXct(dob), where they are invisible in 
strptime(as.character(BDT),'%d-%b-%y'):-
 strptime(as.character(BDT),'%d-%b-%y')
  [1] 2022-07-14 2022-07-14 2022-07-14 2022-07-14 2021-03-23 
2021-03-23 2021-03-23 2027-08-27 2027-08-27
 [10] 2027-08-27 2027-08-27 2040-04-05 2040-04-05 2040-04-05 
2040-04-05 2023-12-15 2023-12-15 2023-12-15
 [19] 2023-12-15 2017-08-19 2017-08-19 2017-08-19 2017-08-19 
2017-08-31 2017-08-31 2017-08-31 2017-08-31
 [28] 2031-05-12 2031-05-12 2031-05-12 2031-05-12 2031-05-07 
2031-05-07 2031-05-07 2031-05-07 2026-12-31
 [37] 2026-12-31 2026-12-31 2026-12-31 2037-08-20 2037-08-20 
2037-08-20 2037-08-20 2033-12-08 2033-12-08
 [46] 2033-12-08 2033-12-08 2038-07-17 2038-07-17 2038-07-17 
2038-07-17 2020-10-09 2020-10-09 2020-10-09
 [55] 2020-10-09 2025-04-29 2025-04-29 2025-04-29 2025-04-29 
2024-07-03 2024-07-03 2024-07-03 2024-07-03
 [64] 2030-09-21 2030-09-21 2030-09-21 2030-09-21 2023-08-03 
2023-08-03 2023-08-03 2023-08-03 2024-05-10
 [73] 2024-05-10 2024-05-10 2024-05-10 2038-05-31 2038-05-31 
2038-05-31 2038-05-31 2028-08-23 2028-08-23
 [82] 2028-08-23 2028-08-23 2031-11-19 2031-11-19 2022-12-12 
2022-12-12 2022-12-12 2022-12-12 2023-09-14
 [91] 2023-09-14 2023-09-14 2023-09-14 2021-01-12 2021-01-12 
2021-01-12 2021-01-12 2021-01-12 2018-11-04
[100] 2018-11-04 2018-11-04 2029-08-19 2029-08-19 2029-08-19 
2029-08-19 2027-04-03 2027-04-03 2027-04-03
[109] 2027-04-03 2021-03-27 2021-03-27 2021-03-27 2021-03-27 
2021-03-27 2030-07-04 2030-07-04 2030-07-04
[118] 2030-07-04 2030-07-04 2023-06-08 2023-06-08 2023-06-08 
2023-06-08 2029-05-02 2029-05-02 2029-05-02
[127] 2029-05-02 2029-05-02 2023-12-20 2023-12-20 2023-12-20 
2023-12-20 2037-05-25 2037-05-25 2037-05-25
[136] 2037-05-25 2037-05-25 2025-04-11 2025-04-11 2025-04-11 
2025-04-11 2025-04-11 2032-08-12 2032-08-12
[145] 2032-08-12 2032-08-12 2024-08-16 2024-08-16 2024-08-16 
2024-08-16 2043-09-17 2043-09-17 2043-09-17
[154] 2043-09-17 2028-09-12 2028-09-12 2028-09-12 2028-09-12 
2036-08-18 2036-08-18 2036-08-18 2036-08-18
[163] 2018-07-16 2018-07-16 2032-11-10 2032-11-10 2032-11-10 
2032-11-10 2032-05-18 2032-05-18 2032-05-18
[172] 2032-05-18 2023-05-08 2023-05-08 2023-05-08 2023-05-08 
2020-11-02 2020-11-02 2020-11-02 2020-11-02
[181] 2031-12-03 2031-12-03 2031-12-03 2031-12-03 2030-06-13 
2030-06-13 2030-06-13 2030-06-13 2019-06-16
[190] 2019-06-16 2019-06-16 2019-06

[R] Mixed Latin, Greek and subscript characters in axis label

2009-06-05 Thread Jonathan Williams
Dear R-helpers,

I have been trying to figure out how to plot a graph with an axis label
consisting of a mixture of Latin, Greek and subscript characters.
Specifically, I need to write A[beta]{1-42}, where A is Latin script A,
[beta] is Greek lower case beta and {1-42} is subscript '1-42'.
I can use xlab=expression(beta[1-42]) and obtain the [beta]{1-42} part of
the label. But, I can't add the preceding Latin character A to this
expression.
I have tried xlab=expression(A,beta[1-42]), which simply prints 'A'. I have
tried xlab=paste('A',expression(beta[1-42])), but this prints A
(beta[1-42]).
Anything else that I try returns an error (e.g. xlab=expression(A
beta[1]42]) returns 'Error: unexpected symbol').
So, I would be very grateful if someone can tell me how to write my label.

Thanks,

Jonathan Williams

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] is there a way generate correlated binomial data in R?

2009-04-27 Thread Jonathan Williams
Dear R Helpers,

Is there a way to generate multivariate correlated binomial data in R, similar 
to how the rmvbin procedure in package bindata can generate multivariate 
correlated binary data?

Thanks for your help,

Jonathan Williams

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] AICs from lmer different with summary and anova

2009-04-15 Thread Jonathan Williams
Dear R Helpers,

I have noticed that when I use lmer to analyse data, the summary function
gives different values for the AIC, BIC and log-likelihood compared with the
anova function.

Here is a sample program

#make some data
set.seed(1);
datx=data.frame(array(runif(720),c(240,3),dimnames=list(NULL,c('x1','x2','y'

id=rep(1:120,2); datx=cbind(id,datx)

#give x1 a slight relation with y (only necessary to make the random effects
non-zero in this artificial example)
datx$x1=(datx$y*0.1)+datx$x1

library(lme4)

#fit the data
fit0=lmer(y~x1+x2+(1|id), data=datx); print(summary(fit0),corr=F)
fit1=lmer(y~x1+x2+(1+x1|id), data=datx); print(summary(fit1),corr=F)

#compare the models
anova(fit0,fit1)


Now, look at the output, below. You can see that the AIC from
print(summary(fit0)) is 87.34, but the AIC for fit0 in anova(fit0,fit1)
is 73.965. There are similar changes for the values of BIC and logLik.

Am I doing something wrong, here? If not, which are the real AIC and logLik
values for the different models?

Thanks for your help,

Jonathan Williams


Output:-

 fit0=lmer(y~x1+x2+(1|id), data=datx); print(summary(fit0),corr=F)
Linear mixed model fit by REML 
Formula: y ~ x1 + x2 + (1 | id) 
   Data: datx 
   AIC   BIC logLik deviance REMLdev
 87.34 104.7 -38.6763.96   77.34
Random effects:
 Groups   NameVariance Std.Dev.
 id   (Intercept) 0.016314 0.12773 
 Residual 0.062786 0.25057 
Number of obs: 240, groups: id, 120

Fixed effects:
Estimate Std. Error t value
(Intercept)  0.503760.05219   9.652
x1   0.089790.06614   1.358
x2  -0.066500.06056  -1.098
 fit1=lmer(y~x1+x2+(1+x1|id), data=datx); print(summary(fit1),corr=F)
Linear mixed model fit by REML 
Formula: y ~ x1 + x2 + (1 + x1 | id) 
   Data: datx 
   AIC   BIC logLik deviance REMLdev
 90.56 114.9 -38.2863.18   76.56
Random effects:
 Groups   NameVariance  Std.Dev. Corr  
 id   (Intercept) 0.0076708 0.087583   
  x1  0.0056777 0.075351 1.000 
 Residual 0.0618464 0.248689   
Number of obs: 240, groups: id, 120

Fixed effects:
Estimate Std. Error t value
(Intercept)  0.500780.05092   9.835
x1   0.092360.06612   1.397
x2  -0.065150.06044  -1.078
 anova(fit0,fit1)
Data: datx
Models:
fit0: y ~ x1 + x2 + (1 | id)
fit1: y ~ x1 + x2 + (1 + x1 | id)
 Df AIC BIC  logLik  Chisq Chi Df Pr(Chisq)
fit0  5  73.965  91.368 -31.982 
fit1  7  77.181 101.545 -31.590 0.7839  2 0.6757

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer overdispersion

2009-04-12 Thread Jonathan Williams
I got a similar problem when I used family=quasibinomial with my data. But, the 
problem disappeared when I used family=binomial. I assumed that Douglas Bates 
et al. had amended the lmer program to detect over-dispersion, so that it is no 
longer necessary to specify its possible presence with family=quasi... But, I 
may be wrong. If you get more information about this from the great man, then 
would you please let me know?

Thanks,

Jonathan Williams

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] why does regexpr not work with '.'

2008-04-15 Thread Jonathan Williams
Dear R Helpers,

I am running R 2.6.2 on a Windows XP machine.

I am trying to use regexpr to locate full stops in strings, but, without
success.

Here an example:-

f=a,[EMAIL PROTECTED]: #define an arbitrary test string
regexpr(',',f) #find the occurrences of ',' in f - should be one at location
2
   # and this is what regexpr finds
#[1] 2
#attr(,match.length)
#[1] 1

regexpr('@',f) #find occurrences of '@' in f - should be one at location 6
   # and this is what regexpr finds
#[1] 6
#attr(,match.length)
#[1] 1

regexpr('.',f) #find the occurrences '.' in f - should be one at location 4
   # but regexpr gives 1 at location 1
#[1] 1
#attr(,match.length)
#[1] 1

Sorry if I am missing something obvious. I'd be very grateful if someone
would
please show me how to use regexpr to locate '.' in my string!

Thanks,

Jonathan Williams

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.