[R] Question about logistic regression with ordered factor variable using the rms package (prev.Design)

2012-09-11 Thread Tom Willems
Dear R users,
Hopefully someone can help me, 
Maybe I just misunderstand the function in the package?
I am working with a logistic regression model.
Until now I always worked with the basic glm function, where for the model 
was:
¡§ glm( disease ~  test.value + cnct , family=binomial(link=¡¦logit¡¦) ¡¨.

This works fine when test .value and concentration (cnct) are continuous 
vairables.
However, concentration is in fact a grouping variable over 5 experiments 
with 5 concentrations
( 25, 50, 100, 200  400).

Therefore I believe concentration to be an ordered factor ( in model : 
cnct_o).
To make this model I used the ¡§rms¡¨ library (previously known as Design) 
and functions lrm (or Glm).
The lrm (or Glm) returns the odds for disease, the ¡§inv.logit (odds) ¡¨ 
gives the probability of disease, but I have to do this with the Predict 
function of the ¡§rms¡¨ package.

#
The resulting model (with lrm or Glm) would be :
CoefS.E.Wald Z  Pr(|Z|)
Intercept   23.800  0.8891  2.680.0074
test.value  20.806  0.3409  6.100.0001
cnct_o  -0.1127 0.0268  -4.21   0.0001
cnct_o=100  77.393  17.542  4.410.0001
cnct_o=200  204.291 45.080  4.530.0001
cnct_o=400  427.829 98.180  4.360.0001
#
The results of the standard glm function  are very different :

#
Standard glm
Deviance Residuals: 
Min   1Q   Median   3Q  Max 
-2.7361  -0.2750   0.2177   0.5143   1.6897 

Coefficients:
Estimate Std. Error z value Pr(|z|) 
(Intercept)  -0.9022 0.3370  -2.677 0.007427 ** 
test.value2.0806 0.3409   6.103 1.04e-09 ***
cnct_o.L  1.4363 0.4722   3.042 0.002352 ** 
cnct_o.Q  1.2208 0.4934   2.474 0.013359 * 
cnct_o.C -2.0649 0.5610  -3.681 0.000232 ***
cnct_o^4  0.5599 0.4760   1.176 0.239485 
---
Signif. codes:  0 ¡¥***¡¦ 0.001 ¡¥**¡¦ 0.01 ¡¥*¡¦ 0.05 ¡¥.¡¦ 0.1 ¡¥ ¡¦ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 252.32  on 220  degrees of freedom
Residual deviance: 167.68  on 215  degrees of freedom
AIC: 179.68

Number of Fisher Scoring iterations: 6
#

As I use the model parameters of the standard glm model to calculate the 
odds and probability manualy, I believe cnct_o = 25 to be the reference 
category and that cnct_o.L = 50, cnct_o.Q=100, cnct_o.C = 200 and 
cnct_o^4= 400. But I am not sure of this. The formulas used are :
Odds -  intercept + slope * test.value + cnct_o  , where cnct_o is the 
corresponding value for the given concentration.
Probability - inv.logit ( Odds ),  the function inv.logit from package 
¡§car¡¨.
The results of the glm are in the table below, which are first the odds 
and then the probability¡¦s ( inv.logit (odds)).

#
glm oddscntc: 
test.value  25  cnct_o.Lcnct_o.Qcnct_o.C 
cnct_o^4 
0   -0.902180.5341690.318649-2.96706 
-0.34231
0.5 0.1381321.5744771.358957-1.92675 
0.698001
1   1.17844 2.6147852.399265-0.88644 
1.738309
1.5 2.2187473.6550933.4395720.153864 
2.778616
2   3.2590554.6954  4.47988 1.194172 
3.818924
#
glm prob.   cntc: 
test.value  25  cnct_o.Lcnct_o.Qcnct_o.C 
cnct_o^4 
0   0.2886040.6304550.5789950.048936 
0.415249
0.5 0.5344780.8284210.79559 0.127111 
0.667744
1   0.7646670.9318070.9167710.291844 
0.850472
1.5 0.90192 0.9747930.9689190.53839 
0.941509
2   0.9629970.9909460.9887920.767486 
0.97852
#

If I compare this with the result of the Predict function in rms, the 
results seem very different, it can be because I misinterpret the glm 
model parameters for the ordered factor. How can I be sure which model 
parameter corresponds to which factor in the standard glm.

#
Results of lrm:
Predict.lrm cntc: 
test.value  25  50  100 200 400
0   -0,43815-3,25628-1,153230,264036 
0,072751
0,50580154  0,614227-2,2039 -0,100851,316414 
1,125129
1,01160308  1,06-1,151530,9515262,368793 
2,177508
1,5050682,693316-0,124821,9782363,395503 
3,204218
2,01086954  3,7456950,9275633,0306154,447882 
4,256597
#
Prob. (inv.logit(odds)) cntc: 
test.value  25  50  100 200 400
0   0,3921820,0371020,2398990,565628 
0,51818
0,50580154  0,6489040,0994  0,4748080,788585 
0,754939
1,01160308  0,8411230,24021 0,7214220,914416 
0,898211
1,505068

[R] Logit regression, I observed different results for glm or lrm (Design) for ordered factor variables

2012-09-06 Thread Tom Willems
Dear useR's,

I was comparing results for a logistic regression model between different 
library's.

themodel formula is arranged as follows: 

response ~  (intercept) +  value +  group

OR:
glm( response ~  (intercept) +  value +  group , 
family=binomial(link='logit'))
lrm( response ~  (intercept) +  value +  group )
ROC( from = response ~  (intercept) +  value +  group , plot='ROC')

the response is a binary vaiable, 
the independent predictor 'value' is a continuous variable,
and the grouping factor is a ordered factor (with 5 levels 
(25,50,100,200,400))

When I compare the GLM model with the ROC model  and the LRM model setting 
 'group'  as factor variable,
 the resulting coefficients are similar to eachother.

When I set 'group' as an ordered factor variable (as it should be) the GLM 
model with the ROC model coefficients are still comparable,
but the LRM coefficients are completely different.

I have looked up the Design package, and there is a function 'cr.setup',
which sets up an ordinal logistic response, this is however not the case 
here.
the model hase a binary response  (0 or 1), a continuos predicter and a 
ordered grouping factor.

Does anybody know what I am doing wrong ?

Thanks for you time,
Tom






Disclaimer: click here 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-sig-ME] lmer() - no applicable method for 'profile' under R version 2.15.1

2012-07-24 Thread Tom Willems
Hi all,

I was working with the MEMSS  mle4 library's under R version 2.15.1.
apparently some practical functions of do not work under R 2.15.1.

After searching the archives i found a mail thread on this subject,
stating that these problems were partialy solved for R 2.12.0 but only 
for lmer() not for glmer().

Is someone aware of an update available of these library's ?
Or should I install R 2.12.0 and lme4a.

Kind regards,
Tom.


ref:

List:   r-sig-mixed-models
Subject:Re: [R-sig-ME] lmer() - no applicable method for 'profile'
From:   Ben Bolker bbolker () gmail ! com
Date:   2011-01-06 15:03:52
Message-ID: 4D25D9D8.6030402 () gmail ! com
[Download message RAW]

   I believe you're stuck for the time being: profiling is not yet
implemented for GLMMs.
   REML is not implemented for GLMMs either: there is some debate as to
whether a useful analogue of REML can be defined: see
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/002104.html
for example.
   I don't know of any canned approach to computing likelihood profiles
for GLMMs: there are MCMC approaches (e.g. MCMCglmm, or AD Model Builder
followed by MCMC sampling) which give you a marginal posterior
distribution ... in principle AD Model Builder can profile over the
marginal likelihood, although the last time I checked profiling didn't
actually work with random-effects models.

   If you are simply trying to get confidence intervals on your
parameters, your best *simple* bet is to take the Wald test (results of
summary()).  If you want a better answer than that, then I think your
choices are either an MCMC-based approach or bootstrapping (see
http://glmm.wikidot.com/basic-glmm-simulation to get started).

  (Since you have 16 variables in the model, I hope you have at least
200-300 observations -- and that's assuming you have only main effects 
...)



On 11-01-06 09:35 AM, sam steyaert wrote:
 Thank you for the helping out before. I could install lme4a, and  it ran
 fine for all chunks of chapter 1. Anyhow, if i try with my own data, it
 works, until i specify REML = FALSE in the model script, or use the 
update()
 function.
 
 Then, i get the following error message (it is in fact a warning 
message):
 
 In glmer(mymodelstructure),  :
 extra arguments REML are disregarded
 
 I can still get the parameter estimates by calling the model name.
 I would like to get the confidence intervals around the parameter 
estimates,
 and this appears not to work.
 
 prM1 = profile(Model1)
 Error: is(fm@resp, lmerResp) is not TRUE
 confint(prM1)  (this function logically does not work after the former 
one)
 Error in UseMethod(vcov) :
   no applicable method for 'vcov' applied to an object of class 
data.frame
 
 So i guess there is something with my data structure? I use logistic
 regression to model habitat use, and have 16 variables included in the
 model, and one random factor (as a character).
 
 Does anyone has some advice?
 
 Thanks a lot,
 
 Sam
 
 
 2010/12/29 Douglas Bates ba...@stat.wisc.edu
 


Disclaimer: click here 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] font size on graphics

2009-08-19 Thread Tom Willems
Dear R users,

My question is about finding the proper font size for graphics.

For this i had written a code that creats 4 diferent graphics and saves 
them as a png file.

From these PNG.graphics , i select one of the proper size and past it to a 
word document.
I have experimented with lots of settings yet:nd lost my track a bit.
there are cex; cexaxis cexlabes and so on, i lost track of wich cex does 
what exactly.

Fruthermore, these graphs work well in an R window, yet when i open the 
png file it did not print the labels of bottom x axis.

One other problem i have is that often '10' on the botom x axis is not 
printed.
I tried to patch it up this way (below), yet its unsatisfactory, sometimes 
it works sometimes it don't. 
 axis(1,at=c(28:30),c('','10','') , padj=-1.5 , 
cex.axis=ifelse((rol==1),setting[rol,8]-0.15,setting[rol,8]-0.1))

And finaly, when creating the graphs, i prefer the size of  graf_1_small 
it pastes easaly to a word documentn ( 2 one one line),
yet the labels are not always clear to read, so i copy the one without 
suffix to word then i size it to 75%, wich means it loses some quality.
The question here is, can anyone help me with the right cex sizes?

I gues that that is the question in general, since it are the cex that are 
eighter printed or not, on all these graph's.

Thaks in advance,
T

here is my example:

# saving Directory

WinDir - C:/Documents and Settings/towil/Projecten



# data

trials - rep(1:10,3)
test - c(rep(Low,10),rep(Normal,10),rep(Randomised,10))
means - rbeta(1:30,11,3)+rbeta(1:30,8,3)
ci - rbeta(1:30,1,2)

meanstable - data.frame( Trial=trials,Test=test,Means= 
means,Upper=means+ci ,Lower=means-ci)

graf_1 - Means 1

graf_2 - Means 2

# settings for different graph's

setting - 
data.frame('adname'=c(paste(graf_1,'_small'),graf_1,paste(graf_1,'_big')),'width'=c(300,400,500),'height'
 
= c(300,400,500), 'pointsetting' = c(10,12,14),'Directory'=rep(WinDir,3), 
'cex.lab'=  c(0.85,0.9,1),'cexsize'=  c(0.8,1,1.2), 'cex.axis'= 
c(0.6,0.9,1))

# loop for different graph's

for(rol in 1:3){
save_at -WinDir
setwd( save_at)
x11()
png(filename = paste(graf_1,setting[rol,1],.png,sep=),width = 
setting[rol,2], height = setting[rol,3],pointsize = setting[rol,4], bg = 
white, res = NA)

 plot(1:nrow(meanstable),meanstable$Mean, xlim=c(0,nrow(meanstable)),ylim= 
ylimits, xaxt='n', pch = 19, 
frame.plot=F,xlab='',ylab=lg10_label,cex.lab=setting[rol,6],cex.axis=setting[rol,8])
  arrows(1:nrow(meanstable) , meanstable$Upper , 1:nrow(meanstable) , 
meanstable$Lower, lty=3, code = 3,  angle = 45, length = .1)
 
  axis(3,at=1:2,unique(meanstable$Test)[c(1,2)],las=1 ,hadj=0.5, 
padj=c(1,0),cex.axis=setting[rol,8])   # upper X axis
  axis(3,at=2:3,unique(meanstable$Test)[c(2,2)],las=1 ,hadj=0.5, 
padj=c(0,-1),cex.axis=setting[rol,8])
  axis(3,at=3,unique(meanstable$Test)[3],las=1, 
hadj=0,padj=-1,cex.axis=setting[rol,8])
 
  axis(1,at=c(1:3),c('','1','') , padj=-1 , cex.axis=setting[rol,8])   # 
lopwer x axis this does not show on the png files, yet it works in an r 
graphic
  axis(1,at=c(4:6),c('','2','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(7:9),c('','3','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(10:12),c('','4','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(13:15),c('','5','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(16:18),c('','6','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(19:21),c('','7','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(22:24),c('','8','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(25:27),c('','9','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(28:30),c('','10','') , padj=-1 , cex.axis=setting[rol,8])


dev.off()
}

#

graphics.off()

End of example.




Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] font size on graphics question (correction in example,sorry)

2009-08-19 Thread Tom Willems
Dear R users,

My question is about finding the proper font size for graphics.

For this i had written a code that creats 4 diferent graphics and saves 
them as a png file.

From these PNG.graphics , i select one of the proper size and past it to a 
word document.
I have experimented with lots of settings yet:nd lost my track a bit.
there are cex; cexaxis cexlabes and so on, i lost track of wich cex does 
what exactly.

Fruthermore, these graphs work well in an R window, yet when i open the 
png file it did not print the labels of bottom x axis.

One other problem i have is that often '10' on the botom x axis is not 
printed.
I tried to patch it up this way (below), yet its unsatisfactory, sometimes 
it works sometimes it don't. 
 axis(1,at=c(28:30),c('','10','') , padj=-1.5 , 
cex.axis=ifelse((rol==1),setting[rol,8]-0.15,setting[rol,8]-0.1))

And finaly, when creating the graphs, i prefer the size of  graf_1_small 
it pastes easaly to a word documentn ( 2 one one line),
yet the labels are not always clear to read, so i copy the one without 
suffix to word then i size it to 75%, wich means it loses some quality.
The question here is, can anyone help me with the right cex sizes?

I gues that that is the question in general, since it are the cex that are 
eighter printed or not, on all these graph's.

Thaks in advance,
T

here is my example:


# saving Directory

WinDir - C:/Documents and Settings/Project
dir.create(file.path(C:/Documents and Settings,Project))



# data

trials - rep(1:10,3)
test - c(rep(Low,10),rep(Normal,10),rep(Randomised,10))
means - rbeta(1:30,11,3)+rbeta(1:30,8,3)
ci - rbeta(1:30,1,2)

meanstable - data.frame( Trial=trials,Test=test,Means= 
means,Upper=means+ci ,Lower=means-ci)

graf_1 - Means 1

graf_2 - Means 2

# settings for different graph's

setting - 
data.frame('adname'=c(paste(graf_1,'_small'),graf_1,paste(graf_1,'_big')),'width'=c(300,400,500),'height'
 
= c(300,400,500), 'pointsetting' = c(10,12,14),'Directory'=rep(WinDir,3), 
'cex.lab'=  c(0.85,0.9,1),'cexsize'=  c(0.8,1,1.2), 'cex.axis'= 
c(0.6,0.9,1))

# loop for different graph's

for(rol in 1:3){
save_at -WinDir
setwd( save_at)
x11()
png(filename = paste(graf_1,setting[rol,1],.png,sep=),width = 
setting[rol,2], height = setting[rol,3],pointsize = setting[rol,4], bg = 
white, res = NA)

 plot(1:nrow(meanstable),meanstable$Mean, xlim=c(0,nrow(meanstable)),ylim= 
ylimits, xaxt='n', pch = 19, 
frame.plot=F,xlab='',ylab=lg10_label,cex.lab=setting[rol,6],cex.axis=setting[rol,8])
  arrows(1:nrow(meanstable) , meanstable$Upper , 1:nrow(meanstable) , 
meanstable$Lower, lty=3, code = 3,  angle = 45, length = .1)

  axis(3,at=1:2,unique(meanstable$Test)[c(1,2)],las=1 ,hadj=0.5, 
padj=c(1,0),cex.axis=setting[rol,8])   # upper X axis
  axis(3,at=2:3,unique(meanstable$Test)[c(2,2)],las=1 ,hadj=0.5, 
padj=c(0,-1),cex.axis=setting[rol,8])
  axis(3,at=3,unique(meanstable$Test)[3],las=1, 
hadj=0,padj=-1,cex.axis=setting[rol,8])

  axis(1,at=c(1:3),c('','1','') , padj=-1 , cex.axis=setting[rol,8])   # 
lopwer x axis this does not show on the png files, yet it works in an r 
graphic
  axis(1,at=c(4:6),c('','2','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(7:9),c('','3','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(10:12),c('','4','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(13:15),c('','5','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(16:18),c('','6','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(19:21),c('','7','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(22:24),c('','8','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(25:27),c('','9','') , padj=-1 , cex.axis=setting[rol,8])
  axis(1,at=c(28:30),c('','10','') , padj=-1 , cex.axis=setting[rol,8])


dev.off()
}

#

graphics.off()


End of example.




Disclaimer: click here


Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bootstrap problem

2008-10-20 Thread Tom Willems
Dear R-users,
 
I'm having a small problem while bootstraping data.
What i would like to do, is resmple the data and calulate a function on 
this, so i can estimate the measure of reproducability for this data.
 
The function i wrote works fine, even while bootstraping.
The only problem is that bootstraping. 
 
The dataset existes of 10 trials, each divided in to 3 groups of high(3) 
medium(2) and low(1).
A bootstrap samlpe (trial) should always exist of 5 obs. taken from each 
group population, so to be representative.
 
example:
original data:
trial 1 : group(1) = (0,0,1,0,0);group(2) = (0,1,1,0,1);group(3) = 
(1,1,1,1,1)
...
bootstraped data:
trial 1 : group(1) = (0,0,0,0,1);group(2) = (1,1,0,0,1);group(3) = 
(1,0,1,1,1)
 
NOT
bootstraped data:
trial 1 : group(1) = (0,0,0,0,1,1,0);group(2) = (1,0,1);group(3) = 
(1,0,1,1,1,1,1,0,1,1)
 
Now I am familiar how to use function bootstrap (pkg bootstrap), but i 
read about a function called boot (pkg boot), however i can't seem to 
master this.
The explanation (help('boot') ) isn't making me any smarter.
I know I can always split the data up (wich is what i am doing) but i was 
wondering whether this would have an effect on the bootstrap, maby it is 
beter to keep all the groups together? 
 
 
here is a (this time WORKING) code example of what i did.
## proc
## generate data
datas - 
data.frame(protection=c(rep(c(0,1,0,1,0,0,1,0,1,1,1,0,1,1,1),2),c(0,0,0,0,1,0,1,0,1,1,1,0,1,0,1),rep(c(0,1,1,1,0,0,1,1,0,1,1,1,1,0,1),2),c(0,1,0,0,1,0,1,1,1,1,1,1,1,1,1),rep(c(0,1,0,1,0,0,1,1,1,1,1,0,1,1,1),2),c(0,1,0,0,0,0,1,1,0,1,1,1,1,1,1),c(0,0,1,1,0,0,1,1,1,0,0,1,1,0,1))
 
,group=rep(1:3,50),trial=c(rep(1,15),rep(2,15),rep(3,15),rep(4,15),rep(5,15),rep(6,15),rep(7,15),rep(8,15),rep(9,15),rep(10,15)))
## describe Function
Vacc.Vcon -function (dataset1 , trialdata , groupdata ) {

groups - unique (groupdata)
trials - unique (trialdata)
Tr - length(trials) 
G - length(groups)
Gl - length(dataset1)/(G*Tr) 
Tl - length(dataset1)/(Tr) 
iterg -data.frame(1:G,as.vector(groups))
trials - unique (trialdata)
Tr - length(trials) 
itert -data.frame(1:Tr,as.vector(trials))
triallist - c()
grouplist - c() 
for (x in 1:G){ 
ifelse(x==1,y-x,y- y+Tr)
grouplist[c(y:(y+Tr-1))] -rep(iterg[x,2],Tr)} 
iter -data.frame(1:(Tr),rep(trials,G),grouplist)
VACC - data.frame()
VACC.sub - function (dataset1,trialn,groupn){
p0 -sum(   ifelse(dataset1==1  trialdata==trialn  
groupdata==groupn, 1,0)  )
p1 -sum(   ifelse(dataset1==0  trialdata==trialn  
groupdata==groupn, 1,0)  ) 
p - p0+p1
VACC.group - 
list('Trial'=trialn,'Group'=groupn,'Vacc'=sum((p0/p)^2 
,(p1/p)^2),p0=(p0/p)  , p1=(p1/p) 
,n0=as.numeric(p0),n1=as.numeric(p1),'n'=as.numeric(p))
VACC.group}

for (i in 1:(G*Tr) ) {
  VACC[i,1] - VACC.sub (dataset1,iter[i,2],iter[i,3])[1] 
  VACC[i,2] - VACC.sub (dataset1,iter[i,2],iter[i,3])[2] 
  VACC[i,3] - VACC.sub (dataset1,iter[i,2],iter[i,3])[3]
  VACC[i,4] - VACC.sub (dataset1,iter[i,2],iter[i,3])[4] 
  VACC[i,5] - VACC.sub (dataset1,iter[i,2],iter[i,3])[5]
  VACC[i,6] - VACC.sub (dataset1,iter[i,2],iter[i,3])[6]
  VACC[i,7] - VACC.sub (dataset1,iter[i,2],iter[i,3])[7]
  VACC[i,8] - VACC.sub (dataset1,iter[i,2],iter[i,3])[8]
  VACC} 
  rownames(VACC) - NULL
  rownames(VACC) - paste(iter[,2],iter[,3],sep='_')
Pcalc - function(x) { 
out-(1/(Tr)) * sum(x)
out} 
P0 - tapply( VACC$p0,VACC$Group,Pcalc)
P1 -  tapply( VACC$p1,VACC$Group,Pcalc)
Vcon - mean(cbind(P0^2 + P1^2))
 
  Vacc.total - mean (tapply( VACC$Vacc,VACC$Group,mean))
  out - 
list(all=VACC,N=G,P0=P0,P1=P1,Vcon=Vcon*100,Vacc.total=Vacc.total*100)
  out   }
## end describe Function 
Vacc.Vcon (datas[,1] , datas[,3], datas[,2]) # example of how fun works

## data needs to be in matrix form for bootstrap function
xdata -matrix( 
cbind(datas$protection,datas$group,datas$trial),ncol=3,byrow=F) 
## function for bootstrap
  vacc.boot - function(x,xdata){ 
Vacc.Vcon(xdata[x,1],xdata[x,3],xdata[x,2]) }
bootk - 10 
results - bootstrap(1:150,bootk,vacc.boot,xdata) 
 
taccs - list() ;Vaccs - vector();Vcons - vector()
  boot.amp.vac2- for(i in 1:bootk) {
 m.i - results$thetastar[[i]] 
 taccs[i] - list(m.i )
 G.Vacc - round( 
tapply(taccs[[i]]$all$Vacc,rownames(taccs[[i]]$all),mean)*100 ,digits=3)
 Vaccs - round( mean(taccs[[i]]$Vacc.total),digits=3)
 Vcons-round( mean(taccs[[i]]$Vcon ),digits=3)
 tacc - list( 
data=taccs,Booted.means=list(Vacc.grouped=G.Vacc 
,Vacc.Total=Vaccs,Vcon.Total=Vcons)) 
 tacc} 
 
 
Rep.table - tacc$Booted.mean 
Rep.table 
## problem area = n should always be 5 in each group as in the original 
data
#calcues based on original data   last colon : n = 5
Vacc.Vcon (datas[,1] ,datas[,3], datas[,2])$all [1:5,]
#calcues based on Booted datan is not 5 !

[R] PNG file don't run on mac's?

2008-09-18 Thread Tom Willems
Dear R users,

I 'm having problems with creating PNG graphic outputs.
Usualy i create reports in HTML format, containing PNG graphics, so they 
can ealsaly be exported to word and xl and so on.
On a windows pc that i use at work all works fine, but it never works on 
my mac.
The HTML's i create on windows, open under safari but the graphics never 
open? is there a way of creating gif's for example?

Then i was wondering whether there is a less codeing intensif way to 
create plots?
What i mean is, when i create a plot, existing out of 4 subplots, and i 
want to save it as a PNG to incorporate it in to a HTML.output
i need to creat every plot under the PNG declaration.

# png(filname..
# parset - par(mfrow = c(2,2), oma = c(0,0,1,0),..])
# plot(...
# plot(...
# plot(...
# plot(...
# par(parset)
# dev.off() 

Now, because i create graphs for different people, i usualy generate them 
in differerent sizes and configurations.
So they just have to choose wich one to use in their text.
 There for i am looking for a way of programming that can save time and 
space
yet the example below does not work, it only writes tables of data

# plot.1 - plot(...
# plot.2 - plot(...
# plot.3 - plot(...
# plot.4  - plot(...
# 
# png(..
# plot.1
# dev.off()
# ...
# png(filname..
# parset - par(mfrow = c(2,2), oma = c(0,0,1,0),..])
# plot.1
# plot.2
# plot.3
# plot.4
# par(parset)
# dev.off() 

Here is an example of how i create the png files that i use in the HTML 
outputs.

# size2 - data.frame(adname=c(_small,,_big,_html), 
width=c(300,400,500,300), height = c(300,400,500,300), pointsize = 
c(10,12,14,10),Directory=c(rep(graph,3),html ), cexsize= 
c(0.6,0.65,0.7,0.65) )
# 
# for(rol in 1:4) {
# save_at -file.path(ResultDir,as.character(size[rol,5]))
# setwd( save_at) 
# x11()
# png(filename = paste(graf_1,size[rol,1],.png,sep=),width = 
size[rol,2], height = size[rol,3],pointsize = size[rol,4], bg = white, 
res = NA, restoreConsole =  TRUE) 
# parset - par(mfrow = c(2,2), oma = 
c(0,0,1,0),cex.axis=size[rol,6],cex=size[rol,6])
# boxplot(bxpdtrail$value,ylab=log10 (x), 
main=paste(Mean,testname,at,fase,sep=' '))
# points(1 ,mean(bxpdtrail$value,na.rm=T),  pch = 19)
# arrows(1 , upper(bxpdtrail$value,na.rm=T) , 1 , 
lower(bxpdtrail$value,na.rm=T), lty=3, code = 3,  angle = 45, length = .1)
# pyramid.plot(pyramiddata$yes,pyramiddata$no,main=Observation of 
Protection,labels=pyramiddata$L1,labelcex=size[rol,6], top.labels = 
c(Protected, log10(x), Unprotected),xycol=xycol,xxcol=xxcol,gap=8 
,unit = # animals)
# barplot(pyramiddata$prob_dens,names=pyramiddata$L1,ylab=EPP %, 
xlab=log10(x) ,main=Probability density plot)
# plot(pyramiddata$L1,pyramiddata$prob_cum, ylab=EPP %, xlab=log10(x) 
,main=Cumulative Probability plot, type=S)
# par(parset)
# dev.off() 
#  } 

kind regards,
Tom.


Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with SQL, how can i use functions in sql (pkg :sqldf)

2008-09-16 Thread Tom Willems
Dear R ussers ,

I was trying to summaryse data with sql, from the sqldf pkg.

it seemed like a promessing solution, yet all i can do in this is 
calculate   avg  count  and sum.

where i d like to use confidence intervals  and standard deviation as wel.

now i was trying to find  a solution my self , but the closest i got was 
sqlite3_create_function16
explained on 
  http://www.sqlite.org/c3ref/create_function.html
sadely i don't understand much of the explanation.

Now i hoped sombody could give me an other SQL solution for this.

the function i hoped to use is this one.

 mean.CI -
function (X,na.rm=T)
{   names(X)-NULL
if (is.vector(X))  {nn - length(X)}
else   {nn - nrow(X)}
NAs - sum(is.na(X))
n - nn-NAs
if (na.rm) 
avg  - mean(X,na.rm=T)
Sd  - sd(X,na.rm=T)
Var  - var(X,na.rm=T)
if (is.matrix(X)) {
apply(X, 2, sd, na.rm = T) }
else if (is.vector(X))  {
   confin -qt(0.975,df=n-1)*(sd(X, na.rm = T)/sqrt(n))}
else if (is.data.frame(X)){
   confin -qt(0.975,df=n-1)*((sapply(X, sd, na.rm = T))/sqrt(n))}
else {confin - qt(0.975,df=n-1)*(sd(as.vector(X), na.rm = 
T))/sqrt(n)}
   out -round (c( avg-confin ,avg+confin) ,digits=3)
   out 
}


kind regards,
Tom


Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how do i save data to txt file? are there marcos to save tables to word?

2008-03-17 Thread Tom Willems
Dear R-ussers,

I would like to save a newly created data file, out of R in to a text 
file.
It is a rather big dataset, and recalculating the new variables takes a 
long time. 
The quickest way to read data is  when it is saved as .txt,
this is why i hope to read the data from the old txt, than calculate a new 
set of variables based on the old set,
and save them in a new data file, also a .txt format.
One other thing i d like to know is how to save a Table output from R, 
directly to a table in word?
copy paste destroys the layout of the table.

many thaks in advance,
Tom


E-mail: [EMAIL PROTECTED]


Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how do i creat multiple back to back histograms?

2008-01-29 Thread Tom Willems
 dear R-ussers,
I would like to creeate a graph, i wich  my data is presented as verticaly 
oriented histograms, wich give the frequency of the measured values, 
grouped per used measurement methode.
So the X axis should hold the grouping variable and the Y axis a continuos 
variable. well not realy continouse, but it should show the values, 
representing the clase intervals.
small example to clarify this:
This one i can create: graph 1
only the positive results per test
5  |  ##|#|##  
|  #  |##  |#
4  |  ###  |  |
|  |##  |##
3  |  ###  |#|
|  #  |###|#
2  |  |  |##
|  ###  |##  |#
1  |  #  |#|##
|_|_|_
0test 1test2test3
Tihs is what i want to create: graph 2
a back to back histogram plot of the pos/negative results, grouped per 
test
5  |  |##  |#  
|##  
|  |#|##
#|#
4  |  |###|  ##|

| .|..|## 
...#|##..
3  |#|###  #|#|
|  ##|###|#####|#
2  
|#|..|..##|##..
|###|#####|#####|#
1  |  ##|#  #|#|##
|___|__#|#|_
0test 1test2test3
   Neg. |  Pos.Neg. |  Pos. Neg. |  Pos.


I 'd like to creat the figure of graph 2, a back to back plot of the pos/ 
and negative results of a test,
and this with the 3 tests in one graf.

I have been searching for examples,  the only trouble is that it is way to 
complex.
(http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=109)
Here is one other example of a back to back plot (graph 2) .
(histbackback(fool)

After trying to understand wath happens in the complex example, i 
identified a part that does what i need.
It does creat the graph similar to what i want (graph 1).  see code 
below 
Yet i do not understand it wel enough,so i can't creat the more complex 
graph 2.
What can i do to understand this graph functions beter, without spending 
to much time on them?

Kind regards,
Tom.

this is the code, wich i use to creat graph 1
# data for plot:

freqs - data.frame(value= c( 
0.000,1.204,1.301,1.362,1.447,1.505,1.602,1.653,1.756,1.806,1.903,1.959,
  2.053,2.107,2.204,2.258,2.354,2.408,2.505,2.559,2.656,2.709,2.806) ,
  tp1= c( 8,1,0,13,0,6,0,25,0,5,0,15,0,4,0,7,0,0,0,1,0,0,0)  ,
  tn1= c( 17,0,0,2,0,0,0,1,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0),
  tp2= c( 10,0,2,0,9,0,8,0,19,0,4,0,5,0,2,0,5,0,2,0,1,0,2)   ,
  tn2= c( 13,0,1,0,1,0,2,0,2,0,0,0,2,0,0,0,0,0,0,0,0,0,0),
  tp3= c( 9,0,0,0,0,0,0,0,0,10,0,10,0,21,0,10,0,11,0,8,0,5,0),
  tn3= c( 15,0,0,0,0,0,0,0,0,3,0,2,0,2,0,1,0,0,0,0,0,0,0))

 test-c(1,2,3,4,5,6)
testname -c('test1 p','test1 n','test2 p','test2 n','test3 p','test3 n')  
 


# parameters for plot 
 xlim = c(min(test),max(test))
 ylim = c(0,length(freqs$value))
 barscale = 0.2
 barcol = 8
 # plot 
 win.graph() 
 for (i in 1:length(freqs))
  {
 par(new = TRUE)
 xmin - -test[i] + xlim[1]
 xmax - xlim[2] - test[i]
 ser - freqs[, i+1]
 ser - ser/max(ser) * barscale
 barplot(ser, horiz = TRUE, axes = FALSE, xlim = c(xmin, 
xmax),
 ylim = ylim, col = barcol, space = 0)
}
 axis(1,labels=testname,at=c(0,0.2,0.4,0.6,0.8,1))
 axis(2,labels=freqs$value ,at=c((0:22)/23) )

this is the code, wich i hoped would creat graph 2 but it doesn't work

for (i in 1:length(freqs))
  {
 par(new = TRUE)
 xmin - -test[i] + xlim[1]
 xmax - xlim[2] - test[i]
 serx - freqs[, i+1]
 sery - freqs[, i+2]
 ser - list((serx/sum(serx) * barscale),(sery/sum(sery) * 
barscale))
 histbackback(ser,axes=FALSE ,xlim = c(xmin, xmax), ylim = 
ylim)
}



E-mail: [EMAIL PROTECTED]









Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org 

[R] biserial correlation with pkg polycor

2007-10-29 Thread Tom Willems
Dear R-ussers,

While looking for a way to calculate the association between a countinuous 
and a binary variable, 
i found a procedure called point biserial corralation.
Me, not being a mathematicion, i did my very best to understand what it 
was all about, and then i found a easily understandable paper (by steve 
simon) on ow to calculate this. ref ## 
http://www.childrens-mercy.org/stats/definitions/biserial.htm (this page 
has the same example)
Further i discovered the polycor package in R.
Now i'm having troubles with the fact that the polycor pkg never gives me 
the same output as the manuals aplication of the formula.

In the example below found, manualy  r(biserial) = 0.49 between fb an age, 
and ussing function polyserial (polycor pkg)  r(biserial) =-0.8591.
This is a rather big difference, no due to abriviation or flootingpoints.

Is there someone whom is familiar with biserial correlation, and the 
appropriate way to calculate it?

Kind regards,
Tom.

here is the example, at the end is the R file. 

1e I create the input

 library(abind) 
 library (polycor)
 ### data input
 no - c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8)
 fb - c(19, 30, 20, 19, 29, 25, 21, 24, 50, 25, 21, 17, 15, 14, 14, 22, 
17) 
 ss - c(14, 41, 18, 11, 16, 24, 18, 21, 37, 17, 10, 16, 22, 12, 14, 12, 
18)
 age -c('elderly', 'elderly', 'elderly', 'elderly', 'elderly', 
'elderly', 'elderly', 'elderly', 'elderly', 'young', 'young', 'young', 
'young', 'young', 'young', 'young', 'young') 
 
 dataset - data.frame(no,fb,ss,age)
 dataset - subset(dataset,select=c(fb:age))
 nrow(dataset) 
[1] 17
 data_eld - subset(dataset,age=='elderly',select=fb)
 data_young - subset(dataset,age=='young',select=fb)
 

here i calculate the R_bis (biserial corelation) manualy

 ### point biserial correlation
 
  fb - subset(dataset,select=fb)
  fb0 - subset(dataset,age!='elderly',select=fb)
  fb1 - subset(dataset,age=='elderly',select=fb)
  meanfb0 - mean(fb0,na.rm=T)
  meanfb1 - mean(fb1,na.rm=T)
  sdfb- sd(dataset$fb,na.rm=T)
 
  ss - subset(dataset, select=ss)
  ss0 - subset(dataset,age!='elderly',select=ss)
  ss1 - subset(dataset,age=='elderly',select=ss)
  meanss0 - mean(ss0,na.rm=T)
  meanss1 - mean(ss1,na.rm=T)
  sdss- sd(dataset$ss,na.rm=T) 
  age - subset(dataset,select=age) 
   n - nrow(dataset)
 
this is the formula from  ref ## 
http://www.childrens-mercy.org/stats/definitions/biserial.htm

  R_bis - function(x,x1,x0,n){  p - (nrow(x1)/n)
+(((mean(x1,na.rm=T)-mean(x0,na.rm=T))/sd(x,na.rm=T)) 
*sqrt(p*(1-p)))  } 

this is the corrected formula from  ref ## 
http://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient
 R_bis2 - function(x,x1,x0,n){ 
+((mean(x1,na.rm=T)-mean(x0,na.rm=T))/sd(x,na.rm=T))  * (sqrt( 
(nrow(x1)*nrow(x0))/(n*(n-1}
 
  R_bis(fb,fb1,fb0,n) 
   fb 
0.4798873 

result in paper was 0.49 
 
   R_bis2(fb,fb1,fb0,n)
   fb 
0.4946565 

equals result in paper  0.49 

Then i use the polycor package,
function hetcor will give all the different correlation ressults

 
 hetcor(dataset$fb,dataset$ss,dataset$age ,ML=TRUE)

Maximum-Likelihood Estimates

Correlations/Type of Correlation:
dataset$fb dataset.ss dataset.age
dataset$fb   1Pearson  Polyserial
dataset.ss   0.703  1  Polyserial
dataset.age-0.8591-0.6685   1

Standard Errors:
   dataset$fb dataset.ss
dataset$fb 
dataset.ss  0.1215 
dataset.age 0.1106 0.2497

n = 17 

P-values for Tests of Bivariate Normality:
dataset$fb dataset.ss
dataset$fb 
dataset.ss  0.1782 
dataset.age 0.4269 0.4034
 hetcor(dataset,ML=TRUE)

Maximum-Likelihood Estimates

Correlations/Type of Correlation:
 fb  ssage
fb1 Pearson Polyserial
ss0.703   1 Polyserial
age -0.8591 -0.6685  1

Standard Errors:
fb ss
fb 
ss  0.1215 
age 0.1106 0.2497

n = 17 

P-values for Tests of Bivariate Normality:
   fb ss
fb 
ss  0.1782 
age 0.4269 0.4034

here a quick two step method is ussed to calculate the polyserial 
correlation

 polyserial(dataset$fb,dataset$age)
[1] -0.6205737
 polyserial(dataset$fb,dataset$age, ML=TRUE, std.err=TRUE) 

same method  as in hetcor, only for indecated variables

Polyserial Correlation, ML est. = -0.8591 (0.1106)
Test of bivariate normality: Chisquare = 4.91, df = 5, p = 0.4269

   1
Threshold 0.1811
Std.Err.  0.1849
 


 ### for side to side (ss)   incase no 9 is an outlier in fb, this will 
not be the case in ss 
 
 R_bis(ss,ss1,ss0,n)
   ss 
0.4153681 

result in paper was 0.43 

 R_bis2(ss,ss1,ss0,n)
   ss 
0.4281516 

equals result in paper  0.43

 polyserial(dataset$ss,dataset$age) 
[1] -0.5371397

 polyserial(dataset$ss,dataset$age, ML=TRUE, std.err=TRUE)

Polyserial Correlation, ML est. = -0.6685 (0.2497)
Test of bivariate normality: Chisquare = 5.103, df = 5, p = 0.4034

   1
Threshold 0.1504
Std.Err.  0.2583

Re: [R] plot for binomial glm

2007-10-29 Thread Tom Willems
Dear Jonh,
there is probably an easier way, but i find this to give nice smooth 
plots.
 good luck with it.

### R-file

alive - data$num - data$numdead
numdead - data$numdead
temp - data$temp

data.table - cbind(numdead, alive)
points.graph -   data$alive/data$num

glm.mort-glm(data.table ~ temp, family=binomial)

 fit - predict(glm.mort, type='response' )


a - glm.mort$coef[1]# writes model parameters to named variable, you 
can also use them directly in a function, as you like
b - glm.mort$coef[2]

  x2 - c((logit(fit)-(a))/b)
 p2 - c ((inv.logit(a+b*x2)) )
 y2 - c ( a+b*x2)


plot(c(30,55), c(0,1),type=n, main= survival,xlab = Log x, ylab = 
Probability)
  lines( sortedXyData( (logit(p2)-(a))/b,p2),type=l,lty=1 
,col=blue,ylim=c(0,1.2) )
  points(temp,fit,pch=4,type= p,col=black)
 
## This will plot a smooth cuve

x  -  c(x=(rep(33:55,1)))
p - c ((inv.logit(a+b*x)) )
y  -  c ( a+b*x)

plot(c(30,55), c(0,1),type=n, main= survival,xlab = Log x, ylab = 
Probability)
  lines( sortedXyData( (logit(p)-(a))/b,p),type=l,lty=1 
,col=blue,ylim=c(0,1.2) )
  points(temp,fit,pch=4,type= p,col=black)

### END 



Willems Tom

E-mail: [EMAIL PROTECTED]

 


Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question: Rcmdr and macbooks

2007-10-23 Thread Tom Willems
Hello dear Russers,

I have noticed that certain verry handy functions, like plotMeans and 
ci.plot,
only run under Rcmdr. 
 Rcmdr does not run on a mac , so i hope someone out there knows about an
Rcmdr module for mac's.

Kind regards,
Tom



Willems Tom
E-mail: [EMAIL PROTECTED]

www.var.fgov.be 


Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Q: confidence intervals plotMeans, how do i discard NA's

2007-09-26 Thread Tom Willems
Dear R ussers,


I noticed a small problem when ussing for example the function  
plotMeamns under Rcmdr.
In this case a graph will be ploted, giving the means on the Y-axis, by a 
factor on the X-axis.
All is correct when doing this, ussing 'sd' for error bars.

The problem occures when ussing conidence intervals for error bars.
Here, R will assume that every factor on the X-axis has the same length.
And will calculate the confidence interval as folows:

N - length ( X)
means  - mean ( X, na.rm=T)
if (is.matrix ( X))
apply( X, 2, sd, na.rm = T)
else if ( is.vector( X))
   qt( 0.975, df=N-1)*( sd( X, na.rm = T)/ sqrt( N))
else if ( is.data.frame(X))
 qt( 0.975, df=N-1)*(( sapply( X, sd, na.rm = T)) /sqrt( N))
else qt( 0.975, df=N-1)*( sd( as.vector(X), na.rm = T)) /sqrt( N)
yrange - if (error.bars != none)
c(min(means - sds, na.rm=TRUE), max(means + sds ,na.rm=TRUE))

Now this works fine when you have a we equilibrated data set, yet in most 
cases you don't!
Still R will calculate the confidence intevals for each means par factor, 
ussing the length of the greatest factor.
example:
when you have 3 factors, A with 10 mesurements, B with 5, and C 
with 8,
all confidence intervals will be calculated ussing a total N of 10

I would like to correct this, so the plots i creat have corret CI 
errorbars, yet i did not find a procedure to omint NA's in the function  
length , NROW, or nrow.
Can anybody help me solve this problem please.

Kind regards,
Tom




Disclaimer: click here
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.