[R] embedFonts() and bounding box

2007-06-28 Thread Christoph Buser
Dear R gurus

I have a question regarding the function embedFonts().  
I assume the in that function which calls gs, the bounding box
of the eps file is changed. Is that by intention? Do I have
call explicitly some gs-options to avoid it and if yes, how?
Thank you very much for your help.

Best regards,

Christoph Buser

## R example
postscript("test.eps", width = 14, height = 8, 
 onefile = FALSE, horizontal=FALSE, paper="special")
plot(1:10)
dev.off()
embedFonts(file = "test.eps", outfile = "test1.eps")


------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Preconditions for a variance analysis

2007-06-20 Thread Christoph Buser
Dear David

I'd not recommend the tests (for normality, equal variances) as
they are described on your second link (wikipedia).

I would use graphical tools such as Tukey-Anscombe Plot
(residuals against fitted values), quantile (or normal) plot,
leverage plot.

see also ?plot.lm for some short descriptions of these plots in
R.

Best regards,

Christoph

Daniel Tahin writes:
 > Thanx for your answer. I don't have the book, but found something on 
 > the web:
 > http://www.basic.northwestern.edu/statguidefiles/oneway_anova.html   
 > and
 > http://en.wikipedia.org/wiki/Analysis_of_variance#Assumptions
 > 
 > Seems to be the same on both of the sites :-)
 > Is this, that was meant?
 > 
 > Thanx again,
 > Daniel
 > 
 > 
 > 
 > 
 > > Dear David
 > > 
 > > Yes. There are assumptions that should be verified in an
 > > analysis of variance. Without checking them, the results are not
 > > reliable. 
 > > I'd recommend e.g.
 > > 
 > > Robert O. Kuehl, Design of Experiments: Statistical Principles
 > >   of Research Design and Analysis, Duxbury Press, 2000
 > > 
 > > You will find a chapter about assumptions and how to check them
 > > by residual analysis,
 > > 
 > > And also
 > > 
 > > W. N. Venables and B. D. Ripley, Modern Applied Statistics 
 > >   with S, Springer-Verlag, New York, 2002
 > > 
 > > in which you find residual analysis and how to obtain it in R.
 > > 
 > > Best regards,
 > > 
 > > Christoph
 > > 
 > > ------
 > > 
 > > Credit and Surety PML study: visit our web page www.cs-pml.org
 > > 
 > > --
 > > Christoph Buser <[EMAIL PROTECTED]>
 > > Seminar fuer Statistik, LEO C13
 > > ETH Zurich 8092 Zurich  SWITZERLAND
 > > phone: x-41-44-632-4673fax: 632-1228
 > > http://stat.ethz.ch/~buser/
 > > --
 > > 
 > > 
 > > Daniel Tahin writes:
 > >  > Hello everbody,
 > >  > 
 > >  > i'm currently using the anova()-test for a small data.frame of 40 
 > >  > rows and 2 columns. It works well, but is there any preconditions for 
 > >  > a valid variance analysis, that i should consider?
 > >  > 
 > >  > Thank you for your answer,
 > >  > Daniel
 > >  > 
 > >  > __
 > >  > R-help@stat.math.ethz.ch mailing list
 > >  > https://stat.ethz.ch/mailman/listinfo/r-help
 > >  > PLEASE do read the posting guide 
 > > http://www.R-project.org/posting-guide.html
 > >  > and provide commented, minimal, self-contained, reproducible code.
 >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Preconditions for a variance analysis

2007-06-20 Thread Christoph Buser
Dear David

Yes. There are assumptions that should be verified in an
analysis of variance. Without checking them, the results are not
reliable. 
I'd recommend e.g.

Robert O. Kuehl, Design of Experiments: Statistical Principles
  of Research Design and Analysis, Duxbury Press, 2000

You will find a chapter about assumptions and how to check them
by residual analysis,

And also

W. N. Venables and B. D. Ripley, Modern Applied Statistics 
  with S, Springer-Verlag, New York, 2002

in which you find residual analysis and how to obtain it in R.

Best regards,

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Daniel Tahin writes:
 > Hello everbody,
 > 
 > i'm currently using the anova()-test for a small data.frame of 40 
 > rows and 2 columns. It works well, but is there any preconditions for 
 > a valid variance analysis, that i should consider?
 > 
 > Thank you for your answer,
 > Daniel
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lines connecting the boxes in a boxplot

2007-06-18 Thread Christoph Buser
Dear Arne

I' recommend to save the information of your boxplots

a <- boxplot(...)
str(a)

Then you have the information that you need about your boxplot
(e.g. the value of the median) and can use segments() to draw
the lines you want.

Hope this helps

Best regards,

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Arne Brutschy writes:
 > Hello,
 > 
 > I'm currently using a boxplot to visualize data for three different
 > models. As I have three models, I'm plotting three parallel boxplots
 > for each factor.
 > 
 > This works fine - what I need now is a line connecting the medians of
 > each boxplot of each model. I want to do this in order to visualize
 > the trend that one of the models exhibit. Basically, I want to plot a
 > curve for each model (slightly offset on the x axis), with a boxplot
 > on each datapoint.
 > 
 > It's only an idea, and I don't know if it's not too confusing after
 > adding the lines... Is it possible? Has anyone done this before?
 > 
 > Sorry if this has been asked before or is a standard feature, I simply
 > have now clue how to name the feature I want. Ergo: I cannot search
 > for it.. :\
 > 
 > Regards,
 > Arne
 > 
 > PS: this is my current code
 > 
 > require(gplots)
 > boxwex=0.15
 > 
 > data <- read.table("all_runs_fitness.data");
 > colnames(data)=c("model","matrix","fitness")
 > 
 > boxplot(fitness ~ matrix,
 > data=data, boxwex=boxwex, at=(1:7 - 0.2),
 > main="Fitness for Matrix/Models", xlab="Matrixtype",
 > ylab="Fitness", ylim=c(20,100), 
 > subset=(model=="dyn"), col="lightblue", xaxt="n", whisklty=1)
 > boxplot(fitness ~ matrix,
 > data=data, boxwex=boxwex, at = 1:7, add=TRUE, 
 > subset=(model=="dl3"), col="mistyrose", xaxt="n", whisklty=1)
 > boxplot(fitness ~ matrix,
 > data=data, boxwex=boxwex, at=(1:7 + 0.2), add=TRUE, 
 > subset=(model=="dl4"), col="lightcyan", xaxt="n", whisklty=1)
 > 
 > axis(1, 1:8-0.5, labels=FALSE)
 > axis(1, 1:7, tck=FALSE, labels=levels(data[,2]))
 > 
 > smartlegend(x="left", y="bottom", inset = 0.01,
 > c("dyn","dl3","dl4"), fill = c("lightblue", "mistyrose", 
 > "lightcyan"))
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data transformation for chi-square test.

2007-06-12 Thread Christoph Buser
Dear Charlie

dat <- data.frame(id = 1:10, member = c(4,2,3,5,6,2,4,6,3,4),
  cost = c(320,150,420,330,540,310,169,647,347,567))  

dat[,"costF"] <- cut(dat[,"cost"], breaks = seq(100, 700, by=100))
table(dat[,"costF"], dat[,"member"])

This should create the table you like.

Best regards,

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

Charlie Chi writes:
 > Dear all R users
 > :
 > I am a IT student with few statistical background and new R user for only 
 > have  two month exprience. I have a data named medcost, import by 
 > read.table() as follow for example (real dataset has 500 cases), the 
 > heander id means case id, member means members in a family and cost is the 
 > family pay for medical cost every 6 months.
 > 
 > idmember   cost
 > 1 4  320
 > 2 2  150
 > 3 3  420
 > 4 5  330
 > 5 6  540
 > 6 2  310
 > 7 4  169
 > 8 6  647
 > 9 3  347
 > 10   4  567
 > 
 > I would like to use this dataset with chi-sqare analysis to see if there is 
 > any realationship between family member and medical cost (more members in a 
 > family will rise their medical cost?) I have found the pacage called stats, 
 > but I think need to transform the dataset into a contingency table as I 
 > read from books. I am not sure if I correct, I think the table should looks 
 > like:
 >   member
 > cost[2]  [3] [4] [5] [6] Total
 > [0,100]   1 000   0  1
 > [100,200]   0 010   0  1
 > [200,300]   0 000   0  0
 > [300,400]   1 111   0  4
 > [400,500]   0 100   0  1
 > [500,600]   0 010   1  2
 > [600,700]   0 000   1  1
 > Total  2  2   3 1   2 10
 > 
 > I did try to use the method in chapter 5.0 of "R Introduction" to create 
 > freqency table, but it did not work. I am wondering if any one can help me 
 > with it? Thank you for your help.
 > 
 > Regards
 > 
 > Charlie
 > ..
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prop.test or chisq.test ..?

2007-02-28 Thread Christoph Buser
Hi

Some comments are inside.

Dylan Beaudette writes:
 > Hi everyone,
 > 
 > Suppose I have a count the occurrences of positive results, and the total 
 > number of occurrences:
 > 
 > 
 > pos <- 14
 > total <- 15
 > 
 > testing that the proportion of positive occurrences is greater than 0.5 
 > gives 
 > a p-value and confidence interval:
 > 
 > prop.test( pos, total, p=0.5, alternative='greater')
 > 
 > 1-sample proportions test with continuity correction
 > 
 > data:  14 out of 15, null probability 0.5 
 > X-squared = 9.6, df = 1, p-value = 0.0009729
 > alternative hypothesis: true p is greater than 0.5 
 > 95 percent confidence interval:
 >  0.706632 1.00 
 > sample estimates:
 > p 
 > 0.933
 > 

First of all by default there is a continuity correction in
prop.test(). If you use 

prop.test(pos, total, p=0.5, alternative="greater", correct = FALSE)

1-sample proportions test without continuity correction

data:  pos out of total, null probability 0.5 
X-squared = 11.2667, df = 1, p-value = 0.0003946
alternative hypothesis: true p is greater than 0.5 
95 percent confidence interval:
 0.7492494 1.000 
sample estimates:
p 
0.933 

Remark that know the X-squared is identical to  your result from
chisq.test(), but the p-value is 0.0007891/2

The reason is that you are testing here the against the
alternative "greater"

If you use a two sided test 

prop.test(pos, total, p=0.5, alternative="two.sided", correct = FALSE)

then you reporduce the results form chisq.test().


 > 
 > My question is how does the use of chisq.test() differ from the above 
 > operation. For example:
 > 
 > chisq.test(table( c(rep('pos', 14), rep('neg', 1)) ))
 > 
 > Chi-squared test for given probabilities
 > 
 > data:  table(c(rep("pos", 14), rep("neg", 1))) 
 > X-squared = 11.2667, df = 1, p-value = 0.0007891
 > 
 > ... gives slightly different results. I am corrent in interpreting that the 
 > chisq.test() function in this case is giving me a p-value associated with 
 > the 
 > test that the probabilities of pos are *different* than the probabilities of 
 > neg -- and thus a larger p-value than the prop.test(... , p=0.5, 
 > alternative='greater') ? 
 > 

Yes. In your example chisq.test() tests the null hypothesis if
all population probabilities are equal

?chisq.test says:
"In this case, the hypothesis tested is whether the population
probabilities equal those in 'p', or are all equal if 'p' is not
given." 

which means p1 = p2 = 0.5 in your two population case against
the alternative p1 != p2.

This is similar to the test in prop.test() p=0.5 against p!=0.5 
and therefore you get identical results if you choose
alternative="two.sided" in prop.test().


 > I realize that this is a rather elementary question, and references to a 
 > text 
 > would be just as helpful. Ideally, I would like a measure of how much I 
 > can 'trust' that a larger proportion is also statistically meaningful. Thus 
 > far the results from prop.test() match my intuition, but
 > affirmation would be 

Your intuition was correct. Nevertheless in your original
results (with the continuity correction), the p-value of
prop.test()  (0.0009729) was larger than the p-value of
chisq.test() (0.0007891) and therefore against your intuition. 

 > great.
 > 
 > Cheers,
 > 
 > 
 > -- 
 > Dylan Beaudette
 > Soils and Biogeochemistry Graduate Group
 > University of California at Davis
 > 530.754.7341
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.


Hope this helps

Christoph Buser

--

Credit and Surety PML study: visit our web page www.cs-pml.org

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the value of Delta

2007-01-23 Thread Christoph Buser

Or better

slot(sum.sam.out, "row.sig.genes")



jing writes:
 > 
 > 
 > 
 > 
 > Dear all,
 > 
 > I am running R 2.4.1.
 > 
 > > library(siggenes);
 > > library(multtest);
 > > cl<-rep(c(0,1),c(3,3));
 > > sub<-exprs(AffyExpData[,c(1:3,7:9)]);
 > > gn<-geneNames(AffyRAwData);
 > > sam.out<-sam(sub,cl,rand=123,gene.names=gn);
 > 
 > We're doing 20 complete permutations
 > 
 > > sam.out
 > SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances 
 >  
 >Deltap0  False Called   FDR
 > 10.1 0.929 292.25293 0.927
 > 20.4 0.929  43.60 56 0.724
 > 30.7 0.929  12.25 20 0.569
 > 41.0 0.929   7.25 14 0.481
 > 51.3 0.929   2.60  7 0.345
 > 61.7 0.929   1.30  5 0.242
 > 72.0 0.929   1.30  5 0.242
 > 82.3 0.929   0.45  2 0.209
 > 92.6 0.929   0.45  2 0.209
 > 10   2.9 0.929   0.45  2 0.209
 > 
 > > sum.sam.out<-summary(sam.out,1,ll=FALSE);
 > > sum.sam.out$row.sig.genes;
 > NULL
 > > sum.sam.out<-summary(sam.out,0.1,ll=FALSE);
 > > sum.sam.out$row.sig.genes;
 > NULL
 > > sum.sam.out<-summary(sam.out,0.001,ll=FALSE);
 > > sum.sam.out$row.sig.genes;
 > NULL
 > > sum.sam.out<-summary(sam.out,1000,ll=FALSE);
 > > sum.sam.out$row.sig.genes;
 > NULL
 >
 > 
 > Any value of Delta I chosen: 1000,1,0,1 0.001, the outcome is NULL.
 > 
 > Any help or suggestions that you can provide will be greatly appreciated.
 > 
 > Qing
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the value of Delta

2007-01-23 Thread Christoph Buser
I have not installed the package "siggenes", but from the manual
on" 

http://www.ugrad.stat.ubc.ca/R/library/siggenes/

I assume it is programmed as an S4 class since they use 

[EMAIL PROTECTED]

instead of "$"

Hope this helps

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


jing writes:
 > 
 > 
 > 
 > 
 > Dear all,
 > 
 > I am running R 2.4.1.
 > 
 > > library(siggenes);
 > > library(multtest);
 > > cl<-rep(c(0,1),c(3,3));
 > > sub<-exprs(AffyExpData[,c(1:3,7:9)]);
 > > gn<-geneNames(AffyRAwData);
 > > sam.out<-sam(sub,cl,rand=123,gene.names=gn);
 > 
 > We're doing 20 complete permutations
 > 
 > > sam.out
 > SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances 
 >  
 >Deltap0  False Called   FDR
 > 10.1 0.929 292.25293 0.927
 > 20.4 0.929  43.60 56 0.724
 > 30.7 0.929  12.25 20 0.569
 > 41.0 0.929   7.25 14 0.481
 > 51.3 0.929   2.60  7 0.345
 > 61.7 0.929   1.30  5 0.242
 > 72.0 0.929   1.30  5 0.242
 > 82.3 0.929   0.45  2 0.209
 > 92.6 0.929   0.45  2 0.209
 > 10   2.9 0.929   0.45  2 0.209
 > 
 > > sum.sam.out<-summary(sam.out,1,ll=FALSE);
 > > sum.sam.out$row.sig.genes;
 > NULL
 > > sum.sam.out<-summary(sam.out,0.1,ll=FALSE);
 > > sum.sam.out$row.sig.genes;
 > NULL
 > > sum.sam.out<-summary(sam.out,0.001,ll=FALSE);
 > > sum.sam.out$row.sig.genes;
 > NULL
 > > sum.sam.out<-summary(sam.out,1000,ll=FALSE);
 > > sum.sam.out$row.sig.genes;
 > NULL
 >
 > 
 > Any value of Delta I chosen: 1000,1,0,1 0.001, the outcome is NULL.
 > 
 > Any help or suggestions that you can provide will be greatly appreciated.
 > 
 > Qing
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aov y lme

2007-01-22 Thread Christoph Buser
Dear Tomas

You can produce the results in Montgomery Montgomery with
lme. Please remark that you should indicate the nesting with the
levels in your nested factor. Therefore I recreated your data,
but used 1,...,12 for the levels of batch instead of 1,...,4.

purity<-c(1,-2,-2,1,-1,-3,0,4, 0,-4, 1,0, 1,0,-1,0,-2,4,0,3,
  -3,2,-2,2,2,-2,1,3,4,0,-1,2,0,2,2,1)
suppli<-factor(c(rep(1,12),rep(2,12),rep(3,12)))
batch<-factor(c(rep(1:4,3), rep(5:8,3), rep(9:12,3)))
material<-data.frame(purity,suppli,batch)

As you remarked you can use aov

summary(material.aov<-aov(purity~suppli+suppli:batch,data=material))
 Df Sum Sq Mean Sq F value  Pr(>F)  
suppli2 15.056   7.528  2.8526 0.07736 .
suppli:batch  9 69.917   7.769  2.9439 0.01667 *
Residuals24 63.333   2.639  
---
Signif. codes:  0 $,1rx(B***$,1ry(B 0.001 $,1rx(B**$,1ry(B 0.01 
$,1rx(B*$,1ry(B 0.05 $,1rx(B.$,1ry(B 0.1 $,1rx(B $,1ry(B 1 

Remark that the test of "suppli" is not the same as in
Montgomery. Here it is wrongly tested against the Residuals and
you should perform the calculate the test with: 

(Fsuppi <- summary(material.aov)[[1]][1,"Mean Sq"]/
  summary(material.aov)[[1]][2,"Mean Sq"])
pf(Fsuppi, df1 = 2, df2 = 9)

To use lme the correct level numbering is now important to
indicate the nesting. You should specify your random component
as 

random=~1|batch

If you use "random=~1|suppli/batch" instead, random components
for batch AND suppli are included in the model and you specify a 
model that incorporates suppli as random and fixed. Therefore
the correct syntax is

library(nlme)
material.lme<-lme(purity~suppli,random=~1|batch,data=material)
## You obtain the F-test for suppli using "anova"
anova(material.lme)
summary(material.lme)
## Remark that in the summary output, the random effects are
## standard deviations and not variance components and you 
## should square them to compare them with Montgomery
## 1.307622^2 = 1.71 1.624466^2 = 2.64

## Or you can use 
VarCorr(material.lme)

I hope this helps you.

Best regards,

Christoph Buser

--

Credit and Surety PML study: visit our web page www.cs-pml.org

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

Tomas Goicoa writes:
 > 
 > 
 > 
 > Dear R user,
 > 
 > I am trying to reproduce the results in Montgomery D.C (2001, chap 13, 
 > example 13-1).
 > 
 > Briefly, there are three suppliers, four batches nested within suppliers 
 > and three determinations of purity (response variable) on each batch. It is 
 > a two stage nested design, where suppliers are fixed and batches are random.
 > 
 > y_ijk=mu+tau_i+beta_j(nested in tau_i)+epsilon_ijk
 > 
 > Here are the data,
 > 
 > purity<-c(1,-2,-2,1,
 >   -1,-3, 0,4,
 >0,-4, 1, 0,
 >1,0,-1,0,
 >-2,4,0,3,
 >-3,2,-2,2,
 >2,-2,1,3,
 >4,0,-1,2,
 >0,2,2,1)
 > 
 > suppli<-factor(c(rep(1,12),rep(2,12),rep(3,12)))
 > batch<-factor(rep(c(1,2,3,4),9))
 > 
 > material<-data.frame(purity,suppli,batch)
 > 
 > If I use the function aov, I get
 > 
 > material.aov<-aov(purity~suppli+suppli:batch,data=material)
 > summary(material.aov)
 >   Df Sum Sq Mean Sq F value  Pr(>F)
 > suppli2 15.056   7.528  2.8526 0.07736 .
 > suppli:batch  9 69.917   7.769  2.9439 0.01667 *
 > Residuals24 63.333   2.639
 > ---
 > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
 > 
 > and I can estimate the variance component for the batches as
 > 
 > (7.769- 2.639)/3=1.71
 > 
 > which is the way it is done in Montgomery, D.
 > 
 > I want to use the function lme because I would like to make a diagnosis of 
 > the model, and I think it is more appropriate.
 > 
 > Looking at Pinheiro and Bates, I have tried the following,
 > 
 > library(nlme)
 > material.lme<-lme(purity~suppli,random=~1|suppli/batch,data=material)
 > VarCorr(material.lme)
 > 
 >  Variance StdDev
 > suppli =pdLogChol(1)
 > (Intercept) 1.563785 1.250514
 > batch = pdLogChol(1)
 > (Intercept) 1.709877 1.307622
 > Residual2.638889 1.624466
 > 
 > material.lme
 > 
 > Linear mixed-effects model fit by REML
 >Data: material
 >Log-restricted-likelihood: -71.42198
 >Fixed: purity ~ suppli
 > (Interce

Re: [R] mathematical symbols in plots

2007-01-03 Thread Christoph Buser
Dear Sebastian

plot(1:10, 1:10)
text(4, 9, expression(paste("<", k, ">")))

should work here.

Best regards,

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Sebastian Weber writes:
 > Hello everyone!
 > 
 > I'm trying to plot some mathematical expression along my axis, but
 > demo(plotmath) did not have the symbol I was looking for. In particular,
 > I would like to denote the mean of an observable by writing
 > 
 > 
 > 
 > which I tried to enter with
 > 
 > expression(group("<", k, ">"))
 > 
 > However, my naive try doesn't work and the help doesn't want to tell me,
 > does someone know?
 > 
 > And here another one: How can I sepcify which fonts get used with which
 > R prints those mathematical symbols? Since I finally include my plots in
 > latex-documents as eps, I would love to use the same font-encoding for
 > all postscript stuff. A problem in the past has been, that R embedded
 > it's own font within the ps-files generated. These were not compatible
 > with the fonts used at the magazine where I published my document. This
 > lead to quite some confusion as \gamma became g and so on. Any solution
 > to this problem? Any hint? As I'm not too much into font-encoding, I
 > have actually no real clue where to even start searching.
 > 
 > Thank you very much for any help.
 > 
 > Greetings,
 > 
 > Sebastian Weber
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compute p-value?

2006-11-15 Thread Christoph Buser
Dear Michael

It calls the function pt() to calculate the p-value based on
your t statistics and the degree of freedoms.

If you are interested how it is calculated in details, you can
have a look into the source code:

 /R-2.4.0/src/nmath/pt.c

There you can find the C-code.

Hope this helps

Christoph Buser

--

Credit and Surety PML study: visit our web page www.cs-pml.org

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Michael writes:
 > Hi all,
 > 
 > I just want to understand how R computes p-value in a simple linear
 > regression model? The reason is that in Matlab in the
 > 
 > function which "evaluate standard errors for multivariate normal
 > regression", it just provide estimates and standard errors, without giving
 > out p-value,
 > 
 > It computes t-statistics as follows:
 > 
 > abs(beta_hat/std_beta_hat)
 > 
 > how to go further to get p-value?
 > 
 > Thanks
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on multilevel modeling

2006-11-07 Thread Christoph Buser
Dear Christine

I think the problem in your second model is that you are
including "CORUMTO" both as a fixed effect and as a random
effect. 

Regards,

Christoph 

--

Credit and Surety PML study: visit our web page www.cs-pml.org

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Christine A. Calmes writes:
 > Hi,
 >  
 > I am trying to run a multilevel model with time nested in people and
 > people nested in dyads (3 levels of nesting) by initially running a
 > series of models to test whether the slope/intercept should be fixed or
 > random.  The problem that I am experiencing appears to arise between the
 > random intercept, fixed slope equation AND.
 >  
 > (syntax:
 > rint<-lme(BDIAFTER~BDI+WEEK+CORUMTO, random=~1|DYADID/PARTICIP,
 > data=new) 
 > summary(rint))
 >  
 > the random slope, random intercept model 
 >  
 > (syntax: 
 > rslint<-lme(BDIAFTER~BDI+WEEK+CORUMTO, random=~CORUMTO|DYADID/PARTICIP,
 > data=new) 
 > summary(rslint))
 >  
 > at which point I obtain the exact same results for each model suggesting
 > that one of the model is not properly specifying the slope or intercept.
 >  
 > Or, I receive the following error message when I try to run the random
 > slope/random intercept model.
 >  
 > Error in solve.default(pdMatrix(a, fact = TRUE)) : 
 > system is computationally singular: reciprocal condition number
 > = 6.77073e-017
 >  
 > (whether I receive an error message or the same results depends on the
 > specific variables in the model).
 >  
 > It has been suggested that I may need to change the default starting
 > values in the model because I may be approaching a boundary-is this a
 > plausible explanation for my difficulties?  If so, how do I do this in R
 > and can you refer me to a source that might highlight what would be
 > reasonable starting values?
 > If this does not seem like the problem, any idea what the problem may be
 > and how I might fix it?
 >  
 > Thank you so much for your assistance,
 > Christine Calmes
 >  
 >  
 > Christine A. Calmes, MA
 > Dept of Psychology
 > University at Buffalo: The State University of New York
 > Park Hall 216
 > Buffalo, NY 14260
 > [EMAIL PROTECTED]
 > (716) 645-3650 x578
 >  
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing 2 dataframes

2006-11-06 Thread Christoph Buser
Hi

Maybe this example can help you to find your solution:

dat1 <- data.frame(CUSTOMER_ID = c("1000786BR", "1002047BR", "10127BR",
 "1004166834BR"," 1004310897BR", "1006180BR",
 "10064798BR", "1007311BR", "1007621BR",
 "1008195BR", "10126BR", "95323994BR"),
   CUSTOMER_RR = c("5+", "4", "5+", "2", "X", "4", "4", "5+",
 "4", "4-", "5+", "4"))

dat2 <- data.frame(CUSTOMER_ID = c("1200786BR", "1802047BR", "1027BR",
 "10166834BR", "107BR", "100BR", "164798BR", "1008195BR",
 "10126BR"),
   CUSTOMER_RR = c("6+", "4", "1+", "2", "X", "4", "4", "4",
 "5+"))

## Merge, but only by "CUSTOMER_ID"
datM <- merge(dat1, dat2, by = "CUSTOMER_ID")
datM
## Select only cases that have a similar "CUSTOMER_RR"
datM1 <- datM[as.character(datM[, "CUSTOMER_RR.x"]) %in%
  as.character(datM[,"CUSTOMER_RR.y"]), ]
datM1

Regards,

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Priya Kanhai writes:
 > Hi,
 > 
 > I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of
 > different length.
 > 
 > For example:
 > 
 > RRC_db1:
 > 
 > CUSTOMER_ID CUSTOMER_RR
 > 1 1000786BR   5+
 > 2 1002047BR4
 > 3   10127BR   5+
 > 4  1004166834BR2
 > 5  1004310897BRX
 > 6 1006180BR4
 > 710064798BR4
 > 8 1007311BR   5+
 > 9 1007621BR4
 > 101008195BR   4-
 > 11  10126BR   5+
 > 12   95323994BR4
 > 
 >  RRC_db2:
 > 
 > CUSTOMER_ID CUSTOMER_RR
 > 1 1200786BR   6+
 > 2 1802047BR4
 > 3  1027BR 1+
 > 4   10166834BR2
 > 5   107BR  X
 > 6 100BR4
 > 7164798BR4
 > 81008195BR   4-
 > 9  10126BR   5+
 > 
 > 
 > I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2:
 > third <- merge(RRC_db1, RRC_db2) or  third <-subset(RRC_db1, CUSTOMER_ID%in%
 > RRC_db2$CUSTOMER_ID)
 > 
 > But I also want to check if the CUSTOMER_RR is correct. I had tried this:
 > 
 > > test <- function(RRC_db1,RRC_db2)
 > + {
 > + noteq <- c()
 > + for( i in 1:length(RRC_db1$CUSTOMER_ID)){
 > + for( j in 1:length(RRC_db2$CUSTOMER_ID)){
 > + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){
 > + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){
 > + noteq <- c(noteq,RRC_db1$CUSTOMER_ID[i]);
 > + }
 > + }
 > + }
 > + }
 > + noteq;
 > + }
 > >
 > > test(RRC_db1, RRC_db2)
 > Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) :
 > level sets of factors are different
 > 
 > 
 > But then I got this error.
 > 
 > I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR.
 > 
 > Can you please help me?
 > 
 > Thanks in advance.
 > 
 > Regards,
 > 
 > Priya
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple question on a function

2006-10-02 Thread Christoph Buser
Dear Serguei

There might be more efficient ways, but this should work:   

## Define function that you want to optimize. In your case I
## copied your code, but included freq as a second argument:
fun <- function(x, freq)
{
  (freq[1]-(1-x[1])*(1-x[2])-x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[1]+
(freq[2]-(1-x[1])*x[2]+x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[2]+
  (freq[3]-x[1]*(1-x[2])+x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[3]+
(freq[4]-x[1]*x[2]-x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[4]
}

## Define mat with values for freq (your code)
mat<-matrix(c(0.4,0.1,0.1,0.4), byrow=TRUE, nrow=10, ncol=4)

## Use apply on mat
apply(mat, 1, function(freq, start) optim(start, fun,
method="BFGS", freq = freq)$par, start = c(0.1,0.1,0.1))

You still can use t() to transpose the matrix if you want the
solutions by row instead of columns.


Please remark that in general optim returns a list, including
several arguments, e.g. convergence that indicates if optim has
converge.
Since you wanted a matrix I only returned optim(...)$par. This
might be dangerous since the additional information gets
lost. Maybe it is better to save the output in a list. You can
try:  

apply(mat, 1, function(freq, start) optim(start, fun,
method="BFGS", freq = freq), start = c(0.1,0.1,0.1))

to see the difference.

Hope this helps

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Serguei Kaniovski writes:
 > I would like to apply the following function to the rows of the matrix 
 > "mat", so that freq[1],...,freq[4] are the four elements of each row.
 > 
 > min_chi2<-function(freq){
 > 
 > obj<-function(x){
 > (freq[1]-(1-x[1])*(1-x[2])-x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[1]+
 > (freq[2]-(1-x[1])*x[2]+x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[2]+
 > (freq[3]-x[1]*(1-x[2])+x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[3]+
 > (freq[4]-x[1]*x[2]-x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[4]
 > }
 > 
 > optim(c(0.1,0.1,0.1),obj,NULL,method="BFGS")$par
 > }
 > 
 > mat<-matrix(c(0.4,0.1,0.1,0.4), byrow=TRUE, nrow=10, ncol=4)
 > 
 > Questions:
 > 1. How to do this using the "apply" function?
 > 2. Can "opmit" be used directly, i.e. without needing to define the 
 > function "min_chi2"?
 > 3. How to pass the vector of initial conditions (c(0.1,0.1,0.1)) as an 
 > argument to "apply"?
 > 
 > The output should be a 10x3 matrix containing 0.5 0.5 0.6 in each row.
 > 
 > Thanks a lot,
 > Serguei
 > -- 
 > ___
 > 
 > Austrian Institute of Economic Research (WIFO)
 > 
 > Name: Serguei Kaniovski  P.O.Box 91
 > Tel.: +43-1-7982601-231  Arsenal Objekt 20
 > Fax:  +43-1-7989386  1103 Vienna, Austria
 > Mail: [EMAIL PROTECTED]
 > 
 > http://www.wifo.ac.at/Serguei.Kaniovski
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contrasts in aov

2006-09-25 Thread Christoph Buser
Dear John

?ordered 

will help you.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


John Vokey writes:
 > useRs,
 >A no doubt simple question, but I am baffled.  Indeed, I think I  
 > once knew the answer, but can't recover it.  The default contrasts  
 > for aov (and lm, and...) are contr.treatment and contr.poly for  
 > unordered and ordered factors, respectively.  But, how does one  
 > invoke the latter?  That is, in a data.frame, how does one indicate  
 > that a factor is an *ordered* factor such that contr.poly is invoked  
 > in the aov or lm call?
 > --
 > Please avoid sending me Word or PowerPoint attachments.
 > See <http://www.gnu.org/philosophy/no-word-attachments.html>
 > 
 > -Dr. John R. Vokey
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge problem

2006-09-22 Thread Christoph Buser
Dear Tova

There is no reason why the merged data.frame should have
exactely 8000 or less rows.

The "all=FALSE" options only says that now new rows are
generated for cases that appear only in one of the two
data.frames.

Have a look at this sample

> dat1 <- data.frame(a = c(1,2,3,4), b = letters[1:4])
> dat2 <- data.frame(a = c(1,2,3,4,5,6,7,8,1), b = LETTERS[1:9])

> merge(dat1,dat2, by = "a", all = FALSE)
1 1a   A
2 1a   I
3 2b   B
4 3c   C
5 4d   D


Since "1" appears twice in the large data.frame it is repeated
as the help page ?merge says:

"If there is more than one match, all possible  matches
contribute one row each."

To compare have a look what the option "all = TRUE" changes

> merge(dat1,dat2, by = "a", all = TRUE)

Probably in your large data frame some rows have identical
target ids and get repeated. It should be easy to check it with
unique()

Hope this helps

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

Tova Fuller writes:
 > Hello all,
 > 
 > I have read as many merge issues as I possibly could tonight and  
 > although I presume this is a small error, I have not found the  
 > solution to my problem.
 > 
 > I'm trying to merge two data sets: dat0 and TransTable.  As you can  
 > see below, dat0 has 8000 rows, whereas TransTable has 47296 rows.  I  
 > would expect when I merge the two data sets, with all.x=F, and  
 > all.y=F, that the intersection would yield 8000 rows, considering  
 > dat0 is a subset of TransTable.
 > 
 > However, I get a neat little surprise when I check the dimensions of  
 > the resultant data frame - dat0merge, the merged data frame has 8007  
 > rows!  How can this be?  Where did these extra 7 rows come from?   
 > This appears to defy logic!
 > 
 > Thank you in advance for your help.  I've put my code below for  
 > reference.
 > 
 > Tova Fuller
 > 
 >  > dim(dat0)
 > [1] 8000   60
 >  > dim(TransTable)
 > [1] 47296 9
 >  > dat0merge=merge(TransTable,dat0,  
 > by.x="Target",by.y="TargetID",all.x=F,all.y=F)
 >  > dim(dat0merge)
 > [1] 8007   68
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer applied to a wellknown (?) example

2006-08-31 Thread Christoph Buser
Dear Henrik

There is an article in the R-News "Fitting linear mixed models
in R" in which you can find some examples for the syntax of
nested and non-nested design.

http://cran.r-project.org/doc/Rnews/Rnews_2005-1.pdf

Hope this helps

Christoph 

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

Henrik Parn writes:
 > Dear all,
 > 
 > During my pre-R era I tried (yes, tried) to understand mixed models by 
 > working through the 'rat example' in Sokal and Rohlfs Biometry (2000) 
 > 3ed p 288-292. The same example was later used by Crawley (2002) in his 
 > Statistical Computing p 363-373 and I have seen the same data being used 
 > elsewhere in the litterature.
 > 
 > Because this example is so thoroughly described, I thought it would be 
 > very interesting to analyse it also using lmer and to see how the 
 > different approaches and outputs differs - from the more or less manual 
 > old-school (?) approach in Sokal, aov in Crawley, and to mixed models by 
 > lmer.
 > 
 > In the example, three treatments (Treatment) with two rats (Rat) each 
 > (i.e six unique rats in total). Three liver preparations (Liver) are 
 > taken from each rat (i.e 18 unique liver preparations), and two glycogen 
 > readings (Glycogen) are taken from each liver preparation (36 readings).
 > 
 > We want to test if treatments has affected the glycogen levels. The 
 > readings are nested in preparation and the preparations nested in rats.
 > 
 > The data can be found here (or on p. 289 in Sokal):
 > http://www.bio.ic.ac.uk/research/mjcraw/statcomp/data/rats.txt
 > //
 > I was hoping to use the rat example as some kind of reference on my way 
 > to understand mixed models and using lmer. However, first I wish someone 
 > could check my suggested models!
 > 
 > My suggestions:
 > 
 > attach(rats)
 > rats$Glycogen <- as.numeric(Glycogen)
 > rats$Treatment <- as.factor(Treatment)
 > rats$Rat <- as.factor(Rat)
 > rats$Liver <- as.factor(Liver)
 > str(rats)
 > 
 > model1 <- lmer(Glycogen ~ Treatment + (1|Liver) + (1|Rat), data=rats)
 > summary(model1)
 > 
 > Was that it?
 > 
 > I also tried to make the 'liver-in-rat' nesting explicit  (as suggested 
 > in 'Examples from...')
 >  
 > model2 <- lmer(Glycogen ~ Treatment + (1|Rat:Liver) + (1|Rat), data=rats)
 > summary(model2)
 > 
 > but then the random effects differs from model1.
 > 
 > Does the non-unique coding of rats and preparations in the original data 
 > set mess things up? Do I need to recode the ids to unique levels...
 > 
 > rats$rat2 <- as.factor(rep(1:6, each=6))
 > rats$liver2 <- as.factor(rep(1:18, each=2))
 > str(rats)
 > 
 > ...and then:
 > 
 > model3 <- lmer(Glycogen ~ Treatment + (1|liver2) + (1|rat2), data=rats)
 > # or maybe
 > model3 <- lmer(Glycogen ~ Treatment + (1|rat2:liver2) + (1|rat2), data=rats)
 >  
 > 
 > Can anyone help me to get this right! Thanks in advance!
 > 
 > P.S.
 > Thanks to all contributors to lme/lmer topic on the list (yes, I have 
 > searched for 'lmer nested'...) and also the examples provided by John 
 > Fox' 'Linear mixed models, Appendix to An R and S-PLUS companion...' and 
 > Douglas Bates' 'Examples from Multilevel Software...' and R-news 5/1. 
 > Very helpful, but as usually I bet I missed something...Sorry.
 > 
 > Regards,
 > 
 > Henrik 
 > 
 > -- 
 > 
 > Henrik Pärn
 > Department of Biology
 > NTNU
 > 7491 Trondheim
 > Norway
 > 
 > +47 735 96282 (office)
 > +47 909 89 255 (mobile)
 > +47 735 96100 (fax)
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] removing intercept from lm() results in oddly high Rsquared

2006-08-07 Thread Christoph Buser
Dear Joseph

Have a look at the questions and answers in the two links 
below. There the topic has been discussed.

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/68905.html

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/6943.html

Best regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

Joseph LeBouton writes:
 > Can anyone help me understand why an lm model summary would return an 
 > r.squared of ~0.18 with an intercept term, and an r.squared of ~0.98 
 > without the intercept?   The fit is NOT that much better, according to 
 > plot.lm: residuals are similar between the two models, and a plot of 
 > observed x predicted is almost identical.
 > 
 > Thanks,
 > 
 > -Joseph
 > 
 > -- 
 > 
 > Joseph P. LeBouton
 > Forest Ecology PhD Candidate
 > Department of Forestry
 > Michigan State University
 > East Lansing, Michigan 48824
 > 
 > Office phone: 517-355-7744
 > email: [EMAIL PROTECTED]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help

2006-07-28 Thread Christoph Buser
Dear Claire

Thank you for providing an example, but it is not totally clear
to me how you want to transform your data. For example in row 1,
you have value 1 in column f and n, but the results after
transformation are not similar "A G" and "A A". Similar problem
for column h and i. They transform both to "G G" although the
original values 0,2 differ?

Furthermore it is not clear if you want to have a vector with
two elements as result of the transformation ->  "A"  "G" 
or one character "A G".
In the first case you cannot work with data.frames or matrices,
but you must use a list. An entry of a data.frame is always a
single element.

Regarding changing elements if they fulfil a condition, it is
not necessary to work with loops. E.g. for a vector

v[v==5] <- 7

changes all values of v which are 5 to 7. 
Please read "An Introduction to R" on the R home page about
indexing vectors, matrices, and so on.

The following code can show you an example how I understood your
problem and one possible solution. I hope it helps you.

## Create data.frame
DF <- data.frame(a = c(14,58), b = c(24,42), c = c("rcvf", "grde"),
 d = c("AG","AC"), e = c(5,2), f = c(2,5), g = c(1,0),
 h = c(0,5), i = c(2,1), n = c(1,0))
## work with matrix to avoid problems with factors
DF <- as.matrix(DF)
## transformation function
myfun <- function(v)
{
  v[v=="5"] <- "00"
  v[v=="1"] <- v["d"]
  v[v=="0"] <- paste(rep(substr(v["d"],1,1), 2), collapse = "")
  v[v=="2"] <- paste(rep(substr(v["d"],2,2), 2), collapse = "")
  v
}
## apply transformation function to all rows
result <- t(apply(DF, 1, myfun))


One final point. Providing examples is a good way to obtain fast
help from others. Could you please try to give some R-code in
the example as well to ease the handling for others, e.g. if you
provide an example data.frame, please include the code to
produce it:

## Create data.frame
DF <- data.frame(a = c(14,58), b = c(24,42), c = c("rcvf", "grde"),
     d = c("AG","AC"), e = c(5,2), f = c(2,5), g = c(1,0),
 h = c(0,5), i = c(2,1), n = c(1,0))

Have a nice week and enjoy further working with R.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


claire pujoll writes:
 > Dear R members,
 > 
 > Sorry for this question. I have a dataframe (please see the example) and i
 > should do the following transformation:
 > 
 > DF
 > abc   d  e  f   g   h   i   n
 > 14  24  rcvfAG5  2  1   0   2   1
 > 58  42  grde   AC2  5  0   5   1   0
 > 
 > 
 > I should transforme my DF like :
 > 
 >  abcd   ef g   h   in
 > 14  24  rcvf A G0 0   G GA G  G G   G G   A  A
 > 58  42  grdeA CC C  0 0 A A0 0A C   A A
 > 
 > i.e: when DF[i,j]==5 =>  DF[i,j] <- 0 0
 >  When DF[i,j] == k (!=0) i should look to d column to do the
 > transormation
 > 
 > DF is 20 * 1 so i cant use loops.
 > 
 > Thanks for your help,
 > Claire
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] KS Test Warning Message

2006-07-10 Thread Christoph Buser
Dear Justin

Ties means that you have identical values in
"Year5.lm$residuals". Please remark that you can have a large
R^2, but your residuals are not normally distributed. A large
R^2 shows a strong linear relationship, but that does not say
anything about the error distribution (see example below).

So to answer your question. Yes it can take away validity of
your model if the residuals are not normally distributed,
especially tests and confidence intervals for your parameters
are based on the normal assumption.
I'd recommend to verify model assumptions by graphical tools,
such as qqplot, Tukey-Anscombe Plot, ... 
Try:

plot(Year5.lm)

The power of KS-Test is quite small and graphical tools will
give you a hint about your true error distribution instead of
giving you only a p-value that "tells you" that the errors are
not normal.

set.seed(3)
x <- 1:100
## t-distributed errors
y <- x + rt(100,2)
## Strong linear relationship
plot(x,y)

## High R^2 due to strong linear relationship
summary(reg <- lm(y~x))
## The residuals are not normal distributed
qqnorm(resid(reg))
## Small power of KS-Test. Violation of model assumption is not detected
ks.test(resid(reg), "pnorm")

Best regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


justin rapp writes:
 > All,
 > 
 > Happy World Cup and Wimbledon.   This morning finds me with the first
 > of my many daily questions.
 > 
 > I am running a ks.test on residuals obtained from a regression model.
 > 
 > I use this code:
 > > ks.test(Year5.lm$residuals,pnorm)
 > 
 > and obtain this output
 >  One-sample Kolmogorov-Smirnov test
 > 
 > data:  Year5.lm$residuals
 > D = 0.7196, p-value < 2.2e-16
 > alternative hypothesis: two.sided
 > 
 > Warning message:
 > cannot compute correct p-values with ties in: ks.test(Year5.lm$residuals, 
 > pnorm)
 > 
 > I am wondering if anybody can tell me what this error message means.
 > 
 > Also, could anybody clarify how I could have a regression model with a
 > high Rsquared, rouglhy .67, but with nonnormal residuals?  Does this
 > take away from the validity of my model?
 > 
 > jdr
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] nested mixed-effect model: variance components

2006-06-19 Thread Christoph Buser
Dear John

I've put your mail back to the R list since I have no
explanation for the "lmer" result. Maybe someone else has an
idea. I adapted it to show some else that I do not understand. 

## Creation of the data (habitat is nested in lagoon):
set.seed(1)
dat <- data.frame(y = rnorm(100), lagoon = factor(rep(1:4,each = 25)),
  habitat = factor(rep(1:20, each = 5)))

## I do not understand how the random intercepts for lagoon and
## lagoon:habitat can both be estimated. It seems a little bit
## strange to me that they are identical (0.46565).
library(lme4)
summary(reg3 <- lmer(y~habitat+(1|lagoon)+(1|lagoon:habitat), data = dat))

## Furthermore I do not understand why the standard errors for
## the fixed effect of habitat are 1.131 for some habitats
## and 1.487 for the others???


## If one removes (1|lagoon), the variance component
## (1|lagoon:habitat) does not change its value (still 0.46565)???
summary(reg3a <- lmer(y~habitat+(1|lagoon:habitat), data = dat))

## Now all standard errors for the fixed factor habitat are 1.131.

## Altogether it seems a little bit strange to me and with the
## warnings and errors of the lme and aov call, I'd be carefull
## by using the output of lmer in that case. In addition I do
## not understand the interpretation of the random effect lagoon
## in top of the nested FIXED factor habitat.

summary(aov(y~habitat + Error(lagoon/habitat), data = dat))

detach(package:Matrix)
detach(package:lme4)
library(nlme)
summary(reg2 <- lme(y~habitat, random = ~1|lagoon/habitat, data = dat))
anova(reg2)


Best regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



>> Christoph,
>> 
>> I am sending this off list bacause I tried 'lmer'and
>> it seems to work with your contrived data,but I don't know why.
>> Can you explain it ?
>> 
>> 
>> John
>> 
>> 
>> 
>> 
>> > detach(package:nlme)
>> > library(Matrix)
>> 
>> summary(lmer(y~habitat+(1|lagoon)+(1|lagoon:habitat), data = dat))
>> 
>> Linear mixed-effects model fit by REML
>> Formula: y ~ habitat + (1 | lagoon) + (1 | lagoon:habitat)
>>   Data: dat
>>   AIC  BIClogLik MLdeviance REMLdeviance
>>  292.3627 349.6764 -124.1814   280.0245 248.3627
>> Random effects:
>>  Groups NameVariance Std.Dev.
>>  lagoon:habitat (Intercept) 0.46565  0.68239
>>  lagoon (Intercept) 0.46565  0.68239
>>  Residual   0.87310  0.93440
>> number of obs: 100, groups: lagoon:habitat, 20; lagoon, 4
>> 
>> Fixed effects:
>>   Estimate Std. Error  t value
>> (Intercept)  0.1292699  1.0516317  0.12292
>> habitat2 0.0058658  1.1316138  0.00518
>> habitat3-0.0911469  1.1316138 -0.08055
>> habitat4 0.3302971  1.1316138  0.29188
>> habitat5-0.0480394  1.1316138 -0.04245
>> habitat6-0.4778469  1.4872319 -0.32130
>> habitat7-0.0867301  1.4872319 -0.05832
>> habitat8 0.0696507  1.4872319  0.04683
>> habitat9-0.0998728  1.4872319 -0.06715
>> habitat100.1096064  1.4872319  0.07370
>> habitat11   -0.0430979  1.4872319 -0.02898
>> habitat120.0714719  1.4872319  0.04806
>> habitat130.3380993  1.4872319  0.22733
>> habitat140.3057808  1.4872319  0.20560
>> habitat15   -0.4915582  1.4872319 -0.33052
>> habitat16   -0.2624539  1.4872319 -0.17647
>> habitat17   -0.2203461  1.4872319 -0.14816
>> habitat180.2165269  1.4872319  0.14559
>> habitat190.6932896  1.4872319  0.46616
>> habitat20   -0.7271468  1.4872319 -0.48893
>>  anova(summary(lmer(y~habitat+(1|lagoon)+(1|lagoon:habitat), data = dat)))
>> 
>> Analysis of Variance Table
>> Df  Sum Sq Mean Sq
>> habitat 19 2.65706 0.13985
>> 
>> > VarCorr(summary(summary(lmer(y~habitat+(1|lagoon)+(1|lagoon:habitat), data
>> = dat)))
>> + )
>> $`lagoon:habitat`
>> 1 x 1 Matrix of class "dpoMatrix"
>> (Intercept)
>> (Intercept)   0.4656545
>> 
>> $lagoon
>> 1 x 1 Matrix of class "dpoMatrix"
>> (Intercept)
>> (Intercept)   0.4656545
>> 
>> attr(,"sc")
>> [1] 0.9343993
>> 
>> Christoph Buser wrote ---
>> 
>> Dear Eric
>> 
>> Do you really have habitats nested within lagoons or are they
>> partially crossed (meaning that you have

Re: [R] nested mixed-effect model: variance components

2006-06-12 Thread Christoph Buser
Dear Eric

Do you really have habitats nested within lagoons or are they
partially crossed (meaning that you have the same habitats in
different lagoons)?

If you have them perfectly nested, I think that you cannot
calculate both a fixed effect for habitats and a random effect
for lagoon (see the example below, lme and aov).

You can compare e.g. two lagoons by defining a contrast of the
habitats of one lagoon against the habitats of the other (if you
think that this is a meaningful test to interpret), but you
cannot estimate a random effect lagoon in presence of a nested
FIXED effect habitat. 

aov() will not return you the test and warn you about the
singular model. 

lme() will estimate a variance component for lagoon, but does
not provide you a test for the fixed factor.

Regards,

Christoph Buser



set.seed(1)
dat <- data.frame(y = rnorm(100), lagoon = factor(rep(1:4,each = 25)),
  habitat = factor(rep(1:20, each = 5)))

summary(aov(y~habitat + Error(lagoon/habitat), data = dat))

library(nlme)
summary(lme(y~habitat, random = ~1|lagoon/habitat, data = dat))


--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Eric Pante writes:
 > Dear listers,
 > 
 > I am trying to assess variance components for a nested, mixed-effects 
 > model. I think I got an answer that make sense from R, but I have a 
 > warning message and I wanted to check that what I am looking at is 
 > actually what I need:
 > 
 > my data are organized as transects within stations, stations within 
 > habitats, habitats within lagoons.
 > lagoons: random, habitats: fixed
 > the question is: how much variation is due to lagoons? habitats? 
 > lagoons*habitat? transects?
 > 
 > Here is my code:
 > 
 > res <- aov(COVER ~ HABITAT + Error(HABITAT+LAGOON+LAGOON/HABITAT), 
 > data=cov)
 > summary(res)
 > 
 > and I get Sum Sq for each to calculate variance components:
 > 
 > Error: STRATE
 > Df Sum Sq Mean Sq
 > STRATE  5 4493.1   898.6
 > 
 > Error: ATOLL
 >Df Sum Sq Mean Sq F value Pr(>F)
 > Residuals  5 3340.5   668.1
 > 
 > Error: STRATE:ATOLL
 >Df  Sum Sq Mean Sq F value Pr(>F)
 > Residuals 18 2442.71  135.71
 > 
 > Error: Within
 > Df Sum Sq Mean Sq F value Pr(>F)
 > Residuals 145 6422.044.3
 > 
 > My error message seems to come from the LAGOON/HABITAT, the Error is 
 > computed.
 > Warning message: Error() model is singular in: aov(COVER ~ HABITAT + 
 > Error(HABITAT+LAGOON+LAGOON/HABITAT), data=cov),
 > 
 > THANKS !!!
 > eric
 > 
 > 
 > 
 > Eric Pante
 > 
 > College of Charleston, Grice Marine Laboratory
 > 205 Fort Johnson Road, Charleston SC 29412
 > Phone: 843-953-9190 (lab)  -9200 (main office)
 > 
 > 
 >  "On ne force pas la curiosite, on l'eveille ..."
 >  Daniel Pennac
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] variable row names

2006-05-16 Thread Christoph Buser
Hi

I was confused by your example, too and still am a little bit
confused, so please excuse me if my example does not answer your 
question:

You have a data.frame "dat"
> dat <- data.frame(colname1 = 1:4, colname2 = 5:8)

You have a vector with the colnames of "dat"
> namen <- colnames(dat)

You want to acess on the values in dat bu using the vector "namen"
> for (i in namen)
>   print(dat[,i])

or

> for (i in 1:length(namen))
>   print(dat[,namen[i]])

Does that answer your question or is it more complicated?

Best regards,

Christoph

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


COMTE Guillaume writes:
 > Thks for your answer,
 > 
 > I wish to access colnames by using vars like in my example nom="toto" , if i 
 > type toto in R it will display the column named toto with the values that 
 > the column keep, i wish the same by typing nom, but in my case it shows the 
 > content of the var nom not the values of the column named toto, but "toto".
 > 
 > I can't be more accurate because i'm unable to formulate it an other way.
 > 
 > another example:
 > 
 > I've got a set of colnames :
 > 
 > > nomalarme1
 > [1] "AirFlowFailure"  "AirSensorFailure""GasLeek"
 > [4] "SuddenPressureDrop"  "InletPressureToLow"  "PressureRiseFailure"
 > [7] "HoseLeakTestFailure" "o50DCVoltageToLow"
 > 
 > All of these names identify colnames (like AirFlowFailure).
 > 
 > My problem is if i do 
 > 
 > For (i in 1:length(nomalarme1))
 > {
 > nomalarme1[i]
 > }
 > It will display the names of the columns not the datas, is there is 
 > something that can tell R to show data's that have for colname nomalarme[i], 
 > and not just the colname.
 > 
 > 
 > Thks for your answer, i've looked into S-Poetry and it seems that it isn't 
 > explained there, but my "poor" english maybe have missed it...
 > 
 > 
 > COMTE Guillaume
 > 
 > 
 >  
 > 
 > 
 > -Message d'origine-
 > De : Patrick Burns [mailto:[EMAIL PROTECTED] 
 > Envoyé : mardi 16 mai 2006 17:12
 > À : COMTE Guillaume
 > Objet : Re: [R] variable row names
 > 
 > I don't understand your question, but perhaps the
 > subscripting section of chapter 1 of S Poetry can
 > help you.
 > 
 > 
 > Patrick Burns
 > [EMAIL PROTECTED]
 > +44 (0)20 8525 0696
 > http://www.burns-stat.com
 > (home of S Poetry and "A Guide for the Unwilling S User")
 > 
 > COMTE Guillaume wrote:
 > 
 > >Hy all,
 > >
 > > 
 > >
 > >I wish to use a variable as rownames for a set of datas.
 > >
 > > 
 > >
 > >By example :
 > >
 > > 
 > >
 > > 
 > >
 > >  
 > >
 > >>nom<-"toto"
 > >>
 > >>
 > >
 > >  
 > >
 > >>prenom<-"tutu"
 > >>
 > >>
 > >
 > >  
 > >
 > >>res<-c(1,2)
 > >>
 > >>
 > >
 > >  
 > >
 > >>res<-t(res)
 > >>
 > >>
 > >
 > >  
 > >
 > >>res
 > >>
 > >>
 > >
 > > [,1] [,2]
 > >
 > >[1,]12
 > >
 > >  
 > >
 > >>colnames(res)<-c(nom,prenom)
 > >>
 > >>
 > >
 > >  
 > >
 > >>res
 > >>
 > >>
 > >
 > > toto tutu
 > >
 > >[1,]12
 > >
 > >  
 > >
 > >>nom
 > >>
 > >>
 > >
 > >[1] "toto"
 > >
 > >  
 > >
 > >
 > > 
 > >
 > >I wish to call the rowname by the variable name, in this case it would
 > >be < nom > or < prenom > , but if i do that it gives me the name of the
 > >row not the values of it.
 > >
 > > 
 > >
 > >Is this possible or not?
 > >
 > > 
 > >
 > >Thks all.
 > >
 > > 
 > >
 > >COMTE Guillaume
 > >
 > > 
 > >
 > >
 > >[[alternative HTML version deleted]]
 > >
 > >__
 > >R-help@stat.math.ethz.ch mailing list
 > >https://stat.ethz.ch/mailman/listinfo/r-help
 > >PLEASE do read the posting guide! 
 > >http://www.R-project.org/posting-guide.html
 > >
 > >
 > >  
 > >
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] t-test with autocorrelation correction

2006-05-12 Thread Christoph Buser
Dear Denis

I do not know the correction by Dale and Fortin, but assuming
that it is closely related to page 15 of Cressie, 1993, it's
probably similar.

In Cressie page 15, it's assumed that the process has an AR(1)
structure. Therefore you could use the function gls() from the
"nlme" package with this correlation structure.

set.seed(1)
n <- 100
## Creating the data (AR(1) structure)
x <- rnorm(1,0,0.3)
for(i in 2:n)
  x[i] <- x[i-1]*0.5 + rnorm(1,0,0.3)
## Calculating a model (only intercept)
res1 <- gls(model = x ~ 1, correlation = corAR1(), method = "ML")
summary(res1)

>  Value  Std.Error  t-value p-value
> (Intercept) 0.06358824 0.04800817 1.324530  0.1884

## The test for the intercept is a one-sample t-test for the
## Null Hypothesis mu=0, using the correct standard error.

## Compare it to an ordinary t-test that has a too small
## standard error due to the disregard of the autocorrelation
## and therefore a too small p-value. 
t.test(x)

> t = 2.3069, df = 99, p-value = 0.02314

There might be easier ways to implement a t-test with
autocorrelation. The advantage of this approach is that you can
calculate more general regression models with autocorrelation if
you do know the gls syntax.

Best regards,

Christoph

--

Credit and Surety PML study: visit our web page www.cs-pml.org

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--




[EMAIL PROTECTED] writes:
 > Has anyone implemented a t-test with the effective sample size
 > correction proposed by Dale and Fortin, Ecoscience 9(2):162-167, 2002,
 > using a discussion by Cressie, 1993, page 15?
 > 
 > thanks,
 > Denis
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] summary.lme: argument "adjustSigma"

2006-05-03 Thread Christoph Buser
Dear Spencer 

Thank you for your answer. You are correct. The "adjustSigma"
argument is only used for the "ML" method. In the code there is: 

stdFixed <- sqrt(diag(as.matrix(object$varFix)))

if (object$method == "ML" && adjustSigma == TRUE) {
stdFixed <- sqrt(object$dims$N/(object$dims$N - length(stdFixed))) * 
stdFixed
}

But my question is still open:

Looking into the code above, "stdFixed" is adapted for the "ML"
method if "adjustSigma" is TRUE. Therefore the standard error
and the test results for the fixed effects is affected.

But I would expect that the residual standard error should be
adapted, too. But "object$sigma" of the lme object is unchanged
and therefore in the summary output, the estimate of the
residual standard error is identical, independent of the value
of "adjustSigma".

To my understanding this is a discrepancy to the help page that
says: 

adjustSigma: an optional logical value. If 'TRUE' and the
  estimation method used to obtain 'object' was maximum
  likelihood, the residual standard error is multiplied
  by sqrt(nobs/(nobs - npar)), converting it to a
  REML-like estimate. This argument is only used when a
  single fitted object is passed to the
  function. Default is 'TRUE'. 


Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Spencer Graves writes:
 >It appears that the "adjustSigma" argument of 'summary.lme' does 
 > nothing with the default method, "REML".  To check, I tried the 
 > following modification of the 'summary.lme' example:
 > 
 > fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject)
 > fm1sa <- summary(fm1)
 > fm1s <- summary(fm1, adjustSigma=FALSE)
 > fm1saa <- summary(fm1, adjustSigma=TRUE)
 > all.equal(fm1s, fm1sa)
 > TRUE
 > all.equal(fm1s, fm1saa)
 > TRUE
 > #
 > 
 >When I changed 'method' to "ML" in this example, the result suggested 
 > that the adjustment to sigma also affected the standard errors and t 
 > values for the fixed effects.  If the p-values had not been 0, they also 
 > would have been affected:
 > 
 > fm2 <- lme(distance ~ age, Orthodont, random = ~ age | Subject,
 > method="ML")
 > (fm2sa <- summary(fm2))
 > Linear mixed-effects model fit by maximum likelihood
 >   Data: Orthodont
 > AIC  BIClogLik
 >451.2116 467.3044 -219.6058
 > 
 > Random effects:
 >   Formula: ~age | Subject
 >   Structure: General positive-definite, Log-Cholesky parametrization
 >  StdDevCorr
 > (Intercept) 2.1940998 (Intr)
 > age 0.2149245 -0.581
 > Residual1.3100399
 > 
 > Fixed effects: distance ~ age
 >  Value Std.Error DF   t-value p-value
 > (Intercept) 16.76 0.7678975 80 21.827277   0
 > age  0.660185 0.0705779 80  9.353997   0
 >   Correlation:
 >  (Intr)
 > age -0.848
 > 
 > Standardized Within-Group Residuals:
 >   Min   Q1  Med   Q3  Max
 > -3.305969237 -0.487429631  0.007597973  0.482237063  3.922789795
 > 
 > Number of Observations: 108
 > Number of Groups: 27
 >  > (fm2s <- summary(fm2, adjustSigma=FALSE))
 > Linear mixed-effects model fit by maximum likelihood
 >   Data: Orthodont
 > AIC  BIClogLik
 >451.2116 467.3044 -219.6058
 > 
 > Random effects:
 >   Formula: ~age | Subject
 >   Structure: General positive-definite, Log-Cholesky parametrization
 >  StdDevCorr
 > (Intercept) 2.1940998 (Intr)
 > age 0.2149245 -0.581
 > Residual1.3100399
 > 
 > Fixed effects: distance ~ age
 >  Value Std.Error DF  t-value p-value
 > (Intercept) 16.76 0.7607541 80 22.03223   0
 > age  0.660185 0.0699213 80  9.44183   0
 >   Correlation:
 >  (Intr)
 > age -0.848
 > 
 > Standardized Within-Group Residuals:
 >   Min   Q1  Med   Q3  Max
 > -3.305969237 -0.487429631  0.007597973  0.482237063  3.922789795
 > 
 > Number of Observations: 108
 > Number of Groups: 27
 >  >
 >Does this answer the question?
 >spencer graves
 >  
 > Christoph Buser wrote:
 > 
 > > Dear R-list
 > > 
 > > I have a questio

[R] summary.lme: argument "adjustSigma"

2006-04-25 Thread Christoph Buser
Dear R-list

I have a question concerning the argument "adjustSigma" in the
function "lme" of the package "nlme". 

The help page says:

"the residual standard error is multiplied by sqrt(nobs/(nobs - 
npar)), converting it to a REML-like estimate."

Having a look into the code I found:

stdFixed <- sqrt(diag(as.matrix(object$varFix)))

if (object$method == "ML" && adjustSigma == TRUE) {
stdFixed <- sqrt(object$dims$N/(object$dims$N - length(stdFixed))) * 
stdFixed
}

tTable <- data.frame(fixed, stdFixed, object$fixDF[["X"]], 
fixed/stdFixed, fixed)


To my understanding, only the standard error for the fixed
coefficients is adapted and not the residual standard error. 

Therefore only the tTable of the output is affected by the
argument "adjustSigma", but not the estimate for residual
standard error (see the artificial example below). 

May someone explain to me if there is an error in my
understanding of the help page and the R code? 
Thank you very much.  

Best regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH Zurich  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Example
---

set.seed(1)
dat <- data.frame(y = rnorm(16), fac1 = rep(1:4, each = 4),
  fac2 = rep(1:2,each = 8))

telme <- lme(y ~ fac1, data = dat, random = ~ 1 | fac2, method = "ML")
summary(telme)
summary(telme, adjustSigma = FALSE)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] difftime arguments

2006-03-16 Thread Christoph Buser
Hi Fred

First of all, I have put the email back on the R-list. Please
answer by reply to all (including the list). Others might have
similar problems and others might have additional proposals,
respectively. 

You should not confound typeof() and class()
Try 
> class(g)
and you will see that your object has the class that you have
assumed. 

tyepof() has to do with the internal storage mode of an
object. See ?class and ?typeof for more details.

Finally use
> as.numeric(g)

But if you want to remove the units you should use the argument
"unit" in difftime(). Otherwise you can get strange results if
the unit is chosen automatically and removed afterwards.

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

--text follows this line--
Fred J. writes:
 > z <- as.POSIXct(strptime(ts, "%m/%d/%Y %I:%M:%S %p")) 
 > z then has the formate #"2006-01-12 16:00:00 Pacific Standard Time"
 > one would expect > typeof (z) would be an object of class POSIXct but notice
 > > typeof(z)
 > [1] "double"
 > 
 > now when I put
 > > g <- difftime(z[11745], z[11746])
 > > g
 > Time difference of 43.61667 mins
 > > typeof(g)
 > [1] "double"
 > > 
 > how can I just have 43.61667 with out the text stuff around it in the object 
 > "g"?
 > 
 > 
 > 
 > Christoph Buser <[EMAIL PROTECTED]> wrote: "format" is not an argument of 
 > the function difftime(). It is an
 > argument of the function as.difftime().
 > Therefore I think that the direct way without converting the
 > dates can not be done, but I am not a specialist in time
 > question in R :-)
 > 
 > Fred J. writes:
 >  > what about this
 >  > 
 >  > > difftime(c("1/12/2006 3:59:45 PM", "1/12/2006 3:59:57 PM"), format 
 > ="%m/%d/%Y %I:%M:%S %p")
 >  > Error in difftime(c("1/12/2006 3:59:45 PM", "1/12/2006 3:59:57 PM"), 
 > format = "%m/%d/%Y %I:%M:%S %p") : 
 >  > unused argument(s) (format ...)
 >  > >
 >  > 
 >  > Christoph Buser  wrote: Dear Fred
 >  > 
 >  > You should change your code from
 >  >  >   x <- strptime(ts, "%m/%d/%y %I:%M:%S %p")
 >  > 
 >  > to
 >  >  >   x <- strptime(ts, "%m/%d/%Y %I:%M:%S %p")
 >  > 
 >  > "Y" instead of "y", since your year includes the century 
 >  > (2006 and not 06)
 >  > 
 >  > Then it should work.
 >  > 
 >  > Regards,
 >  > 
 >  > Christoph
 >  > 
 >  > --
 >  > Christoph Buser 
 >  > Seminar fuer Statistik, LEO C13
 >  > ETH (Federal Inst. Technology) 8092 Zurich  SWITZERLAND
 >  > phone: x-41-44-632-4673  fax: 632-1228
 >  > http://stat.ethz.ch/~buser/
 >  > --
 >  > 
 >  > 
 >  > 
 >  > Fred J. writes:
 >  >  >   Hi
 >  >  > I just started using RGui.exe under widnows.
 >  >  >   I have a text file containing date arranged in columns and rows, 
 > each column has the same format, each row with different formats. 3 of the 
 > columns are something like this 1/12/2006 3:59:45 PM
 >  >  >   I need to calculate the different in seconds between 2 selected 
 > periods using their row�s index
 >  >  >
 >  >  >   My solution:
 >  >  >   Read the file in a data frame and collect the columns in concern in 
 > a variable.
 >  >  >   data1 <- read.table("C:\\path\\data.txt", header=TRUE)
 >  >  >   ts <- paste(data1[[1]], data1[[3]], data1[[7]])   
 >  >  >
 >  >  >   #ts now looks like this 
 >  >  >   #... "1/12/2006 3:59:45 PM"  "1/12/2006 3:59:57 PM" ...
 >  >  >
 >  >  >   #now convert between character representations and object of classes 
 > "POSIXct"
 >  >  >   x <- strptime(ts, "%m/%d/%y %I:%M:%S %p")
 >  >  >
 >  >  >   this last code line is putting out NA, what did I do wrong?
 >  >  >
 >  >  >   After this then I do
 >  >  >   z <- as.POSIXct(x)
 >  >  >
 >  >  >   Is my whole approach efficient?
 >  >  >   Then I can

Re: [R] difftime arguments

2006-03-15 Thread Christoph Buser
Dear Fred

You should change your code from
 >   x <- strptime(ts, "%m/%d/%y %I:%M:%S %p")

to
 >   x <- strptime(ts, "%m/%d/%Y %I:%M:%S %p")

"Y" instead of "y", since your year includes the century 
(2006 and not 06)

Then it should work.

Regards,

Christoph

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Fred J. writes:
 >   Hi
 > I just started using RGui.exe under widnows.
 >   I have a text file containing date arranged in columns and rows, each 
 > column has the same format, each row with different formats. 3 of the 
 > columns are something like this 1/12/2006 3:59:45 PM
 >   I need to calculate the different in seconds between 2 selected periods 
 > using their row?s index
 >
 >   My solution:
 >   Read the file in a data frame and collect the columns in concern in a 
 > variable.
 >   data1 <- read.table("C:\\path\\data.txt", header=TRUE)
 >   ts <- paste(data1[[1]], data1[[3]], data1[[7]])   
 >
 >   #ts now looks like this 
 >   #... "1/12/2006 3:59:45 PM"  "1/12/2006 3:59:57 PM" ...
 >
 >   #now convert between character representations and object of classes 
 > "POSIXct"
 >   x <- strptime(ts, "%m/%d/%y %I:%M:%S %p")
 >
 >   this last code line is putting out NA, what did I do wrong?
 >
 >   After this then I do
 >   z <- as.POSIXct(x)
 >
 >   Is my whole approach efficient?
 >   Then I can use difftime, because using it without the above preparations 
 > gives this
 >
 >   > difftime(c("1/12/2006 3:59:45 PM", "1/12/2006 3:59:57 PM"), format 
 > ="%m/%d/%y %I:%M:%S %p")
 >   Error in difftime(c("1/12/2006 3:59:45 PM", "1/12/2006 3:59:57 PM"), 
 > format = "%m/%d/%y %I:%M:%S %p") : 
 >   unused argument(s) (format ...)
 >
 >   Thanks
 >   
 > __
 > 
 > 
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] lmer

2006-03-13 Thread Christoph Buser
Dear Erlend

For an answer to your first question about the degrees of
freedom, have a look at:

https://stat.ethz.ch/pipermail/r-help/2006-January/016384.html

To answer the second question: I think that the "AGQ" method is
not yet implemented as the error message implies. "lmer" is a
function that is under development which needs some time. 

Regards,

Christoph

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Erlend Myre Bremset writes:
 > Hi!
 > I've just updated my version of R, from 2.0.1 to 2.2.1.
 > When I run my old lmer models in the new version, I get no p-values for
 > the fixed factors, i got them with the older version of R.
 > Also, when I try to run a binomial model in lmer, with the AGQ method I
 > get the following message:
 > Error in lmer(RECRUIT ~ H + I(H^2) + LNRHDAY + (1 | year:isl:nest) + (1
 > |  :
 > method = "AGQ" not yet implemented for supernodal representation
 > 
 > 
 > Erlend Myre Bremset
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] nls number of explantory variables

2006-03-13 Thread Christoph Buser
Hi Harsh

As indicated in the answer to your first post, it is not so easy
to debug code without the code itself.

1) Have you checked the names of your variables so that you are
   sure that they are correct. For example undesired white space
   due to the automatic creation of variable names with paste()
   results in undesired variable names:

dat <- data.frame(x1 = rnorm(10), "x 2" = rnorm(10), y = rnorm(10))
nls(y ~ beta*x1  + gamma*x2 , data = dat, start = list(beta = 1, gamma = 1))
## variable name "x 2" has been changed to "x.2"
str(dat)

2) If you believe that nls can not handle a certain amount of
   variables you can check it easily by yourself. E.g. If your
   code works for  the variables x1,x2,...,x170, replace one
   variable by variable x181 and check if it works.
   If it does not work it is not the amount of variables.

In general (also as an answer to your first post: "Non-linear
Regression : Error in eval(expr, envir, enclos)", I would
recommend to do some debugging by starting with code that works
and go stepwise until some errors occurs. If there is a conflict
between two or more variables or a problem with over
parameterization you have a better chance to find it.

Regards,

Christoph 

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Cal Stats writes:
 > Hi..
 >   
 >  is there a limit on the number of explanatory variables in nls ?
 >   
 >   i have a dataframe with the columns names x1,x2..,x300
 >   when i run nls it gives the error: " x181 not found"
 >   
 >   thought it does run when i have x1,x2,...,x170 variables.
 >   
 >   Thanks
 >   
 >   Harsh
 >   
 >   
 >   
 >  
 > -
 > 
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help - lm, glm, aov results inconsistent with other stati stical package

2006-03-01 Thread Christoph Buser
Dear Prof. Ripley

Please appologize. I was a little too short in my first
answer. My guess is that the line equations have been produced
by Ben (since JMP does not give me that line equation as
output), using the coefficients from JMP with the wrong assumption
that JMP uses the constraint that one coefficient for each
factor is set on zero instead of the true constraint that the
sum of coefficients is equal to zero.

The following example shows what JMP does:


## create A with 2 levels - 2 and 4
A <-factor(c(rep(2,times=30),rep(4,times=42))) 
set.seed(1)
## generate 72 random x points
x <-rnorm(72)   
## create different slopes & intercepts for A=2 and A=4
## add a random error term 
y <-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2)   
dat <- data.frame(A, x, y)

options(contrasts = c("contr.sum", "contr.poly" ))
## y_1 = mu + Beta*x(alpha_1 = 0, Beta_int_1 = 0)  [1]
## y_2 = mu + alpha + Beta*x + Beta_int*x  [2]
## ([1] + [2])/2 = mu + alpha/2 + Beta + Beta_int/2*x[3]
test.lm1 <-lm(y~A*x, dat = dat)
## check the output
summary(test.lm1)
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept)  -0.6397 0.2051  -3.119  0.00266 ** 
> A11.5912 0.2051   7.759 5.99e-11 ***
> x 1.6821 0.2226   7.555 1.41e-10 ***
> A1:x -0.4372 0.2226  -1.964  0.05367 .  
> ---
> Signif. codes:  0 $,1rx(B***$,1ry(B 0.001 $,1rx(B**$,1ry(B 0.01 
> $,1rx(B*$,1ry(B 0.05 $,1rx(B.$,1ry(B 0.1 $,1rx(B $,1ry(B 1 

In JMP when I calculate a linear model I get:

> Term  Estimate   Std error   t Ratio Prob>|t|
> Intercept -0.639704  0.205071   -3.12   0.0027
> A[2]  1.5260508  0.2030697.51   <.001*
> x 1.6821048  0.22264 7.56   <.001*
> A[2]*(x-0.14909)  -0.437177  0.22264-1.96   0.0537
## y_1 = mu + alpha/2 + Beta*x + Beta_int/2*(x - mean(x))(1a)
## y_2 = mu - alpha/2 - Beta*x - Beta_int/2*(x - mean(x))(2a)

mean(dat[,"x"])
> 0.1490923

1.5912 = 1.5260508 - (-0.437177)*0.1490923


Going back to the original post with the line equations:

 y = 7072.09-1024.94 x (for A=0) and
 y = 7875.58-970.088 x (for A=1)
 
from JMP. And from R I get

 y = 6276.7-1259.8 x (for A=0) and
 y = 7867.5-1150.1 x (for A=1).


If one tries to calculate the line equations in belief that the
coefficients are estimates, using treatment contrasts, but in reality
the coefficients are estimates, using sum contrasts, I would expect
that the difference of the slopes of his wrong equations are half of
the difference of the true line equations:

1024.94 - 970.088   = 54.852
(1259.8 - 1150.1)/2 = 54.85
  
Furthermore if I expect to get the intercept of the first wrong
equation if I take the mean of the two correct ones:

(7867.5 + 6276.7)/2 = 7072.1

Therefore I assume that not JMP generated the wrong equations rather than Ben 
misinterpreted the coefficients.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Prof Brian Ripley writes:
 > Similar results for what?
 > 
 > The original posting quoted fitted lines for the two groups, which do not 
 > depend on the parametrization.  If that is what you want, it is best to 
 > parametrize the model directly to give them, which was one of Berwin's 
 > alternatives.  I don't have JMP, so do not know how to do that in JMP 
 > (and I still think the solution here is to seem help on using JMP).
 > 
 > On Wed, 1 Mar 2006, Christoph Buser wrote:
 > 
 > > Dear Ben
 > >
 > > Berwin is correct in his answer about different
 > > parameterizations.
 > >
 > > After changing the contrasts in R from treatment to sum
 > >
 > > options(contrasts = c("contr.sum", "contr.poly" ))
 > > test.lm<-lm(y~A*x)
 > > summary(test.lm)
 > >
 > > I got similar results to JMP. Be careful in doing a correct
 > > interpretation of your coefficients, especially when you have an
 > > interaction in your model.
 > >
 > > Regards,
 > >
 > > Christoph Buser
 > >
 > > --
 > > Christoph Buser <[EMAIL PROTECTED]>
 > > Seminar fuer Statistik, LEO C13
 > > ETH (Federal Inst. Technology) 8092 Zurich  SWITZERLAND
 > > phone: x-41-44-632-4673fax: 632-1228
 > > http://stat.ethz.ch/~buser/
 > > -

Re: [R] Help - lm, glm, aov results inconsistent with other stati stical package

2006-03-01 Thread Christoph Buser
Dear Ben

Berwin is correct in his answer about different
parameterizations.

After changing the contrasts in R from treatment to sum

options(contrasts = c("contr.sum", "contr.poly" ))
test.lm<-lm(y~A*x)
summary(test.lm)

I got similar results to JMP. Be careful in doing a correct
interpretation of your coefficients, especially when you have an
interaction in your model.

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--




Ben Ridenhour writes:
 > Okay, I took the data to SAS and it gives me the same answer that R does.  I 
 > don't why JMP is giving me an incorrect answer, but it seems to be.  (Either 
 > that or I have made the same mistake in SAS and R.) Any ideas what JMP might 
 > be doing?
 >  
 >  Ben
 >  
 > ---
 > Benjamin Ridenhour
 > School of Biological Sciences
 > Washigton State University
 > P.O. Box 644236
 > Pullman, WA 99164-4236
 > Phone (509)335-7218
 >  
 > "Nothing in biology makes sense except in the light of evolution."
 > -T. Dobzhansky
 > 
 > 
 > - Original Message 
 > 
 > To: "Liaw, Andy" <[EMAIL PROTECTED]>; r-help@stat.math.ethz.ch
 > Sent: Tuesday, February 28, 2006 10:50:18 PM
 > Subject: Re: [R] Help - lm, glm, aov results inconsistent with other stati 
 > stical package
 > 
 > Alright, I'll try to give some sample code.
 >  
 >  # create A with 2 levels - 2 and 4
 >  > A<-c(rep(2,times=30),rep(4,times=42)) 
 >  
 >  # make A a factor
 >  > A<-as.factor(A)  
 >  
 >#generate 72 random x points
 >  > x<-rnorm(72)   
 >  
 >   #create different slopes & intercepts for A=2 and A=4
 >  #add a random error term 
 >  > y<-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2)   
 >  
 >  #use model y~A*x for lm (or glm)
 >  > test.lm<-lm(y~A*x)
 >  
 >  #check the output
 >  > summary(test.lm)
 >  
 >  This essentially creates something like my data set and uses the same 
 > model.  In response to (1), I was just using 0/1 because these are codings 
 > for the 2 levels, correct? (i.e., when A=2 the dummy variable=0, when A=4 
 > the dummy variable=1?).  In response to (2), yes, I do want different slopes 
 > for the two categories (that is what I am interested in testing).
 >  
 >  If I export the data created above to JMP and run what I believe to be the 
 > same model, I get a different answer for my equations :(
 >  
 >  
 > ---
 > Benjamin Ridenhour
 > School of Biological Sciences
 > Washigton State University
 > P.O. Box 644236
 > Pullman, WA 99164-4236
 > Phone (509)335-7218
 >  
 > "Nothing in biology makes sense except in the light of evolution."
 > -T. Dobzhansky
 > 
 > 
 > - Original Message 
 > From: "Liaw, Andy" <[EMAIL PROTECTED]>
 > 
 > Sent: Tuesday, February 28, 2006 5:14:57 PM
 > Subject: RE: [R] Help - lm, glm, aov results inconsistent with other stati 
 > stical package
 > 
 > 1. You have levels(A) as "2" and "4", yet you showed equations for A=0 and
 > A=1?
 > 
 > 2. y = A + X + A*X means you're allowing the different groups of A to have
 > different slopes.  Probably not what you intended.
 > 
 > 3. It's probably best to provide a small sample of the data (and R code) so
 > we know how you got what you got.
 > 
 > Andy
 > 
 > From: Ben Ridenhour
 > > 
 > > Hello,
 > >  
 > >  I 'm sure there must a be a simple explanation for what I'm 
 > > doing wrong but I am stumped.  I am a novice R user and this 
 > > has shaken my confidence in what I'm doing! I am trying to 
 > > run a simple ANCOVA using the model y~A*x, where y and x are 
 > > continuous and A has two levels.  Everything seems fine until 
 > > I compare the output with what I get from another statistical 
 > > package (JMP).  JMP has the model y=A+x+A*x (this should be 
 > > the same as what I specified to R, correct?).  In comparison 
 > > I get the line equations
 > >  
 > >  y = 7072.09-1024.94 x (for A=0) and
 > >  y = 7875.58-970.088 x (for A=1)
 > >  
 > >  from JMP.  And from R I get
 > >  
 > >  y = 6276.7-1259.8

Re: [R] Extracting variance components from lmer

2006-02-21 Thread Christoph Buser
Hi Rick 

There may be a better way, but the following should work: 
 
attributes(vc.fit)$sc

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Rick DeShon writes:
 > Hi All.
 > 
 > I need a bit of help extracting the residual error variance from the VarCorr
 > structure from lmer.
 > 
 > #Here's a 2-way random effects model
 > lmer.1<- lmer(rating ~ (1|person)+(1|rater), data = dat)
 > 
 > #Get the structure
 > vc.fit <- VarCorr(lmer.1)
 > 
 > #results in.
 > $person
 > 1 x 1 Matrix of class "dpoMatrix"
 > (Intercept)
 > (Intercept)   0.7755392
 > 
 > $rater
 > 1 x 1 Matrix of class "dpoMatrix"
 > (Intercept)
 > (Intercept)   0.2054469
 > 
 > attr(,"sc")
 > [1] 0.5051518
 > 
 > #I can pull out the person and rater variance components easy enough. For
 > example...
 > vc.person <- [EMAIL PROTECTED]
 > 
 > I'm sure it's simple but I have not been able to grab the residual variance
 > in the last matrix.  I simply wish to asign the residual to a scalar
 > variable.  Any suggestions would be appreciated!
 > 
 > Thanks!
 > 
 > Rick DeShon
 > 
 > 
 > --
 > Rick DeShon
 > 306 Psychology Building
 > Department of Psychology
 > Michigan State University
 > East Lansing, MI 48824-1116
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] logical condition in vector operation

2006-02-09 Thread Christoph Buser
Dear Frederico

>From your example it is not clear to me what you like to obtain: 
Please have a look on the slightly changed example here (I
changed two values to show a potentially undesired side effect of
your coding.


test <- data.frame(rbind(c(3,3,10,21,0), c(2,3,11,12,0), c(3,4,12,23,0),
 c(3,5,13,24,0)))
names(test) <- c("x","y","p","d","su")
test

>>   x y  p  d su
>> 1 3 3 10 21  0
>> 2 2 3 11 12  0
>> 3 3 4 12 23  0
>> 4 3 5 13 24  0

j <- 3
test[test[,1] == j, 5] <- test[test[,1] == j,2] + test[test[,2] == j,1]

>> > > Warning message:
>> longer object length
>>  is not a multiple of shorter object length in: 
>> test[test[, 1] == j, 2] + test[test[, 2] == j, 1] 

Your code example produces now a warning for the adapted
data frame "test", since one tries to add two vectors of length 2
and 3, respectively. The result is based on recycling of the
smaller vector. In your example there was no warning since the
second column had only one entry.
The result with the adapted data frame is:

test
>>   x y  p  d su
>> 1 3 3 10 21  6
>> 2 2 3 11 12  0
>> 3 3 4 12 23  6
>> 4 3 5 13 24  8

Is that kind of recycling desired in your application. Otherwise
you should be careful with the coding example above.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--






Federico Calboli writes:
 > HI All,
 > 
 > I have a data frame such as:
 > 
 > > test
 >  x y  p  d
 > [1,] 1 0 10 21 0
 > [2,] 2 3 11 12 0
 > [3,] 3 4 12 23 0
 > [4,] 3 5 13 24 0
 > 
 > 
 > and I want to perfor some operations on the first two coulums,
 > conditional on the uneqaulity values on the 3rd and 4th columns.
 > 
 > For instance:
 > 
 > j = 3
 > test[test[,1] == j, 5] = test[test[,1] == j,2] + test[test[,2] == j,1]
 > 
 > gives me the result:
 > 
 > test:
 > 
 >  x y  p  d
 > [1,] 1 0 10 21 0
 > [2,] 2 3 11 12 0
 > [3,] 3 4 12 23 6
 > [4,] 3 5 13 24 7
 > 
 > 
 > My probblem is the following: I want to perform the operation
 > test[test[,1] == j,2] + test[test[,2] == j,1] only if the value of
 > column p and column d are different at the positions where x or y = j.
 > In practice, I don't want to perform the first operation because
 > test[2,4 is 12 and test[1,3] is 12 as well.
 > 
 > I tried an if statement with little success:
 > 
 > if(test[test[,1] == j,3] != test[test[,2] == j,4]){
 > test[test[,1] == j, 5] = test[test[,1] == j,2] + test[test[,2] == j,1]
 > }
 > Warning message:
 > the condition has length > 1 and only the first element will be used in:
 > if (test[test[, 1] == j, 3] != test[test[, 2] == j, 4]) {
 > 
 > Could anyone lend some advice?
 > 
 > Cheers,
 > 
 > Federico
 > -- 
 > Federico C. F. Calboli
 > Department of Epidemiology and Public Health
 > Imperial College, St Mary's Campus
 > Norfolk Place, London W2 1PG
 > 
 > Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
 > 
 > f.calboli [.a.t] imperial.ac.uk
 > f.calboli [.a.t] gmail.com
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Mixed models and missing p-value...

2006-02-06 Thread Christoph Buser
Dear Simon

There was a discussion about degrees of freedom in linear mixed
models on the R-list last week. Have a look at

https://stat.ethz.ch/pipermail/r-help/2006-January/085560.html

and following. There have been earlier discussions about that
topic. Try also:

RSiteSearch("degree freedom mixed")


Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Simon Blanchet writes:
 > Dear R-users,
 > 
 > I computed a simple mixed models which was:
 > 
 >  mod<-lmer(nb ~ site + (1|patelle),tr)
 > 
 > The output was:
 > 
 > Linear mixed-effects model fit by REML
 > Formula: nb ~ site + (1 | patelle)
 > Data: tr
 >AIC  BIClogLik MLdeviance REMLdeviance
 >   1157.437 1168.686 -574.7184   1164.523 1149.437
 > Random effects:
 >   Groups   NameVariance Std.Dev.
 >   patelle  (Intercept)  34.995   5.9157
 >   Residual 744.736  27.2899
 > # of obs: 123, groups: patelle, 33
 > 
 > Fixed effects:
 >  Estimate Std. Error t value
 > (Intercept)  60.3483 4.3929 13.7378
 > siteLCN -20.1969 7.8070 -2.5870
 > siteLCS -18.2154 6.1514 -2.9612
 > 
 > Correlation of Fixed Effects:
 >  (Intr) sitLCN
 > siteLCN -0.563
 > siteLCS -0.714  0.402
 > 
 > I don't understand why D.F. and p-values associated to the fixed-effects 
 > coefficients are missing.
 > Could anyone help me?
 > 
 > When I tried another model (mod2<-lmer(nb ~ site + 
 > (1|patelle),tr,family=poisson)), D.F. and p-values were given...
 > 
 > Thank you in advance.
 > 
 > Very sincerely, Simon
 > 
 > 
 > 
 > BLANCHET Simon
 > PhD student
 > Université Laval - Québec-Océan / CIRSA
 > Pavillon Alexandre-Vachon
 > Local 8022
 > Québec (Québec), Canada G1K 7P4
 > Téléphone : (418) 656-2131 poste 8022
 > courriel : [EMAIL PROTECTED]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Randomised Block Design

2006-02-02 Thread Christoph Buser
Hi

try ?aov

That's a good starting point with references to other functions
in R and some literature.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Krzysztof Suwada writes:
 > Hi, I'm studying math, and i have to make an analysys using Randomised 
 > Block Design. I have two factors, i know how to do this in Statistica, but 
 > how to do this in R, i read some manuals but the only thing that i have 
 > found was 2 factor ANOVA.
 > 
 > Please could someone help me, or give some usefull links ??
 > 
 > 
 > Krzytsztof Suwada
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Type II SS for fixed-effect in mixed model

2006-01-31 Thread Christoph Buser
Dear Julien

Have a look at the "type" argument in ?anova.lme

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Martin Julien writes:
 > Hi
 > In mixed-model with lme()
 > How can I obtain Type II SS or Type III SS for fixed effect?
 > Thanks 
 > Julien
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] linear contrasts with anova

2006-01-23 Thread Christoph Buser
Dear Marco

If you are interested in a comparison of a reference level
against each other level (in your case level 1 against level 2
and level 1 against level 3), you can use the summary.lm(),
since this is the default contrast (see ?contr.treatment)

ar <- data.frame(GROUP = factor(rep(1:3, each = 8)),
 DIP = c(3.0, 3.0, 4.0, 5.0, 6.0, 7.0, 3.0, 2.0, 1.0, 6.0, 5.0,
   7.0, 2.0, 3.0, 1.5, 1.7, 17.0, 12.0, 15.0, 16.0, 12.0, 23.0,
   19.0, 21.0))


r.aov10 <- aov(DIP ~  GROUP, data = ar)
anova(r.aov10)
summary.lm(r.aov10)

As result you will get the comparison GROUP 2 against GROUP 1,
denoted by GROUP2 and the comparison GROUP 3 against GROUP 1,
denoted by GROUP3.

Be careful. In your example you include both GROUP and C1 or C2,
respectively in your model. This results in a over parameterized
model and you get a warning that not all coefficients have been
estimated, due to singularities.

It is possible to use other contrasts than contr.treatment,
contr.sum, contr.helmert, contr.poly, but then you have to
specify the correctly in the model.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

Posta Univ. Cagliari writes:
 > I have some doubts about the validity of my procedure to estimeate linear 
 > contrasts ina a factorial design.
 > For sake of semplicity, let's imagine a one way ANOVA with three levels. I 
 > am interested to test the significance of the difference between the first 
 > and third level (called here contrast C1) and between the first and the 
 > seconda level (called here contrast C2). I used the following procedure:
 > 
 > 
 > --- reading data from a text file ---
 > 
 > > ar <-read.table("C:/Programmi/R/myworks/contrasti/cont1.txt",header=TRUE)
 > 
 > > ar
 > 
 >  CC GROUP
 > 
 > 1   3.0 0
 > 
 > 2   3.0 0
 > 
 > 3   4.0 0
 > 
 > 4   5.0 0
 > 
 > 5   6.0 0
 > 
 > 6   7.0 0
 > 
 > 7   3.0 0
 > 
 > 8   2.0 0
 > 
 > 9   1.0 1
 > 
 > 10  6.0 1
 > 
 > 11  5.0 1
 > 
 > 12  7.0 1
 > 
 > 13  2.0 1
 > 
 > 14  3.0 1
 > 
 > 15  1.5 1
 > 
 > 16  1.7 1
 > 
 > 17 17.0 2
 > 
 > 18 12.0 2
 > 
 > 19 15.0 2
 > 
 > 20 16.0 2
 > 
 > 21 12.0 2
 > 
 > 22 23.0 2
 > 
 > 23 19.0 2
 > 
 > 24 21.0 2
 > 
 >  
 > 
 > --- creating a new array of data---
 > 
 > > ar<-data.frame(GROUP=factor(ar$GROUP),DIP=ar$CC)
 > 
 > > ar
 > 
 >GROUP  DIP
 > 
 > 1  0  3.0
 > 
 > 2  0  3.0
 > 
 > 3  0  4.0
 > 
 > 4  0  5.0
 > 
 > 5  0  6.0
 > 
 > 6  0  7.0
 > 
 > 7  0  3.0
 > 
 > 8  0  2.0
 > 
 > 9  1  1.0
 > 
 > 10 1  6.0
 > 
 > 11 1  5.0
 > 
 > 12 1  7.0
 > 
 > 13 1  2.0
 > 
 > 14 1  3.0
 > 
 > 15 1  1.5
 > 
 > 16 1  1.7
 > 
 > 17 2 17.0
 > 
 > 18 2 12.0
 > 
 > 19 2 15.0
 > 
 > 20 2 16.0
 > 
 > 21 2 12.0
 > 
 > 22 2 23.0
 > 
 > 23 2 19.0
 > 
 > 24 2 21.0
 > 
 >  
 > 
 > --- creating two dummy variables (C1 and C2) for linear 
 > contrasts---
 > 
 > > ar<-data.frame(GROUP=factor(ar$GROUP),C1=factor(ar$GROUP),C2=factor(ar$GROUP),DIP=ar$DIP)
 > 
 > > ar
 > 
 >GROUP C1 C2  DIP
 > 
 > 1  0  0  0  3.0
 > 
 > 2  0  0  0  3.0
 > 
 > 3  0  0  0  4.0
 > 
 > 4  0  0  0  5.0
 > 
 > 5  0  0  0  6.0
 > 
 > 6  0  0  0  7.0
 > 
 > 7  0  0  0  3.0
 > 
 > 8  0  0  0  2.0
 > 
 > 9  1  1  1  1.0
 > 
 > 10 1  1  1  6.0
 > 
 > 11 1  1  1  5.0
 > 
 > 12 1  1  1  7.0
 > 
 > 13 1  1  1  2.0
 > 
 > 14 1  1  1  3.0
 > 
 > 15 1  1  1  1.5
 > 
 > 16 1  1  1  1.7
 > 
 > 17 2  2  2 17.0
 > 
 > 18 2  2  2 12.0
 > 
 > 19 2  2  2 15.0
 > 
 > 20 2  2  2 16.0
 > 
 > 21 2  2  2 12.0
 > 
 > 22 2  2  2 23.0
 > 
 > 23 2  2  2 19.0
 > 
 > 24 2  2  2 21.0
 > 
 >  
 > 
 > --- selecting the contrast levels---
 > 
 > > ar$C1 <- C(ar$C1, c(1,0,-1), how.many = 1)
 > 
 > > ar$C2 

Re: [R] F-test degree of freedoms in lme4 ?

2006-01-12 Thread Christoph Buser
Dear Wilhelm

There is an article, including a part about fitting linear mixed
models. There the problem with the degrees of freedom is
described.
You can have a look to the second link, too, discussing the
problem as well.

http://cran.r-project.org/doc/Rnews/Rnews_2005-1.pdf
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/67414.html

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Wilhelm B. Kloke writes:
 > I have a problem moving from multistratum aov analysis to lmer.
 > 
 > My dataset has observations of ampl at 4 levels of gapf and 2 levels of bl
 > on 6 subjects levels VP, with 2 replicates wg each, and is balanced.
 > 
 > Here is the summary of this set with aov:
 > >> summary(aov(ampl~gapf*bl+Error(VP/(bl*gapf)),hframe2))
 > >
 > >Error: VP
 > >  Df Sum Sq Mean Sq F value Pr(>F)
 > >Residuals  5531 106   
 > >
 > >Error: VP:bl
 > >  Df Sum Sq Mean Sq F value Pr(>F)   
 > >bl 1   1700170037.8 0.0017 **
 > >Residuals  5225  45  
 > >---
 > >Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
 > >
 > >Error: VP:gapf
 > >  Df Sum Sq Mean Sq F value  Pr(>F)
 > >gapf   3933 31124.2 5.3e-06 ***
 > >Residuals 15193  13
 > >---
 > >Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
 > >
 > >Error: VP:bl:gapf
 > >  Df Sum Sq Mean Sq F value Pr(>F)  
 > >gapf:bl3   93.931.33.68  0.036 *
 > >Residuals 15  127.6 8.5 
 > >---
 > >Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
 > >
 > >Error: Within
 > >  Df Sum Sq Mean Sq F value Pr(>F)
 > >Residuals 48318   7   
 > >
 > This is mostly identical the analysis by BMDP 4V, except for the
 > Greenhouse-Geisser epsilons, which are not estimated this way.
 > 
 > I have to analyse a similar dataset, which is not balanced. So I need to
 > change the method. Following Pinheiro/Bates p.90f, I tried
 > > hf2.lme <- 
 > > lme(ampl~gapf*bl,hframe2,random=list(VP=pdDiag(~gapf*bl),bl=pdDiag(~gapf)))
 > and some variations of this to get the same F tests generated. At least,
 > I got the F-test on error stratum VP:bl this way, but not the other two:
 > >> anova(hf2.lme)
 > >numDF denDF F-value p-value
 > >(Intercept) 178  764.86  <.0001
 > >gapf378   17.68  <.0001
 > >bl  1 5   37.81  0.0017
 > >gapf:bl 3782.99  0.0362
 > 
 > Then I tried to move to lmer.
 > I tried to find something equivalent to the above lme call, with no
 > success at all.
 > 
 > In case, that the problem is in the data, here is the set:
 > 
 > VP ampl wg bl gapf
 > 1 WJ 22 w s 144
 > 2 CR 23 w s 144
 > 3 MZ 25 w s 144
 > 4 MP 34 w s 144
 > 5 HJ 36 w s 144
 > 6 SJ 26 w s 144
 > 7 WJ 34 w s 80
 > 8 CR 31 w s 80
 > 9 MZ 33 w s 80
 > 10 MP 36 w s 80
 > 11 HJ 37 w s 80
 > 12 SJ 32 w s 80
 > 13 WJ 34 w s 48
 > 14 CR 37 w s 48
 > 15 MZ 38 w s 48
 > 16 MP 38 w s 48
 > 17 HJ 40 w s 48
 > 18 SJ 32 w s 48
 > 19 WJ 36 w s 16
 > 20 CR 40 w s 16
 > 21 MZ 39 w s 16
 > 22 MP 40 w s 16
 > 23 HJ 40 w s 16
 > 24 SJ 38 w s 16
 > 25 WJ 16 g s 144
 > 26 CR 28 g s 144
 > 27 MZ 18 g s 144
 > 28 MP 33 g s 144
 > 29 HJ 37 g s 144
 > 30 SJ 28 g s 144
 > 31 WJ 28 g s 80
 > 32 CR 33 g s 80
 > 33 MZ 24 g s 80
 > 34 MP 34 g s 80
 > 35 HJ 36 g s 80
 > 36 SJ 30 g s 80
 > 37 WJ 32 g s 48
 > 38 CR 38 g s 48
 > 39 MZ 34 g s 48
 > 40 MP 37 g s 48
 > 41 HJ 39 g s 48
 > 42 SJ 30 g s 48
 > 43 WJ 36 g s 16
 > 44 CR 34 g s 16
 > 45 MZ 36 g s 16
 > 46 MP 40 g s 16
 > 47 HJ 40 g s 16
 > 48 SJ 36 g s 16
 > 49 WJ 22 w b 144
 > 50 CR 24 w b 144
 > 51 MZ 20 w b 144
 > 52 MP 26 w b 144
 > 53 HJ 22 w b 144
 > 54 SJ 16 w b 144
 > 55 WJ 26 w b 80
 > 56 CR 24 w b 80
 > 57 MZ 26 w b 80
 > 58 MP 27 w b 80
 > 59 HJ 26 w b 80
 > 60 SJ 18 w b 80
 > 61 WJ 28 w b 48
 > 62 CR 23 w b 48
 > 63 MZ 28 w b 48
 > 64 MP 29 w b 48
 > 65 HJ 27 w b 48
 > 66 SJ 24 w b 48
 > 67 WJ 32 w b 16
 >

Re: [R] Difficulty with 'merge'

2006-01-05 Thread Christoph Buser
Dear Michael

Please remark that merge calculates all possible combinations if
you have repeated elements as you can see in the example below. 

?merge

"... If there is more than one match, all possible matches
contribute one row each. ..."

Maybe you can apply "aggregate" in a reasonable way on your 
data.frame first to summarize your repeated values to unique
ones and the proceed with merge, but that depends on your
problem. 

Regards,

Christoph

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

example with repeated values


v1 <- c("a", "b", "a", "b", "a")
n1 <- 1:5
v2 <- c("b", "b", "a", "a", "a")
n2 <- 6:10
(f1  <- data.frame(v1, n1))
(f2 <- data.frame(v2, n2))
(m12 <- merge(f1, f2, by.x = "v1", by.y = "v2", sort = F))





Michael Kubovy writes:
 > Dear R-helpers,
 > 
 > Happy New Year to all the helpful members of the list.
 > 
 > Here is the behavior I'm looking for:
 >  > v1 <- c("a","b","c")
 >  > n1 <- c(0, 1, 2)
 >  > v2 <- c("c", "a", "b")
 >  > n2 <- c(0, 1 , 2)
 >  > (f1  <- data.frame(v1, n1))
 >v1 n1
 > 1  a  0
 > 2  b  1
 > 3  c  2
 >  > (f2 <- data.frame(v2, n2))
 >v2 n2
 > 1  c  0
 > 2  a  1
 > 3  b  2
 >  > (m12 <- merge(f1, f2, by.x = "v1", by.y = "v2", sort = F))
 >v1 n1 n2
 > 1  c  2  0
 > 2  a  0  1
 > 3  b  1  2
 > 
 > Now to my data:
 >  > summary(pL)
 >  pairL
 > a fondo   :  41
 > alto  :  41
 > ampio :  41
 > angoloso  :  41
 > aperto:  41
 > appoggiato:  41
 > (Other)   :1271
 > 
 >  > pL$pairL[c(1,42)]
 > [1] appoggiato dentro
 > 37 Levels: a fondo alto ampio angoloso aperto appoggiato asimmetrico  
 > complicato convesso davanti dentro destra ... verticale
 > 
 >  > summary(oppN)
 >  pairL  pairR subject
 > LLLRR   M
 > a fondo   :  41   a galla:  41   S1 :  37   Min.   :0.3646
 > Min.   :0.02083   Min.   :0.0010   Min.   :0.
 > alto  :  41   acuto  :  41   S10:  37   1st Qu.:0.5521
 > 1st Qu.:0.37500   1st Qu.:0.1771   1st Qu.:0.1042
 > ampio :  41   arrotondato:  41   S11:  37   Median :0.6354
 > Median :0.47917   Median :0.2708   Median :0.2292
 > angoloso  :  41   basso  :  41   S12:  37   Mean   :0.6403
 > Mean   :0.46452   Mean   :0.2760   Mean   :0.2598
 > aperto:  41   chiuso :  41   S13:  37   3rd Qu.:0.7188
 > 3rd Qu.:0.55208   3rd Qu.:0.3750   3rd Qu.:0.3854
 > appoggiato:  41   compl  :  41   S14:  37   Max.   :0.9375
 > Max.   :0.92708   Max.   :0.6042   Max.   :0.7812
 > (Other)   :1271   (Other):1271   (Other): 
 > 1295  NA's   :3.   NA's   : 
 > 3.
 >asym polarpolar_a1  clust
 > Min.   :-0.   Min.   :-1.2410   Min.   :-2.949e+00   c1:492
 > 1st Qu.: 0.2091   1st Qu.: 0.4571   1st Qu.:-1.902e-01   c2:287
 > Median : 0.   Median : 1.1832   Median :-1.110e-16   c3: 82
 > Mean   : 0.6265   Mean   : 1.3428   Mean   :-5.745e-02   c4:246
 > 3rd Qu.: 0.9383   3rd Qu.: 2.0712   3rd Qu.: 1.168e-01   c5: 82
 > Max.   : 2.7081   Max.   : 4.6151   Max.   : 4.218e+00   c6:328
 > NA's   : 3.   NA's   : 3.000e+00
 > 
 >  > oppN$pairL[c(1,42)]
 > [1] spesso fine
 > 37 Levels: a fondo alto ampio angoloso aperto appoggiato asimmetrico  
 > complicato convesso davanti dentro destra ... verticale
 > 
 >  > unique(sort(oppM$pairL)) == unique(sort(pL$pairL))
 > [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  
 > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 > [26] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 > 
 > In other words I think that pL$pairL and oppN$pairL consists of 37  
 > blocks of 41 repetitions of names, and that these blocks are  
 > permutations of each other,
 > 
 > However:
 > 
 >  > summary(m1 <- merge(oppM, pairL, by.x = "pairL", by.y = "pairL",  
 > sort = F))
 >  pairL   pairR  subject 
 > LLLRR 

Re: [R] Newbie - Summarize function

2005-12-21 Thread Christoph Buser
Dear Andrew

Could you provide a reproducible example, please? Since we do
not have your data "test.csv", we can not reproduce your code
and that complicates bug fixing a lot.
You can use set.seed to create artificial reproducible examples.

The only error in your code that I recognized:
 > > test.2 <- with(test,summarize(test$x,llist(test$Stratum,test$plot),g))

You have a llist(...) instead of list(...)

If this is only copy-paste error, then you should provide a
reproducible example.
If it already solves the problem, you were lucky :-)

Regards,

Christoph 

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


[EMAIL PROTECTED] writes:
 > Dear R Users,
 > 
 > I have searched through the archives but I am still struggling to find a 
 > way to process the below dataset. I have a dataset that has stratum and 
 > plot identifier. Within each plot there is variable (Top) stating the 
 > number of measurments that should be used to to calculate the mean to the 
 > largest "top" elements within one of the vectors (X). I would like to 
 > process this summary statistic by groups. At this stage I have been trying 
 > to use the "summarize" function within the Hmisc library but I am getting 
 > the following error "Error in eval(expr, envir, enclos) : numeric 'envir' 
 > arg not of length one In addition: Warning message: no finite arguments to 
 > max; returning -Inf".
 > 
 > Any suggetsions on how I can fix this would be greatly appreciated.
 > 
 > Kind regards
 > 
 > Andrew
 > 
 > test <- read.table("test.csv", header=TRUE, sep=",")
 > #function to calculate mean of "top" elements within a plot
 > > g<-function(y) {
 > + top_no <-max(y$top)
 > + weight <- with(y,as.numeric(x>=x[order(x,decreasing=TRUE)[top_no]]))
 > + wtd.mean(y$x,weight)
 > + }
 > > g(test)
 > [1] 172.6667
 > #call to summarize function - use function g and summarise by stratum plot
 > > test.2 <- with(test,summarize(test$x,llist(test$Stratum,test$plot),g))
 > Error in eval(expr, envir, enclos) : numeric 'envir' arg not of length one
 > In addition: Warning message:
 > no finite arguments to max; returning -Inf 
 > 
 > >traceback() 
 > 9: eval(substitute(expr), data, enclos = parent.frame())
 > 8: with.default(y, as.numeric(x >= x[order(x, decreasing = 
 > TRUE)[top_no]]))
 > 7: with(y, as.numeric(x >= x[order(x, decreasing = TRUE)[top_no]]))
 > 6: FUN(X, ...)
 > 5: summarize(test$x, llist(test$Stratum, test$plot), g)
 > 4: eval(expr, envir, enclos)
 > 3: eval(substitute(expr), data, enclos = parent.frame())
 > 2: with.default(test, summarize(test$x, llist(test$Stratum, test$plot), 
 >g))
 > 1: with(test, summarize(test$x, llist(test$Stratum, test$plot), 
 >g))
 > 
 > The version im running on is 
 > 
 > $platform
 > [1] "i386-pc-mingw32"
 > 
 > $arch
 > [1] "i386"
 > 
 > $os
 > [1] "mingw32"
 > 
 > $system
 > [1] "i386, mingw32"
 > 
 > $status
 > [1] ""
 > 
 > $major
 > [1] "2"
 > 
 > $minor
 > [1] "2.0"
 > 
 > $year
 > [1] "2005"
 > 
 > $month
 > [1] "10"
 > 
 > $day
 > [1] "06"
 > 
 > $"svn rev"
 > [1] "35749"
 > 
 > $language
 > [1] "R"
 > 
 > > 
 > 
 > 
 > Stratum plot idtop x
 > 1   1   1   2   12
 > 1   1   2   2   41
 > 1   1   3   2   12
 > 1   1   4   2   43
 > 1   1   5   2   12
 > 1   1   6   2   14
 > 1   1   7   2   43
 > 1   1   8   2   12
 > 1   2   1   4   42
 > 1   2   2   4   12
 > 1   2   3   4   432
 > 1   2   4   4   12
 > 1   2   5   4   12
 > 1   2   6   4   14
 > 1   2   7   4   41
 > 1   2   8   4   1
 > 2   1   1   2   12
 > 2   1   2   2   41
 > 2   1   3   2   12
 > 2   1   4   2   43
 > 2   1   5   2   12
 > 2   1   6   2   14
 > 2   1   7   2   43
 > 2   1   8   2   12
 > 2   2   1   3   42
 > 2   2 

Re: [R] Find main effect in 2-way ANOVA

2005-11-28 Thread Christoph Buser
Hi

It is not so clear to me what your intention is. Could you
provide a reproducible example to show what you want to do. 

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Gao Fay writes:
 > Hi,
 > 
 > I use anova() to find interaction effect and then I need to find data with 
 > main effect from those data with interaction effect. How to do that?
 > 
 > I used : anova(lm(data~factor1*factor2)), then select data with interaction 
 > effect. Then I need to select those data also with main effect of factor1 
 > or factor2, from previous selected data. How to do that? Many thanks for 
 > your time on my quesiton!
 > 
 > Fay
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Compiled name to object

2005-11-18 Thread Christoph Buser
Dear Claus

You can write

earlySessions[,"seconds"] instead of earlySessions$seconds and
with that syntax you can also use:

col1 <- "seconds"
earlySessions[,col1]

and you will get what you are looking for.

Best regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Claus Atzenbeck writes:
 > Hi,
 > 
 > I have a tricky problem with composing a name that should represent an
 > object:
 > 
 > I want to pass a data frame and a string that contains a column name to
 > a function, e.g.:
 > 
 > check(testFrame, "seconds")
 > 
 > The function should now divide the testFrame into two subsets and call a
 > t-test on the numbers that are in the given column:
 > 
 > check <- function(frame, column) {
 >   sessionNo <- c("s01", "s02", "s03", "s04", "s05", "s06")
 >   earlySessions <- subset(frame, frame$session %in% sessionNo)
 >   lateSessions <- subset(frame, !frame$session %in% sessionNo)
 >   t.test(earlySessions$column, lateSessions$column)
 > }
 > 
 > How can I get R to change "earlySessions$column" to whatever was passed
 > as string, in the above example "earlySessions$seconds"?
 > 
 > Thanks for any hint.
 > Claus
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R questions

2005-11-17 Thread Christoph Buser
Hi

There is a very good introduction script to R on 

http://www.r-project.org/

under manuals, including an index of nice functions. For
example you will find the basic plots, how to sort vectors and
so on.

Furthermore have a look at

?RSiteSearch

It is very useful to search in R archives for similar
problems. Lots of answers are already there and only wait to be
found. 
So there is a good chance that you will find the information
that you are looking for in these archives.

Best regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--




Yuying Shi writes:
 > Dear Sir/Madam,
 >  I am a beginner in R. Here is my questions.
 >  1. Can you give me one test for randomness (a name and descriptive
 > paragraph is sufficient).
 > 2. I have learned a uniform random number generator [e.g. not the
 > algorithms: i)Wichmann-Hill, ii) Marsaglia-Multicarry, iii) Super-Duper
 > (Marsaglia), iv) Mersenne-Twister, v) TAOCP-1997 (Knuth), or vi) TAOCP-2002
 > (Knuth)] . Is there any other method besides that?
 > 3. How to generate 100 random standard normal deviates using the Box-Muller
 > method for standard normal random deviates and sort the sequence, smallest
 > to largest?
 >   Your kind help is greatly appreciated. Thanks in advance!
 >  best wishes
 > yuying shi
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] solve the quadratic equation ax^2+bx+c=0

2005-11-07 Thread Christoph Buser
Hi

Have a look at ?polyroot. This might be helpful.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Yuying Shi writes:
 > If I have matrics as follows:
 > > a <- c(1,1,0,0)
 > > b <- c(4,4,0,0)
 > > c <- c(3,5,5,6)
 > How can I use R code to solve the equation ax^2+bx+c=0.
 > thanks!
 >  yuying shi
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Random effect models

2005-10-28 Thread Christoph Buser
Dear Jacques

You may be interested in the answer of Doug Bates to another
post. The question is different, but in the answer there is a
part about testing if a variance component may be zero:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/51080.html

Hope this helps.

Best regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Jacques VESLOT writes:
 > Dear R-users,
 > 
 > Sorry for reposting. I put it in another way :
 > 
 > I want to test random effects in this random effect model :
 > Rendement ~ Pollinisateur (random) + Lignee (random) + Pollinisateur:Lignee 
 > (random)
 > 
 > Of course :
 > summary(aov(Rendement ~ Pollinisateur * Lignee, data = mca2))
 > gives wrong tests for random effects.
 > 
 > But :
 > summary(aov1 <- aov(Rendement ~ Error(Pollinisateur * Lignee), data = mca2))
 > gives no test at all, and I have to do it with mean squares lying in 
 > summary(aov1).
 > 
 > With "lme4" package (I did'nt succeed in writing a working formula with lme 
 > from "nlme" package),
 > I can "see" standard deviations of random effects (I don't know how to find 
 > them) but I can't find F 
 > tests for random effects.
 > 
 > I only want to know if there is an easy way (a function ?) to do F tests for 
 > random effects in 
 > random effect models.
 > 
 > Thanks in advance.
 > 
 > Best regards,
 > 
 > Jacques VESLOT
 > 
 > 
 > Data and output are as follows :
 > 
 >  > head(mca2)
 >Lignee Pollinisateur Rendement
 > 1 L1P1  13.4
 > 2 L1P1  13.3
 > 3 L2P1  12.4
 > 4 L2P1  12.6
 > 5 L3P1  12.7
 > 6 L3P1  13.0
 > 
 > 
 >  > names(mca2)
 > [1] "Lignee""Pollinisateur" "Rendement"
 > 
 >  > dim(mca2)
 > [1] 100   3
 > 
 >  > replications(Rendement ~ Lignee * Pollinisateur, data = mca2)
 >LigneePollinisateur Lignee:Pollinisateur
 >20   102
 > 
 >  > summary(aov1 <- aov(Rendement ~ Error(Pollinisateur * Lignee), data = 
 > mca2)
 > 
 > Error: Pollinisateur
 >Df  Sum Sq Mean Sq F value Pr(>F)
 > Residuals  9 11.9729  1.3303
 > 
 > Error: Lignee
 >Df  Sum Sq Mean Sq F value Pr(>F)
 > Residuals  4 18.0294  4.5074
 > 
 > Error: Pollinisateur:Lignee
 >Df Sum Sq Mean Sq F value Pr(>F)
 > Residuals 36 5.1726  0.1437
 > 
 > Error: Within
 >Df Sum Sq Mean Sq F value Pr(>F)
 > Residuals 50 3.7950  0.0759
 > 
 > 
 > # F tests :
 > 
 >  > Femp <- c(tab1[1:3, 3]/tab1[c(3,3,4), 3])
 >  > names(Femp) <- c("Pollinisateur", "Lignee", "Interaction")
 >  > Femp
 > PollinisateurLignee   Interaction
 >   9.258709 31.370027  1.893061
 > 
 >  > 1 - pf(Femp, tab1[1:3,1], tab1[c(3,3,4),1])
 > PollinisateurLignee   Interaction
 >   4.230265e-07  2.773448e-11  1.841028e-02
 > 
 > # Standard deviation :
 > 
 >  > variances <- c(c(tab1[1:3, 3] - tab1[c(3,3,4), 3]) / c(2*5, 2*10, 2), 
 > tab1[4,3])
 >  > names(variances) <- c(names(Femp), "Residuelle")
 >  > variances
 > PollinisateurLignee   InteractionResiduelle
 > 0.118663890.218183330.033891670.0759
 > 
 > # Using lmer :
 > 
 >  > library(lme4)
 >  > lme1 <- lmer(Rendement ~ (1|Pollinisateur) + (1|Lignee) + 
 > (1|Pollinisateur:Lignee), data = mca2))
 >  > summary(lme1)
 > Linear mixed-effects model fit by REML
 > Formula: Rendement ~ (1 | Pollinisateur) + (1 | Lignee) + (1 | 
 > Pollinisateur:Lignee)
 > Data: mca2
 >AIC  BIClogLik MLdeviance REMLdeviance
 >   105.3845 118.4104 -47.69227   94.35162 95.38453
 > Random effects:
 >   Groups   NameVariance Std.Dev.
 >   Pollinisateur:Lignee (Intercept) 0.033892 0.18410
 >   Pollinisateur(Intercept) 0.118664 0.34448
 >   Lignee   (Intercept) 0.218183 0.46710
 >   Residual 0.075900 0.27550
 > # of obs: 100, groups: Pollinisateur:Lignee, 50; Pollinisateur, 10; Lignee, 5
 > 
 > Fixed effects:
 >  Estimate Std. Error DF t value  Pr(>|t|)
 > (Intercept) 12.601000.23862 99  52.808

Re: [R] Backtransforming regression coefficient for scaled covariate

2005-09-12 Thread Christoph Buser
Dear Gregor

The solution of Andres was correct, but by reproducing his
example you did a copy paste error. In the model lm2 you should
scale your variable after the polynomial transformation

I(scale(x^2)) and not I(scale(x)^2)

Then the backtransformation in your example should work.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Gorjanc Gregor writes:
 > Andres, this seems not to be the case. Look bellow 
 > the coefficients. They are not the same as in unscaled
 > regression.
 > 
 > R> (lm1 <- lm(y ~ x + I(x^2)))
 > 
 > Call:
 > lm(formula = y ~ x + I(x^2))
 > 
 > Coefficients:
 > (Intercept)x   I(x^2)  
 > 4.62069  1.78811 -0.00751  
 > 
 > R> ## Fit regression with transformed i.e. standardized covariate

Wrong line

 > R> (lm2 <- lm(y ~ scale(x) + I(scale(x)^2)))

Correct one

 > R> (lm2 <- lm(y ~ scale(x) + I(scale(x^2

 > 
 > Call:
 > lm(formula = y ~ scale(x) + I(scale(x)^2))
 > 
 > Coefficients:
 >   (Intercept)   scale(x)  I(scale(x)^2)  
 > 75.12  29.86  -6.21  
 > 
 > R> coef(lm2)[3]/sd(x^2)
 > I(scale(x)^2) 
 >-0.0020519 
 > 
 > R> coef(lm2)[2]/sd(x)
 > scale(x) 
 >   1.0384 
 > 
 > -Original Message-
 > From: Andres Legarra [mailto:[EMAIL PROTECTED]
 > Sent: Mon 2005-09-12 08:53
 > To: Gorjanc Gregor; r-help@stat.math.ethz.ch
 > Subject: Re: [R] Backtransforming regression coefficient for scaled covariate
 >  
 > [R] Backtransforming regression coefficient for scaled covariate
 > 
 > Your
 > covariate in the second part of the polynomial is x^2 and not x. Therefore
 > the transformation should be applied to x^2.
 > Like this:
 > (lm2 <- lm(y ~ scale(x) + I(scale(x^2)) )
 > then you would use
 > coef(lm2)[3]/sd(x^2)
 > 
 > Andres
 > --
 > Andres Legarra 
 > NEIKER
 > Apdo. 46 
 > Vitoria-Gasteiz 01080 Spain
 > --
 > 
 > 
 > 
 > - Original Message - 
 > From: Gorjanc Gregor
 > To: r-help@stat.math.ethz.ch
 > Sent: Sunday, September 11, 2005 10:25 PM
 > Subject: [R] Backtransforming regression coefficient for scaled covariate
 > 
 > 
 > Hello!
 > Scaling i.e. (x - mean(x)) / sd(x) of covariates in the model
 > can improve the efficiency of estimation. That is nice, but
 > sometimes one needs to report estimates for original scale. I
 > was able to backtransform estimates of linear regression quite
 > easily but I stumped on higher polynomials. Is there a general
 > rule that I am not aware of or is my algebra so bad?
 > I appologize for not pure R question but I hope others will also
 > benefit. I attached the R code for example bellow.
 > ## --- Generate data for linear regression --- 
 > e <- rnorm(n = 100, sd = 10)
 > x <- rnorm(n = 100, mean = 100, sd = 10)
 > b <- 3
 > mu <- 2
 > y <- mu + b * x + e
 > plot(y = y, x = x)
 > ## Fit linear regression
 > (lm1 <- lm(y ~ x))
 > ## Fit linear regression with transformed i.e. standardized covariate
 > (lm2 <- lm(y ~ scale(x)))
 > ## Backtransform estimate of regression coefficient
 > coef(lm2)[2] / sd(x)
 > ## --- Generate data for quadratic regression --- 
 > e <- rnorm(n = 100, sd = 10)
 > x <- runif(n = 100, min = 1, max = 100)
 > b1 <- 2
 > b2 <- -0.01
 > mu <- 2
 > y <- mu + b1 * x + b2 * x^2 + e
 > plot(y = y, x = x)
 > ## Fit regression
 > (lm1 <- lm(y ~ x + I(x^2)))
 > ## Fit regression with transformed i.e. standardized covariate
 > (lm2 <- lm(y ~ scale(x) + I(scale(x)^2)))
 > ## Backtransform estimates of regression coefficients
 > ## ??
 > Lep pozdrav / With regards,
 > Gregor Gorjanc
 > -- 
 > University of Ljubljana
 > Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan
 > Zootechnical Department mail: gregor.gorjanc  bfro.uni-lj.si
 > Groblje 3   tel: +386 (0)1 72 17 861
 > SI-1230 Domzale fax: +386 (0)1 72 17 888
 > Slovenia, Europe
 > -- 
 > "One must learn by doing the thing; for though you think you know it,
 >  you have no certainty until you try." Sophocles ~ 450 B.C.
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 &g

Re: [R] test for exponential,lognormal and gammadistribution

2005-09-11 Thread Christoph Buser
Hi Nadja

It depends on your purpose. Often people are using tests to show
that a sample follows a distribution (normal, exponential,
lognormal, ...).
If a test rejects the null hypothesis that the sample comes from
the specified distribution, you are on the safe side, since you
are controlling the significance level (e.g. 5%) and therefore
know the alpha error.
But if a test do not reject the null hypothesis, generally you
have NOT shown that the sample has the specified
distribution. This is related to the power of your test (to
detect differences). 
If the power of a test is lousy, the conclusion that "your
sample has the distribution ...". based on the nonsignificant
test result is misleading or even wrong.

As you mentioned below the kolmogorov-smirnov test does not
adapt for the fact that the parameters of the distribution you
test against are estimated from the data sample.
It assumes that the parameters are know. But in practice that's
not the case in general.
Since the parameter are estimated from the data, but the test do
not have this information, but assumes that these parameters are
a fixed known quantity, the test is to conservative and has a
small power to detect differences.
Therefore it is quite dangerous to conclude that a sample has a
specified distribution, based on the kolmogorov-smirnov test.

An alternative way might be using graphical tools, e.g. quantile
plots (see ?qqplot and ?qqnorm).
Obviously you have the same difficulty by interpreting the
plots, since nobody can tell you for sure if a deviation from
the straight line is significant or just by chance.
But if you conclude that a sample has a distribution by looking
at the plot you will be aware of this subjectivity that can not
be avoided.
The test result will often give you the wrong impression of
objectivity.
The best example to see this is if you have a very small
sample. In general any test has a small power if your sample is
small and it is most probable that the test is
nonsignificant. If we look at the quantile plot (with a small
sample) we often can not judge if it is a straight line or not
(since the sample is to small) and in this case it is the
correct conclusion that we can not say anything about the
distribution. 

I hope this will be helpful.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


[EMAIL PROTECTED] writes:
 > 
 > hello!
 > i don't want to test my sample data for normality, but exponential- 
 > lognormal- 
 > or gammadistribution.
 > as i've learnt the anderson-darling-test in R is only for normality and i am 
 > not supposed to use the kolmogorov-smirnov test of R for parameter estimates 
 > from sample data, is that true?
 > can you help me, how to do this anyway!
 > thank you very much!
 > nadja
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 > 
 > 
 > !DSPAM:43219660106581956711619!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Dummy variables model

2005-09-05 Thread Christoph Buser
Hi

If you'd like to fit a fixed effect model without random
effects, you can use lm() or aov() (see ?lm and ?aov). If your
variable is a factor (?factor) then you can specify your model
in lm() without coding all dummy variables.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Tobias Muhlhofer writes:
 > Hi, all!
 > 
 > Anyone know an easy way to specify the following model.
 > 
 > Panel dataset, with stock through time, by firm.
 > 
 > I want to run a model of y on a bunch of explanatory variables, and one 
 > dummy for each firm, which is 1 for observations that come from firm i, 
 > and 0 everywhere else. I have over 200 firms (and a factor variable that 
 >   contains a firm identifier).
 > 
 > Any easy way of going about this, without having to define all these 
 > dummies? I checked lme() with random = ~ 1|firm, but the problem is that 
 > these are random effects, i.e. that there are firm-by-firm disturbance 
 > terms and overall disturbance terms, whereas I want just overall 
 > disturbance terms. This is generally called a "fixed effects" model, 
 > although it seems like the term "fixed effects" is being used somewhat 
 > differently in the context of the nlme package.
 > 
 > Toby
 > 
 > -- 
 > **
 > When Thomas Edison invented the light bulb he tried over 2000
 > experiments before he got it to work. A young reporter asked
 > him how it felt to have failed so many times. He said
 > "I never failed once. I invented the light bulb.
 > It just happened to be a 2000-step process."
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 > 
 > 
 > !DSPAM:431c4675196241771238468!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Doubt about nested aov output

2005-09-05 Thread Christoph Buser
Hi

I think the problem is related to specifying "Tratamento" both
as a fixed factor and in the error term. I attached a script
with a reproducible example that shows a similar output.
I do not know the details of the original data and the questions
of interest, but maybe a model including "Tratamento" is more
what you wanted to implement.

Regards,

Christoph

## R-Script

library(nlme)

## Generating the data
set.seed(1)
ziel <- rep(c(-6,8,20), each = 40) + rep(rnorm(15, 0, 20), each = 4) +
  rep(rnorm(60, 0, 10), each = 2) + rnorm(120, 0, 3)
dat <- data.frame(y = ziel,
  fix = factor(rep(1:3, each = 40)),
  R1 = factor(rep(1:15, each = 8)),
  R2 = factor(rep(1:60, each = 2)))

## Model including "fix" as fixed and random effect.
res2 <- aov(y ~ fix + Error(fix/R1/R2), data = dat)
summary(res2)

reslme2 <- lme(y ~ fix , data = dat, random = ~ 1|fix/R1/R2)
summary(reslme2)
anova(reslme2)

## Model including "fix" as fixed factor.
res1 <- aov(y ~ fix + Error(R1/R2), data = dat)
summary(res1)

reslme <- lme(y ~ fix , data = dat, random = ~ 1|R1/R2)
summary(reslme)
anova(reslme)

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Spencer Graves writes:
 >Others may know the answer to your question, but I don't.  However, 
 > since I have not seen a reply, I will offer a few comments:
 > 
 >1.  What version of R are you using?  I just tried superficially 
 > similar things with the examples in ?aov in R 2.1.1 patched and 
 > consistently got F and p values.
 > 
 >2.  My preference for this kind of thing is to use lme in 
 > library(nlme) or lmer in library(lme4).  Also, I highly recommend 
 > Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer).
 > 
 >3.  If still want to use aov and are getting this problem in R 2.1.1, 
 > could you please provide this list with a small, self contained example 
 > that displays the symptoms that concern you?  And PLEASE do read the 
 > posting guide! "http://www.R-project.org/posting-guide.html";.  It might 
 > increase the speed and utility of replies.
 > 
 >spencer graves
 > 
 > Ronaldo Reis-Jr. wrote:
 > 
 > > Hi,
 > > 
 > > I have two doubts about the nested aov output.
 > > 
 > > 1) I have this:
 > > 
 > >>anova.ratos <- aov(Glicogenio~Tratamento+Error(Tratamento/Rato/Figado))
 > >>summary(anova.ratos)
 > > 
 > > 
 > > Error: Tratamento
 > >Df  Sum Sq Mean Sq
 > > Tratamento  2 1557.56  778.78
 > > 
 > > Error: Tratamento:Rato
 > >   Df Sum Sq Mean Sq F value Pr(>F)
 > > Residuals  3 797.67  265.89   
 > > 
 > > Error: Tratamento:Rato:Figado
 > >   Df Sum Sq Mean Sq F value Pr(>F)
 > > Residuals 12  594.049.5   
 > > 
 > > Error: Within
 > >   Df Sum Sq Mean Sq F value Pr(>F)
 > > Residuals 18 381.00   21.17   
 > > 
 > > R dont make the F and P automatically, it is possible?
 > > 
 > > I Like an output like this:
 > > 
 > > Error: Tratamento
 > >Df  Sum Sq Mean Sq F value Pr(>F)
 > > Tratamento  2 1557.56  778.78   2.929 0.197
 > > 
 > > Error: Tratamento:Rato
 > >   Df Sum Sq Mean Sq F value Pr(>F)
 > > Residuals  3 797.67  265.89   5.372 0.0141  
 > > 
 > > Error: Tratamento:Rato:Figado
 > >   Df Sum Sq Mean Sq F value Pr(>F)
 > > Residuals 12  594.049.5   2.339 0.0503
 > > 
 > > Error: Within
 > >   Df Sum Sq Mean Sq F value Pr(>F)
 > > Residuals 18 381.00   21.17 
 > > 
 > > Why it not make automatic calculations? It is possible?
 > > 
 > > 
 > > 2) I can say that Error: Tratamento:Rato means an interaction between 
 > > Tratamento and Rato? Normally the : represents an interaction, but in this 
 > > output I think that it dont mean the interaction. 
 > > 
 > > Any explanation are welcome.
 > > 
 > > Thanks
 > > Ronaldo
 > 
 > -- 
 > Spencer Graves, PhD
 > Senior Development Engineer
 > PDF Solutions, Inc.
 > 333 West San Carlos Street Suite 700
 > San Jose, CA 95110, USA
 > 
 > [EMAIL PROTECTED]
 > www.pdf.com <http://www.pdf.com>
 > Tel:  408-938-4420
 > Fax: 408-280-7915
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 > 
 > 
 > !DSPAM:431b8510220677348368323!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] plotting 3 functions on same graph

2005-08-02 Thread Christoph Buser
Hi 

Maybe matplot (see ?matplot) can help you.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


[EMAIL PROTECTED] writes:
 > hi all,
 > 
 > I wish to draw on the same graphic device 3 functions.
 > 
 > But i don't want them to be on different graph, i want to compare them on 
 > the same
 > 
 > I don't need mfrow or mfcol, I need something else... 
 > 
 > 1 graph on 1 device inside this graph 3 ploted function.
 > 
 > I saw something unsing data.frame, but i think it's overkill, and something 
 > less complicated must exist, if not why?
 > 
 > why not plot(func1,func2,func3) ??
 > 
 > thks.
 > 
 > 
 > 
 > // Webmail Oreka : http://www.oreka.com
 > 
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gamma distribution

2005-07-28 Thread Christoph Buser
Hi 

As Uwe mentioned be careful about the difference the
significance level alpha and the power of a test.

To do power calculations you should specify and alternative
hypothesis H_A, e.g. if you have two populations you want to
compare and we assume that they are normal distributed (equal
unknown variance for simplicity). We are interested if there is
a difference in the mean and want to use the t.test.
Our Null hypothesis H_0: there is no difference in the means

To do a power calculation for our test, we first have to specify
and alternative H_A: the mean difference is 1 (unit)
Now for a fix number of observations we can calculate the power
of our test, which is in that case the probability that (if the
true unknown difference is 1, meaning that H_A is true) our test
is significant, meaning if I repeat the test many times (always
taking samples with mean difference of 1), the number of
significant test divided by the total number of tests is an
estimate for the power.


In you case the situation is a little bit more complicated. You
need to specify an alternative hypothesis.
In one of your first examples you draw samples from two gamma
distributions with different shape parameter and the same
scale. But by varying the shape parameter the two distributions
not only differ in their mean but also in their form.
 
I got an email from Prof. Ripley in which he explained in
details and very precise some examples of tests and what they
are testing. It was in addition to the first posts about t tests
and wilcoxon test. 
I attached the email below and recommend to read it carefully. It
might be helpful for you, too.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--
 


From: Prof Brian Ripley <[EMAIL PROTECTED]>
To: Christoph Buser <[EMAIL PROTECTED]>
cc: "Liaw, Andy" <[EMAIL PROTECTED]>
Subject: Re: [R] Alternatives to t-tests (was Code Verification)
Date: Thu, 21 Jul 2005 10:33:28 +0100 (BST)

I believe there is a rather more to this than Christoph's account.  The 
Wilcoxon test is not testing the same null hypothesis as the t-test, and 
that may very well matter in practice and it does in the example given.

The (default in R) Welch t-test tests a difference in means between two 
samples, not necessarily of the same variance or shape.  A difference in 
means is simple to understand, and is unambiguously defined at least if 
the distributions have means, even for real-life long-tailed 
distributions.  Inference from the t-test is quite accurate even a long 
way from normality and from equality of the shapes of the two 
distributions, except in very small sample sizes.  (I point my beginning 
students at the simulation study in `The Statistical Sleuth' by Ramsey and 
Schafer, stressing that the unequal-variance t-test ought to be the 
default choice as it is in R.  So I get them to redo the simulations.)

The Wilcoxon test tests a shift in location between two samples from 
distributions of the same shape differing only by location.  Having the 
same shape is part of the null hypothesis, and so is an assumption that 
needs to be verified if you want to conclude there is a difference in 
location (e.g. in means).  Even if you assume symmetric distributions (so 
the location is unambiguously defined) the level of the test depends on 
the shapes, tending to reject equality of location in the presence of 
difference of shape.  So you really are testing equality of distribution, 
both location and shape, with power concentrated on location-shift 
alternatives.

Given samples from a gamma(shape=2) and gamma(shape=20) distributions, we 
know what the t-test is testing (equality of means).  What is the Wilcoxon 
test testing?  Something hard to describe and less interesting, I believe.

BTW, I don't see the value of the gamma simulation as this 
simultaneously changes mean and shape between the samples.  How about
checking holding the mean the same:

n <- 1000
z1 <- z2 <- numeric(n)
for (i in 1:n) {
   x <- rgamma(40, 2.5, 0.1)
   y <- rgamma(40, 10, 0.1*10/2.5)
   z1[i] <- t.test(x, y)$p.value
   z2[i] <- wilcox.test(x, y)$p.value
}
## Level
1 - sum(z1>0.05)/1000  ## 0.049
1 - sum(z2>0.05)/1000  ## 0.15

? -- the Wilcoxon test is shown to be a poor test of equality of means. 
Christoph's simulation shows that it is able to use difference in shape as 
well as location in the test of these two distributions, whereas the 
t-test is designed only to use the difference in means.  Why compare the 
power of two tests testing different null hypotheses?

I would say a very

Re: [R] gamma distribution

2005-07-28 Thread Christoph Buser
Hi 

Again to come back on the question why you don't get identical
p.values for the untransformed and the transformed data.

I ran your script below and I get always 2 identical test per
loop. In your text you are talking about the first 1000 values
for the untransformed and the next 1000 values for the
transformed. 

But did you consider that in each loop there is a test for the
untransformed and the transformed, so the tests are printed
alternating. 
This might be a reason why you did not get equal results.

Hope this helps,

Christoph

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


[EMAIL PROTECTED] writes:
 > Hi R Users
 > 
 > 
 > This is a code I wrote and just want to confirm if the first 1000 values are 
 > raw
 > gamma (z) and the next 1000 values are transformed gamma (k) or not. As I get
 > 2000 rows once I import into excel, the p - values beyond 1000 dont look that
 > good, they are very high.
 > 
 > 
 > --
 > sink("a1.txt");
 > 
 > for (i in 1:1000)
 > {
 > x<-rgamma(10, 2.5, scale = 10)
 > y<-rgamma(10, 2.5, scale = 10)
 > z<-wilcox.test(x, y, var.equal = FALSE)
 > print(z)
 > x1<-log(x)
 > y1<-log(y)
 > k<-wilcox.test(x1, y1, var.equal = FALSE)
 > print(k)
 > }
 > 
 > ---
 > any suggestions are welcome
 > 
 > thanks
 > 
 > -devarshi
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] how to overlook the zero in the denominator

2005-07-27 Thread Christoph Buser
What about

x<-c(0,0,0.28,0.55,1.2,2,1.95,1.85,
 1.6,0.86,0.78,0.6,0.21,0.18)
y<-c(0,0,0,0.53,1.34,1.79,2.07,1.88,
1.52,0.92,0.71,0.55,0.32,0.19)

sum(((x-y)^2/x)[x!=0])

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Chun-Ying Lee writes:
 > Dear R users:
 > 
 > I have two set of data, as follow:
 > x<-c(0,0,0.28,0.55,1.2,2,1.95,1.85,
 >  1.6,0.86,0.78,0.6,0.21,0.18)
 > y<-c(0,0,0,0.53,1.34,1.79,2.07,1.88,
 > 1.52,0.92,0.71,0.55,0.32,0.19)
 > i<-1:length(x)
 > 
 > I want to sum each (x[i]-y[i])^2/x[i] together, 
 > like:
 > >Sum <-sum((x[i]-y[i])^2/x[i])
 > >Sum
 > [1] NaN
 > 
 > Because the denominator shoud not be zero.
 > So I want to overlook those when x[i]=0,
 > and just to sum those x[i] not equal to 0.
 > What should I do?
 > Any suggestion.
 > Thanks in advance !!
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gamma distribution

2005-07-27 Thread Christoph Buser
Hi

I am a little bit confused. You create two sample (from a gamma
distribution) and you do a wilcoxon test with this two samples.
Then you use the same monotone transformation (log) for both
samples and redo the wilcoxon test.
But since the transformations keeps the order of your samples
the second wilcoxon test is identical to the first one:

x<-rgamma(10, 2.5, scale = 10)
y<-rgamma(10, 2.5, scale = 10)
wilcox.test(x, y, var.equal = FALSE)
x1<-log(x)
y1<-log(y)
wilcox.test(x1, y1, var.equal = FALSE)

Maybe you can give some more details about the hypothesis you'd
like to test.

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



[EMAIL PROTECTED] writes:
 > Hi R Users
 > 
 > 
 > This is a code I wrote and just want to confirm if the first 1000 values are 
 > raw
 > gamma (z) and the next 1000 values are transformed gamma (k) or not. As I get
 > 2000 rows once I import into excel, the p - values beyond 1000 dont look that
 > good, they are very high.
 > 
 > 
 > --
 > sink("a1.txt");
 > 
 > for (i in 1:1000)
 > {
 > x<-rgamma(10, 2.5, scale = 10)
 > y<-rgamma(10, 2.5, scale = 10)
 > z<-wilcox.test(x, y, var.equal = FALSE)
 > print(z)
 > x1<-log(x)
 > y1<-log(y)
 > k<-wilcox.test(x1, y1, var.equal = FALSE)
 > print(k)
 > }
 > 
 > ---
 > any suggestions are welcome
 > 
 > thanks
 > 
 > -devarshi
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Question about 'text' (add lm summary to a plot)

2005-07-21 Thread Christoph Buser
Dear Dan

I can only help you with your third problem, expression and
paste. You can use:

plot(1:5,1:5, type = "n")
text(2,4,expression(paste("Slope : ", 3.45%+-%0.34, sep = "")), pos = 4)
text(2,3.8,expression(paste("Intercept : ", -10.43%+-%1.42)), pos = 4)
text(2,3.6,expression(paste(R^2,": ", "0.78", sep = "")), pos = 4)

I do not have an elegant solution for the alignment.

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Dan Bolser writes:
 > 
 > I would like to annotate my plot with a little box containing the slope,
 > intercept and R^2 of a lm on the data.
 > 
 > I would like it to look like...
 > 
 >  ++
 >  | Slope :   3.45 +- 0.34 |
 >  | Intercept : -10.43 +- 1.42 |
 >  | R^2   :   0.78 |
 >  ++
 > 
 > However I can't make anything this neat, and I can't find out how to
 > combine this with symbols for R^2 / +- (plus minus).
 > 
 > Below is my best attempt (which is franky quite pour). Can anyone
 > improve on the below?
 > 
 > Specifically, 
 > 
 > aligned text and numbers, 
 > aligned decimal places, 
 > symbol for R^2 in the text (expression(R^2) seems to fail with
 > 'paste') and +- 
 > 
 > 
 > 
 > Cheers,
 > Dan.
 > 
 > 
 > dat.lm <- lm(dat$AVG_CH_PAIRS ~ dat$CHAINS)
 > 
 > abline(coef(dat.lm),lty=2,lwd=1.5)
 > 
 > 
 > dat.lm.sum <- summary(dat.lm)
 > dat.lm.sum
 > 
 > attributes(dat.lm.sum)
 > 
 > my.text.1 <-
 >   paste("Slope : ", round(dat.lm.sum$coefficients[2],2),
 > "+/-",  round(dat.lm.sum$coefficients[4],2))
 > 
 > my.text.2 <-
 >   paste("Intercept : ", round(dat.lm.sum$coefficients[1],2),
 > "+/-",  round(dat.lm.sum$coefficients[3],2))
 > 
 > my.text.3 <-
 >   paste("R^2 : ",   round(dat.lm.sum$r.squared,2))
 > 
 > my.text.1
 > my.text.2
 > my.text.3
 > 
 > 
 > ## Add legend
 > text(x=3,
 >  y=300,
 >  paste(my.text.1,
 >my.text.2,
 >my.text.3,
 >sep="\n"),
 >  adj=c(0,0),
 >  cex=1)
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Code Verification

2005-07-20 Thread Christoph Buser
Dear Andy

Since the power of the t-test decreases when there are
discrepancies in the data to the normal distribution and there
is only a small loss of power if the data is normal distributed,
the only reason to use the t.test is his simplicity and the
easier interpretation.
Generally I'd prefer the wilcoxon test even if the data is
normal distributed.
But I agree that for a gamma distribution there is no huge loss
of power.

## example simulation:

n <- 1000
z1 <- numeric(n)
z2 <- numeric(n)

## gamma distribution
for (i in 1:n)
{
x<-rgamma(40, 2.5, 0.1)  
y<-rgamma(40, 3.5, 0.1)  
z1[i]<-t.test(x, y)$p.value
z2[i]<-wilcox.test(x, y)$p.value
}
## Power
1 - sum(z1>0.05)/1000  ## 0.71
1 - sum(z2>0.05)/1000  ## 0.76

##

## t distribution
for (i in 1:n)
{
x<-rt(40, df = 3)  
y<-1 + rt(40, df = 3)  
z1[i]<-t.test(x, y)$p.value
z2[i]<-wilcox.test(x, y)$p.value
}
## Power
1 - sum(z1>0.05)/1000  ## 0.76
1 - sum(z2>0.05)/1000  ## 0.91

##

## normal distribution
for (i in 1:n)
{
x<-rnorm(40, 0, 3)  
y<-1 + rnorm(40, 1, 3)  
z1[i]<-t.test(x, y)$p.value
z2[i]<-wilcox.test(x, y)$p.value
}
## Power
1 - sum(z1>0.05)/1000  ## 0.83
1 - sum(z2>0.05)/1000  ## 0.81

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
------

Liaw, Andy writes:
 > > From: Christoph Buser
 > > 
 > > Hi
 > > 
 > > "t.test" assumes that your data within each group has a normal
 > > distribution. This is not the case in your example.
 > 
 > Eh?  What happen to the CLT?
 > 
 > Andy
 > 
 > 
 > > I would recommend you a non parametric test like "wilcox.test" if
 > > you want to compare the mean of two samples that are not normal
 > > distributed. 
 > > see ?wilcox.test
 > > 
 > > Be careful. Your example produces two gamma distributed samples
 > > with rate = 10, not scale = 10. 
 > > rate = 1/scale.
 > > If you want to use scale, you need to specify this argument 
 > > x<-rgamma(40, 2.5, scale = 10)
 > > see ?rgamma
 > > 
 > > I do not see the interpretation of your result. Since you do
 > > know the distribution and the parameters of your sample, you
 > > know the true means and that they are different. It is only a
 > > question of the sample size and the power of your test, if this
 > > difference is detected.
 > > Is that something you are investigating? Maybe a power
 > > calculation or something similar.
 > > 
 > > Regards,
 > > 
 > > Christoph Buser
 > > 
 > > --
 > > Christoph Buser <[EMAIL PROTECTED]>
 > > Seminar fuer Statistik, LEO C13
 > > ETH (Federal Inst. Technology) 8092 Zurich  SWITZERLAND
 > > phone: x-41-44-632-4673fax: 632-1228
 > > http://stat.ethz.ch/~buser/
 > > --
 > > 
 > > [EMAIL PROTECTED] writes:
 > >  > Hi R Users
 > >  > I have a code which I am running for my thesis work. Just 
 > > want to make sure that
 > >  > its ok. Its a t test I am conducting between two gamma 
 > > distributions with
 > >  > different shape parameters.
 > >  > 
 > >  > the code looks like:
 > >  > 
 > >  > sink("a1.txt");
 > >  > 
 > >  > for (i in 1:1000)
 > >  > {
 > >  > x<-rgamma(40, 2.5, 10)  # n = 40, shape = 2.5, Scale = 10
 > >  > y<-rgamma(40, 2.8, 10)  # n = 40, shape = 2.8, Scale = 10
 > >  > z<-t.test(x, y)
 > >  > print(z)
 > >  > }
 > >  > 
 > >  > 
 > >  > I will appreciate it if someone could tell me if its alrite or not.
 > >  > 
 > >  > thanks
 > >  > 
 > >  > -dev
 > >  > 
 > >  > __
 > >  > R-help@stat.math.ethz.ch mailing list
 > >  > https://stat.ethz.ch/mailman/listinfo/r-help
 > >  > PLEASE do read the posting guide! 
 > > http://www.R-project.org/posting-guide.html
 > > 
 > > 
 > > __
 > > R-help@stat.math.ethz.ch mailing list
 > > https://stat.ethz.ch/mailman/listinfo/r-help
 > > PLEASE do read the posting guide! 
 > > http://www.R-project.org/posting-guid

Re: [R] Code Verification

2005-07-20 Thread Christoph Buser
Hi

"t.test" assumes that your data within each group has a normal
distribution. This is not the case in your example.
I would recommend you a non parametric test like "wilcox.test" if
you want to compare the mean of two samples that are not normal
distributed. 
see ?wilcox.test

Be careful. Your example produces two gamma distributed samples
with rate = 10, not scale = 10. 
rate = 1/scale.
If you want to use scale, you need to specify this argument 
x<-rgamma(40, 2.5, scale = 10)
see ?rgamma

I do not see the interpretation of your result. Since you do
know the distribution and the parameters of your sample, you
know the true means and that they are different. It is only a
question of the sample size and the power of your test, if this
difference is detected.
Is that something you are investigating? Maybe a power
calculation or something similar.

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

[EMAIL PROTECTED] writes:
 > Hi R Users
 > I have a code which I am running for my thesis work. Just want to make sure 
 > that
 > its ok. Its a t test I am conducting between two gamma distributions with
 > different shape parameters.
 > 
 > the code looks like:
 > 
 > sink("a1.txt");
 > 
 > for (i in 1:1000)
 > {
 > x<-rgamma(40, 2.5, 10)  # n = 40, shape = 2.5, Scale = 10
 > y<-rgamma(40, 2.8, 10)  # n = 40, shape = 2.8, Scale = 10
 > z<-t.test(x, y)
 > print(z)
 > }
 > 
 > 
 > I will appreciate it if someone could tell me if its alrite or not.
 > 
 > thanks
 > 
 > -dev
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] initial points for arms in package HI

2005-07-20 Thread Christoph Buser
Dear Martyn

Thank you for your fast and helpful answer. It showed me to
my wrong thinking.

I confused the previous value of the Markov Chain with the
initial values to construct the envelope.

In the original C code by W. Gilks there are the arguments
"xinit" and "xprev". The first is used to construct the
envelope, the second is the previous value of the Markov chain.
There are comments from the author how to set the initial values
"xinit". It is important to choose these values independent from
the current parameter, being updated (in the case of not
log-concave functions).

In the R code and the C code behind the function arms the
argument "xinit" is missing, since it is set in the function
itself and the user can not change it.

Since I confused the two arguments I didn't realize that
"y.start" is the value "xprev", the state of the Markov chain
and set this value 1 or runif(...). But for an implementation of
a Gibbs Sampler this makes no sense.

My example was a little bit misleading, since I simulated 1000
points from the same distribution. In reality my distribution
is changing at each step and I always need only 1 new value in
my Gibbs Sampler. 
But by implementing it correctly with the previous value of the
chain it should work in my context.

Best regards,

Christoph


[EMAIL PROTECTED] writes:
 > Quoting Christoph Buser <[EMAIL PROTECTED]>:
 > 
 > > Dear R-users
 > >
 > > I have a problem choosing initial points for the function arms()
 > > in the package HI
 > > I intend to implement a Gibbs sampler and one of my conditional
 > > distributions is nonstandard and not logconcave.
 > > Therefore I'd like to use arms.
 > >
 > > But there seem to be a strong influence of the initial point
 > > y.start. To show the effect I constructed a demonstration
 > > example. It is reproducible without further information.
 > >
 > > Please note that my target density is not logconcave.
 > >
 > > Thanks for all comments or ideas.
 > >
 > > Christoph Buser
 > 
 > Dear Christoph,
 > 
 > There is a Metropolis step at each iteration of the ARMS sampler, in
 > which it may choose to reject the proposed move to a new point and stick
 > at the current point (This is what the "M" in "ARMS" stands for)  If you
 > do repeated calls to arms with the same starting point, then the
 > iterations where the Metropolis step rejects a move will create a spike
 > in the sample density at your initial value. If you use a uniform random
 > starting point, then your sample density will be a mixture of the
 > target distribution (Metropolis accepts move) and a uniform distribution
 > (Metropolis rejects move).
 > 
 > You should be doing something like this:
 > 
 > res1 <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 1)
 > for(i in 2:1000)
 >   res1[i] <- arms(res1[i-1], logDichteGam, function(x) (x>0)&(x<100), 1)
 > 
 > i.e., using each sampled point as the starting value for the next
 > iteration.  The sequence of values in res1 will then be a correlated
 > sample from the given distribution:
 > 
 > acf(res1)
 > 
 > The bottom line is that you can't use ARMS to draw a single sample
 > from a non-log-concave density.
 > 
 > If you are still worried about using ARMS, you can verify your results
 > using the random walk Metropolis sampler (MCMCmetrop1R) in the package
 > MCMCpack.
 > 
 > Martyn
 > 
 > > ## R Code:
 > >
 > > library(HI)
 > > ## parameter for the distribution
 > > para <- 0.1
 > >
 > > ## logdensity
 > > logDichteGam <- function(x, u = para, v = para) {
 > >   -(u*x + v*1/x) - log(x)
 > > }
 > > ## density except for the constant
 > > propDichteGam <- function(x, u = para, v = para) {
 > >   exp(-(u*x + v*1/x) - log(x))
 > > }
 > > ## calculating the constant
 > > (c <- integrate(propDichteGam, 0, 1000, rel.tol = 10^(-12))$value)
 > > ## density
 > > DichteGam <- function(x, u = para, v = para) {
 > >   exp(-(u*x + v*1/x) - log(x))/c
 > > }
 > >
 > > ## calculating 1000 values by repeating a call of arms (this would
 > > ## be the situation in an Gibbs Sample. Of course in a Gibbs sampler
 > > ## the distribution would change. This is only for demonstration
 > > res1 <- NULL
 > > for(i in 1:1000)
 > >   res1[i] <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 
 > > 1)
 > >
 > > ## Generating a sample of thousand observations with 1 call of arms
 > > res2 <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 1000

[R] initial points for arms in package HI

2005-07-19 Thread Christoph Buser
Dear R-users

I have a problem choosing initial points for the function arms()
in the package HI
I intend to implement a Gibbs sampler and one of my conditional
distributions is nonstandard and not logconcave.
Therefore I'd like to use arms.

But there seem to be a strong influence of the initial point
y.start. To show the effect I constructed a demonstration
example. It is reproducible without further information.  

Please note that my target density is not logconcave.

Thanks for all comments or ideas.

Christoph Buser

## R Code:

library(HI)
## parameter for the distribution
para <- 0.1

## logdensity
logDichteGam <- function(x, u = para, v = para) {
  -(u*x + v*1/x) - log(x)
}
## density except for the constant
propDichteGam <- function(x, u = para, v = para) {
  exp(-(u*x + v*1/x) - log(x))
}
## calculating the constant 
(c <- integrate(propDichteGam, 0, 1000, rel.tol = 10^(-12))$value)
## density
DichteGam <- function(x, u = para, v = para) {
  exp(-(u*x + v*1/x) - log(x))/c
}

## calculating 1000 values by repeating a call of arms (this would
## be the situation in an Gibbs Sample. Of course in a Gibbs sampler
## the distribution would change. This is only for demonstration
res1 <- NULL
for(i in 1:1000)
  res1[i] <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 1)

## Generating a sample of thousand observations with 1 call of arms
res2 <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 1000)

## Plot of the samples 
mult.fig(4)
plot(res1, log = "y")
plot(res2, log = "y")
hist(res1, freq = FALSE, xlim = c(0,4), breaks = seq(0,100,by = 0.1),
 ylim = c(0,1))
curve(DichteGam, 0,4, add = TRUE, col = 2)
hist(res2, freq = FALSE, xlim = c(0,4), breaks = seq(0,100,by = 0.1),
 ylim = c(0,1))
curve(DichteGam, 0,4, add = TRUE, col = 2)


## If we repeat the procedure, using the fix intial value 1,
## the situation is even worse
res3 <- NULL
for(i in 1:1000)
  res3[i] <- arms(1, logDichteGam, function(x) (x>0)&(x<100), 1)

## Generating a sample of thousand observations with 1 call of arms
res4 <- arms(1, logDichteGam, function(x) (x>0)&(x<100), 1000)

## Plot of the samples 
par(mfrow = c(2,2))
plot(res3, log = "y")
plot(res4, log = "y")
hist(res3, freq = FALSE, xlim = c(0,4), breaks = seq(0,100,by = 0.1),
 ylim = c(0,1))
curve(DichteGam, 0,4, add = TRUE, col = 2)
hist(res4, freq = FALSE, xlim = c(0,4), breaks = seq(0,100,by = 0.1),
 ylim = c(0,1))
curve(DichteGam, 0,4, add = TRUE, col = 2)


## If I generate the sample in a for-loop (one by one) I do not
## get the correct density. But this is exactly the situation in 
## my Gibbs Sampler. Therfore I am concerned about the correct 
## application of arms

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Predict

2005-07-19 Thread Christoph Buser
Dear Matthias

Can you provide an example to demonstrate what you did? Two
remarks to your email. Maybe that answers already your question.

1) Using predict() you will get the estimated value for each
   observation or for new data. You can reproduce this value by
   using the coefficients from your estimated model (see the
   example below). 
   For the interval you can get a confidence interval for the
   expected value under fixed conditions of the explanatory
   variables or you can obtain a prediction interval for a
   single new observation. The latter is of course wider, since
   you try to catch a single observation and not the expected
   value. 

2) Using confint() you will get the estimated parameters (which
   are random variables, too) and their confidence interval. 
   You can use the estimated values to calculate the predicted
   values.

But you can NOT use the upper values from confint to
estimate the upper values from predict by just putting them into
your regression model. Thats not the way how confidence
intervals are constructed.
(I am not sure if this was your intention. Maybe if you show a
reproducible example you can correct me if you meant something
different) 

## R Code
## Creation of a dataframe 
set.seed(1)
x1 <- runif(40)
f1 <- rep(c("a", "b", "c","d"), each = 10)
y <- 2*x1 + rep(c(0.5, 0.1, -0.6, 1.5), each = 10) + rnorm(40, 0, 2)
dat <- data.frame(y = y, f1 = f1, x1 = x1)
## regression model
reg <- lm(y~ x1 + f1, data = dat)
summary(reg)

confint(reg)
predict(reg, type=c("response"), interval = "confidence")

## caluclation of predicted values using the estimated
## coefficients 

## estimated coefficients
co <- summary(reg)$coefficients[,"Estimate"]
## Using the regression model with that coefficients
## for observation 11 
co["(Intercept)"] + dat[11,"x1"]*co["x1"] + co["f1b"]
## prediction of observation 11 
predict(reg, type=c("response"))[11]


Regards,

Christoph 

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Matthias Eggenberger writes:
 > When I callculate a linear model, then I can compute via confint the
 > confidencial intervals. the interval level can be chosen. as result, I get
 > the parameter of the model according to the interval level. 
 > 
 > On the other hand, I can compute the prediction-values for my model as well
 > with predict(object, type=c("response") etc.). Here I have also the
 > possibility to chose a level for the confidential intervals. the output are
 > the calculatet values for the fit, the lower and upper level. 
 > 
 > the problem now is, that when I calculate the values through the linear
 > model function with the parameter values I get from confint() an I compare
 > them with the values I get from predict() these values differ extremely. Why
 > is that so? Does the command predict() calculate the values through an other
 > routine? That means the command predict() doesn't use the same parameters to
 > calculate the prediction-values than the ones given by confint()?
 > 
 > Greetings Matthias
 > 
 > -- 
 > GMX DSL = Maximale Leistung zum minimalen Preis!
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Contingency-Coefficient, Variance

2005-07-18 Thread Christoph Buser
Hi

You can easily implement it by:

Mag. Ferri Leberl writes:
 > Dear Everybody!
 > Excuse me for this newbie-questions, but I have not found answers to these 
 > Question:
 > 
 > - Is there a command calculating the variance if the whole group is known 
 > (thus dividing by n instead of n-1)?

var(x)*(n-1)/n

 > 
 > - Which is the command to calculate the
 > - contingency-coefficient?

Have a look at ?chisq.test
There you can calculate and extract the chi-square statistic and
then it is easy to program the needed formula.
Please note that there is an argument "exact" in
chisq.test(). You can set it on FALSE if you do not want to
apply Yates continuity correction.

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


 > 
 > Thank you in advance.
 > Mag. Ferri Leberl
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] maps drawing

2005-07-14 Thread Christoph Buser
There is also the package maptools if you want or need to read
ESRI shapefiles.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


m p writes:
 > Hello,
 > is there a package in R that would allow map drawing:
 > coastlines, country/state boundaries, maybe
 > topography,
 > rivers etc?
 > Thanks for any guidance,
 > Mark
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Name for factor's levels with contr.sum

2005-07-14 Thread Christoph Buser
This is exactly what I've been looking for (without success)
when I was speaking about a more elegant and general solution. 

I agree with your argument that labels might be misleading.  
Nevertheless if a user is aware what contr.sum calculates, it is
practical to have an annotation. 

Thank you very much for the solution.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Prof Brian Ripley writes:
 > One way to do this generally is to make a copy of contr.sum, rename it, 
 > and set the dimnames appropriately.
 > 
 > I think contr.treatment is misleading (it labels contrasts of two levels 
 > by just one of them), and Christoph's labels are informative but 
 > impractically long.  But if you want to label each contrast by the level 
 > it contrasts with the rest,
 > 
 >  cont <- array(0, c(lenglev, lenglev - 1),
 >list(levels, levels[-lenglev]))
 > 
 > in a modified contr.sum will do it. E.g.
 > 
 > > z <- factor(letters[1:3])
 > > contr.treatment(z)
 >b c
 > a 0 0
 > b 1 0
 > c 0 1
 > > contr.sum(z)
 >[,1] [,2]
 > a10
 > b01
 > c   -1   -1
 > > mycontr.sum(z)
 > a  b
 > a  1  0
 > b  0  1
 > c -1 -1
 > 
 > 
 > 
 > 
 > On Wed, 13 Jul 2005, Christoph Buser wrote:
 > 
 > > Dear Ghislain
 > >
 > > I do not know a general elegant solution, but for some
 > > applications the following example may be helpful:
 > >
 > > ## Artificial data for demonstration: group is fixed, species is random
 > > dat <- data.frame(group = c(rep("A",20),rep("B",17),rep("C",24)),
 > >  species = c(rep("sp1", 4), rep("sp2",5),   rep("sp3",5),
 > >rep("sp4",6),  rep("sp5",2),  rep("sp6",5),  
 > > rep("sp7",3),
 > >rep("sp8",3), rep("sp9",4), rep("sp10",6),  
 > > rep("sp11",6),
 > >rep("sp12",6), rep("sp13",6)),
 > >  area = rnorm(61))
 > >
 > > ## You can attach a contrast at your fixed factor of interest "group"
 > > ## Create the contrast you like to test (in our case contr.sum for 3
 > > ## levels)
 > > mat <- contr.sum(3)
 > > ## You can add the names you want to see in the output
 > > ## Be carefull that you give the correct names to the concerned
 > > ## column. Otherwise there is the big danger of misinterpretation.
 > > colnames(mat) <- c(": A against rest", ": B against rest")
 > > ## Attatch the contrast at your factor "group"
 > > dat[,"group"] <- C(dat[,"group"],mat)
 > > ## Now calculate the lme
 > > library(nlme)
 > > reg.lme <- lme(area ~ group, data = dat, random = ~ 1|species)
 > > summary(reg.lme)
 > >
 > > Maybe someone has a better idea how to do it generally.
 > >
 > > Hope this helps
 > >
 > > Christoph Buser
 > >
 > > --
 > > Christoph Buser <[EMAIL PROTECTED]>
 > > Seminar fuer Statistik, LEO C13
 > > ETH (Federal Inst. Technology) 8092 Zurich  SWITZERLAND
 > > phone: x-41-44-632-4673fax: 632-1228
 > > http://stat.ethz.ch/~buser/
 > > --
 > >
 > >
 > > Ghislain Vieilledent writes:
 > > > Good morning,
 > > >
 > > > I used in R contr.sum for the contrast in a lme model:
 > > >
 > > > > options(contrasts=c("contr.sum","contr.poly"))
 > > > > Septo5.lme<-lme(Septo~Variete+DateSemi,Data4.Iso,random=~1|LieuDit)
 > > > > intervals(Septo5.lme)$fixed
 > > > lower est. upper
 > > > (Intercept) 17.0644033 23.106110 29.147816
 > > > Variete1 9.5819873 17.335324 25.088661
 > > > Variete2 -3.3794907 6.816101 17.011692
 > > > Variete3 -0.5636915 8.452890 17.469472
 > > > Variete4 -22.8923812 -10.914912 1.062558
 > > > Variete5 -10.7152821 -1.865884 6.983515
 > > > Variete6 0.2743390 9.492175 18.710012
 > > > Variete7 -23.7943250 -15.070737 -6.347148
 > > > Variete8 -21.7310554 -12.380475 -3.02989

Re: [R] Name for factor's levels with contr.sum

2005-07-13 Thread Christoph Buser
Dear Ghislain

I do not know a general elegant solution, but for some
applications the following example may be helpful:

## Artificial data for demonstration: group is fixed, species is random 
dat <- data.frame(group = c(rep("A",20),rep("B",17),rep("C",24)),
  species = c(rep("sp1", 4), rep("sp2",5),   rep("sp3",5),
rep("sp4",6),  rep("sp5",2),  rep("sp6",5),  rep("sp7",3),
rep("sp8",3), rep("sp9",4), rep("sp10",6),  rep("sp11",6),
rep("sp12",6), rep("sp13",6)),
  area = rnorm(61))

## You can attach a contrast at your fixed factor of interest "group"
## Create the contrast you like to test (in our case contr.sum for 3
## levels)
mat <- contr.sum(3)
## You can add the names you want to see in the output
## Be carefull that you give the correct names to the concerned
## column. Otherwise there is the big danger of misinterpretation.
colnames(mat) <- c(": A against rest", ": B against rest")
## Attatch the contrast at your factor "group"
dat[,"group"] <- C(dat[,"group"],mat)
## Now calculate the lme
library(nlme)
reg.lme <- lme(area ~ group, data = dat, random = ~ 1|species)
summary(reg.lme)

Maybe someone has a better idea how to do it generally.

Hope this helps

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Ghislain Vieilledent writes:
 > Good morning,
 > 
 > I used in R contr.sum for the contrast in a lme model:
 > 
 > > options(contrasts=c("contr.sum","contr.poly"))
 > > Septo5.lme<-lme(Septo~Variete+DateSemi,Data4.Iso,random=~1|LieuDit)
 > > intervals(Septo5.lme)$fixed
 > lower est. upper
 > (Intercept) 17.0644033 23.106110 29.147816
 > Variete1 9.5819873 17.335324 25.088661
 > Variete2 -3.3794907 6.816101 17.011692
 > Variete3 -0.5636915 8.452890 17.469472
 > Variete4 -22.8923812 -10.914912 1.062558
 > Variete5 -10.7152821 -1.865884 6.983515
 > Variete6 0.2743390 9.492175 18.710012
 > Variete7 -23.7943250 -15.070737 -6.347148
 > Variete8 -21.7310554 -12.380475 -3.029895
 > Variete9 -27.9782575 -17.480555 -6.982852
 > DateSemi1 -5.7903419 -1.547875 2.694592
 > DateSemi2 3.6571596 8.428417 13.199675
 > attr(,"label")
 > [1] "Fixed effects:"
 > 
 > How is it possible to obtain a return with the name of my factor's levels as 
 > with contr.treatment ?
 > 
 > Thanks for you help.
 > 
 > -- 
 > Ghislain Vieilledent
 > 30, rue Bernard Ortet 31 500 TOULOUSE
 > 06 24 62 65 07
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] adding a factor column based on levels of another factor

2005-07-12 Thread Christoph Buser
Hi Karen 

I am not sure if I understand correctly your question. If no,
please ignore this answer.
Do you want a new factor "group" which contains the same
information like "species", just with other names, e.g
1,2,... or "A","B",... ?

If yes you can do it like this

## Your data.frame (without ht & diam)
dat <- data.frame(uniqueID = factor(1:8),
  species = c("sp2", "sp2", "sp3", "sp65", "sp43", "sp2",
"sp12", "sp3"),
  elev = c(3.5, 4.2, 3.2, 2.2, 5.4, 2.5, 1.1, 3.4))
str(dat)
## new factor group (copy of species)
dat[,"group"] <- dat[,"species"]
## rename the levels into "1", "2", ... or whatever you want:
levels(dat[,"group"]) <- list("3" = "sp12", "4" = "sp2", "2" = "sp3",
  "5" = "sp43", "1" = "sp65")
## control
dat[,"species"]
dat[,"group"]

Please be careful. This only changes the labels into 1,2,...
If you use for example 

as.numeric(dat[,"group"])

you will get the values that are behind the original
alphabetical ordering, meaning "sp12" is 1, "sp2" is 2, etc.
You can change this, too if necessary, using as.character() and
as.numeric() as well.

I hope this is helpful fro your problem.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

Karen Kotschy writes:
 > Hi R users
 > 
 > Does anyone out there have a better/quicker way of adding a factor column
 > to a data frame based on levels of another factor?
 > 
 > I have a (large) data frame consisting of records for individual plants,
 > each represented by a unique ID number. The species of each plant is
 > indicated in the column "species", which is a factor column with many
 > different levels (species). There are multiple records for each species,
 > and there is no pattern to the order in which the species names appear in
 > the data frame.
 > 
 > e.g.
 >uniqueID species   elev   ht   diam
 > 1  1sp2   3.51.3  55
 > 2  2sp2   4.20.5  15
 > 3  3sp3   3.21.0  13
 > 4  4sp65  2.22.0  14
 > 5  5sp43  5.45.7  20
 > 6  6sp2   2.54.1  32
 > 7  7sp12  1.10.9  5
 > 8  8sp3   3.43.6  2
 > 
 > I would like to add a factor column to this data frame, indicating to
 > which group each individual belongs. All individuals of the same species
 > will belong to the same group.
 > 
 > Is there a quick way of saying "for all instances of species1, give the
 > value 5, for all instances of species2, give the value 4, etc" (where 5
 > and 4 are levels of a factor)?
 > 
 > The only way I can think of doing it is to split the data frame by
 > species, then add a column to each subset showing the group, then
 > re-join all the subsets. This seems clumsy and prone to errors. Anyone
 > know a better way?
 > 
 > I've looked at expand.grid and gl but they don't seem to do what I want.
 > 
 > Thanks!
 > 
 > Karen Kotschy
 > Centre for Water in the Environment
 > University of the Witwatersrand
 > Johannesburg
 > South Africa
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to get the minimum ?

2005-07-07 Thread Christoph Buser
Dear Philipe

You can use optimize (see ?optimize), e.g. :

funToMin <- function(x, data, a = 1, b = 1) {
  sum((data[data[,"group"]=="A","y"] - x)^2) +
sum((data[data[,"group"]=="B","y"] - a*x - b)^2)
}

dat <- data.frame(y = rnorm(100), group = rep(c("A","B"), each = 50))
(m <- optimize(function(x) funToMin(x,dat), interval = c(-10,10)))

Please be careful. This function is only for demonstration
issue. It is bad programmed. It works if x is only 1 number,
but if you call the function, using a vector instead of a single
number (and I do not prevent this by checking it), you will get
warnings or errors. Therefore it will be better to use your own,
hopefully better programmed function in optimize.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Philippe Lamy writes:
 > Hi,
 > 
 > I have a model with differents observations Xi.
 > Each observation belongs to a group, either A or B.
 > I would like to minimize a fonction like :
 > 
 > sum( Xi - Z)^2 + sum (Xi - aZ -b)^2
 >  AB
 > 
 > The first sum contains all observations from group A and the second all
 > observations from group B.
 > I want to find the Z-value wich minimize this function. a and b are 
 > predefined
 > parameters.
 > 
 > Thanks for help.
 > 
 > Philippe
 > 
 > __
 >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ks.test() output interpretation

2005-06-28 Thread Christoph Buser
Hi

I would recommend graphical methods to compare two samples from 
possible different distributions. See ?qqplot
Since the Kolmogorov-Smirnov test has in many cases very small
power, you can not conclude that two sample come from the same
distribution only because the ks.test is not significant.

The following example shows you one problem:
In a short simulation we generate 1000 times two samples (with
100 observation per sample). The first sample has a standard
normal distribution, the second a t-distribution with 1 degree
of freedom. For each of these 1000 pairs we calculate the
ks.test and save the p.value.

x1 <- matrix(nrow = 100, ncol = 1000)
y1 <- matrix(nrow = 100, ncol = 1000)
test1 <- numeric(1000)
for(i in 1:1000) {
  set.seed(i)
  x1[,i] <- rnorm(100)
  y1[,i] <- rt(100, df = 1)
  test1[i] <- ks.test(x1[,i],y1[,i])$p.value
}
sum(test1<0.05)


Only in 309 of 1000 cases the test shows a significant
difference of the two samples. In all other cases we would
conclude that the two sample have the same distribution.
This is an example with 100 observation per group. If you have
smaller groups the power is even worse.

If we look at 10 randomly drawn pairs of the 1000 simulations
and plot the qqplot:

par(mfrow = c(3,3))
ind <- sample(1:1000, 9)
tmp <- sapply(ind, function(j) qqplot(x1[,j],y1[,j], xlab = paste("x1[,",j,"]"),
  ylab = paste("y1[,",j,"]")))

In many cases we see that the two distributions are
different. Compare it to the qqplot of two normal distributed
random variables:

x2 <- matrix(rnorm(900), nrow = 100, ncol = 9)
y2 <- matrix(rnorm(900), nrow = 100, ncol = 9)
par(mfrow = c(3,3))
tmp <- sapply(1:9, function(j) qqplot(x2[,j],y2[,j], xlab = paste("x2[,",j,"]"),
  ylab = paste("y2[,",j,"]")))

Of course there are situations for which the graphical methods
fail, too, but it becomes apparent that it is a descriptive way
to describe two distributions.
Calculating the Kolmogorov-Smirnov test pretends a clear test
result (that two distribution are the same) which is wrong or at
least misleading.

Best regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


kapo coulibaly writes:
 > I'm using ks.test() to compare two different
 > measurement methods. I don't really know how to
 > interpret the output in the absence of critical value
 > table of the D statistic. I guess I could use the
 > p-value when available. But I also get the message
 > "cannot compute correct p-values with ties ..." does
 > it mean I can't use ks.test() for these data or I can
 > still use the D statistic computed to make a decision
 > whether the two samples come from the same
 > distribution.
 > 
 > Thanks!!
 > 
 > 
 >  
 >  
 > 
 > Rekindle the Rivalries. Sign up for Fantasy Football
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] is.all.equal

2005-06-26 Thread Christoph Buser
Hi

It is corrected in the newer R-version. The help page now says: 

'all.equal(x,y)' is a utility to compare R objects 'x' and 'y'
testing "near equality".  If they are different, comparison is
still made to some extent, and a report of the differences is
returned.Don't use 'all.equal' directly in 'if'
expressions-either use 'isTRUE(all.equal())' or 'identical' if
appropriate.

My R version is:

> version

platform x86_64-unknown-linux-gnu
arch x86_64  
os   linux-gnu   
system   x86_64, linux-gnu   
status   
major2   
minor1.1 
year 2005
month06      
day      20  
language R   

Best regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


hadley wickham writes:
 > Hi,
 > 
 > The description of all.equal states "is.all.equal should be used for
 > programming, typically in if expressions. It is a simple wrapper using
 > identical as shown in the documentation there.", but is.all.equal is
 > not explicitly defined there (although there is a hint in the comments
 > that is.all.equal <- function(x,y) isTRUE(all.equal(x,y))).
 > 
 > Could the documentation be corrected? (or even better, how about
 > defining is.all.equal by default)
 > 
 > Thanks,
 > 
 > Hadley
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] abbreviate

2005-06-16 Thread Christoph Buser
Down to 17 if we carry on your idea using the matching property
on the left hand side, too. :-)

p$h=pmax(p$h,p$s)


Gabor Grothendieck writes:
 > Agree that this definitely should be pursued. :)  In fact,
 > we can shave off several keystrokes by 
 > 
 > - replacing p[[1]] with p[1] on the left hand side
 > - p$high and p$settle with p$h and p$s on the right hand
 >   side (which makes use of the matching property of $)
 > - '=' instead of '<-'
 > - remove all the spaces
 > 
 > This gets it down to 18 characters:
 > 
 > p[1]=pmax(p$h,p$s)
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Simplify formula for iterative programming

2005-06-03 Thread Christoph Buser
Dear Stef

Please check my code below carefully, because it is Friday
afternoon and there might be mistakes in it:

## initialize two random vectors (length n = 100)
a <- rnorm(100)
b <- rnorm(100)
## computation of your formula (see ?outer which is very
## useful here) 
(c <- sum(sqrt(outer(a,a,"-")^2 + outer(b,b,"-")^2)))
## expand a and b to length n+1
a1 <- c(a,rnorm(1))
b1 <- c(b,rnorm(1))
## computation calculating the whole sum
(c1 <- sum(sqrt(outer(a1,a1,"-")^2 + outer(b1,b1,"-")^2)))
## computation using the result of the smaller vectors
c + 2*sum(sqrt((a1 - a1[length(a1)])^2 + (b1 - b1[length(b1)])^2))

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Stefaan Lhermitte writes:
 > Dear R-ians,
 > 
 > I am looking for the simplification of a formula to improve the 
 > calculation speed of my program. Therefore I want to simplify the 
 > following formula:
 > 
 > H = Si (Sj ( sqrt [ (Ai - Aj)² + (Bi - Bj)²  ] ) )
 > 
 > where:
 > A, B = two vectors (with numerical data) of length n
 > sqrt = square root
 > Si = summation over i  (= 0 to n)
 > Sj = summation over j (= 0 to n)
 > Ai = element of A with index i
 > Aj = element of A with index j
 > Bi = element of B with index i
 > Bj = element of B with index j
 > 
 > n is not fixed, but it changes with every run for my program. Therefore 
 > for I am looking for a simplication of h in order to calculate it when 
 > my A and B get extendend by 1 element (n = n + 1).
 > 
 > I know a computional simplified formula exists for the standard 
 > deviation (sd) that is much easier in iterative programming. Therefore I 
 > wondered I anybody knew about analog simplifications to simplify H:
 > 
 > sd = sqrt [ ( Si (Xi - mean(X) )² ) /n  ]  -> simplified computation -> 
 > sqrt [ (n * Si( X² ) - ( Si( X ) )² )/ n² ]
 > 
 > This simplied formula is much easier in iterative programming, since I 
 > don't have to keep every element of X.
 > E.g.: I have a vector X[1:10]  and I already have caculated Si( X[1:10]² 
 > ) (I will call this A) and Si( X ) (I will call this B).
 > When X gets extendend by 1 element (eg. X[11]) it easy fairly simple to 
 > calculate sd(X[1:11]) without having to reuse the elements of X[1:10].
 > I just have to calculate:
 > 
 > sd = sqrt [ (n * (A + X[11]²) - (A + X[11]²)² ) / n² ]
 > 
 > This is fairly easy in an iterative process, since before we continue 
 > with the next step we set:
 > A = (A + X[11]²)
 > B = (B + X[11])
 > 
 > Can anybody help me to do something comparable for H? Any other help to 
 > calculate H easily in an iterative process is also welcome!
 > 
 > Kind regards,
 > Stef
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] mplot :how to deal with missing data

2005-06-02 Thread Christoph Buser
If you have to deal with missing values, you might be interested
in na.omit().
?na.omit

Regards,

Christoph

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


NATALIA F TCHETCHERINA writes:
 > Hello all,
 > I have  data:
 > 
 >Genes   time rep vart dye y trt
 > 130911   sa1-d07 030min   1  col   g  9.636244   o
 > 145771   sa1-d07 030min   1  col   r  8.107577   c
 > 93335sa1-d07 030min   1  ler   g  7.409566   o
 > 94821sa1-d07 030min   1  ler   r  5.107160   c
 > 10119101 sa1-d07 030min   2  col   g  8.336862   o
 > 11605101 sa1-d07 030min   2  col   r  7.824530   c
 > 725313   sa1-d07 030min   2  ler   g  8.249347   o
 > 740171   sa1-d07 030min   2  ler   r  7.565084   c
 > 1160522  sa1-d07 030min   3  col   gNA   c
 > 1011922  sa1-d07 030min   3  col   rNA   o
 > 562232   sa1-d07 030min   3  ler   g  9.974227   c
 > 547362   sa1-d07 030min   3  ler   r 10.341149   o
 > ..
 > ..
 > ..
 > 
 > I would like to get graphs means for two-way factor combinations
 > I used Rlab package:
 > > mplot(data$y[which(data$Genes==sa1-d07)],
 > data$time[which(data$Genes==sa1-d07)], data$trt[which(data$Genes==sa1-d07)])
 > 
 > However, I have the following error message:
 > 
 > plot window will lay out plots in a 3 by 1 matrix 
 > Error in plot.window(xlim, ylim, log, asp, ...) : 
 > need finite 'ylim' values
 > In addition: Warning messages:
 > 1: no finite arguments to min; returning Inf 
 > 2: no finite arguments to max; returning -Inf 
 > > 
 > I think this is because of some y='NA'.
 > My question is: how I can deal with this problem?
 > 
 > Sincerely, Natalia.
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to name variables in a single plot

2005-06-01 Thread Christoph Buser
Dear Amari

Why not use legend?

y<- c(1:100)
x1<-seq(0.1,10, by=0.1) 
x2<-seq(0.5,50,by=0.5)
mydata<- data.frame(  y=y,  x1=x1, x2=x2)
matplot(mydata, type = "l" ,xlab="Time",ylab="MSE ", col = 1:3, lty = 1:3)
legend(10,90,c("line1", "line2", "line3"), col = 1:3, lty = 1:3)

You can work with text to place any text within the plot
See ?legend?text

Regards,

Christoph

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

Amir Safari writes:
 > 
 >  
 > 
 > Dear R Friends ,
 > 
 > I want to name my variables( more than 2 variables in a single plot) within 
 > a plot to distinct them from each other, but I cann't. How it is possible? I 
 > don't mean x and y axis using xlab or ylab. At the below , it follows some 
 > lines, only as an example that you could try please, if it is possible. I 
 > really thanks for your attention.
 > 
 > Amir
 > 
 >  
 > 
 >  
 > 
 > library(graphics)
 > 
 > y<- c(1:100)
 > 
 > x1<-seq(0.1,10, by=0.1) 
 > 
 > x2<-seq(0.5,50,by=0.5)
 > 
 > mydata<- data.frame(  y=y,  x1=x1, x2=x2)
 > 
 > matplot(mydata, type = "l" )
 > 
 > matplot(mydata, type = "l" ,xlab="Time")
 > 
 > matplot(mydata, type = "l" ,xlab="Time",ylab="MSE ")
 > 
 > 
 > 
 > __
 > 
 > 
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] many chr2factors ?

2005-06-01 Thread Christoph Buser
Dear Christian

If you create your data frame by using data.frame all characters
are automatically transformed into factors unless you force them
to stay a character. Maybe that can solve your problem easily.

dat <- data.frame(a=1:10, b=letters[1:10])
str(dat)
  `data.frame': 10 obs. of  2 variables:
  $ a: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
  $ b: int  1 2 3 4 5 6 7 8 9 10
 
Assuming that doesn't solve your problem due to the way your
data frame are created you can do it afterwards.

There are two problems with your code. 

First: (and that causes the error) you use in your repeat 

if(!is.character(df[,i]))
  next

Imagine that the last column of you data frame is not a
character you jump to the next cycle and then you are outside of
the range of your data frame. Your break condition is ignored.

Second: You change your data frame inside of a
function. Variables that are created or changed within a
function are local. Their life ends with the end of the
function. Therefore all changes you do will have no effect on
the global data frame you want to change. See the example:

dat1 <- structure(list(a = 1:10, b = letters[1:10]), .Names = c("a", "b"),
  row.names = as.character(1:10), class = "data.frame")
str(data.frame(dat1))
  `data.frame': 10 obs. of  2 variables:
  $ a: int  1 2 3 4 5 6 7 8 9 10
  $ b: chr  "a" "b" "c" "d" ...
tofac(dat1)
  [1] 2
str(data.frame(dat1))
  `data.frame': 10 obs. of  2 variables:
  $ a: int  1 2 3 4 5 6 7 8 9 10
  $ b: chr  "a" "b" "c" "d" ...

You can use the following code instead

tofac <- function(x){
  for(i in 1:length(x)) {
if(is.character(x[,i]))
  x[,i] <- factor(x[,i])
  }
  x
}

dat1 <- tofac(dat1)
  [1] 2
str(dat1)
  `data.frame': 10 obs. of  2 variables:
  $ a: int  1 2 3 4 5 6 7 8 9 10
  $ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10

The for loop avoids the problem with the index. Therefore it
works in example that have a non character variable in the last
column, too and by returning x at the end you are sure that you
object keeps existing.

Regards,

Christoph

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--

christian schulz writes:
 > Hi,
 > 
 > i would like transfrom 
 > characters from a data.frame to factors automatic.
 > 
 >  > tofac <- function(df){
 > + i=0
 > + repeat{
 > + i <- i+1
 > + if(!is.character(df[,i]))
 > + next
 > + df[,i] <- as.factor(df[,i])
 > + print(i)
 > + if(i == length(df))
 > + break }
 > + }
 >  >
 >  > tofac(abrdat)
 > [1] 7
 > [1] 8
 > [1] 9
 > [1] 11
 > [1] 13
 > [1] 15
 > Error in "[.data.frame"(df, , i) : undefined columns selected
 > 
 > This are the correct columns and i get the idea put into the loop
 > a empty matrix with dimension like df and return it!?
 > 
 > Another check?
 > abrdat2 <- apply(abrdat,2,function(x) 
 > ifelse(is.character(x),as.factor(x),x))
 > 
 > 
 > many thanks & regards,
 > christian
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] POSIX problem

2005-05-31 Thread Christoph Buser
Dear David

I tried to reproduce your example. It is not exact the same. When I create
a dataframe, character are transformed to factors except I use I() to
protect "them".

> PeopleData.df <- data.frame(StartDate = c("29/10/2001", "7/12/2001",
"16/11/2001", "28/11/2001", "2/11/2001",
"26/11/2001"),
  StartTime = c("15:26", "10:32", "13:58", "14:00",
"15:22", "11:15"))
> str(PeopleData.df)
data.frame':6 obs. of  2 variables:
 $ StartDate: Factor w/ 6 levels "16/11/2001","2/..",..: 5 6 1 4 2 3
 $ StartTime: Factor w/ 6 levels "10:32","11:15",..: 6 1 3 4 5 2

So there is a small difference to your dataframe. Then the following
produces what you want

> as.POSIXct(strptime(paste(PeopleData.df$StartDate,PeopleData.df$StartTime),
  format="%d/%m/%Y %H:%M"))
[1] "2001-10-29 15:26:00 CET" "2001-12-07 10:32:00 CET"
[3] "2001-11-16 13:58:00 CET" "2001-11-28 14:00:00 CET"
[5] "2001-11-02 15:22:00 CET" "2001-11-26 11:15:00 CET"

Regards,

Christoph

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



David Scott writes:
 > 
 > I am having trouble with creating a POSIXct object. If I create a variable 
 > of class Date first out of the date part of my data, I am ok, but if I 
 > just paste the date and time parts together and try and create the POSIXct 
 > object, I have problems.
 > 
 > Here is a toy example created from the actual data which caused the 
 > problem. I am using R 2.0.1 on Windows XP.
 > 
 > > # Data frame with dates and times, as character
 > > PeopleData.df
 > StartDate StartTime
 > 1 29/10/2001 15:26
 > 2  7/12/2001 10:32
 > 3 16/11/2001 13:58
 > 4 28/11/2001 14:00
 > 5  2/11/2001 15:22
 > 6 26/11/2001 11:15
 > > str(PeopleData.df)
 > `data.frame':   6 obs. of  2 variables:
 >   $ StartDate: chr  "29/10/2001" "7/12/2001" "16/11/2001" "28/11/2001" ...
 >   $ StartTime: chr  "15:26" "10:32" "13:58" "14:00" ...
 > > dput(PeopleData.df)
 > structure(list(StartDate = c("29/10/2001", "7/12/2001", "16/11/2001",
 > "28/11/2001", "2/11/2001", "26/11/2001"), StartTime = c("15:26",
 > "10:32", "13:58", "14:00", "15:22", "11:15")), .Names = c("StartDate",
 > "StartTime"), row.names = c("1", "2", "3", "4", "5", "6"), class = 
 > "data.frame")
 > > BeginDate <- as.Date(PeopleData.df$StartDate,format="%d/%m/%Y")
 > > BeginDate
 > [1] "2001-10-29" "2001-12-07" "2001-11-16" "2001-11-28" "2001-11-02"
 > [6] "2001-11-26"
 > > # Create POSIXct date-time object without difficulty
 > > BeginTime <- as.POSIXct(format(paste(BeginDate,PeopleData.df$StartTime),
 > + format="%Y/%m/%d %H:%M"))
 > > BeginTime
 > [1] "2001-10-29 15:26:00 New Zealand Standard Time"
 > [2] "2001-12-07 10:32:00 New Zealand Standard Time"
 > [3] "2001-11-16 13:58:00 New Zealand Standard Time"
 > [4] "2001-11-28 14:00:00 New Zealand Standard Time"
 > [5] "2001-11-02 15:22:00 New Zealand Standard Time"
 > [6] "2001-11-26 11:15:00 New Zealand Standard Time"
 > > # But not directly from the dates and times
 > > BeginTime <- 
 > as.POSIXct(format(paste(PeopleData.df$StartDate,PeopleData.df$StartTime),
 > + format="%d/%m/%Y %H:%M"))
 > > BeginTime
 > [1] "0029-10-20 New Zealand Standard Time"
 > [2] "0007-12-20 New Zealand Standard Time"
 > [3] "0016-11-20 New Zealand Standard Time"
 > [4] "0028-11-20 New Zealand Standard Time"
 > [5] "0002-11-20 New Zealand Standard Time"
 > [6] "0026-11-20 New Zealand Standard Time"
 > > # Format looks correct to me
 > > paste(PeopleData.df$StartDate,PeopleData.df$StartTime)
 > [1] "29/10/2001 15:26" "7/12/2001 10:32"  &

Re: [R] adjusted p-values with TukeyHSD?

2005-05-17 Thread Christoph Buser
Dear Christoph

You can use the multcomp package. Please have a look at the
following example:

library(multcomp)

The first two lines were already proposed by Erin Hodgess:

summary(fm1 <- aov(breaks ~ wool + tension, data = warpbreaks))
TukeyHSD(fm1, "tension", ordered = TRUE)

Tukey multiple comparisons of means
95% family-wise confidence level
factor levels have been ordered
 
Fit: aov(formula = breaks ~ wool + tension, data = warpbreaks)

$tension
 difflwr  upr
M-H  4.72 -4.6311985 14.07564
L-H 14.72  5.3688015 24.07564
L-M 10.00  0.6465793 19.35342
 

By using the functions simtest or simint you can get the
p-values, too:

summary(simtest(breaks ~ wool + tension, data = warpbreaks, whichf="tension",
type = "Tukey"))

 Simultaneous tests: Tukey contrasts 

Call: 
simtest.formula(formula = breaks ~ wool + tension, data = warpbreaks, 
whichf = "tension", type = "Tukey")

 Tukey contrasts for factor tension, covariable:  wool 

Contrast matrix:
  tensionL tensionM tensionH
tensionM-tensionL 0 0   -110
tensionH-tensionL 0 0   -101
tensionH-tensionM 0 00   -11


Absolute Error Tolerance:  0.001 

Coefficients:
  Estimate t value Std.Err. p raw p Bonf p adj
tensionH-tensionL  -14.722  -3.8023.872 0.000  0.001 0.001
tensionM-tensionL  -10.000  -2.5823.872 0.013  0.026 0.024
tensionH-tensionM   -4.722  -1.2193.872 0.228  0.228 0.228



or if you prefer to get the confidence intervals, too, you can
use:

summary(simint(breaks ~ wool + tension, data = warpbreaks, whichf="tension",
type = "Tukey"))

Simultaneous 95% confidence intervals: Tukey contrasts

Call: 
simint.formula(formula = breaks ~ wool + tension, data = warpbreaks, 
whichf = "tension", type = "Tukey")

 Tukey contrasts for factor tension, covariable:  wool 

Contrast matrix:
  tensionL tensionM tensionH
tensionM-tensionL 0 0   -110
tensionH-tensionL 0 0   -101
tensionH-tensionM 0 00   -11

Absolute Error Tolerance:  0.001 

 95 % quantile:  2.415 

Coefficients:
  Estimate   2.5 % 97.5 % t value Std.Err. p raw p Bonf p adj
tensionM-tensionL  -10.000 -19.352 -0.648  -2.5823.872 0.013  0.038 0.034
tensionH-tensionL  -14.722 -24.074 -5.370  -3.8023.872 0.000  0.001 0.001
tensionH-tensionM   -4.722 -14.074  4.630  -1.2193.872 0.228  0.685 0.447

-
Please be careful: The resulting confidence intervals in
simint are not associated with the p-values from 'simtest' as it
is described in the help page of the two functions.
-

I had not the time to check the differences in the function or
read the references given on the help page.
If you are interested in the function you can check those to
find out which one you prefer.

Best regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--


Christoph Strehblow writes:
 > hi list,
 > 
 > i have to ask you again, having tried and searched for several days...
 > 
 > i want to do a TukeyHSD after an Anova, and want to get the adjusted  
 > p-values after the Tukey Correction.
 > i found the p.adjust function, but it can only correct for "holm",  
 > "hochberg", bonferroni", but not "Tukey".
 > 
 > Is it not possbile to get adjusted p-values after Tukey-correction?
 > 
 > sorry, if this is an often-answered-question, but i didn´t find it on  
 > the list archive...
 > 
 > thx a lot, list, Chris
 > 
 > 
 > Christoph Strehblow, MD
 > Department of Rheumatology, Diabetes and Endocrinology
 > Wilhelminenspital, Vienna, Austria
 > [EMAIL PROTECTED]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] filename

2005-05-10 Thread Christoph Buser
Hi 

the following works for me:

j <- "box"
z <- "plot"

jpeg(file=paste("YOURPATH",j,z,".jpg", sep = ""))
boxplot(rnorm(100))
dev.off()

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-44-632-4673 fax: 632-1228
http://stat.ethz.ch/~buser/
--



Lars writes:
 > Hey,
 > 
 > I'm generating a .jpeg file for a web application passing parameters to 
 > R via cgi. R gets the parameters by using commandArgs, which are passed 
 > to a variable named j  and z
 > 
 > j<-commandArgs()[3]
 > z<-commandArgs()[4]
 > 
 > Later I want to use the characters strings of the argument which the 
 > variables are holding for my filename, e.g.:
 > 
 > jpeg(file="d:/data/images/jz.jpeg",...)
 > 
 > i tried to use paste and as.character, but neither worked out the way i 
 > want it to.
 > 
 > thanks for any advice, Lars Claussen
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Pb with ks.test pvalue

2005-03-22 Thread Christoph Buser
Dear Anthony

I don't know how SAS calculates the p-value, but in R the
p-value is calculated under the assumption that the parameters
of the distribution (you want to compare with your samples) are
known and not estimated from the data.

In your example you estimate them from the data (by mean(w) and
sd(w) and therefore the p-values are not reliable. 
Somehow you fit the theoretical distribution to well to your
data (using mean and sd, estimated from the data).
Hence you are too conservative and the p.values are two large.
Maybe SAS does a correction for the estimation of the parameters
and therefore gets smaller p-values, but this is pure
speculation since I don't know the way how SAS is doing the
calculation.

I did a simulation and created 1 samples from a normal
distribution and calculated the ks.test. I expected around 500 
significant results (on the level 0.05) by chance and got 1 or
2. 

I recommend to use graphical methods (e.g. normal plots) to
validate the normal distribution of your data instead of testing
it.  
See also ?qqnorm or ?qqplot.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--



Anthony Landrevie writes:
 > 
 > Hello,
 > 
 > While doing test of normality under R and SAS, in order to prove the 
 > efficiency of R to my company, I notice
 > 
 > that Anderson Darling, Cramer Van Mises and Shapiro-Wilk tests results are 
 > quite the same under the two environnements,
 > 
 > but the Kolmogorov-smirnov p-value really is different.
 > 
 > Here is what I do:
 > 
 > > ks.test(w,pnorm,mean(w),sd(w))
 > 
 > One-sample Kolmogorov-Smirnov test
 > 
 > data: w 
 > 
 > D = 0.2143, p-value = 0.3803
 > 
 > alternative hypothesis: two.sided 
 > 
 > > w
 > 
 > [1] 3837 3334 2208 1745 2576 3208 3746 3523 3430 3480 3116 3428 2184 2383 
 > 3500 3866 3542
 > 
 > [18] 3278
 > 
 >  
 > 
 > SAS results:
 > 
 > Kolmogorov-Smirnov D 0.214278 Pr > D 0.0271
 > 
 > Why is the p-value so high under R? Much higher than with other tests.
 > 
 > Best regards,
 > 
 > Anthony Landrevie (French Student)
 > 
 > 
 >  
 > -
 > 
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] setAs between a new S4 class and "POSIXlt"

2005-03-22 Thread Christoph Buser
Dear R gurus

I've a question concerning the transformation of a new S4 class
(that I defined) and the existing class "POSIXlt". I did the
following: 

## Definition of the new class:
setClass("dtime",
 representation(data = "POSIXlt"),
 prototype(data = as.POSIXlt("2004/06/01")))

## Transformation between the new class "dtime" and "POSIXlt":
setAs("dtime", "POSIXlt", def = function(from) {
  [EMAIL PROTECTED]
})
setAs("POSIXlt", "dtime", def = function(from) {
  new("dtime", data = from)
})

## Create a new "dtime" object:
(x1 <- new("dtime"))
str(x1)
## Transformation to "POSIXlt" class works well:
(y1 <- as(x1, "POSIXlt"))
str(y1)

## Transformation to "dtime" class fails
as(y2, "dtime")
> Fehler in insertMethod(methods, sig, args, def, TRUE) : 
inserting method into non-methods-list object (class "NULL")

## This works properly
new("dtime", data = y2)



## Now I put another setAs for the Subclass "POSIXt"
setAs("POSIXt", "dtime", def = function(from) {
  new("dtime", data = from)
})

## and the transformation to "dtime" class works:
as(y2, "dtime")


I tried to understand what happend, without success. Could
someone give me a hint or a reference (some help page or others) 
to improve my understanding, please?
Thank you very much.

I work with:

platform i686-pc-linux-gnu   
arch i686
os   linux-gnu   
system   i686, linux-gnu 
status   Under development (unstable)
major2   
minor1.0     
year 2005
month03  
day  22  
language R   


Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] wilcox.test statistics

2005-03-03 Thread Christoph Buser
Hi,

a value of 0 for the test statistic is possible. The test
statistic is not just the sum of ranks, but this sum - n*(n+1)/2,
where n is the number of observations of the group the rank sum
is build.
This statistic is equivalent to the ranks sums, since it
differs only about a constant, which depends on the number of
observations.
Look at the following situation

> x <- 1:10
> y <- 11:20
> wilcox.test(x,y)
Wilcoxon rank sum test

data:  x and y 
W = 0, p-value = 1.083e-05
alternative hypothesis: true mu is not equal to 0 

When every observation of group1 is smaller than those of group2
the rank sum of the smaller group is
sum(1:n1) = sum(1:10) = 10*(10+1)/2 = n1*(n1+1)/2 
If you compare this to the test statistics, you'll observe that
in this case the test statistic is 0.

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--


Liu Ting-Yuan writes:
 > 
 > Hi,
 > 
 > Could anyone provide the formula of the statistics which the wilcox.test 
 > used for the two-sample rank-sum test?  I got some statistics of 0 values, 
 > but it is impossible to have 0 "rank-sum".  Does the function use the 
 > Mann-Whitney U test statistics?  Thanks.
 > 
 > Ting-Yuan Liu
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with se.contrast()

2005-02-22 Thread Christoph Buser
Dear Prof Ripley, Dear Prof Dalgaard 

Thank you both for your help. I tried it with helmert contrasts
and got a result that is consistent with lme. I didn't realize
that the parameterization of the model has an influence on the
contrasts that I tried to test. 
It seems that I should read a little bit more about this issue
for my better understanding.

I have one last point to propose:

You could include (as interim solution) a warning (that there might
be an in efficiency loss) in se.contrast() if one uses
non-orthogonal contrasts and a multi-stratum model.

Best regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--


Prof Brian Ripley writes:
 > On Mon, 21 Feb 2005, Peter Dalgaard wrote:
 > 
 > > Prof Brian Ripley <[EMAIL PROTECTED]> writes:
 > >
 > >>>> test.aov <- with(testdata,aov(Measurement ~ Material + 
 > >>>> Error(Lab/Material)))
 > >>>> se.contrast(test.aov,
 > >>>> 
 > >>>> list(Material=="A",Material=="B",Material=="C",Material=="D"),
 > >>>> coef=c(0.5,0.5,-0.5,-0.5),data=testdata)
 > >>>> [1] 0.1432572
 > >>>
 > >>> I got a different result and I have admit that I didn't understand why
 > >>> there is a differnce between the lme model and this one. There are some
 > >>> comments in the help pages but I'm not sure if this is the answer.
 > >>
 > >> It is.  You used the default `contrasts', which are not actually
 > >> contrasts.  With
 > >>
 > >> options(contrasts=c("contr.helmert", "contr.poly"))
 > >>
 > >> it gives the same answer as the other two.  Because you used
 > >> non-contrasts there was an efficiency loss (to the Intercept stratum).
 > >
 > > Brian,
 > >
 > > I'm not sure how useful that contrasts-that-are-not-contrasts line is.
 > 
 > I agree, it was not precise enough (too late at night for me).  Try 
 > `non-orthogonal contrasts'.  The issue was correct though, it is the 
 > choice of contrasts, and as I would automatically use orthogonal contrasts 
 > with aov() I had not encountered it and it took me a while to pick up on 
 > what Christoph had done differently from me (I had run the example and got 
 > the same result as the randomized-block analysis before my original post).
 > 
 > There's a comment in the code for aov:
 > 
 >  ##  helmert contrasts can be helpful: do we want to force them?
 >  ##  this version does for the Error model.
 > 
 > and perhaps we should make them the default for the per-stratum fits.
 > 
 > > It certainly depends on your definition of "contrasts". Contrast
 > > matrices having zero column sums was not part of the definition I was
 > > taught. I have contrasts as "representations of the group mean
 > > structure that are invariant to changes of the overall level", so
 > > treatment contrasts are perfectly good contrasts in my book.
 > 
 > I don't think Yates thought in those terms, and the whole idea of dividing 
 > into strata (and the generalized Yates algorithm) is based on contrasts 
 > being orthogonal to the overall mean (and to things in other strata).
 > 
 > It was that, not zero-sum, that I was taught, but in balanced cases such 
 > as here it is the same thing.
 > 
 > > The zero-sum condition strikes me as a bit arbitrary: after all there
 > > are perfectly nice orthogonal designs where some levels of a factor
 > > occur more frequently than others.
 > 
 > Balance and orthogonality are not quite the same, though.
 > 
 > > This in turn makes me a bit wary of what is going on inside 
 > > se.contrasts, but it's gotten too late for me to actually study the code 
 > > tonight.
 > >
 > > Could you elaborate on where precisely the condition on the contrast
 > > matrices comes into play?
 > 
 > In finding the projection lengths in eff.aovlist, here
 > 
 >  proj.len <-
 >  lapply(aovlist, function(x)
 > {
 > asgn <- x$assign[x$qr$pivot[1:x$rank]]
 > sp <- split(seq(along=asgn), attr(terms(x), "term.labels")[asgn])
 > sapply(sp, function(x, y) sum(y[x]), y=diag(x$qr$qr)^2)
 > })
 > 
 > using only the diagonal requires orthogona

Re: [R] bivariate empirical cdf

2005-02-18 Thread Christoph Buser
Hi

I changed you function a little bit, so there is no more conflict
in the if condition:

mecdf<-function(u,v,z) {
  u=sort(u)
  v=sort(v)
  n=length(u)
  m=length(z)
  nb <- numeric(m)

  for(j in seq(1,m)) {
nb.temp=0
for (i in seq(1,n)) {
  if (u[i]
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--

MEYFREDI Jean-Christophe writes:
 > Dear R users,
 >  
 > 
 >  I'm trying to write a small function in order to compute empirical 
 > cumulative density function.All seems to work but when I try to plot the 
 > function, I always get error messages.
 >   
 >  This is the function I use
 >   
 >  mecdf<-function(u,v,z) {
 >  u=sort(u)
 >  v=sort(v)
 >  n=length(u)
 >  nb=0
 >   
 >  for (i in seq(1,n)) {
 >  if (u[i]  nb<-nb+1
 >  }
 >  }
 >  nb=nb/n
 >  return(nb)
 >  }
 >   
 >  In fact if I try to obtain mecdf(u,v,0.1) or mecdf(u,v,0.2), all is 
 > good, but with mecdf(u,v,c(0.1,0.2)), for example, I get errors and wrong 
 > results for both mecdf(u,v,0.1) and mecdf(u,v,0.2). I think that it 
 > consitutes the key point of my plot difficulty. Can someone help me ?
 >   
 >  Best regards
 >   
 >  JCM
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with se.contrast()

2005-02-18 Thread Christoph Buser
Dear Jamie

I already thought that your data structure could be more
complicated than in the example. 
I would be careful anywhere. Since there is a difference in the
results of se.contrasts() in R-devel and the results from lme
(and the with lme consistent results of the aggregated data)
in this nice balanced design, I am suspicious, especially if you
real design is more complicated.
Even if you get no error message for your data, I'd calculate
the analysis using for example lme for a second time to control
the results. 
I wish you all the best for your analysis.

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--


Jamie Jarabek writes:
 > Christoph,
 > 
 > Thank you for your advice. My actual design is indeed more complicated than 
 > what 
 > I have indicated here. I was just using this as a toy example illustrate my 
 > particular problem. As suggested by Prof. Ripley I will download R-devel and 
 > see 
 > if the fixes included within alleviate my problems.
 > 
 > Jamie Jarabek
 >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with se.contrast()

2005-02-17 Thread Christoph Buser
Dear Jamie

As Prof. Ripley explained your analysis is equivalent to the fixed effect
models for the means, so you can calculate it by (if this is your design): 

> Lab <- factor(rep(c("1","2","3"),each=12))
> Material <- factor(rep(c("A","B","C","D"),each=3,times=3))
> Measurement <- c(12.20,12.28,12.16,15.51,15.02,15.29,18.14,18.08,18.21,
>  18.54,18.36,18.45,12.59,12.30,12.67,14.98,15.46,15.22,
>  18.54,18.31,18.60,19.21,18.77,18.69,12.72,12.78,12.66,
>  15.33,15.19,15.24,18.00,18.15,17.93,18.88,18.12,18.03)
> testdata <- data.frame(Lab,Material,Measurement)
> rm(list=c("Lab","Material","Measurement"))
 
You can aggregate your data

> dat.mean <- aggregate(testdata$Measurement,
>   by = list(Material=testdata$Material,Lab=testdata$Lab),
>   FUN = mean)
> names(dat.mean)[3] <- "Measurement"

> test.red.aov1 <- aov(Measurement ~ Lab + Material, data = dat.mean)
> se.contrast(test.red.aov1,
> list(Material=="A",Material=="B",Material=="C",Material=="D"),
> coef=c(0.5,0.5,-0.5,-0.5),dat.mean)
> [1] 0.1220339

By aggregating the data you bypass the problem in se.contrast and you do
not need R-devel.

-

The second way to get the same is to set your contrast for the factor
"Material" and calculate you model with this contrast and use summary.lm: 

> dat.mean$Material <- C(dat.mean$Material, c(-0.5,-0.5,0.5,0.5), 
>how.many = 3) 
> test.red.aov2 <- aov(Measurement ~ Lab + Material, data = dat.mean)
> summary.lm(test.red.aov2)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.020000.10568 151.583 5.56e-12 ***
Lab2 0.258330.14946   1.7280.135
Lab3 0.065830.14946   0.4400.675
Material14.522780.12203  37.062 2.58e-08 ***
Material21.210560.12203   9.920 6.06e-05 ***
Material31.553890.12203  12.733 1.44e-05 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Material1 is now the contrast you are interested in and you get beside the
Std. Error the Estimate, too. Material2 and Material3 are just orthogonal
contrasts to complete. 

-

The third way (which I prefer) is using lme from the package nlme

> library(nlme)

Here I use the origianl data set (not aggregated) and set the desired
contrast, too.

> testdata$Material <- C(testdata$Material, c(-0.5,-0.5,0.5,0.5), 
>how.many = 3) 
> test.lme <- lme(fixed = Measurement ~ Material, data = testdata, 
> random = ~1|Lab/Material)

With anova you get the F-Test for the fixed factor "Material"

> anova(test.lme)
>   numDF denDF  F-value p-value
  (Intercept) 124 43301.14  <.0001
  Material3 6   544.71  <.0001

and with the summary you have your contrast with standard error: 

> summary(test.lme)
Fixed effects: Measurement ~ Material 
Value  Std.Error DF   t-value p-value
(Intercept) 16.128056 0.07750547 24 208.08925   0e+00
Material14.522778 0.12203325  6  37.06185   0e+00
Material21.210556 0.12203325  6   9.91988   1e-04
Material31.553889 0.12203325  6  12.73332   0e+00

-

Last but not least I tried it with R-devel and the original data frame:
First I reset the contrast on the default value:

testdata$Material <- C(testdata$Material, "contr.treatment", how.many = 3) 

I used your syntax and se.contrast():

> test.aov <- with(testdata,aov(Measurement ~ Material + Error(Lab/Material)))
> se.contrast(test.aov,
> list(Material=="A",Material=="B",Material=="C",Material=="D"),
> coef=c(0.5,0.5,-0.5,-0.5),data=testdata)
> [1] 0.1432572

I got a different result and I have admit that I didn't understand why
there is a differnce between the lme model and this one. There are some
comments in the help pages but I'm not sure if this is the answer.

-

I hope some of the code above can help to analyse your data. Maybe Prof.
Ripley was right and you have another design. Then you just can ignore this 
and your life is much more easier :-)

Best regards,

Christoph Buser


--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal 

Re: [R] (no subject)

2005-02-16 Thread Christoph Buser
Hi Jim

I'm not sure if I understand your problem correctly. Is this a solution?

(li <- list(a=1,b=200.44))
as.numeric(paste(unlist(li), collapse = ""))

Best regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--

Jim Gustafsson writes:
 > 
 > R-people
 > 
 > I wonder if one could change a list of table with number of the form 
 > 1,200.44 , to 1200.44
 > 
 > Regards
 > JG
 > 
 > 
 > --
 > This e-mail and any attachment may be confidential and may also be 
 > privileged.
 > If you are not the intended recipient, please notify us immediately and then
 > delete this e-mail and any attachment without retaining copies or disclosing
 > the contents thereof to any other person.
 > Thank you.
 > --
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to generate labels or names?

2005-01-27 Thread Christoph Buser
Hi Eric

If you want produce a vector with names, you can use

v <- rnorm(20)
names(v) <- paste("Lab",1:20, sep="")

Regards,

Christoph

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--


Eric Rodriguez writes:
 > Hi,
 > 
 > I'm new to R and I would like to generate labels like data.frame does
 > : "V1 V2 V3...".
 > I'm trying to generate a N vector with label such as "Lab1 Lab2 ... LabN".
 > 
 > I guess this is pretty easy when you know R ;)
 > 
 > Thanks for help
 > 
 > Eric
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] sw

2005-01-27 Thread Christoph Buser
Dear Mahdi

Mahdi Osman writes:
 > Hi list,
 > 
 > I am just a new user of R.
 > 
 > How can I run stepwise regression in R?

If you look for a stepwise procedure for Linear Regression
Models or Generalized Linear Models, you can use step() 
(see ?step) 

Regards,

Christoph

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--



 > Is there a graphic user interphase for any of the spatial packages inculded
 > in R, such as gstat, geoR and someothers. I am mainly interested interactive
 > variogram modelling and mapping.
 > 
 > Thanks
 > Mahdi
 > 
 > -- 
 > ---
 > Mahdi Osman (PhD)
 > E-mail: [EMAIL PROTECTED]
 > ---
 > 
 > 10 GB Mailbox, 100 FreeSMS http://www.gmx.net/de/go/topmail
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] r square values for independent variables in multiple linear regr ession model -- newbie

2005-01-26 Thread Christoph Buser
Dear Avneet

If you fit a multiple linear regression model your independent
variables will not be orthogonal and therefore it is difficult
to divide the r square on the different variables with a
meaningful interpretation.
In a balanced analysis of variance design the situation is
somehow different. As result of the balanced design the
explanatory variables are orthogonal and you can decompose the
sum of squares on the different variables.
But if you have covariables you must be careful. I wouldn't try
to assign r square values for each Variable.

Regards,

Christoph Buser

--
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--


Singh, Avneet writes:
 > Hello
 > 
 > Could you please suggest a way to find out the r square values for each
 > independent variable while using "lm" for developing a multiple linear
 > regression model.
 > 
 > Thank you
 > avneet
 > 
 > "I have no data yet. It is a capital mistake to theorize before one has
 > data. Insensibly one begins to twist facts to suit theories instead of
 > theories to suit facts."
 > ~ Sir Arthur Conan Doyle (1859-1930), Sherlock Holmes
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Wilcoxon test for mixed design (between-within subjects)

2005-01-24 Thread Christoph Buser
Hallo Marco

Marco Chiarandini writes:
 > Hallo,
 > 
 > is there any extension of the pairwise Wilcoxon test to a dependent
 > samples layout with replicates (or, in other terms, a one-way layout
 > with blocking and replicates)?
 > 

There is always the possibility to summarize the replicates and
then calculate a common pairwise Wilcoxon test.


 > The Wilcoxon method with matched pairs works for the case of dependent
 > samples with one observation per block, while the Mann-Whitney test
 > works for independent samples, thus one single block and replicated
 > observations. Is there a method which mixes this two cases considering a
 > depedent sample design in which each block has more than one
 > observation? I know it exists a Friedman test for this case but in the
 > Friedman test ranks are constructed considering all subjects jointly,
 > while in Wilcoxon only the pair of subject currently considered are
 > ranked, thus resulting in a more powerful test.
 > 

The Friedman test calculates the ranks inside of the blocks, for
each block separately and is not considering all subjects
jointly.
In R implemented is the case with unreplicated blocked data, so
this doesn't help you. 

 > If no such method exists in R, I am anyway intersted in possible
 > references.

Look at:

 Myles Hollander & Douglas A. Wolfe (1999), _Nonparametric
 statistical methods_. New York: John Wiley & Sons. 

There you find the generalization of the replicated case, but
you have to implement it.

Regards,

Christoph Buser

------
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/
--



 > Thank you for the consideration,
 > 
 > Regards,
 > 
 > Marco
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cholesky Decomposition

2005-01-21 Thread Christoph Buser
Dear Kolluru

For a real symmetric positive-definite square matrix you can use
chol (see ?chol) in the base package.

help.search("cholesky") gives some more alternatives:

chol.new(assist)A Modified Cholesky Decomposition
chol.reduce(kernlab)Incomplete Cholesky decomposition
gchol(kinship)  Generalized Cholesky decompostion
solve.bdsmatrix(kinship)
Solve a matrix equation using the generalized
Cholesky decompostion
solve.gchol(kinship)Solve a matrix equation using the generalized
Cholesky decompostion
Cholesky-class(Matrix)
Cholesky decompositions
sscChol-class(Matrix)   Cholesky decompositions of sscMatrix objects
chol(base)  The Choleski Decomposition
chol2inv(base)  Inverse from Choleski Decomposition

Hope there is something for you.

Christoph

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/


kolluru ramesh writes:
 > Can we do Cholesky Decompositon in R for any matrix
 > 
 >  
 > -
 > 
 > 
 >  [[alternative HTML version deleted]]
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Dropping a digit with scan() on a connection

2005-01-19 Thread Christoph Buser
Dear Tim

You can use

cat("TITLE extra line", "235 335 535 735", "115 135 175", file="ex.data", 
sep="\n")
cn.x <- file("ex.data", open="r")

a <- scan(cn.x, skip=1, n=2, sep = " ")
> Read 2 items
a
> [1] 235 335
b <- scan(cn.x, n=2, sep = " ")
> Read 2 items
b
> [1] 535 735
c <- scan(cn.x, n=2, sep = " ")
> Read 2 items
c
> [1] 115 135
d <- scan(cn.x, n=1, sep = " ")
> Read 1 items
d
> [1] 175

Regards,

Christoph Buser

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/


Tim Howard writes:
 > R gurus,
 > 
 > My use of scan() seems to be dropping the first digit of sequential
 > scans on a connection. It looks like it happens only within a line:
 > 
 > > cat("TITLE extra line", "235 335 535 735", "115 135 175",
 > file="ex.data", sep="\n")
 > > cn.x <- file("ex.data", open="r")
 > > a <- scan(cn.x, skip=1, n=2)
 > Read 2 items
 > > a
 > [1] 235 335
 > > b <- scan(cn.x, n=2)
 > Read 2 items
 > > b
 > [1]  35 735
 > > c <- scan(cn.x, n=2)
 > Read 2 items
 > > c
 > [1] 115 135
 > > d <- scan(cn.x, n=1)
 > Read 1 items
 > > d
 > [1] 75
 > > 
 > 
 > Note in b, I should get 535, not 35 as the first value. In d, I should
 > get 175.  Does anyone know how to get these digits?
 > 
 > The reason I'm not scanning the entire file at once is that my real
 > dataset is much larger than a Gig and I'll need to pull only portions of
 > the file in at once. I got readLines to work, but then I have to figure
 > out how to convert each entire line into a data.frame. Scan seems a lot
 > cleaner, with the exception of the funny character dropping issue.
 > 
 > Thanks so much!
 > Tim Howard
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] merge data frames taking mean/mode of multiple macthes

2005-01-17 Thread Christoph Buser
Dear Avneet

the function aggregate (see also ?aggregate) could be useful for
your problem. See the short example I've written below.

dat1 <- data.frame(lot = factor(1:10),y1 = rnorm(10))
str(dat1)
dat2 <- data.frame(nr = 1:100, lot = factor(rep(1:10, each = 10)),y2 = 
rnorm(100))
str(dat2)

dat2.agr <- aggregate(dat2$y, by = list(lot = dat2$lot), FUN = mean)
names(dat2.agr)[2] <- "y2"

dat.mer <- merge(dat1, dat2.agr)
str(dat.mer)

Be careful about merging dataframes. There should always be a
control that the right cases are merged together.

Regards,

Christoph Buser

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/


avneet singh writes:
 > Hello :)
 > 
 > I have two data frames, one has properties taken on a
 > piece by piece basis and the other has performance on
 > a lot by lot basis. I wish to combine these two data
 > frames but the problem is that each lot has multiple
 > pieces and hence i need to take a mean of the
 > properties of multiple pieces and match it to the row
 > having data about the lot.
 > 
 > I was wondering if there is a simple commmand, an
 > extension of "merge", or an option of merge i do not
 > know which could easily do this work.
 > 
 > Thank you  :)
 > 
 > =
 > I believe in equality for everyone, except reporters and photographers.
 > ~Mahatma Gandhi
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Kolmogorov-Smirnof test for lognormal distribution with estimated parameters

2005-01-12 Thread Christoph Buser
Hi Kwabena

I did once a simulation, generating normal distributed values
(500 values) and calculating a KS test with estimated
parameters. For 1 times repeating this test I got about
1 significant tests (on a level alpha=0.05 I'm expecting about 500 
significant tests by chance)
So I think if you estiamte the parameters from the data, you fit
to good and the used distribution of the test statistic is not
adequate as it is indicated in the help page you cited. There
(in the help page) is some literature, but it is no easy stuff
to read.
Furthermore I know no implementation of an KS test which
accounts for this estimation of the parameter.

I recommend a graphical tool instead of a test:

x <- rlnorm(100)
qqnorm(log(x))

See also ?qqnorm and ?qqplot.

If you insist on testing a theoretical distribution be aware
that a non significant test does not mean that your data has the
tested distribution (especially if you have few data, there is
no power in the test to detect deviations from the theoretical
distribution and the conclusion that the data fits well is
trappy)

If there are enough data I'd prefer a chi square test to the KS
test (but even there I use graphical tools instead). 

See ?chisq

For this test you have to specify classes and this is 
subjective (you can't avoid this).

You can reduce the DF of the expected chi square distribution
(under H_0) by the number of estimated parameters from the data
and will get better results. 

DF = number of classes - 1 - estimated parameters

I think this test is more powerful than the KS test,
particularly if you must estimate the parameters from data.

Regards,

Christoph

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/



Kwabena Adusei-Poku writes:
 > Hello all,
 > 
 > Would somebody be kind enough to show me how to do a KS test in R for a
 > lognormal distribution with ESTIMATED parameters. The R function
 > ks.test()says "the parameters specified must be prespecified and not
 > estimated from the data" Is there a way to correct this when one uses
 > estimated data?
 > 
 > Regards,
 > 
 > Kwabena.
 > 
 > 
 > Kwabena Adusei-Poku
 > University of Goettingen
 > Institute of Statistics and Econometrics
 > Platz der Goettingen Sieben 5
 > 37073 Goettingen
 > Germany
 > Tel: +49-(0)551-394794
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting empirical percentiles for data

2005-01-07 Thread Christoph Buser
Dear Mike

Mike Cheung writes:
 > Dear List,
 > 
 > I have some discrete data and want to calculate the percentiles and the 
 > percentile ranks for each of the unique scores. I can calculate the 
 > percentiles with quantile().
 > 
 > I know that "ecdf" can be used to calculate the empirical cumulative 
 > distribution. However, I don't know how to exact the cumulative 
 > probabilities for each unique element. The requirement is similar to the 
 > "FREQUENCIES" in SPSS. Could someone help me in exacting the cumulative 
 > probabilities from the ecdf object? Thanks in advance!

You can use the following function:

f.freq <- function(x)
{
  tab <- data.frame(table(x))
  tab$Percent <- tab$Freq*100/length(x)
  tab$Cum.Percent[1] <- tab$Percent[1]
  for(i in 2:length(tab[,1]))
tab$Cum.Percent[i] <- tab$Cum.Percent[i-1] + tab$Percent[i]
  tab
}

x <- round( rnorm(50, mean=50, sd=10) )
f.freq(x)

This should give you a table analog to frequencies in SPSS.


 > 
 > # Generating artificial data
 > x <- round( rnorm(50, mean=50, sd=10) )
 > probs <- seq(0.1, 0.9, by=0.1)
 > # Calculating percentiles
 > x.percentiles <- quantile(x, probs)
 > # Calculating percentile ranks
 > x.ranks <- ecdf(x)
 > 
 > Best,
 > Mike
 > 
 > __
 > R-help@stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Best,
Christoph

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] integer factorization

2005-01-05 Thread Christoph Buser
Hi Robin

There is a function factorize() in the package conf.design

library(conf.design)
factorize(60)
[1] 2 2 3 5

Hope this helps
Christoph

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] different DF in package nlme and lme4

2005-01-03 Thread Christoph Buser
Hi all

I tried to reproduce an example with lme and used the Orthodont
dataset.

library(nlme)
fm2a.1 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1 | Subject)
anova(fm2a.1)
> numDF denDF  F-value p-value
> (Intercept) 180 4123.156  <.0001
> age 180  114.838  <.0001
> Sex 1259.292  0.0054

or alternatively (to get the same result)

fm2a.2 <- lme(distance ~ age + Sex, data = Orthodont, random = list(Subject = ~ 
1))
anova(fm2a.2)
> numDF denDF  F-value p-value
> (Intercept) 180 4123.156  <.0001
> age 180  114.838  <.0001
> Sex 1259.292  0.0054

---
Then I restarted!!! R to use the lme4 package instead of nlme.
---

library(lme4)
fm2b.1 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1 | Subject)
anova(fm2b.1)
> Analysis of Variance Table
> Df  Sum Sq Mean Sq   Denom  F valuePr(>F)
> age  1 235.356 235.356 105.000 114.8383 < 2.2e-16 ***
> Sex  1  19.044  19.044 105.000   9.2921  0.002912 ** 

or alternatively (to get the same result)

fm2b.2 <- lme(distance ~ age + Sex, data = Orthodont, random = list(Subject = ~ 
1anova(fm2b.2)
> Analysis of Variance Table
> Df  Sum Sq Mean Sq   Denom  F valuePr(>F)
> age  1 235.356 235.356 105.000 114.8383 < 2.2e-16 ***
> Sex  1  19.044  19.044 105.000   9.2921  0.002912 ** 


I got different DF for the denominator. Do I have to use lme in
another way in the package lme4?

I use R 2.0.1 under linux and
Package:   nlme
Version:   3.1-53
Date:  2004-11-03
Package:   lme4
Version:   0.6-11
Date:      2004-12-16

Thanks for help.

Regards, 

Christoph Buser

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Random intercept model with time-dependent covariates, results different from SAS

2004-12-21 Thread Christoph Buser
Answering on a mail from
>From Keith Wong 
Date Sun 04 Jul 2004 - 17:21:36 EST
Subject  [R] Random intercept model with time-dependent
 covariates, results different from SAS  

Hi all

I've got a question about the degrees of freedom in a mixed model,
calculated with lme from the lme4 package. 
Since I've no access to the original data of Keith Wong, I generated a
dataframe with the same structure (but balanced) by

> set.seed(1)
> dat <- data.frame(y = rnorm(50), group = rep(c("a","b"), each = 25),
  time = factor(rep(1:5,10)), w = rnorm(50), z = rnorm(50),
  id = factor(rep(1:10, each = 5)))
> str(dat)

The subject id is nested in group. We have 5 repeated measures for each
subject. 

Surly there is no meaningfull interpretation for the results but for
demonstration it will do well.

I do the model without the covariates:

> library(nlme)
> options(contrasts = c("contr.SAS", "contr.poly")) 
> g2 <- lme(y ~ time+group+time:group, data = dat, random = ~ 1 | id)
> anova(g2)

Analysis of Variance Table
numDF denDF   F-value p-value
(Intercept) 132 0.6340507  0.4317
time432 0.1103619  0.9780
group   1 8 0.2924309  0.6034
time:group  432 0.4614766  0.7634

I quit R and restart it and now use library(lme4)

> library(lme4)
> options(contrasts = c("contr.SAS", "contr.poly")) 
> g2 <- lme(y ~ time+group+time:group, data = dat, random = ~ 1 | id)
> anova(g2)

Analysis of Variance Table
   Df Sum Sq Mean Sq  Denom F value Pr(>F)
time4  0.351   0.088 40.000  0.1104 0.9782
group   1  0.233   0.233 40.000  0.2925 0.5916
time:group  4  1.468   0.367 40.000  0.4615 0.7635

I get other degrees of freedom for the denominator. How can I tell lme that
id is nested in the fixed factor group to get 8 as denominator DF for
group? In my example the difference is small (I've generated the data
randomly) but in other more meaningfull analysis the different DF can
change the results.

I Use R 2.0.1 under Linux.
Package:   nlme
Version:   3.1-53
Package:   lme4
Version:   0.6-11

In the original message the example was also calculated with SAS and there
was the problem of the degrees of freedom for group, too, but the syntax
wasn't correct. One must specify

> PROC MIXED;
> CLASS id time group;
> MODEL y = time group time*group /solution;
> RANDOM int /subject = id(group);
> RUN;

It is important to tell SAS that the subject id is nested in group and you
can do this by using 
subject = id(group)
instead of
subject = id
Then you will get the correct degrees of freedom for the test of group:

Type 3 Tests of Fixed Effects

Num   Den   
Effect   DFDF   F Value Pr > F

time  432  0.11 0.9780
group 1 8  0.29 0.6033
time*group432  0.46     0.7634


Thanks in advance for an answer. I'm interested if I must use the lme
syntax in another way for the lme4 package.

Regards,

Christoph Buser

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Friedman test for replicated blocked data

2004-12-20 Thread Christoph Buser
Hi Marco

> I would need to extend the Friedman test to a replicated
> design. Currently the function: friedman.test(y, ...) only works for
> unreplicated designs.

You'll find something about this topic in:
 
Myles Hollander & Douglas A. Wolfe (1999), _Nonparametric
statistical methods_. 2nd edition, John Wiley & Sons. Chapter 7 

In Chapter 7.9 there is an extension to the balanced replicated
design. Notice also the comments 70 and 74 about the
motivation of the test and the approximation of the test
statistic and comment 77 for an extension to the general case
(unbalanced). 
They give further references for more details, too. 

Hope this will help you

Christoph

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-5414  fax: 632-1228
http://stat.ethz.ch/~buser/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] drop1 with contr.treatment

2004-08-20 Thread Christoph Buser
Dear R Core Team

I've a proposal to improve drop1(). The function should change the
contrast from the default ("treatment") to "sum". If you fit a
model with an interaction (which ist not signifikant) and you
display the main effect with

drop1(   , scope = .~., test = "F")

If you remove the interaction, then everything's okay. There is
no way to fit a model, let an non signifikant interaction in
the model and get interpretable main effects with the
"treatment" contrast.

One solution would be to change automatically the contrast to
the "sum" contrast.
Another possibility is to produce a warning when you use drop1
with the scope argument to get main effects in the presence of 
an interaction (even not signifikant) with "treatment"
contrast. 

An example:

library(MASS)
##- Data "genotype"
names(genotype)
##- > [1] "Litter" "Mother" "Wt"

##- to be sure the contrasts are "treatment"
options(contrasts = c("contr.treatment", "contr.poly" ))

##- model with interaction
gen.int <- aov(Wt ~ Litter*Mother, data = genotype)
drop1(gen.int, scope = .~., test = "F")
##- Model:
##- Wt ~ Litter * Mother
##-   Df Sum of SqRSSAIC F value   Pr(F)  
##-  2440.8  257.0  
##- Litter 3 591.7 3032.5  264.3  3.6362 0.01968 *
##- Mother 3 582.3 3023.1  264.1  3.5782 0.02099 *
##- Litter:Mother  9 824.1 3264.9  256.8  1.6881 0.12005  
##- ---
##- Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

##- the interaction is not signifikant, Litter is signifikant,
##- but is that really true?

##- main effect model
gen.main <- aov(Wt ~ Litter + Mother, data = genotype)
drop1(gen.main, scope = .~., test = "F")
##- Model:
##- Wt ~ Litter + Mother
##-Df Sum of SqRSSAIC F value   Pr(F)   
##-   3264.9  256.8   
##- Litter  3  63.6 3328.5  252.0  0.3508 0.78870   
##- Mother  3 775.1 4040.0  263.8  4.2732 0.00886 **
##- ---
##- Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

## Dramatic change of the sum of squares for the factor Litter

## With contrast = "sum" the whole thing looks better
options(contrasts = c("contr.sum", "contr.poly" ))

##- model with interaction
gen.int1 <- aov(Wt ~ Litter*Mother, data = genotype)
drop1(gen.int1, scope = .~., test = "F")
##- Model:
##- Wt ~ Litter * Mother
##-   Df Sum of SqRSSAIC F value   Pr(F)  
##-  2440.8  257.0  
##- Litter 3  27.7 2468.5  251.7  0.1700 0.91612  
##- Mother 3 671.7 3112.6  265.9  4.1282 0.01142 *
##- Litter:Mother  9 824.1 3264.9  256.8  1.6881 0.12005  
##- ---
##- Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

##- the interaction is not signifikant, Litter is NOT! signifikant.

gen.main1 <- aov(Wt ~ Litter + Mother, data = genotype)
drop1(gen.main1, scope = .~., test = "F")
##- Wt ~ Litter + Mother
##-Df Sum of SqRSSAIC F value   Pr(F)   
##-   3264.9  256.8   
##- Litter  3  63.6 3328.5  252.0  0.3508 0.78870   
##- Mother  3 775.1 4040.0  263.8  4.2732 0.00886 **
##- ---
##- Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

##- Litter stays not signifikant.


Best regards

Christoph Buser

-- 
Christoph Buser <[EMAIL PROTECTED]>
Seminar fuer Statistik, LEO D6
ETH (Federal Inst. Technology)  8092 Zurich  SWITZERLAND
phone: x-41-1-632-3505  fax: 632-1228
http://stat.ethz.ch/~buser/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html