Re: [R] Factor levels

2007-08-28 Thread Peter Alspach
Sebastain

Does the following work for you?

seb - read.table(file='clipboard', header=T)
seb$c
 [1] w k r s k p l u z s j j x r d j x w q f
Levels: d f j k l p q r s u w x z
seb$c - factor(seb$c, levels=unique(seb$c))
seb$c
 [1] w k r s k p l u z s j j x r d j x w q f
Levels: w k r s p l u z j x d q f

Peter Alspach
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Sébastien
 Sent: Wednesday, 29 August 2007 9:00 a.m.
 To: Gabor Grothendieck
 Cc: R-help
 Subject: Re: [R] Factor levels
 
 Ok, I cannot send to you one of my dataset since they are 
 confidential. 
 But I can produce a dummy mini dataset to illustrate my question. 
 Let's say I have a csv file with 3 columns and 20 rows which 
 content is reproduced by the following line.
 
   mydata-data.frame(a=1:20,
 b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20, 
 replace = T))   mydata
 a   b c
 1   1 176 w
 2   2 141 k
 3   3 172 r
 4   4 182 s
 5   5 123 k
 6   6 153 p
 7   7 176 l
 8   8 170 u
 9   9 140 z
 10 10 194 s
 11 11 164 j
 12 12 100 j
 13 13 127 x
 14 14 137 r
 15 15 198 d
 16 16 173 j
 17 17 113 x
 18 18 144 w
 19 19 198 q
 20 20 122 f
 
 If I had to read the csv file, I would use something like: 
 mydata-data.frame(read.table(file=c:/test.csv,header=T))
 
 Now, if you look at mydata$c, the levels are alphabetically ordered.
   mydata$c
  [1] w k r s k p l u z s j j x r d j x w q f
 Levels: d f j k l p q r s u w x z
 
 What I am trying to do is to reorder the levels as to have 
 them in the order they appear in the table, ie
 Levels: w k r s p l u z j x d q f
 
 Again, keep in mind that my script should be used on datasets 
 which content are unknown to me. In my example, I have used 
 letters for mydata$c, but my code may have to handle factors 
 of numeric or character values (I need to transform specific 
 columns of my dataset into factors for plotting purposes). My 
 goal is to let the code scan the content of each factor of my 
 data.frame during or after the read.table step and reorder 
 their levels automatically without having to ask the user to 
 hard-code the level order.
 
 In a way, my problem is more related to the way the factor 
 levels are ordered than to the read.table function, although 
 I guess there is a link...
 
 Gabor Grothendieck a écrit :
  Its not clear from your description what you want

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple coding question

2007-07-30 Thread Peter Alspach

Kirsten

One way to do this:

kirsten - c(123, 1234, 12345)
100*as.numeric(paste(substring(kirsten, 1, 3), substring(kirsten, 4, 5),
sep='.'))

HTH 

Peter Alspach
  

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Kirsten Beyer
 Sent: Tuesday, 31 July 2007 9:31 a.m.
 To: r-help@stat.math.ethz.ch
 Subject: [R] simple coding question
 
 I have a list of ICD9 (disease) codes with various formats - 3 digit,
 4 digit, 5 digit.  The first three digits of these codes are 
 what I am most interested in.  I would like to either add 
 zeros to the 3 and 4 digit codes to make them 5 digit codes 
 or add decimal points to put them all in the format ###.##.  
 I did not see a function that allows me to do this in the 
 formatting command.  This seems simple - can someone help?
 
 Thanks,
 K.Beyer
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Aggregate to find majority level of a factor

2007-05-31 Thread Peter Alspach

Jon

One way:  assuming your data.frame is 'jon'

aggregate(jon[,2], list(jon[,1]), function(x)
levels(x)[which.max(table(x))])
  Group.1 x
1   Plot1   big
2   Plot2 small
3   Plot3 small 

HTH 

Peter Alspach

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Thompson, Jonathan
 Sent: Friday, 1 June 2007 7:26 a.m.
 To: r-help@stat.math.ethz.ch
 Subject: [R] Aggregate to find majority level of a factor
 
 I want to use the aggregate function to summarize data by a 
 factor (my field plots), but I want the summary to be the 
 majority level of another factor.
 
  
 For example, given the dataframe:
 
 Plot1 big
 Plot1 big
 Plot1 small
 Plot2 big
 Plot2 small
 Plot2 small
 Plot3 small
 Plot3 small
 Plot3 small
 
 
 My desired result would be:
 Plot1 big
 Plot2 small
 Plot3 small
 
 
 I can't seem to find a scalar function that will give me the 
 majority level. 
 
 Thanks in advance,
 
 Jonathan Thompson
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A particular shuffling on a vector

2007-04-19 Thread Peter Alspach

Emmanuel

One option which appears to work:

emmanuel - c(1,1,1,2,2,3,3,3)
#emmanuel - c(1,1,2,3,4,5,6,6,6,6,6,6,6,6,7,8)
runs - rle(emmanuel)[[1]]
shuffle - sample(1:length(runs))
newEmm - rep(emmanuel[cumsum(runs)[shuffle]], runs[shuffle])
startPos - sample(1:length(emmanuel), 1)
if (startPos==1) newEmm else
newEmm[c(startPos:length(newEmm),1:(startPos-1))]

Peter Alspach

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Emmanuel Levy
 Sent: Friday, 20 April 2007 1:03 p.m.
 To: r-help@stat.math.ethz.ch
 Subject: [R] A particular shuffling on a vector
 
 Hello,
 
 I was wondering if anyone can think of a straightforward way (without
 loops) to do the following shuffling:
 
 Let's imagine a vector:
 c(1,1,1,2,2,3,3,3)
 
 I would like to derive shuffled vectors __where the same 
 digits are never separated__, although they can be at both 
 ends (periodicity).
 
 So the following shuffled vectors are possible:
 
 c(2,2,1,1,1,3,3,3)
 c(2,1,1,1,3,3,3,2)
 c(3,3,3,1,1,1,2,2)
 c(3,1,1,1,2,2,3,3)
 etc ...
 
 I should mention that there can be any number of different 
 numbers, and any number of repetition of each number.
 
 So the vectors I have to deal with could look like
 c(1,1,2,3,4,5,6,6,6,6,6,6,6,6,7,8) for example
 
 Since I have to derive many shuffled versions for each 
 vector, I am looking for an efficient way of doing it. Can 
 you think of a way without nested loops?
 
 Many thanks for your help,
 
 Best,
 
 Emmanuel
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Moving plot labels

2007-04-04 Thread Peter Alspach

Dean

Try:

plot(c(-0.25,18),c(0, max(patient10)),type=n, ylab=SD of POST
estimator, xlab=)
title(xlab='Scans\n(a)', line=3) 

Peter Alspach

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
 Sent: Thursday, 5 April 2007 12:21 p.m.
 To: r-help@stat.math.ethz.ch
 Subject: [R] Moving plot labels
 
 Hi.
 
 I'm sure this is a simple problem, yet I can't seem to get 
 simple help for it.
 
 I am simply trying to move my xlab in plot().
 
 I am currently using the following commands:
 
 plot(c(-0.25,18),c(0, max(patient10)),type=n, ylab=SD of 
 POST estimator, xlab=Scans \n (a))
 
 But when the plot prints, the xlab is printed over top the 
 xaxis. I tried adjusting mar(), but this just works with the 
 margins outside the plotting area. What is a simple fix? Is 
 there a single command I can add that adjusts the size of the 
 plotting area within the area controlled by mar()?
 
 Thanks very much.
 
 Dean
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hwo can i get a vector that...

2007-03-07 Thread Peter Alspach

Check out which.max 

Peter Alspach


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of bunny 
 , lautloscrew.com
 Sent: Thursday, 8 March 2007 11:20 a.m.
 To: R-help@stat.math.ethz.ch
 Subject: Re: [R] hwo can i get a vector that...
 
 
 
 
  dear all,
 
  how can i get a vector that shows the number of the column 
 of matrix 
  that contains the maximum of the row ??
  can´t believe in need a loop for this...
 
  i have a  100 x 3 matrix and want to get a 100 x 1 vector 
 with values 
  1,2,3 .
 
  there must be a simple solution. i just cannot find it. i think am 
  searching on the wrong end.
 
  thx for help in advance.
 
  m.
 
 
 EDIT: ok,  i know the following by now :)
 
 apply(for18[,-1], 1, max, na.rm=T)
 
 but this doesn´t get me the number of the column - which is 
 what i need... 
   [[alternative HTML version deleted]]
 
 
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If you had just one book on R to buy...

2007-02-25 Thread Peter Alspach

Julien

This is quite a common question.  Within R, try

RSiteSearch('one good book for R') 

and follow the threads ..

Peter Alspach


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Julien Barnier
 Sent: Monday, 26 February 2007 8:51 a.m.
 To: r-help@stat.math.ethz.ch
 Subject: [R] If you had just one book on R to buy...
 
 Hi,
 
 I am starting a new job as a study analyst for a social 
 science research unit. I would really like to use R as my 
 main tool for data manipulation and analysis. So I'd like to 
 ask you, if you had just one book on R to buy (or to keep), 
 which one would it be ? I already bought the Handbook of 
 Statistical Analysis Using R, but I'd like to have something 
 more complete, both on the statistical point of view and on R usage.
 
 I thought that Modern applied statistics with S-Plus would 
 be a good choice, but maybe some of you could have 
 interesting suggestions ?
 
 Thanks in advance,
 
 --
 Julien
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] identify selected substances across individuals

2007-01-21 Thread Peter Alspach

Jenny

   Below is an example data, which contains three id-numbers.
 For each id there are three substances in each of the three
 blocks. Some substances are repeated twice.The subsatances
 are the same for all ids. The id number 3 is actually a
 control so all responses (y) that are equal or greater than 4
 are supposed to be removed from this id number. This I can do
 easily in R but what I need help with is I want to have those
 substances that are removed from id number 3 also removed
 from other ids as well. I could do an algorithm like : for id
 in 1:2, if substance = c(abc,dgf) then delete but if the
 substances to be removed have long strings and are more than
 2 (for example 20 substances) then it would take long time to
 list the substances manually. Can you guys please show me a
 clever way to do what I described above ?

Use subsetting to identify that substances, and then !(...%in%...) to
remove records with these substances:

yourData[!(yourData$substance %in% yourData[yourData$id==3 
yourData$y=4, 'substance']),]

The above is untested and will need modification is yourData$id or
yourData$y contains missing values.
   
Peter Alspach

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] making a grid of points

2006-12-07 Thread Peter Alspach

Ross

I think you want

?expand.grid

BTW, help.search('grid') finds this.

Cheers ..

Peter Alspach


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Ross Boylan
 Sent: Friday, 8 December 2006 8:03 a.m.
 To: r-help
 Subject: [R] making a grid of points

 I'd like to evaluate a function at each point on a 2 or 3-D
 grid.  Is there some function that already does this, or
 generates the grid of points?

 My search has led me to the grid and lattice packages, and I
 found a reference to the sp package (e.g., SpatialGrid) for
 this.  There are things in there that might be relevant, but
 at first blush many of them are embedded in other concepts
 (grobs, shingles, rugs) and don't obviously solve the problem.

 I know this is not a hard thing to program, but I suspect
 someone has already done it.  Any pointers?

 Thanks.
 --
 Ross Boylan  wk:  (415) 514-8146
 185 Berry St #5700   [EMAIL PROTECTED]
 Dept of Epidemiology and Biostatistics   fax: (415) 514-8150
 University of California, San Francisco
 San Francisco, CA 94107-1739 hm:  (415) 550-1062

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getting a title in a plot during an lapply

2006-11-16 Thread Peter Alspach

Mark

 In my code below tempa and tempb are numeric vectors that I
 combined into a dataframe along with the deciles of tempa.
 I have an lapply statement that goes through the dataframe
 and does ten plots according to the appropriate decile.
 The code is below and it works fine. There are no bugs so I
 figure there was no need to include structure statements on the data.
 Also, I don't want to use coplot because the axes get sort of
 messed up because of outliers.

 tempa.deciles-cut(tempa,breaks=quantile(tempa,probs=seq(0,1,0
 .1)),inclu
 de.lowest=TRUE)
 df-data.frame(tempa,tempb,tempa.deciles)
 levels(df$tempa.deciles)-paste(Decile ,1:10,sep=)
 lapply(split(df, df$tempa.deciles), function(x) {
 print(cor(x$tempa,x$tempb)); plot(x$tempa,x$tempb);

 I would like to include a title on each plot so that I can
 recognize which plot goes with which decile but I don't know
 how to do that basically because I can't figure out what
 lapply is looping over by number. I definitely looked around
 in various R books but I was not successful. Thanks for your help.

Try

plot(x$tempa, x$tempb, main=x$tempa.deciles[1])

Peter Alspach

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a table

2006-11-14 Thread Peter Alspach

Michael

One solution:

df-data.frame(loc=c(A,B,A,A,A),
   year=c(1970,1970,1970,1976,1980))
df[,3] - cut(df$year, c(1969.5,1974.5,1979.5,1984.5),
c('1970-74','1975-79','1980-85'))
with(df, addmargins(table(loc, V3)))


Peter Alspach

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Michael Graber
 Sent: Wednesday, 15 November 2006 9:21 a.m.
 To: R-Mailingliste
 Subject: [R] Creating a table

 Dear R List,

 I am a new to R,  so my question may be easy to answer for you:

 I have a dataframe, for example:

 df-data.frame(loc=c(A,B,A,A,A),
 year=as.numeric(c(1970,1970,1970,1976,1980)))

 and I want to create the following table without using loops:

   1970-74 ; 1975-79 ; 1980-85; rowsum
 A 2   1   1  4
 B 1   00  1
 colsum   31   15

 so that the frequencies of df$loc are shown in the table for
 different time intervals.

 Thanks in advance for any hint,

 Michael Graber

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summary linear regression

2006-11-09 Thread Peter Alspach

Gorka

See the message from Brian Ripley, which is the first item from

RSiteSearch('R-squared')

Peter Alspach


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Gorka Merino
 Sent: Friday, 10 November 2006 4:54 a.m.
 To: r-help@stat.math.ethz.ch
 Subject: [R] summary linear regression


 Good morning,
 
 I am a PhD student in Barcelona working with R. My question
 is about
 the summary of linear regressions.
 The output of that summary gives some statistical parameters of the
 regression. One of them is the R-squared. In the help menu
 i have read
 that the manner to calculate the R-squared is
 
 R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), and  when the linear
 regression is forced to start at the point (0,0)  y* is 0.
 
 My question is if the R-squared obtained when the linear regression
 crosses the origin has the same meaning as when it does with an
 intercept. I'm faced to a theoretical model which brings
 some values
 for a variable that compared to the observed ones gives
 some R*squared
 of
 0.32 when the regression has an intercept and 0.7 when it
 is forced to
 cross the origin.
 To sum up, it may be stated that the theoretical methodolgy
 followed
 explains the 70% of  of the observed variance?
 
 Thank you very much.
 
 Gorka Merino.

 Gorka Merino
 Institut de Ciències del Mar, CMIMA-CSIC Psg. Marítim de la
 Barceloneta 37-49 08003-BARCELONA (Spain)

 Tel.: (34) 932 30 95 48
 e-mail: [EMAIL PROTECTED]

 CMIMA:
 Tel.: (34) 932 30 95 00
 Fax:  (34) 932 30 95 55

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot question

2006-10-17 Thread Peter Alspach

Mark

 i'm doing a bar plot and there are 16 column variables. is
 there a way to make the variable names go down instead of
 across when you do the barplot ?
 because the names are so long, the barplot just shows 3 names
 and leaves the rest out. if i could rotate the names 90
 degrees, it would probably fit a lot more.

Is this the sort of thing you mean:

temp - barplot(rnorm(16, 3))
text(temp, rep(-0.2, 16), paste('trt', 1:16), srt=90, adj=1)

Peter Alspach

 or maybe i can use space to make the horizontal width longer
 ? I looed up ?barlot but i'm not sure. when 1st and 2nd are
 on the bottom, things look fine but i'm not as interesed in
 those 2 barplots. 
 
 i didn't use any special options. i just did
 
 barplot(probsignmatrix)
 barplot(t(probsignmatrix))
 
 barplot(probsignmatrix,beside=T)
 barplot(t(probsignmatrix),beside=T)
 
 
 
 i put probsignmatrix below in case someone wants to see what
 i mean because it may not be clear. i don't expect anyone to
 type it in but rounding would still show what i mean.
 thanks a lot.
 
 
  pcount pmpppcount pmmppcount pmmmpcount pcount
 mcount pppmmcount ppmmmcount ppmppcount ppmmpcount
 pppmpcount ppmpmcount pmpmpcount pmpmmcount pmmpmcount
 pmppmcount 1st 0.03477157 0.02842640 0.03157360 0.03365482
 0.04010152 0.03553299
 0.03989848 0.04182741 0.02817259 0.03203046 0.02781726 0.02218274
 0.01771574 0.02289340 0.02583756 0.02390863 2nd 0.04648895
 0.02901495 0.03092490 0.03064044 0.04108420 0.03998700
 0.03958062 0.04059655 0.03039662 0.03027471 0.02901495 0.02170026
 0.01601105 0.02287874 0.02165962 0.02267555
 

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] confusing about contrasts concept

2006-08-16 Thread Peter Alspach

Tian

Bill Venables wrote an excellent explanation to the S list back in 1997.
I saved it as a pdf file and attach it herewith ...

Peter Alspach
  

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of T Mu
 Sent: Thursday, 17 August 2006 3:23 a.m.
 To: R-Help
 Subject: [R] confusing about contrasts concept

 Hi all,

 Where can I find a thorough explanation of Contrasts and
 Contrasts Matrices?
 I read some help but still confused.

 Thank you,
 Tian

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__

The contents of this e-mail are privileged and/or confidential to the
named recipient and are not to be used by any other person and/or
organisation. If you have received this e-mail in error, please notify
the sender and delete all material pertaining to this e-mail.

R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] confusing about contrasts concept [long]

2006-08-16 Thread Peter Alspach

Tian

It appears the attachment might not have worked so I'll embed Bill's
message at the end.

Peter Alspach


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Peter Alspach
 Sent: Thursday, 17 August 2006 8:02 a.m.
 To: T Mu; R-Help
 Subject: Re: [R] confusing about contrasts concept


 Tian

 Bill Venables wrote an excellent explanation to the S list
 back in 1997.
 I saved it as a pdf file and attach it herewith ...

 Peter Alspach
  

  -Original Message-
  From: [EMAIL PROTECTED]

  [mailto:[EMAIL PROTECTED] On Behalf Of T Mu
  Sent: Thursday, 17 August 2006 3:23 a.m.
  To: R-Help
  Subject: [R] confusing about contrasts concept
 

  Hi all,
 

  Where can I find a thorough explanation of Contrasts and

  Contrasts Matrices?
  I read some help but still confused.
 

  Thank you,
  Tian


Date sent:  Sat, 29 Mar 1997 17:23:31 +1030
From:   Bill Venables [EMAIL PROTECTED]
To: Wei Qian [EMAIL PROTECTED]
Copies to:  [EMAIL PROTECTED]
Subject:Re: test contrasts [long]

Wei Qian writes:

I am new to Splus, so please don't be offended if my question is too
naive.

We're all friends here, Wei. It's not that kind of group...mostly.

Does anyone know how to test contrasts in Splus? To be specific, suppose
I have a one-way layout experiment with 3 treatments, and I want to test
the contrasts of treatment 1 vs. treatment 2, treatment 1 vs. treatment
3, etc. In SAS, I could use the following:

 proc gim;
   class trt;
  model y = trt;
  contrasts! vs. 2 1-10;
  contrasts 2 vs. 3 10-1;
run;

 One way I can think of is that to construct my design matrix and obtain
the contrast sum of squares through a series of matrix operations, but
is there any easy way to do it or any built-in function in Splus can do
it?

The answer is 'yes' but hardly anyone seems to know how to do it. The
explanation in the 'white book' for example, seems a little incomplete
to me and not quite adequate to settle the case you raise. (The
explanation in the yellow book is also inadequate, but I hope come July
we will have fixed that.) Since this is one of the most frequent
questions people ask me in direct email, too, let me try (again) to sort
it out in some detail.

A formula such as y ~ f, where f is a factor in principle generates a
single classification model in the form

*y_{ij} == mu + phi_i + e_{ij}

Write the design matrix in the form X = [1 Xf], where, assuming f has p
levels, Xf is the n x p dummy variable (ie binary) matrix corresponding
to the phi_i's. So in matrix terms the model is written as

*y = 1 mu + Xf phi + e

(a) If you remove the intercept term, using y ~ f -1, then Xf is the
design matrix you get and the coefficients correspond to the class
means;
(b) If the intercept term is left in, then the design matrix X does
not have full column rank, so an adjustment has to be made to Xf to make
it so.

The redundancy comes about because the columns of Xf add to the the
1-vector, that is
Xf l_p = l_n. So let C be any p x (p -1) matrix such that [1 C] is
nonsingular. It can easily be seen that Xc = [1 (Xf C)] will have full
column rank and that fitting the model using this model matrix is
equivalent to the original redundantly specified model. The matrix C is
called the *coding matrix* for the factor f.

The linear model is actually fitted in the form

*y = 1 mu + (Xf C) alpha + e

where alpha has (p-1) components, of course. In order to make sense of
the alpha's we need to relate them back to the phi's.

For any such C there will be a vector, v, such the v'C = 0' (using ' for
transpose). (In fact v spans the orthogonal complement to the column
space of C). Clearly phi and alpha are related by

*C alpha = phi

but since v'C = 0', it follows that an identification constraint
applies, namely v'phi = 0. By multiplying both sides by (C'C)^{-1} C',
it also follows that

*alpha =(C'C)^{-1}C'phi

which provides an interpretation for the alpha's in terms of the
(constrained) phi's. For example take the Helmert contrasts.

 contr.helmert(4)
[,1][,2][,3]
1   -1  -1  -1
2   1   -1  -1
3   0   2   -1
4   0   0   3

The constraint vector is clearly v= (1,1,1,1), since the columns add to
zero. In this case the columns are also mutually orthogonal, so the
matrix (C'C^{-l} C' (the generalized inverse of C) has a similar form
apart from a few row divisors:

fractions(ginverse(contr.helmert(4)))
[,1][,2][,3][,4]
[1,]-1/21/2 0   0
[2,]-1/6-1/61/3 0
[3,]-1/12   -1/12   -1/12   1/4

(ginverse() will be available in S+4.0 and fractions(), now available in
the MASS2 library, simply displays numbers in fractional form so that
patterns are more obvious).

Thus the phi's are identified by requiring that they add to zero, and

*alpha_l = (phi_2 - phi_l )/2,
*alpha_2 = [phi_3 - (phi_l + phi_2)/2] / 3

c. When the columns of C are not mutually orthogonal

Re: [R] How to choose columns in data.frame by parts of columns' names?

2006-05-30 Thread Peter Alspach

Wei-wei

 I have a data.frame which has names as following.
 [1] XG1  YG1  XEST YEST
 [2] XNOEMP1  XNOEMP2 YNOEMP1  YNOEMP2
 [3] XBUS10   XBUS10A XBUS10B  XBUS10C
 [4] YBUS10   YBUS10A  YBUS10B  YBUS10C
 [5] XOWNBUS  XSELFEST  YOWNBUS  YSELFEST

 Those columns have names beginning with X or Y. Each X
 is paired by a Y, e.g. XG1 and YG1, but they are not in
 the order of X Y X Y  I want to combine X* and Y* like this:

 data.new[,G1]  - (data.old[,XG1] + endata.use[,YG1])/2

 How to choose columns by parts of names? For example, I can pick out
 XG1 and YG1 because they have the common part G1.

Not entirely sure what you mean but one approach might be to re-order
the columns so that they are in order.

yourNames
 [1] XG1  YG1  XEST YEST XNOEMP1  XNOEMP2
 [7] YNOEMP1  YNOEMP2  XBUS10   XBUS10A  XBUS10B  XBUS10C
[13] YBUS10   YBUS10A  YBUS10B  YBUS10C  XOWNBUS  XSELFEST
[19] YOWNBUS  YSELFEST
yourNames[order(substring(yourNames,2), substring(yourNames, 1,1))]
 [1] XBUS10   YBUS10   XBUS10A  YBUS10A  XBUS10B  YBUS10B
 [7] XBUS10C  YBUS10C  XEST YEST XG1  YG1
[13] XNOEMP1  YNOEMP1  XNOEMP2  YNOEMP2  XOWNBUS  YOWNBUS
[19] XSELFEST YSELFEST

gives an idea of what I mean ...

Peter Alspach



__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] factors and mca

2006-05-02 Thread Peter Alspach

Carlos

?mca states that mca works on a dataframe.  As you've written it

is.data.frame(de) returns FALSE

Try

de - data.frame(d,e)  instead of  de - factor(c(d,e))

HTH

Peter Alspach



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Carlos
 Mauricio Cardeal Mendes
 Sent: Wednesday, 3 May 2006 8:23 a.m.
 To: 'R-Help help'
 Subject: [R] factors and mca

 Hi ! Wonder if I have this code below:

 a - c(1, 2, 3, 2, 1)
 b - c(3, 2, 3, 1, 1)
 d -as.factor(a)
 e -as.factor(b)
 table(d,e)
 is.factor(a)
 is.factor(b)
 is.factor(d)
 is.factor(e)
 de - factor(c(d,e))
 is.factor(de)
 require(MASS)
 mca(de)

 I'd like to perform a correspondence analysis, but I can't
 understand why the message error after mca.

 Aren't those (d and e) factors as is.factors shows ?

   Erro em mca(de) : all variables must be factors

 I'm running R 2.3.0 under windows XP


 Thank you !
 Mauricio

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Special characters: plus/minus - a method that works

2006-03-20 Thread Peter Alspach

Gabor Grothendieck wrote:

 Another way that works on my Windows XP system is to use \261 .

Please note Windows escape codes are not portable thus not recommended as 
Martin Maechler pointed out in a response to a suggestion of mine:

On Tue, 14 Jun 2005, Martin Maechler wrote:


 Peter == Peter Alspach PAlspach at hortresearch.co.nz
 on Tue, 14 Jun 2005 14:11:47 +1200 writes:

 Peter Ben

 Peter Others have pointed out plotmath. However, for some
 Peter superscripts (including 2) it may be easier to use
 Peter the appropriate escape sequence (at in Windows):

 Peter ylab = 'BA (m\262/ha)'

 but please refrain from doing that way.
 You should write R code that is portable, and ``plotmath''
 has been designed to be portable. Windows escape codes are not,
 and may not even work in future (or current?) versions of
 Windows with `unusual' locale settings {fonts, etc}.


Peter Alspach

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] superscript in figures - basic question

2005-06-13 Thread Peter Alspach

Ben

Others have pointed out plotmath.  However, for some superscripts (including 2) 
it may be easier to use the appropriate escape sequence (at in Windows):

ylab = 'BA (m\262/ha)'

Cheers 

Peter Alspach


 Benjamin M. Osborne [EMAIL PROTECTED] 14/06/05 13:42:53 
Although I see similar, but more complex, questions addressed in the help
archive, I'm having trouble adding superscripted text to the y-axis labels of
some figures, and I can't find anything in the R documentation on this.

I have:

ylab=BA (m2/ha)

but I want the 2 to be superscripted.
Thanks in advence for the help, or for pointing out the appropriate help file.

-Ben Osborne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Adding up intervals by activity

2005-02-23 Thread Peter Alspach

Lorin

You could use rle():  Say your Activity and Interval data is in lorin, then:

tmp.rle - rle(as.vector(lorin[,1]))[[1]]
tapply(lorin[,2], rep(1:length(tmp.rle), tmp.rle), sum)
 1  2  3 
16  8  6 

HTH

Peter Alspach

 Lorin Hochstein [EMAIL PROTECTED] 24/02/05 16:18:43 
Hello all,

I have a data frame with the following columns: timestamp, activity, 
interval.

Each row contains information about an activity that occured over a time 
interval. Timestamp is when the interval ended, Activity is a factor 
that represents the type of activity, and Interval is a double which 
represents an interval in time. The data frame is sorted by timestamp. 
For example:

TimestampActivity  Interval
2004-02-12 08:59:08  A 5
2004-02-12 09:23:03  A 7
2004-02-12 09:56:04  A 4
2004-02-12 10:39:00  B 5
2004-02-12 11:23:06  B 3
2004-02-12 12:56:01  A 4
2004-02-12 12:59:01  A 2
...

I would like to know how much time was spent on an activity before the 
activity changed (to compute time, I just care about the lengths of the 
intervals, not the values of the timestamps). For example, given the 
table above, I would like to get an output that looks like:


Activity  TotalTime
A 16
B 8
A 6

Is this possible to do in R without resorting to a for loop or recursion 
(i.e. what's the R-ish way to do this?) I know I can use the diff 
function to identify which rows represent a change in activity, but is 
there anything I can do with that?

Lorin

--
Lorin Hochstein
Graduate Research Assistant
Experimental Software Engineering Group
Computer Science Dept., University of Maryland
email:[EMAIL PROTECTED]  tel:301-405-2721

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Positive log-likelihood in lme

2005-02-16 Thread Peter Alspach

Kia ora

I'm a using lme (from nlme package) with data similar to the Orthodont dataset 
and am getting positive log-likelihoods (100).  This seems usual and I 
wondered if someone could offer a possible explanation.

I can supply a sample dataset if requested, but I feel almost certain that this 
question has been asked and answered recently.  However, I can find no trace of 
it in the mail archives (although I have spent several hours reading lots of 
other interesting things :-)).

Thanks .

Peter Alspach


__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] unbalanced design

2004-12-01 Thread Peter Alspach

Damián

I asked a similar question a few months ago (3 August 2004):

 temp.aov - aov(S~rep+trt1*trt2*trt3, data=dummy.data)
 model.tables(temp.aov, type='mean', se=T)

 Returns the means, but states Design is unbalanced - use se.contrasts
 for se's which is a little surprising since the design is balanced. 

To which Prof Ripley replied: If you used the default treatment contrasts, it 
is not.  Try Helmert
contrasts with aov().

If I recall correctly, following Prof Ripley's suggestion led aov() to accept 
the design was balanced, but model.tables() still did not (but that could have 
been my error).  However, se.contrast() worked.

Cheers 

Peter Alspach



 Damián Cirelli [EMAIL PROTECTED] 02/12/04 09:50:13 
Hi all,
I'm new to R and have the following problem:
I have a 2 factor design (a has 2 levels, b has 3 levels). I have an
object kidney.aov which is an aov(y ~ a*b), and when I ask for
model.tables(kidney.avo, se=T) I get the following message along with
the table of effects:

Design is unbalanced - use se.contrast() for se's

but the design is NOT unbalanced... each fator level combination has the
same n

I' d appreciate any help.
Thanks.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] dropping rows

2004-12-01 Thread Peter Alspach

Tobias

I remember finding Patrick Burns' S Poetry (see http://www.burns-stat.com/ ) 
worth reading - and it covers this sort of thing nicely.

Peter Alspach


 Tobias Muhlhofer [EMAIL PROTECTED] 02/12/04 13:57:47 
Thanks.

The problem is that there is extremely little on dataframes or matrices 
in An Intro to R, which I did read and I frankly don't know where else 
to go.

Once I know a function like subset() exists, I can then read the help 
files on it and that's fine, but I would never dream this function up 
myself...

As for indexing, I DID read An Introduction to R and I did NOT catch 
the part where it says you can use any variable in the dataframe to 
index it, nor would I have thought of it by myself. From that 
documentation, I only learned about using row-labels to index things...

But I am definitely thankful for the quick help given to me by people on 
this list, and so I guess being RTFM'ed is a small price to pay for 
figuring out how to solve the problem I need to solve.

Toby


Jeff Laake wrote:
 Here's an example:
 
 earlydata=data[data$year1960,]
 
 Lookup help and read manuals on manipulating dataframes.
 
 
 Tobias Muhlhofer wrote:
 
 Hi!

 Sorry for asking a trivial questions, but I can't seem to figure this 
 out.

 I have a dataframe called master containing 30-odd variables.

 In this dataframe, I have observations across these 30 variables from 
 1930 to 2003 (I've made a year variable). How can I drop all rows 
 for which the year is less than 1960? I'm assuming something with 
 ifelse() but I can't quite figure it out.

 I would appreciate a suggestion of some syntax.

 Thanks!
 Toby

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html 
 
 
 
 
 

-- 
**
When Thomas Edison invented the light bulb he tried over 2000
experiments before he got it to work. A young reporter asked
him how it felt to have failed so many times. He said
I never failed once. I invented the light bulb.
It just happened to be a 2000-step process.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ifelse() question

2004-10-28 Thread Peter Alspach

Francisco

Did you try changing the factors to character, with as.character?

Also you don't really need ifelse() for this.  Something like the
following (untested) should do it:

dat[,4] - as.character(dat[,4])
dat[,5] - as.character(dat[,5])
dat[dat[,4]=='POR',4] - dat[dat[,4]=='POR',5]
dat[,4] - as.factor(dat[,4])
dat[,5] - as.factor(dat[,5])


Peter Alspach

 F Z [EMAIL PROTECTED] 29/10/04 12:48:54 
Hi

I have a data.frame with dim = 18638 (rows) 6 (cols)

names(dat)
[1] id  longlat species typesize

Variable species and type are factors.  Species has 5 levels BOV
CAP 
CER OVI POR
Variable type has 11 levels BRD CL ... OTHER

I would like to replace the values on species by the values on types
only if 
species is == POR
I tried:

x-ifelse(dat$species %in% POR,dat$type,dat$species)
dat[,4]-x
but levels(x)
[1] 1  2  3  4  5  6  8  9  10 11 12

So x changes the factor names by numbers.  I can not use factor() to
recover 
the names since the resulting factors in x are a mixture of factors
from 
species and type.

I also tried

x-gsub(pattern = POR,replacement= factor(dat$type),dat$species) 
with 
same behavior.

Apparently I did not have my granola bar today so I can't find a
solution!  
Any help is greatly appreciated

Thanks!

Francisco

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] as.list.matrix

2004-10-28 Thread Peter Alspach

Kjetil

Isn't a data.frame as special type of list, and thus one could use
as.data.frame?

Peter Alspach


 Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] 29/10/04
14:16:03 
I found the need of converting a matrix into a list of its columns
(for use with do.call), and was surprised there was no method
as.list.matrix, it could easily be a part of as.list default

I wrote

my.as.list.matrix - function(mat) {
   if(!is.matrix(mat))stop(Argument must be a matrix)
   n - NCOL(mat)
   res - vector(mode=list, length=n)
   for (i in seq(along=res))
 res[[i]] - mat[,i]
   res
}

but still think there must be some in-built function doing this!

By the way, my use is

 with(VRS,do.call(scatter.smooth,
c(my.as.list.matrix(na.omit(cbind(Tmed, 
resid(VRS.mod1,type=response,
xlab=, ylab=)))

there shoud be an easier way for such a daily task?

Kjetil

-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
   --  Mahdi Elmandjra

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Standard errors from glm

2004-08-03 Thread Peter Alspach

Dear Prof Ripley

Thanks for your reply and clarification.  However:

1.  Regarding model.tables() returning Design is unbalanced.  Setting contrasts to 
Helmert does indeed make the design balanced, but model.tables() still returns Design 
is unbalanced:

 options()$contrasts
unordered   ordered 
contr.treatment  contr.poly 
 aov(S~rep+trt1*trt2*trt3, data=dummy.data)
Call:
...
Residual standard error: 14.59899 
Estimated effects may be unbalanced
 options(contrasts=c(contr.helmert, contr.treatment))
 aov(S~rep+trt1*trt2*trt3, data=dummy.data)
Call:
...
Residual standard error: 14.59899 
Estimated effects are balanced
 model.tables(aov(S~rep+trt1*trt2*trt3, data=dummy.data), se=T)
Design is unbalanced - use se.contrasts for se's
Tables of effects
...

However, this is a relatively minor issue, and covered in ?model.tables which clearly 
states that The implementation is incomplete, and only the simpler cases have been 
tested thoroughly.

2.  You point out that In either case you can predict something you want to estimate 
and use
predict(, se=TRUE).  Doesn't this give the standard error of the predicted value, 
rather than the mean for, say, trt1 level 0?  For example:
 predict(temp.lm, newdata=data.frame(rep='1', trt1='0', trt2='1', trt3='0'), se=T)
$fit
[1] 32

$se.fit
[1] 10.53591

$df
[1] 23

$residual.scale
[1] 14.59899

Whereas from the analysis of variance table we can get the standard error of the mean 
for trt1 as being sqrt(anova(temp.lm)[9,3]/12) = 4.214365.  It is the equivalent of 
this latter value that I'm after in the glm() case.


 Prof Brian Ripley [EMAIL PROTECTED] 03/08/04 18:10:56 
On Tue, 3 Aug 2004, Peter Alspach wrote:

[Lines wrapped for legibility.]

 I'm having a little difficulty getting the correct standard errors from
 a glm.object (R 1.9.0 under Windows XP 5.1).  predict() will gives
 standard errors of the predicted values, but I am wanting the standard
 errors of the mean.
 
 To clarify:
 
 Assume I have a 4x3x2 factorial with 2 complete replications (i.e. 48
 observations, I've appended a dummy set of data at the end of this
 message).  Call the treatments trt1 (4 levels), trt2 (3 levels) and trt3
 (2 levels) and the replications rep - all are factors.  The observed
 data is S.  Then:
 
 temp.aov - aov(S~rep+trt1*trt2*trt3, data=dummy.data)
 model.tables(temp.aov, type='mean', se=T)
 
 Returns the means, but states Design is unbalanced - use se.contrasts
 for se's which is a little surprising since the design is balanced.  

If you used the default treatment contrasts, it is not.  Try Helmert 
contrasts with aov().

 Nevertheless, se.contrast gives what I'd expect:
 
 se.contrast(temp.aov, list(trt1==0, trt1==1), data=dummy.data)
 [1] 5.960012
 
 i.e. standard error of mean is 5.960012/sqrt(2) = 4.214, which is the
 sqrt(anova(temp.aov)[9,3]/12) as expected.  Similarly for interactions,
 e.g.:
 
 se.contrast(temp.aov, list(trt1==0  trt2==0, trt1==1  trt2==1), 
 data=dummy.data)/sqrt(2)
 [1]  7.299494
 
 How do I get the equivalent of these standard errors if I have used
 lm(), and by extension glm()?  I think I should be able to get these
 using predict(..., type='terms', se=T) or coef(summary()) but can't
 quite see how.

In either case you can predict something you want to estimate and use
predict(, se=TRUE).

-- 
Brian D. Ripley,  [EMAIL PROTECTED] 
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/ 
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Standard errors from glm

2004-08-02 Thread Peter Alspach

Kia ora list members:

I'm having a little difficulty getting the correct standard errors from a glm.object 
(R 1.9.0 under Windows XP 5.1).  predict() will gives standard errors of the predicted 
values, but I am wanting the standard errors of the mean.

To clarify:

Assume I have a 4x3x2 factorial with 2 complete replications (i.e. 48 observations, 
I've appended a dummy set of data at the end of this message).  Call the treatments 
trt1 (4 levels), trt2 (3 levels) and trt3 (2 levels) and the replications rep - all 
are factors.  The observed data is S.  Then:

temp.aov - aov(S~rep+trt1*trt2*trt3, data=dummy.data)
model.tables(temp.aov, type='mean', se=T)

Returns the means, but states Design is unbalanced - use se.contrasts for se's which 
is a little surprising since the design is balanced.  Nevertheless, se.contrast gives 
what I'd expect:

se.contrast(temp.aov, list(trt1==0, trt1==1), data=dummy.data)
[1] 5.960012

i.e. standard error of mean is 5.960012/sqrt(2) = 4.214, which is the 
sqrt(anova(temp.aov)[9,3]/12) as expected.  Similarly for interactions, e.g.:

se.contrast(temp.aov, list(trt1==0  trt2==0, trt1==1  trt2==1), 
data=dummy.data)/sqrt(2)
[1]  7.299494

How do I get the equivalent of these standard errors if I have used lm(), and by 
extension glm()?  I think I should be able to get these using predict(..., 
type='terms', se=T) or coef(summary()) but can't quite see how.

predict(lm(S~rep+trt1*trt2*trt3, data=dummy.data), type='terms', se=T)
or
predict(glm(cbind(S, 100-S)~rep+trt1*trt2*trt3, data=dummy.data, family='binomial'), 
type='terms', se=T)
or, as in my case,
predict(glm(cbind(S, 100-S)~rep+trt1*trt2*trt3, data=dummy.data, 
family='quasibinomial'), type='terms', se=T)


Thanks 

Peter Alspach
HortResearch

dummy.data
trt1trt2trt3rep S
0   0   0   1   33
0   0   0   2   55
0   0   1   1   18
0   0   1   2   12
0   1   0   1   47
0   1   0   2   16
0   1   1   1   22
0   1   1   2   33
0   2   0   1   22
0   2   0   2   18
0   2   1   1   60
0   2   1   2   40
1   0   0   1   38
1   0   0   2   24
1   0   1   18
1   0   1   2   14
1   1   0   1   69
1   1   0   2   42
1   1   1   1   42
1   1   1   2   44
1   2   0   1   48
1   2   0   2   26
1   2   1   1   46
1   2   1   2   33
2   0   0   1   48
2   0   0   2   46
2   0   1   1   24
2   0   1   28
2   1   0   1   69
2   1   0   2   33
2   1   1   1   22
2   1   1   2   33
2   2   0   1   33
2   2   0   2   18
2   2   1   1   26
2   2   1   2   42
3   0   0   1   12
3   0   0   2   42
3   0   1   1   16
3   0   1   2   22
3   1   0   1   14
3   1   0   2   60
3   1   1   1   40
3   1   1   2   55
3   2   0   1   47
3   2   0   2   38
3   2   1   1   18
3   2   1   2   44


__

The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Standard errors from glm

2004-08-02 Thread Peter Alspach

Roger

Thanks - I have tried that but it doesn't give the standard errors I required.

Using my example:

coef(summary(temp.lm)) gives:

  Estimate Std. Error t value Pr(|t|)
(Intercept)   44.5  10.535912  4.22364968 0.0003224616
rep2  -1.0   4.214365 -0.23728369 0.8145375583
trt11-13.0  14.598987 -0.89047272 0.3824325293
trt12  3.0  14.598987  0.20549370 0.8389943428
trt13-17.0  14.598987 -1.16446432 0.2561726432
.. etc ...

that is, standard error for the (intercept) is 10.54, rep 4.21, main effects 14.60, 
2-way interactions 20.65 and 3-way interaction 29.20.  These can also be obtained via
sqrt(diag(vcov(temp.lm))).

It is not clear to me how to go from these estimates to those from the aov() call.  I 
have tried pre- and post- multiplying vcov() by the design matrix but this gives the 
same standard errors as predict(temp.lm, se=T); i.e. those of the predicted values.

Regards 

Peter Alspach


 Roger D. Peng [EMAIL PROTECTED] 03/08/04 12:22:44 
Try summary(glm.object)$coefficients.

-roger

Peter Alspach wrote:
 Kia ora list members:
 
 I'm having a little difficulty getting the correct standard
 errors from a glm.object (R 1.9.0 under Windows XP 5.1).
 predict() will gives standard errors of the predicted values,
 but I am wanting the standard errors of the mean.
 
 To clarify:
 
 Assume I have a 4x3x2 factorial with 2 complete replications
 (i.e. 48 observations, I've appended a dummy set of data at
 the end of this message).  Call the treatments trt1 (4
 levels), trt2 (3 levels) and trt3 (2 levels) and the
 replications rep - all are factors.  The observed data is S.
 Then:
 
 temp.aov - aov(S~rep+trt1*trt2*trt3, data=dummy.data) 
 model.tables(temp.aov, type='mean', se=T)
 
 Returns the means, but states Design is unbalanced - use
 se.contrasts for se's which is a little surprising since the
 design is balanced.  Nevertheless, se.contrast gives what I'd
 expect:
 
 se.contrast(temp.aov, list(trt1==0, trt1==1), data=dummy.data)
  [1] 5.960012
 
 i.e. standard error of mean is 5.960012/sqrt(2) = 4.214, which
 is the sqrt(anova(temp.aov)[9,3]/12) as expected.  Similarly
 for interactions, e.g.:
 
 se.contrast(temp.aov, list(trt1==0  trt2==0, trt1==1 
 trt2==1), data=dummy.data)/sqrt(2) [1]  7.299494
 
 How do I get the equivalent of these standard errors if I have
 used lm(), and by extension glm()?  I think I should be able
 to get these using predict(..., type='terms', se=T) or
 coef(summary()) but can't quite see how.
 
 predict(lm(S~rep+trt1*trt2*trt3, data=dummy.data),
 type='terms', se=T) or predict(glm(cbind(S,
 100-S)~rep+trt1*trt2*trt3, data=dummy.data,
 family='binomial'), type='terms', se=T) or, as in my case, 
 predict(glm(cbind(S, 100-S)~rep+trt1*trt2*trt3,
 data=dummy.data, family='quasibinomial'), type='terms', se=T)
 
 
 Thanks 
 
 Peter Alspach HortResearch
 
 dummy.data trt1   trt2trt3rep S 0 0   0   1   33 0   
  0   0   2   55 00   1   1
 18 0  0   1   2   12 01   0   1   47 01   0  
  2   16 01   1   1   22 01   1   2   33 02
 0 1   22 02   0   2   18 02   1   1   60 0   
  2   1   2   40 10   0   1   38 10   0   2   
 24 
 1 0   1   18 10   1   2   14 11   0  
  1   69 11   0   2   42 11   1   1   42 11   
 1   2
 44 1  2   0   1   48 12   0   2   26 12   1  
  1   46 12   1   2   33 20   0   1   48 20
 0 2   46 20   1   1   24 20   1   28 2   
  1   0   1   69 21   0   2   33 21   1   1   
 22 
 2 1   1   2   33 22   0   1   33 22   0  
  2   18 22   1   1   26 22   1   2   42 30   
 0   1
 12 3  0   0   2   42 30   1   1   16 30   1  
  2   22 31   0   1   14 31   0   2   60 31
 1 1   40 31   1   2   55 32   0   1   47 3   
  2   0   2   38 32   1   1   18 32   1   2   
 44
 
 
 __
 
 The contents of this e-mail are privileged and/or
 confidenti...{{dropped}}
 
 __ 
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE
 do read the posting guide!
 http://www.R-project.org/posting-guide.html 
 

__

The contents of this e-mail are privileged

[R] Scoping Rules: Summary

2003-10-09 Thread Peter Alspach
Thanks to Andy Liaw, Roger Peng, Thomas Lumley, Brian Ripley and Peter
Dalgaard, all of whom addressed my questions or threads arising from
them.  The full messages were posted to the list so this is a brief
summary:

Andy Liaw explained the difference between lexical and dynamic scoping
and the rationale behind the choice of lexical scoping for R.  Roger
Peng showed how to modify fnB.  Brian Ripley suggested that it is
generally better to pass functions an object rather than just the name,
and warned of the dangers of using get() on the result of
deparse(substitute()).

Thanks all

Peter Alspach


Original question below:

I'm using R1.7.1 (Windows 2000) and having difficulty with scoping. 
I've studied the FAQ and on-line manuals and think I have identified
the
source of my difficulty, but cannot work out the solution.

For the purposes of illustration.  I have three functions as defined
below:

fnA - function(my.x)
{
  list(first=as.character(substitute(my.x)), second=sqrt(my.x))
}

fnB - function(AA)
{
  tmp.x - get(AA$first)
  tmp.x^2
}

fnC - function()
{
  x - 1:2
  y - fnA(x)
  z - fnB(y)
  c(x,y,z)
}

fnA() has a vector as an argument and returns the name of the vector
and the square root of its elements in a list.  fn(B) takes the result
of fn(A) as its argument, gets the appropriate vector and computes the
square of its elements.  These work fine when called at the command
line.

fnC() defines a local vector x and calls fnA() which operates on this
vector.  Then fnB() is called, but it operates on a global vector x in
GlobalEnv (or returns an error is x doesn't exist there) - but I want
it
to operate on the local vector.

I think this is related to the enclosing environment of all three
functions being GlobalEnv (since they were created at the command
line),
but the parent environment of fnB() being different when invoked from
within fnC().

My questions:

1  Have I correctly understood the issue ?
2  How do I make fnB() operate on the local vector rather than the
global one ?
3  And, as an aside, I have used as.character(substitute(my.x)) to
pass
the name - but deparse(substitute(my.x)) also works.  Is there any
reason to prefer one over the other?

Thank you ...



__
The contents of this e-mail are privileged and/or confidenti...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Scoping rules

2003-10-08 Thread Peter Alspach
Dear List members:

I'm using R1.7.1 (Windows 2000) and having difficulty with scoping. 
I've studied the FAQ and on-line manuals and think I have identified
the
source of my difficulty, but cannot work out the solution.

For the purposes of illustration.  I have three functions as defined
below:

fnA - function(my.x)
{
  list(first=as.character(substitute(my.x)), second=sqrt(my.x))
}

fnB - function(AA)
{
  tmp.x - get(AA$first)
  tmp.x^2
}

fnC - function()
{
  x - 1:2
  y - fnA(x)
  z - fnB(y)
  c(x,y,z)
}

fnA() has a vector as an argument and returns the name of the vector
and the square root of its elements in a list.  fn(B) takes the result
of fn(A) as its argument, gets the appropriate vector and computes the
square of its elements.  These work fine when called at the command
line.

fnC() defines a local vector x and calls fnA() which operates on this
vector.  Then fnB() is called, but it operates on a global vector x in
GlobalEnv (or returns an error is x doesn't exist there) - but I want
it
to operate on the local vector.

I think this is related to the enclosing environment of all three
functions being GlobalEnv (since they were created at the command
line),
but the parent environment of fnB() being different when invoked from
within fnC().

My questions:

1  Have I correctly understood the issue ?
2  How do I make fnB() operate on the local vector rather than the
global one ?
3  And, as an aside, I have used as.character(substitute(my.x)) to
pass
the name - but deparse(substitute(my.x)) also works.  Is there any
reason to prefer one over the other?

Thank you ...


Peter Alspach



__
The contents of this e-mail are privileged and/or confidential to the
named recipient and are not to be used by any other person and/or
organisation. If you have received this e-mail in error, please notify 
the sender and delete all material pertaining to this e-mail.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help