Re: [R] Finding matches in 2 files

2007-07-26 Thread Christophe Pallier
Maybe with 'merge', but your message is too vague (see
http://www.catb.org/~esr/faqs/smart-questions.html).

On 7/26/07, jenny tan <[EMAIL PROTECTED]> wrote:
>
>
>
> I have 2 files containing data analysed by 2 different methods. I would
> like to find out which genes appear in both analyses. Can someone show me
> how to do this?
> _
> [[trailing spam removed]]
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding matches in 2 files

2007-07-26 Thread Christophe Pallier
Maybe 'merge', but your message is wa
First


On 7/26/07, jenny tan <[EMAIL PROTECTED]> wrote:
>
>
>
> I have 2 files containing data analysed by 2 different methods. I would
> like to find out which genes appear in both analyses. Can someone show me
> how to do this?
> _
> [[trailing spam removed]]
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multilevel package: Obtaining significance for waba within-group correlation?

2007-07-23 Thread Christophe Pallier
On 7/23/07, Bertolt Meyer <[EMAIL PROTECTED]> wrote:
>
> And another question: Does anybody know whether it is possible to save
> individual group-mean-centered values into a new variable?


If data is in 'x' and grouping factor is 'group', use "x - tapply(x, group,
mean)[group]"

For example:

group <- gl(2,10)
x <- rnorm(20)
x - tapply(x, group, mean)[group]




-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using R for a reaction-time experiment

2007-07-23 Thread Christophe Pallier
I also recommend python/pygame for stimulus presentation.
With the "psychopy" and "visionegg "toolboxes, you'll write a simple
reaction time experiment in just a few lines.
Be sure to save the data in a tabular format immediatly readable by R's "
read.table".
Alternatively, you might use "Rpy" to analyse the data.

Christophe Pallier

On 7/23/07, Mike Lawrence <[EMAIL PROTECTED]> wrote:
>
> Psych grad student here, R user for 3 years. Using R for experiments
> is likely not advisable as it has no fine control of display timing
> (ex synching stimuli to the screen refresh, etc). On recommendation
> of colleagues I'm learning the Python language, which has a module
> called 'pygame' that is fantastic for psych experiments.
>
> Mike
>
>
> On 22-Jul-07, at 1:38 PM, Seth Roberts wrote:
>
> >
> > I want to use R to run a reaction-time experiment: Something
> > appears on the
> > screen, I respond by typing something (one keystroke), the system
> > measures
> > the speed of my response. R would be great for this if only I
> > didn't have to
> > hit Enter to enter that keystroke. I am doing such experiments now
> > but they
> > require two actions per trial: hit keystroke, hit Enter.
> >
> > Is there some way that R can be made to respond to a single
> > keystroke (other
> > than Enter)?
> > --
> > View this message in context: http://www.nabble.com/using-R-for-a-
> > reaction-time-experiment-tf4125643.html#a11732474
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Mike Lawrence
> Graduate Student, Department of Psychology, Dalhousie University
>
> Website: http://memetic.ca
>
> Public calendar: http://icalx.com/public/informavore/Public
>
> "The road to wisdom? Well, it's plain and simple to express:
> Err and err and err again, but less and less and less."
> - Piet Hein
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list

2007-07-18 Thread Christophe Pallier
'c' also works with lists:

> a=list(1,2,3)
> b=list(1,2,3)
> c(a,b)
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 1

[[5]]
[1] 2

[[6]]
[1] 3


On 7/18/07, elyakhlifi mustapha <[EMAIL PROTECTED]> wrote:
>
> Hello,
> in using vector() we can create a vector and fill in like this
>
> v <- vector()
> v <- c(v,2)
> v <- c(v,c(5,10,23))
>
> but I wanna know if it's possible to do the same with the list I don't
> fond how?
> Can you help me?
> Thanks.
>
>
>
>
>
>
>
> ___
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove the quote "" in the data frame?

2007-07-14 Thread Christophe Pallier
Beware: you are not working with data.frames but with a vector and a
matrice.
(see ?cbind)

Solution: convert 'res' to data.frame.

Christophe

On 7/14/07, Zhang Jian <[EMAIL PROTECTED]> wrote:
>
> If I do not add "ress" into the data frame "res", there is no quote in the
> data frame. However, I add "ress", all column were found the quote.
> How to remove it?
> If you can delete the quote in the file "ress", that is better.
> Thanks.
>
> > ress[1:10]
> [1] "ABHO.ABNE" "ABHO.ACBA" "ABHO.ACGI" "ABHO.ACKO" "ABHO.ACMA" "ABHO.ACMO
> "
> [7] "ABHO.ACPS" "ABHO.ACSE" "ABHO.ACTE" "ABHO.ACTR"
> > res=cbind(obv.value,p.value,mean.sim)
> > res[1:10,]
>   obv.value p.value mean.sim
> [1,] 2 1.0  6.0
> [2,] 0 1.0  0.0
> [3,]66 0.5 49.6
> [4,] 3 1.0  3.0
> [5,] 0 1.0 64.7
> [6,] 0 1.0  0.0
> [7,] 0 1.0  0.0
> [8,]51 0.5 39.8
> [9,] 0 1.0 47.4
> [10,]59 0.7 72.0
>
> > ress=cbind(res,ress)
> > ress[1:10,]
>   obv.value p.value mean.sim ress
> [1,] "2"   "1" "6"  "ABHO.ABNE"
> [2,] "0"   "1" "0"  "ABHO.ACBA"
> [3,] "66"  "0.5"   "49.6"   "ABHO.ACGI"
> [4,] "3"   "1" "3"  "ABHO.ACKO"
> [5,] "0"   "1" "64.7"   "ABHO.ACMA"
> [6,] "0"   "1" "0"  "ABHO.ACMO"
> [7,] "0"   "1" "0"  "ABHO.ACPS"
> [8,] "51"  "0.5"   "39.8"   "ABHO.ACSE"
> [9,] "0"   "1" "47.4"   "ABHO.ACTE"
> [10,] "59"  "0.7"   "72" "ABHO.ACTR"
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aov() question

2007-07-12 Thread Christophe Pallier
On 7/11/07, Leigh E Alexander <[EMAIL PROTECTED]> wrote:
>
> i have a 3 x 4 x 2 repeated measures design.  All of the IVs are within
> subjects.I do also have missing values (unequal N), as I have to
> remove any incorrect trials for each subject.


Hum, you mean there are empty cells in table(sid, tran, block, half)?
That is probably the reason for the error message. It is not clear to me
that 'aov' can handle empty cells in such a design (maybe 'lmer' can)

Christophe








Here is the code I entered and the error message:
>
> a<-aov(log(rt)~(tran*block*half) + Error (sid/ (tran*block*half)),
> data=mydata2)
>
> Warning message:
> Error() model is singular in: aov(log(rt) ~ (tran * block * half) +
> Error(sid/(tran * block *
>
> I then do summary(a) and am able to get an output, but I am not sure
> whether or not I can trust that output since I got the error message.
> Any body have any thoughts/solutions for this?
>
> Also, are there any benefits of you aov() vs. use some of the linear
> model functions or vice versa?
>
> Thanks for any help you can offer!!
>
> ~Leigh Alexander
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA: Does a Between-Subjects Factor belong in the Error Term?

2007-07-09 Thread Christophe Pallier
On 7/9/07, Alex Baugh <[EMAIL PROTECTED]> wrote:
>
> I am executing a Repeated Measures Analysis of Variance with 1 DV
> (LOCOMOTOR
> RESPONSE),  2 Within-Subjects Factors (AGE, ACOUSTIC CONDITION), and 1
> Between-Subjects Factor (SEX).
>
> Does anyone know whether the between-subjects factor (SEX) belongs in the
> Error Term of the aov or not?



It does not.

If you have between-subjects factors A, B and within-subjects factors X, Y,
Z, use:

aov( dv ~ a*b*x*y*z + Error(subj/(x*y*z))

The subj/(x*y*z) formula includes subj:x subj:y subj:z and all their
interactions as error terms.

The effect of a within subject factor 'x' is assessed against the error term
subj:x

-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] import data

2007-06-30 Thread Christophe Pallier
Oops, sorry for the previous empty reply.

You do not tell us which Operating System you are using, so I assume it must
be Windows...
You should check the "R import/export Data" available from the Help/Manuals
menu
(in French: "Aide/Manuels"...)

Christophe


On 6/30/07, Christophe Pallier <[EMAIL PROTECTED]> wrote:
>
> ha
>
> On 6/30/07, eric zoukekang <[EMAIL PROTECTED]> wrote:
>
> > Hello!
> > I wonder if you might help me with informations about how to import data
> > with a 2.4.1 R version without the menu "Import data".
> > Best regards.
> >
> > Eric Duplex ZOUKEKANG
> > Ingénieur Zootechnicien
> > Montpellier SupAgro
> > Master2 AAA-PARC
> > tel : +33(0)661432340
> > [EMAIL PROTECTED]
> >
> >
> >
> >
> >
> >
> > ___
> >
> >
> >
> >
> >
> >
> > [[alternative HTML version deleted]]
> >
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
> --
> Christophe Pallier (http://www.pallier.org)




-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] import data

2007-06-30 Thread Christophe Pallier
ha

On 6/30/07, eric zoukekang <[EMAIL PROTECTED]> wrote:
>
> Hello!
> I wonder if you might help me with informations about how to import data
> with a 2.4.1 R version without the menu "Import data".
> Best regards.
>
> Eric Duplex ZOUKEKANG
> Ingénieur Zootechnicien
> Montpellier SupAgro
> Master2 AAA-PARC
> tel : +33(0)661432340
> [EMAIL PROTECTED]
>
>
>
>
>
>
>
> ___
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeat if

2007-06-29 Thread Christophe Pallier
On 6/29/07, Birgit Lemcke <[EMAIL PROTECTED]> wrote:
>
> Hello Jim,
>
> thanks for your answer. At the moment I am using this code:
>
> Range0<-sapply(1:85, function(i) eval(parse(text=paste("range(V", i,
> ", na.rm=T)", sep=""



It is a matter of taste, but I tend to prefer:

a <- list()
for (i in 1:85) a[[i]] = get(paste("V", i, spe=""))

sapply(a, range, na.rm=T)

The first two lines put your variables inside a list (assuming they are not
already in one, e.g., in a data.frame).
This is especially interesting if you have other operations to perform on
the V1..V85 variables.

Christophe








and it works really fine.
>
> The code you sent me is also fine but how can I implement, that
> missing values are TRUE?
>
> Thanks a lot for your help
>
> Birgit
>
> Am 28.06.2007 um 13:01 schrieb Jim Lemon:
>
> > Birgit Lemcke wrote:
> >> Hello,
> >> (Power Book G4, Mac OS X, R 2.5.0)
> >> I would like to repeat the function range for 85 Vectors (V1-V85).
> >> I tried with this code:
> >> i<-0
> >>  > repeat {
> >> + i<-i+1
> >> + if (i<85) next
> >> + range (Vi, na.rm = TRUE)
> >> + if (i==85) break
> >> + }
> >> I presume that the Vi is wrong, because in this syntax i is not
> >> known  as a variable. But I don´t know how to say that it is a
> >> variable here.
> >> Would be nice if somebody could help me.
> >> Perhaps I´m thinking too complicated and there is an easier way to
> >> do  this.
> > Hi Birgit,
> > This may be what you want:
> >
> > for(i in 1:85)
> >  print(do.call("range",list(as.name(paste("V",i,sep="")
> >
> > Jim
> >
>
> Birgit Lemcke
> Institut für Systematische Botanik
> Zollikerstrasse 107
> CH-8008 Zürich
> Switzerland
> Ph: +41 (0)44 634 8351
> [EMAIL PROTECTED]
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] TukeyHSD

2007-06-28 Thread Christophe Pallier
Are you sure that Tukey's HSD (or any other usual multiple comparison
procedure) applies to within-subject factors?
I do not have my stats books (e.g. Winer or Kirk)  to check.

You can get some output with:

TukeyHSD(aov(hand:gaze+subject))

(assuming you ant to compare the cells defined by hand:gaze)
but I am not sure wheter this is meaningful/correct in the context of the
within-subject design.
(Actually, I write this email because I am interested to know whether it is
a legitimate approach or not.)

Christophe

On 6/28/07, Patrick Bedard <[EMAIL PROTECTED]> wrote:
>
> Hello everyone,
>
> So I ran an anova with aov and then I want to run post-hoc comparisons but
> keep receiving this message :
> > no applicable method for "TukeyHSD"
>
> Here is my code:
>
> > d<-read.table("d.txt")
> > d
> >Obs subj Hand GazeRT
> > 11   s111 401.4
> > 22   s211 363.3..
>
> > summary(ano <- aov(RT~(Hand*Gaze)+Error(subj/(Hand*Gaze)),data=d ))
>
>
> This seems to work fine, but then I use
> > fm1Tukey=TukeyHSD(fm1,"Hand") ; fm1Tukey
>
> And receive
> > no applicable method for "TukeyHSD"
>
>
> I can¹t seem to find the error nor anyone ever answering that question
> which
> seems to pop-up here and there on other websites (eg, Nabble)
>
>
> Tnaks in advance for your help,
>
> __
> Patrick Bédard Ph.D.
> Dept. of Neuroscience
> Brown University
>
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using self-written functions

2007-06-28 Thread Christophe Pallier
One "quick and dirty" way is to put all your functions inside a file, say
/home/me/R/myfuncs.R, and add  the following line at the beginning of your
scripts:

source('/home/me/R/myfuncs.R')

This is dirty because if you need to change the location of this file, your
scripts will cease to work.
Alternatively, you can add this command to your Rprofile (see the Annex B of
An introduction to R "Invoking R")

Christophe




On 6/28/07, R. Leenders <[EMAIL PROTECTED]> wrote:
>
> Hi, I am pretty new to R, so I apologize for the obvious question.
> I
> have worked with R for a few months now and in the process have written
> several functions that I frequently use in various data analysis
> projects. I tend to give each project a directory of its own and set
> the working directory to that.
> Since there are several tasks that
> need to be accomplished in many of my projects, I frequently want to
> use functions I have written previously. My question is, how do I get
> access to them? The way I do it now is copy the relevant code to the
> script file of the project I am working on at the time and then run it
> so as to make the functions available. But that seems to be
> unnecessarily cumbersome. I used to work a lot with gauss, which had
> the opportunity of putting one's own functions is one directory and
> gauss would then have that directory in its search path always. How can
> I access my own functions in R without having to copy-paste them
> everytime and run them manually so I can call them later? Do I need to
> learn how to write a package and attach the package to make the
> functions available at all times? Is there another way?
>
> thanks, James
>
>
>
>
>
>
> 
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeat if

2007-06-28 Thread Christophe Pallier
Your V1 to V85 are probably coming from a data.frame, aren't they?

If yes, and if this data.frame is named 'a', you can use 'sapply(a,range)'

Otherwise, see ?get (get(paste("V","1",sep="")) returns V1)

Christophe


On 6/28/07, Birgit Lemcke <[EMAIL PROTECTED]> wrote:
>
> Hello,
> (Power Book G4, Mac OS X, R 2.5.0)
>
> I would like to repeat the function range for 85 Vectors (V1-V85).
> I tried with this code:
>
> i<-0
> > repeat {
> + i<-i+1
> + if (i<85) next
> + range (Vi, na.rm = TRUE)
> + if (i==85) break
> + }
>
> I presume that the Vi is wrong, because in this syntax i is not known
> as a variable. But I don´t know how to say that it is a variable here.
> Would be nice if somebody could help me.
> Perhaps I´m thinking too complicated and there is an easier way to do
> this.
>
> Thanks in advance
>
> Greetings
>
> Birgit
>
> Birgit Lemcke
> Institut für Systematische Botanik
> Zollikerstrasse 107
> CH-8008 Zürich
> Switzerland
> Ph: +41 (0)44 634 8351
> [EMAIL PROTECTED]
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compare 2 vectors

2007-06-28 Thread Christophe Pallier
setdiff(b,a) is even simpler.

On 6/28/07, Christophe Pallier <[EMAIL PROTECTED]> wrote:
>
>
>
> On 6/28/07, João Fadista <[EMAIL PROTECTED]> wrote:
> >
> > I would like to take out the values from one vector that are equal to
> > the values in another vector.
> >
> > Example:
> > a <- c(1,2,3,4,5,6,7,8,9)
> > b <- c(3,10,20,5,6)
> > b_noRepeats = c(10,20)
> >
> >
>
>  b[!(b %in% intersect(a,b))]
>
> See ?intersect
>
>
> --
> Christophe Pallier (http://www.pallier.org)




-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compare 2 vectors

2007-06-28 Thread Christophe Pallier
On 6/28/07, João Fadista <[EMAIL PROTECTED]> wrote:
>
> I would like to take out the values from one vector that are equal to the
> values in another vector.
>
> Example:
> a <- c(1,2,3,4,5,6,7,8,9)
> b <- c(3,10,20,5,6)
> b_noRepeats = c(10,20)
>
>

 b[!(b %in% intersect(a,b))]

See ?intersect


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matlab end operator

2007-06-27 Thread Christophe Pallier
Hello Markus,

On 6/27/07, Markus Loecher <[EMAIL PROTECTED]> wrote:
>
> Dear list members,
> I use both R and Matlab and find that each has its own strengths. Matlab
> definitely has the edge when it comes to the interactivity of its graphs.


I also use both. R definitely has the edge when it comes to do perform
statistical data analyses :)
(and also when you consider the price...)

In addition I find the little operator end extremely useful in indexing
> arrays. (as in x(1:end,) )


You mean 'x(1:end,1:end)' or 'x(:,:)'  (':' is equivalent to "1:end")

When I go from R to Matlab, I tend to forget to type the ':' ("a[,2]" in R
is "a(:,2)" in Matlab.)

The interest of 'end' is clearer when the starting index is larger than 1 as
in, e.g., 'x(2:end)'

Yet note that in R, you can use negative indexes:

  x[-1]   is the R equivalent of  Matlab's x(2:end)

  x[-(1:(n-1))] is equivalent to x(n:end)


I agree that R syntax may be a bit less "elegant" in this particular
situation (but try to write the equivalent of a[-2,] in Matlab)
Personally, I would stick to "x[n:length(x)]" (or "a[n:nrow(a),]" for a
matrix). Anyway this kind of code would probably appear inside a loop and I
would put the numbers of rows or columns in variables if there are needed
more than once.

Best,

-- 
Christophe Pallier

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A really simple data manipulation example

2007-06-27 Thread Christophe Pallier
For merging, selecting and aggregating, R is not too bad:
I believe the following code is more or less equivalent to your Vilno
example, isn't?


# to input data, replace the following two lline by 'read.table'
labresults <- data.frame(patient.id=gl(10,5), visit.num=gl(5,10),
sodium=rnorm(50))
demo <- data.frame(patient.id=gl(10,1), gender=gl(2,5,labels=c('F','M')))

data <- merge(labresults, demo)
data <- subset(data, visit.num!=2)
with(data,
 aggregate(sodium, list(gender=gender), mean)
 )


If the data sets are very large, then doing merges/selection/aggregation
outside of R can be a good idea.


Christophe

On 6/27/07, Robert Wilkins <[EMAIL PROTECTED]> wrote:
>
> In response to those who asked for a better explanation of what the
> Vilno software does, here's a simple example that gives some idea of
> what it does.
>
> LABRESULTS is a dataset with multiple rows per patient , with lab
> sodium measurements. It has columns: PATIENT_ID, VISIT_NUM, and
> SODIUM.
>
> DEMO is a dataset with one row per patient, with demographic data.
> It has columns: PATIENT_ID, GENDER.
>
> Here's a simple example, the following paragraph of code is a
> data processing function (dpf) :
>
>
> inlist LABRESULTS DEMO ;
> mergeby PATIENT_ID ;
> if (SODIUM == -9) SODIUM = NULL ;
> if (VISIT_NUM != 2) deleterow ;
> select AVERAGE_SODIUM = avg(SODIUM) by GENDER ;
> sendoff(RESULTS_DATASET)  GENDER AVERAGE_SODIUM ;
> turnoff; // just means end-of-paragraph , version 1.0 won't need this.
>
> Can you guess what it does? The lab result rows are merged with the
> demographic rows, just to get the gender information merged in.
> Obviously, they are merged by patient. The code -9 is used to denote
> "missing", so convert that to NULL. I'm about to take a statistic for
> visit 2, so rows with visit 0 or 1 must be deleted. I'm assuming, for
> visit 2, each patient has at most one row. Now, for each sex group,
> take the average sodium level. After the select statement, I have just
> two rows, for male and female, with the average sodium level in the
> AVERAGE_SODIUM column. Now the sendoff statement just stores the
> current data table into a datafile, called RESULTS_DATASET.
>
> So you have a sequence of data tables, each calculation reading in the
> current table , and leaving a new data table for the next calculation.
>
> So you have input datasets, a bunch of intermediate calculations, and
> one or more output datasets. Pretty simple idea.
>
> *
>
> Some caveats:
>
> LABRESULTS and DEMO are binary datasets. The asciitobinary and
> binarytoascii statements are used to convert between binary datasets
> and comma-separated ascii data files. (You can use any delimiter:
> comma, vertical bar , etc). An asciitobinary statement is typically
> just two lines of code.
>
> The dpf begins with the inlist statement , and , for the moment ,
> needs "turnoff ;" as the last line. Version 1.0 won't require the use
> of "turnoff;", but version 0.85 does. It only means this paragraph of
> code ends here ( a program can , of course , contain many paragraphs:
> data processing functions, print statements, asciitobinary statements,
> etc.).
>
> If you've worked with lab data, you know lab data does not look so
> simplistic. I need a simple example.
>
> Vilno has a lot of functionality, many-to-many joins, adding columns,
> firstrow() and lastrow() flags, and so forth. A fair amount of complex
> data manipulations have already been tested with test programs ( in
> the tarball ). Of course a simple example cannot show you that, it's
> just a small taste.
>
> *
>
> If you've never used SPSS or SAS before, you won't care, but this
> programming language falls in the same family as the SPSS and SAS
> programming languages. All three programming languages have a fair
> amount in common, but are quite different from the S programming
> language. The vilno data processing function can replace the SAS
> datastep. (It can also replace PROC TRANSPOSE and much of PROC MEANS,
> except standard deviation calculations still need to be included in
> the select statement).
>
> ********
>
> I hope that helps.
>
> http://code.google.com/p/vilno
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inter-rater agreement index kappa

2007-06-26 Thread Christophe Pallier
Google 'R-help kappa' ->
https://stat.ethz.ch/pipermail/r-help/2006-April/104605.html



On 6/26/07, Nair, Murlidharan T <[EMAIL PROTECTED]> wrote:
>
> Is there a function that calculates the inter-rater agreement index
> (kappa) in R?
>
> Thanks ../Murli
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looking for parallel functionality between Matlab and R

2007-06-26 Thread Christophe Pallier
On 6/26/07, El-ad David Amir <[EMAIL PROTECTED]> wrote:
>
> a) How do I mimic Matlab's 'hold on'? (I want to show several plots
> together, when I type two plots one after the other the second overwrites
> the first)



par(new=T) (see ?par)

Yet, it often better, after a call to the 'plot' command, to use 'points' or
'lines' to add new elements to the plot.
Also, if you want to plot several curves in one shoot, you can use
'matplot'.

-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-excel

2007-06-25 Thread Christophe Pallier
cf. R Data Import/Export
in the standard
documentation.

Christophe

On 6/25/07, Erika Frigo <[EMAIL PROTECTED]> wrote:
>
>
> Good morning to everybody,
> I have a problem : how can I import excel files in R???
>
> thank you very much
>
>
> Dr.sa. Erika Frigo
> Università degli Studi di Milano
> Facoltà di Medicina Veterinaria
> Dipartimento di Scienze e Tecnologie Veterinarie per la Sicurezza
> Alimentare (VSA)
>
> Via Grasselli, 7
> 20137 Milano
> Tel. 02/50318515
> Fax 02/50318501
> [[alternative HTML version deleted]]
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-22 Thread Christophe Pallier
If I understand correctly (from your Perl script)

1. you count the number of occurences of each "(echo, muga)" pairs in the
first file.

2. you remove from the second file the lines that correspond to these
occurences.

If this is indeed your aim, here's a solution in R:

cumcount <- function(x) {
 y <- numeric(length(x))
 for (i in 1:length(y)) {
 y[i] = sum(x[1:i] == x[i])
 }
 y
}

both <- read.csv('both_echo.csv')
v <- table(paste(both$echo, "_", both$muga, sep=""))

semi <- read.csv('qual_echo.csv')
s <- paste(semi$echo, "_", semi$muga, sep="")
cs = cumcount(s)
count = v[s]
count[is.na(count)]=0

semi2 <- data.frame(semi, s, cs, count, keep = cs > count)

> semi2
  echo muga quant s cs count  keep
1   10   20 0 10_20  1 0  TRUE
2   10   20 0 10_20  2 0  TRUE
3   10   21 0 10_21  1 1 FALSE
4   10   21 0 10_21  2 1  TRUE
5   10   24 0 10_24  1 0  TRUE
6   10   25 0 10_25  1 2 FALSE
7   10   25 0 10_25  2 2 FALSE
8   10   25 0 10_25  3 2  TRUE


My code is not very readable...
Yet, the 'trick' of using an helper function like 'cumcount' might be
instructive.

Christophe Pallier


On 6/22/07, Kevin E. Thorpe <[EMAIL PROTECTED]> wrote:
>
> I am posting to this thread that has been quiet for some time because I
> remembered the following question.
>
> Christophe Pallier wrote:
> > Hi,
> >
> > Can you provide examples of data formats that are problematic to read
> and
> > clean with R ?
>
> Today I had a data manipulation problem that I don't know how to do in R
> so I solved it with perl.  Since I'm always interested in learning more
> about complex data manipulation in R I am posting my problem in the
> hopes of receiving some hints for doing this in R.
>
> If anyone has nothing better to do than play with other people's data,
> I would be happy to send the row files off-list.
>
> Background:
>
> I have been given data that contains two measurements of left
> ventricular ejection fraction.  One of the methods is echocardiogram
> which sometimes gives a true quantitative value and other times a
> semi-quantitative value.  The desire is to compare echo with the
> other method (MUGA).  In most cases, patients had either quantitative
> or semi-quantitative.  Same patients had both.  The data came
> to me in excel files with, basically, no patient identifiers to link
> the "both" with the semi-quantitative patients (the "both" patients
> were in multiple data sets).
>
> What I wanted to do was extract from the semi-quantitative data file
> those patients with only semi-quantitative.  All I have to link with
> are the semi-quantitative echo and the MUGA and these pairs of values
> are not unique.
>
> To make this more concrete, here are some portions of the raw data.
>
> "Both"
>
> "ID NUM","ECHO","MUGA","Semiquant","Quant"
> "B",12,37,10,12
> "D",13,13,10,13
> "E",13,26,10,15
> "F",13,31,10,13
> "H",15,15,10,15
> "I",15,21,10,15
> "J",15,22,10,15
> "K",17,22,10,17
> "N",17.5,4,10,17.5
> "P",18,25,10,18
> "R",19,25,10,19
>
> Seimi-quantitative
>
> "echo","muga","quant"
> 10,20,0  <-- keep
> 10,20,0  <-- keep
> 10,21,0  <-- remove
> 10,21,0  <-- keep
> 10,24,0  <-- keep
> 10,25,0  <-- remove
> 10,25,0  <-- remove
> 10,25,0  <-- keep
>
> Here is the perl program I wrote for this.
>
> #!/usr/bin/perl
>
> open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> # Discard first row;
> $_ = ;
> while() {
> chomp;
> ($id, $e, $m, $sq, $qu) = split(/,/);
> $both{$sq,$m}++;
> }
> close(BOTH);
>
> open(OUT, "> qual_echo_only.csv") || die "Can't open qual_echo_only.csv";
> print OUT "pid,echo,muga,quant\n";
> $pid = 2001;
>
> open(QUAL, "qual_echo.csv") || die "Can't open qual_echo.csv";
> # Discard first row
> $_ = ;
> while() {
> chomp;
> ($echo, $muga, $quant) = split(/,/);
> if ($both{$echo,$muga} > 0) {
> $both{$echo,$muga}--;
> }
> else {
> print OUT "$pid,$echo,$muga,$quant\n";
> $pid++;
> }
> }
> close(QUAL);
> close(OUT);
>
> open(OUT, "> both_echo.csv") || die "Can't open both_echo.csv";
> print OUT "pid

Re: [R] add line to data.frame

2007-06-20 Thread Christophe Pallier
On 6/20/07, Manuele Pesenti <[EMAIL PROTECTED]> wrote:
>
> how can I update a data.frame adding new lines?


rbind


> I need to create a second data frame from a first one with only some of
> their
> entrys filtering the value of a specific column... How can I do this?


dtf2 <- dtf1[dtf1$col=='xxx',]

-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function -return value

2007-06-19 Thread Christophe Pallier
First, to return several values from your function 'parameter', you can use
a list:

parameter <- function (...) {
...
list(alpha=alpha,beta=beta,para=para,parab=parab)
}


Then, you may use:

sapply(split(variable,list(a,b)), parameter)

(tapply also works but return a matrix of lists)


Christophe Pallier


On 6/19/07, livia <[EMAIL PROTECTED]> wrote:
>
>
> Hi, I am trying to write a function with the following codes and I would
> like
> it to return the values for "alpha
> beta para parab " seperately. Then I would like to use this funstion for
> "variable" with factor "a" and "b". But the result turns out to be a
> matrix
> with element like "Numeric,2" ... I guess they are just the values for
> "parab", and we can not even see the two parameters in parab.
>
>
> parameter <- function(v) {
> v1 <- v[v>mean(v)+0.5*sd(v)]
> v2 <- v[v alpha=min(v1)
> beta=max(v2)
> para <- fitgpd(v1,alpha, method="pwmu")$param
> parab <- fitgpd((-v2), (-beta), method="pwmu")$param
> v1.fit <- qgpd(ppoints(v1, a=0.5), alpha, para[1], para[2])
> v2.fit <- qgpd(ppoints((-v2), a=0.5), (-beta), para[1], para[2])
> alpha
> beta
> para
> parab
>
> }
>
> tapply(variable, list(a, b),parameter)
>
>
> I would be grateful if anyone can give me some advice. Many thanks
> --
> View this message in context:
> http://www.nabble.com/Function--return-value-tf3947134.html#a11197159
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function -return value

2007-06-19 Thread Christophe Pallier
You wChange the function 'parameter'

 sapply(split(variable,list(a,b)),parameter)

On 6/19/07, livia <[EMAIL PROTECTED]> wrote:
>
>
> Hi, I am trying to write a function with the following codes and I would
> like
> it to return the values for "alpha
> beta para parab " seperately. Then I would like to use this funstion for
> "variable" with factor "a" and "b". But the result turns out to be a
> matrix
> with element like "Numeric,2" ... I guess they are just the values for
> "parab", and we can not even see the two parameters in parab.
>
>
> parameter <- function(v) {
> v1 <- v[v>mean(v)+0.5*sd(v)]
> v2 <- v[v alpha=min(v1)
> beta=max(v2)
> para <- fitgpd(v1,alpha, method="pwmu")$param
> parab <- fitgpd((-v2), (-beta), method="pwmu")$param
> v1.fit <- qgpd(ppoints(v1, a=0.5), alpha, para[1], para[2])
> v2.fit <- qgpd(ppoints((-v2), a=0.5), (-beta), para[1], para[2])
> alpha
> beta
> para
> parab
>
> }
>
> tapply(variable, list(a, b),parameter)
>
>
> I would be grateful if anyone can give me some advice. Many thanks
> --
> View this message in context:
> http://www.nabble.com/Function--return-value-tf3947134.html#a11197159
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame

2007-06-18 Thread Christophe Pallier
If M is the original matrix,

k <- as.data.frame(M)
names(k) <- paste("Rép",1:ncol(k),sep="")
rownames(k) <- paste("Col",1:nrow(k),sep="") # replace by what you want
k



On 6/18/07, elyakhlifi mustapha <[EMAIL PROTECTED]> wrote:
>
> hello,
> I'm trying to write a function which take a matrix and give a dataframe
> with column names and row names but the problem I meet it's that the column
> number is changing and the vector containing the column names is also
> changing how can I do to write a good progam for the moment I tryied like
> follow:
>
> dm <- ncol(M)
> v <- vector()
> t <- 1
> while (dm > 0) {
> v <- c(v,paste("Rép",t,sep=""))
> t <- t + 1
> dm <- dm - 1
> }
> nv <- noquote(v)
> df <- function (M,x) {
> return(data.frame(nv[1] = M[,1], nv[2] = M[,2],nv[3] = M[,3], row.names =
> var[[1]], check.rows = TRUE, check.names = TRUE))
> }
>
> I know that there are errors but the important is that R doesn't recognize
> nv.
> For more precision the martix M is like follow:
>
> M
>   [,1] [,2] [,3]
> [1,] 6.52   NA 6.59
> [2,] 6.99 6.85 6.38
> [3,] 6.92 6.72 6.99
> [4,] 6.59 5.51 6.45
> [5,] 6.65 7.12 6.99
> [6,] 6.18 5.71 5.78
> [7,] 6.65 6.52 6.72
> [8,] 6.65 6.79 6.12
> [9,] 6.59 6.65 6.32
> [10,] 5.85 6.05 6.38
> [11,] 6.38 6.79 6.65
> [12,] 6.79 6.52 6.72
> [13,] 6.12 6.25 6.38
> [14,] 6.99 6.72 6.38
> [15,] 6.59 6.65 6.99
> [16,] 6.45 6.18 6.59
> [17,] 5.65 6.05 6.52
> [18,] 6.52 6.85 6.65
> [19,] 6.18 6.32 6.32
> [20,] 6.99 6.65 6.72
> [21,] 6.52 6.99 6.32
>
> Can you help me?
> thanks.
>
>
>
>   
> _
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo!
> Mail
> [[alternative HTML version deleted]]
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame

2007-06-18 Thread Christophe Pallier
On 6/18/07, Robin Hankin <[EMAIL PROTECTED]> wrote:
>
>
> I think the questioner was looking for row() and col(), which (IMO) are
> difficult to find if you don't know of their existence.



Searching for "R matrix number of rows columns" in google returns, in fourth
position, the manual page for 'nrow'.


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame

2007-06-18 Thread Christophe Pallier
On 6/18/07, elyakhlifi mustapha <[EMAIL PROTECTED]> wrote:
>
> hello,
> are there functions giving the columns number and the rows number of a
> matrix?


Yes, there are.

Are you trying to use R without reading *any* documentation???
The mailing list is not a substitute for the manuals.
See the Posting guide.

Christophe



thanks.
>
>
>
>   
> _
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BATCH

2007-06-18 Thread Christophe Pallier
?BATCH

By default, the input commands are printed along with the output.
 To suppress this behavior, add 'options(echo = FALSE)' at the
 beginning of 'infile'.


"at the begining of 'infile'" means that 'options(echo = FALSE)' must be
included inside 'infile'( in your case copie.r), as a first line. Not before
'infile' on the command line.

Btw, the help does not mention that you can put several script files on the
command line. I am not sure if it works.

Christophe



On 6/18/07, elyakhlifi mustapha <[EMAIL PROTECTED]> wrote:
>
> Hello,
> I run some programs R from BATCH using  this syntax
>
> R CMD BATCH options(echo = FALSE) C:\R\copie.r C:\PHP\sortie.r
>
> but the options doesn't work do you know why?
> thanks.
>
>
>
>   
> _
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what about options in BATCH

2007-06-18 Thread Christophe Pallier
?BATCH

Read second paragraph of "Details" section.


On 6/18/07, elyakhlifi mustapha <[EMAIL PROTECTED]> wrote:
>
> hello,
> when I run a calcul in BATCH the screen displays all the code of
> my  programs and even the introduction of R how can I do to don't display
> it?
> thanks.
>
>
>
>   
> _
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to fit y=m*x

2007-06-14 Thread Christophe Pallier
summary(lm(y ~ x - 1))

Use google (R "linear regression without intercept")
Read the posting guide.

There.


On 6/14/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> Hi There,
>
> I have a set of data (xi,yi).I want to fit them with the equation
> y=mx.
>
> note: in the above equation, there is no intercept.
>
> I don't know how to use common software such as R , matlab, sas, or
> spss to do this kind of regression.
>
> Does anyone know how to do this?
>
> I know it is easy to use least square method to do this by
> programming. But I want to find if there exists some common software
> which can do this.
>
> Thank you very much.
>
> Van
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon test on data matrix

2007-06-14 Thread Christophe Pallier
   wilcox.test(data.matrix[i,] ~ data.cl)$p.value

will return the p.value for row 'i'.

With a loop:

   pv = rep(0,42)
   for (i in 1:42) { pv[i]=wilcox.test(data.matrix[i,] ~ data.cl)$p.value }

Is this what you wanted?

Christophe




On 6/14/07, Booman, M <[EMAIL PROTECTED]> wrote:
>
> Dear everyone,
> I am trying to do a Wilcoxon one-sided test on my gene expression data.
> These are the data i have in R:
> data.matrix (matrix, numeric) containing all gene expression data (42
> rows=genes,  42 columns=tumors), no column header or row names
> data.cl (vector, numeric) consisting of 42 0's and 1's to indicate class 0
> or class 1 for each column in data.matrix
>
> I want to do a Wilcoxon one-sided test on the data from class 0 versus the
> data from class 1, for each row (gene) of the data set.
> My first try:
>
> #to make separate matrices for both classes:
> data.matrix.0 <- data.matrix[,data.cl==0]
> data.matrix.1 <- data.matrix[,data.cl==1]
>
> # to run the wilcox.test function for each row:
> rawp <- apply(data.matrix.0, 1, wilcox.test, y=data.matrix.1,
> alternative="less")
>
>
> The result of printing rawp is:
> $`1`
>
> Wilcoxon rank sum test with continuity correction
>
> data:  newX[, i] and data.matrix.1
> W = 7585, p-value = 1
> alternative hypothesis: true location shift is less than 0
>
>
> $`2`
>
> Wilcoxon rank sum test with continuity correction
>
> data:  newX[, i] and data.matrix.1
> W = 6700, p-value = 0.9983
> alternative hypothesis: true location shift is less than 0
>
>
> Etcetera for each row of the data matrix.
> I can get the p value for one row (gene) using:
> rawp.1 <- rawp$'1'$p.value
>
> But how can I get these p-values in one list? I have tried:
> rawp <- NULL
> for (i in 1:42) {
> a <- paste("'", i, "'", sep="")
> rawp <- rbind(rawp, test$a$p.value)
> }
>
> but that doesn't work (no errors but rawp stays NULL)
>
> There must be an easier way to do a wilcoxon analysis on a matrix!
> I'd appreciate your help with this...
>
> Marije
>
>
> De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de
> geadresseerde(n). Anderen dan de geadresseerde mogen geen gebruik maken van
> dit bericht, het openbaar maken of op enige wijze verspreiden of
> vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een
> incomplete aankomst of vertraging van dit verzonden bericht.
>
> The contents of this message are confidential and only intended for the
> eyes of the addressee(s). Others than the addressee(s) are not allowed to
> use this message, to make it public or to distribute or multiply this
> message in any way. The UMCG cannot be held responsible for incomplete
> reception or delay of this transferred message.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read Windows-like .INI files into R data structure?

2007-06-14 Thread Christophe Pallier
Ah! I forgot to mention that it is possible to call awk from R:

a <- system("awk -F'=' '/\\[/{a=$1;next}{print $1,$2,a}' example.ini",
intern=T)
z <- textConnection(a)
read.table(z)


Christophe


On 6/14/07, Christophe Pallier <[EMAIL PROTECTED]> wrote:
>
> On 6/14/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
> >
> > Here is yet another solution.  This is the simplest so far.
> > Lines.raw is as before and the output is a 3 column character
> > matrix.
>
> section <- ""
> > f <- function(x) {
> > if (length(x) == 1) section <<- gsub("[\\[\\]]", "", x)
> > if (length(x) <= 1) return()
> > return(c(x, section))
> > }
> > # Lines <- readLines("myfile.ini")
> > Lines <- readLines(textConnection(Lines.raw))
> > do.call("rbind", lapply(strsplit(Lines, "="), f))
>
>
>
> The corresponding awk code fits in one line '/\[/{a=$1;next}{print
> $1,$2,a}'.
>
> With the example.ini:
>
> $ awk -F"=" '/\[/{a=$1;next}{print $1,$2,a}' example.ini
> var1 value1 [Section1]
> var2 value2 [Section1]
> A value3 [Section2]
> B value4 [Section2]
>
> The output can then be imported in R with read.table.
>
> I know, I am not playing by the rules here... :)
> But with programming languages, like with human languages, it pays to be
> bi or tri-lingual (or polyglot, if one can).
>
> I also realise that under Windows, it means using the command line,
> something that  not so many people are comfortable with nowadays.
>
> One reason people insist on using only one language to do everything may
> be due to the awkwardness and limitations of the default scripting language
> under Windows (DOS). Having everything done inside one single R script can
> seem simpler. But a divide-and-conquer approach, with a few small scripts,
> can actually work better in some complex cases.
>
> I tend to do as much as possible in R but for all serious data analysis
> tasks, I use Makefile to "glue" the various stages of processing. Data
> preprocessing stages (performed with R or other tools) create files that are
> then processed with R. One advantage is that preprocessing is performed only
> when raw input data change; but the most important, is that when I come back
> years later, I can follow the exact flow of transformations.
> Again, "make"  (like awk  or R) is not without limits and idiosyncrasies.
> If someone has a simpler solution, I am interested to hear about it.
>
> Christophe
>
>
>
>
>
> --
> Christophe Pallier ( http://www.pallier.org)
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read Windows-like .INI files into R data structure?

2007-06-14 Thread Christophe Pallier
On 6/14/07, Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
>
> Here is yet another solution.  This is the simplest so far.
> Lines.raw is as before and the output is a 3 column character
> matrix.

section <- ""
> f <- function(x) {
> if (length(x) == 1) section <<- gsub("[\\[\\]]", "", x)
> if (length(x) <= 1) return()
> return(c(x, section))
> }
> # Lines <- readLines("myfile.ini")
> Lines <- readLines(textConnection(Lines.raw))
> do.call("rbind", lapply(strsplit(Lines, "="), f))



The corresponding awk code fits in one line '/\[/{a=$1;next}{print
$1,$2,a}'.

With the example.ini:

$ awk -F"=" '/\[/{a=$1;next}{print $1,$2,a}' example.ini
var1 value1 [Section1]
var2 value2 [Section1]
A value3 [Section2]
B value4 [Section2]

The output can then be imported in R with read.table.

I know, I am not playing by the rules here... :)
But with programming languages, like with human languages, it pays to be bi
or tri-lingual (or polyglot, if one can).

I also realise that under Windows, it means using the command line,
something that  not so many people are comfortable with nowadays.

One reason people insist on using only one language to do everything may be
due to the awkwardness and limitations of the default scripting language
under Windows (DOS). Having everything done inside one single R script can
seem simpler. But a divide-and-conquer approach, with a few small scripts,
can actually work better in some complex cases.

I tend to do as much as possible in R but for all serious data analysis
tasks, I use Makefile to "glue" the various stages of processing. Data
preprocessing stages (performed with R or other tools) create files that are
then processed with R. One advantage is that preprocessing is performed only
when raw input data change; but the most important, is that when I come back
years later, I can follow the exact flow of transformations.
Again, "make"  (like awk  or R) is not without limits and idiosyncrasies. If
someone has a simpler solution, I am interested to hear about it.

Christophe





-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read Windows-like .INI files into R data structure?

2007-06-13 Thread Christophe Pallier
> "var1=value1", "A=value3" is almost pure R code.
> Is it possible to use this feature to solve the problem?



Along the same lines: you may write a short script that converts the ini
file into R code that can be sourced.

>From your example, you can generate  the  following R code:

Section1 <- list()
Section1['var1'] <- value1
Section1['var2'] <- value2
Section2 <- list()
Section2['A'] <- value3
Section2['B'] <- value4


with the following awk script (using awk -F'=' -f conv.awk example.ini >
example.R)

### conv.awk
$1 ~ /\[/ { gsub(/[\[\]]/,""); # remove the brackets
   listname = $1;
   print $1 " <- list()";
   next }
{ print listname "['" $1 "'] <- " $2 }

(I know, it looks cryptic... so I am shooting myself in the foot after
claiming that awk scripts are typically quite readable ;-)

-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] export to a dat file that SAS can read

2007-06-13 Thread Christophe Pallier
One solution:

Use 'write.csv' (or write.csv2)  in R to output a csv file.
Then import it in SAS (by default, it imports into the "work" library; then
you need to copy into another libray to have the SAS file).

Christophe

On 6/13/07, Rina Miehs <[EMAIL PROTECTED]> wrote:
>
> Hello
>
> i have a data frame in R that some SAS users need to use in their
> programs, and they want it in a dat file, is that possible?
> and which functions to use for that?
>
> my data frame is like this:
>
> > out13[1:100,]
>  faridniveau1  niveau3
> p1p3   antal1
> 210007995  0.0184891394  4.211306e-10 5.106471e-02 2.594580e-02
> 3
> 910076495  0.0140812953  3.858757e-10 1.065804e-01 3.743271e-02
> 3
> 10   10081892  0.0241760590  7.429612e-10 1.628295e-02 3.021538e-04
> 6
> 13   10101395  0.0319517576  3.257375e-10 2.365204e-03 6.633232e-02
> 19
> 16   10104692  0.0114040787  3.661169e-10 1.566721e-01 4.550663e-02
> 4
> 17   10113592  0.0167586526  4.229634e-10 6.922003e-02 2.543987e-02
> 2
> 18   10113697  0.0259205504  2.888646e-10 1.096366e-02 9.118995e-02
> 6
> 22   10121697 -0.0135341273 -5.507914e-10 1.157417e-01 5.501947e-03
> 16
> 28   10146495  0.0093514076  3.493487e-10 2.041883e-01 5.340801e-02
> 4
> 29   10150497  0.0091611504  3.455925e-10 2.089893e-01 5.531904e-02
> 4
> 36   10171895  0.0089116669  2.956742e-10 2.153844e-01 8.614259e-02
> 4
> 42   10198295  0.0078515166  3.147140e-10 2.437943e-01 7.314111e-02
> 5
>
>
> Thanks
>
> Rina
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Awk and Vilno

2007-06-12 Thread Christophe Pallier
On 6/13/07, Robert Wilkins <[EMAIL PROTECTED]> wrote:
>
> The point is : there are lots of data preparation scenarios where
> large numbers of merges need to be done. This is an example where
> Vilno and SAS are easier to use than the competition. I'm sure an Awk
> programmer can come up with something, but the result would be
> awkward.


Agreed.
In the awk+R scenario, it is clear that the merges are often better done
with R.
My strategy is to use awk only to clean/reformat data into a tabular format
and
do most of the "consolidation" (computations/filtering/merges) in R.  I
suggested to use awk only to perform manipulations that would be more
complex to do within R (especially mutliline records or recors with
optionnal fields). I try to keep the scripts as simple as possible on both
sides



> Certain apsects of Vilno and SAS are a bit more user-friendly:
> > Each column has a variable name, such as "PatientID".
> > Awk uses $1, $2, $3 , as variable names for columns. Not user-friendly.
>
>


In the first lines of awk scripts, I usually assign column numbers to
variables (e.g. "Code=1, time=3") and then access the fields with "$Code",
"$Time"...
Yet, it is true that it is cumbersome, in awk, to use the labels on the
first line of a file as a variable names (my major complain about awk).

I looked at a few examples of  SAS Data step scripts on the Net, and found
that the awk scripts would be very similar (except for merges), but there
may  manipulations which I missed.


> For scanning inconsistently structured ASCII data files, where
> different rows have different column specifications, Awk is a better
> tool.
>
> For data problems that lend themselves to UNIX-style regular
> expressions, Awk, again, is a great tool.



The examples of messy data formats that were described ealier on the list
are good examples where regular expressions will help a lot. In the very
first stage of data inspection, to detect coding "mistakes", awk (sometimes
with the help ot other gnutools such as 'uniq' and 'sort') can be very
efficient.

> The upshot:

> Awk is a hammer.
> Vilno is a screwdriver.

Nice analogy. Using the right tool for the right task is very important.
So awk and vilno seem complementary.
Yet, when R enters into the equation, do you still "need" the three tools?

What we should really compare is the four situations:

R alone
R + awk
R + vilno
R + awk + vilno

and maybe "R + SAS Data step"

and see what scripts are more  elegant (read 'short and understandable')


Best,

Christophe



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data transformation for chi-square test.

2007-06-12 Thread Christophe Pallier
If you want to test whether ' member' has an effect on 'cost' (a continuous
numerical variable), I do not recommend using a chi.square test,
but rather a  simple linear regression or a one-way analysis of variance.
Chi.square are for categorical variables and unless you have a good reason,
there is no need to discretize the 'cost' variable.

First, look at the data to see if "member" actually has an effect on "cost"
(I assume your data in a data.frame 'a'):

tapply(a$cost,a$member,summary)
boxplot(a$cost ~ a$ member)

Then, try a linear regression

l<- lm(a$cost ~ a$member)
plot(a$cost ~ a$member)
abline(l)
summary(l)

or, if the linear fit is not too good, use an analysis of variance (ANOVA):

l<- aov(a$cost ~ as.factor(a$member))
summary(l)
TukeyHSD(l)


On 6/12/07, Charlie Chi <[EMAIL PROTECTED]> wrote:
>
> Dear all R users
> :
> I am a IT student with few statistical background and new R user for only
> have  two month exprience. I have a data named medcost, import by
> read.table() as follow for example (real dataset has 500 cases), the
> heander id means case id, member means members in a family and cost is the
> family pay for medical cost every 6 months.
>
> idmember   cost
> 1 4  320
> 2 2  150
> 3 3  420
> 4 5  330
> 5 6  540
> 6 2  310
> 7 4  169
> 8 6  647
> 9 3  347
> 10   4  567
>
> I would like to use this dataset with chi-sqare analysis to see if there
> is
> any realationship between family member and medical cost (more members in
> a
> family will rise their medical cost?) I have found the pacage called
> stats,
> but I think need to transform the dataset into a contingency table as I
> read from books. I am not sure if I correct, I think the table should
> looks
> like:
>   member
> cost[2]  [3] [4] [5] [6] Total
> [0,100]   1 000   0  1
> [100,200]   0 010   0  1
> [200,300]   0 000   0  0
> [300,400]   1 111   0  4
> [400,500]   0 100   0  1
> [500,600]   0 010   1  2
> [600,700]   0 000   1  1
> Total  2  2   3 1   2 10
>
> I did try to use the method in chapter 5.0 of "R Introduction" to create
> freqency table, but it did not work. I am wondering if any one can help me
> with it? Thank you for your help.
>
> Regards
>
> Charlie
> .
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Christophe Pallier
On 6/8/07, Douglas Bates <[EMAIL PROTECTED]> wrote:
>
>
> Other responses in this thread have mentioned 'little language'
> filters like awk, which is fine for those who were raised in the Bell
> Labs tradition of programming ("why type three characters when two
> character names should suffice for anything one wants to do on a
> PDP-11") but the typical field scientist finds this a bit too terse to
> understand and would rather write a filter as a paragraph of code that
> they have a change of reading and understanding a week later.


Hum,


Concerning awk, I think that this comment does not apply: because the
language is simple and and somewhat limited, awk scripts are typically quite
clean and readable (of course, it is possible to write horrible code in any
languages).

I have introduced awk to dozens of people (mostly scientists in social
sciences, and dos/windows users...) over the last 15 years  it is sometimes
the only programming language they know and they are very happy with what
they can do with it.

The philosophy of using it as a filter (that is, a converter) is also good
because many problems are best solved in 2 or 3 steps (2/3 short scripts run
sequentially) rather than in one single step,as people tend to do with
languages that encourage to use more complex data structures than
associative arrays.

It could be argued that awk is the swiss army knife of simple text
manipulations. All in all, awk+R is very efficient combination for data
manipulation (at least for the cases I have encountered).

It would a pity if your remark led people to overlook awk as it would
efficiently solve many of the input parsing problems that are posted on this
list (I am talking here about extracting information from text files, not
data entry).

awk, like R, is not exempt of defects, yet both are tools that one gets
attached to because they increase your productivity a lot.


-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Christophe Pallier
Hi,

Can you provide examples of data formats that are problematic to read and
clean with R ?

The only problematic cases I have encountered were cases with multiline
and/or  varying length records (optional information). Then, it is sometimes
a good idea to preprocess the data to present in a tabular format (one
record per line).

For this purpose, I use awk (e.g. http://www.vectorsite.net/tsawk.html),
which is very adept at processing ascii data files  (awk is much simpler to
learn than perl, spss, sas, ...).

I have never encountered a data file in ascii format that I could not
reformat with Awk.  With binary formats, it is another story...

But, again, this is my limited experience; I would like to know if there are
situations where using SAS/SPSS is really a better approach.

Christophe Pallier


On 6/8/07, Robert Wilkins <[EMAIL PROTECTED]> wrote:
>
> As noted on the R-project web site itself ( www.r-project.org ->
> Manuals -> R Data Import/Export ), it can be cumbersome to prepare
> messy and dirty data for analysis with the R tool itself. I've also
> seen at least one S programming book (one of the yellow Springer ones)
> that says, more briefly, the same thing.
> The R Data Import/Export page recommends examples using SAS, Perl,
> Python, and Java. It takes a bit of courage to say that ( when you go
> to a corporate software web site, you'll never see a page saying "This
> is the type of problem that our product is not the best at, here's
> what we suggest instead" ). I'd like to provide a few more
> suggestions, especially for volunteers who are willing to evaluate new
> candidates.
>
> SAS is fine if you're not paying for the license out of your own
> pocket. But maybe one reason you're using R is you don't have
> thousands of spare dollars.
> Using Java for data cleaning is an exercise in sado-masochism, Java
> has a learning curve (almost) as difficult as C++.
>
> There are different types of data transformation, and for some data
> preparation problems an all-purpose programming language is a good
> choice ( i.e. Perl , or maybe Python/Ruby ). Perl, for example, has
> excellent regular expression facilities.
>
> However, for some types of complex demanding data preparation
> problems, an all-purpose programming language is a poor choice. For
> example: cleaning up and preparing clinical lab data and adverse event
> data - you could do it in Perl, but it would take way, way too much
> time. A specialized programming language is needed. And since data
> transformation is quite different from data query, SQL is not the
> ideal solution either.
>
> There are only three statistical programming languages that are
> well-known, all dating from the 1970s: SPSS, SAS, and S. SAS is more
> popular than S for data cleaning.
>
> If you're an R user with difficult data preparation problems, frankly
> you are out of luck, because the products I'm about to mention are
> new, unknown, and therefore regarded as immature. And while the
> founders of these products would be very happy if you kicked the
> tires, most people don't like to look at brand new products. Most
> innovators and inventers don't realize this, I've learned it the hard
> way.
>
> But if you are a volunteer who likes to help out by evaluating,
> comparing, and reporting upon new candidates, well you could certainly
> help out R users and the developers of the products by kicking the
> tires of these products. And there is a huge need for such volunteers.
>
> 1. DAP
> This is an open source implementation of SAS.
> The founder: Susan Bassein
> Find it at: directory.fsf.org/math/stats (GNU GPL)
>
> 2. PSPP
> This is an open source implementation of SPSS.
> The relatively early version number might not give a good idea of how
> mature the
> data transformation features are, it reflects the fact that he has
> only started doing the statistical tests.
> The founder: Ben Pfaff, either a grad student or professor at Stanford CS
> dept.
> Also at : directory.fsf.org/math/stats (GNU GPL)
>
> 3. Vilno
> This uses a programming language similar to SPSS and SAS, but quite unlike
> S.
> Essentially, it's a substitute for the SAS datastep, and also
> transposes data and calculates averages and such. (No t-tests or
> regressions in this version). I created this, during the years
> 2001-2006 mainly. It's version 0.85, and has a fairly low bug rate, in
> my opinion. The tarball includes about 100 or so test cases used for
> debugging - for logical calculation errors, but not for extremely high
> volumes of data.
> The maintenance of Vilno has slowed down, because I am currently
> (desparately) looking for employment. But once I've 

Re: [R] test for nested factors

2007-06-04 Thread Christophe Pallier
Here are two functions I wrote, 'is.nested' and 'are.crossed', that check
whether a factor is nested inside antoher one, or if both are crossed:

is.nested <- function (factor1,factor2)
  {
# only one positive number per line in the f1 * f2 crosstable
all(apply(table(factor1,factor2)>0,1,sum) == 1)
  }

are.crossed <- function (factor1,factor2)
  { all(table(factor1,factor2) > 0 ) }

Christophe Pallier
www.pallier.org

On 6/4/07, Tim Bergsma <[EMAIL PROTECTED]> wrote:
>
> Is there a conventional way to test for nested factors?  I.e., if 'a'
> and 'b' are lists of same-length factors, does each level specified by
> 'a' correspond to exactly one level specified by 'b'?
>
> The function below seems to suffice, but I'd be happy to know of a more
> succinct solution, if it already exists.
>
> Thanks,
>
> Tim.
>
> ---
>
> "%nested.in%" <- function(x,f,...){
> #coerce to list
> if(!is.list(x))x<-list(x)
> if(!is.list(f))f<-list(f)
> #collapse to vector
> x <- tapply(x[[1]],x)
> f <- tapply(f[[1]],f)
> #analyse
> return(all(sapply(lapply(split(f,x),unique),length)==1))
> }
>
> CO2$Plant %nested.in% CO2[,c("Type","Treatment")] #TRUE
> CO2$Plant %nested.in% (CO2$uptake < mean(CO2$uptake)) #FALSE
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] basic problem but can't solve it

2007-05-23 Thread Christophe Pallier
If I understood correctly, the initial post asked for a vector of the same
length as the original one. This is why I suggested:

   tapply(Measure,Month,mean)[as.character(Month)]

btw, this is handy way to compute deviations from the means of subgroups (x
- tapply(x, group, mean)[as.character(group)])

Christophe Pallier


On 5/22/07, John Kane <[EMAIL PROTECTED]> wrote:
>
> aggregate(Measure, list(Month=Month), mean)
>
> --- Benoit Chemineau <[EMAIL PROTECTED]>
> wrote:
>
> > Hello,
> >I have a basic problem but i can't figure it out
> > with the
> > table underneath. I would like to compute monthly
> > averages.
> >I would like to have the average measure for
> > month #5 for the first
> > three rows (the same number in the first three
> > lines) and the average
> > measure for month #6 for the last four rows ((the
> > same number in the first
> > three lines) in a separate vesctor (let's call it
> > 'result')
> >I tried to use a "while" statement inside a "for"
> > loop but it doesn't
> > seem to work.
> >Can someone please help me with this ?
> >
> >Measure Month
> >2.28 5
> >14.04 5
> >0.60 5
> >0.21 6
> >0.96 6
> >0.75 6
> >1.28 6
> >
> >Thank you !
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] basic problem but can't solve it

2007-05-22 Thread Christophe Pallier
tapply(Measure,Month,mean)[as.character(Month)]

-- 
Christophe Pallier (http://www.pallier.org)


On 5/22/07, Benoit Chemineau <[EMAIL PROTECTED]> wrote:
>
> Hello,
>I have a basic problem but i can't figure it out with the
> table underneath. I would like to compute monthly averages.
>I would like to have the average measure for month #5 for the first
> three rows (the same number in the first three lines) and the average
> measure for month #6 for the last four rows ((the same number in the first
> three lines) in a separate vesctor (let's call it 'result')
>I tried to use a "while" statement inside a "for" loop but it doesn't
> seem to work.
>Can someone please help me with this ?
>
>Measure Month
>2.28 5
>14.04 5
>0.60 5
>0.21 6
>0.96 6
>0.75 6
>1.28 6
>
>Thank you !
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Optimized File Reading with R

2007-05-15 Thread Christophe Pallier
If you execute the same script several times and the data file does not
change, it may be a good idea to save it as an R object:

if (file.access('mydata.obj',0)==0) {
  load('mydata.obj')
} else {
 a<-read.table("mydata.csv,...)
 save(a,file='mydata.obj')
}

It can speed up things considerably.

Christophe Pallier


On 5/15/07, Liaw, Andy <[EMAIL PROTECTED]> wrote:
>
> If it's a matrix, use scan().  If the columns are not all the same type,
> use the colClasses argument to read.table() to specify their types,
> instead of leaving it to R to guess.  That will speed things up quite a
> lot.
>
> Andy
>
> From: Lorenzo Isella
> >
> > Dear All,
> > Hope I am not bumping into a FAQ, but so far my online search
> > has been fruitless
> > I need to read some data file using R. I am using the (I think)
> > standard command:
> >
> > data_150<-read.table("y_complete06000", header=FALSE)
> >
> > where y_complete06000 is a 6000 by 40 table of numbers.
> > I am puzzled at the fact that R is taking several minutes to
> > read this file.
> > First I thought it may have been due to its shape, but even
> > re-expressing and saving the matrix as a 1D array does not help.
> > It is not a small file, but not even huge (it amounts to about 5Mb of
> > text file).
> > Is there anything I can do to speed up the file reading?
> > Many thanks
> >
> > Lorenzo
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
>
> --
> Notice:  This e-mail message, together with any attachments,...{{dropped}}
>
> __________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova Test

2007-05-15 Thread Christophe Pallier
If your data is inside a data.frame names 'a' with data from each group in a
different columns,..., you can use "stack" and perform the anova with:

l <- aov(values~ind,data=stack(a))
anova(l)

Christophe Pallier



On 5/15/07, CrazyJoe <[EMAIL PROTECTED]> wrote:
>
>
> Thank you Guys.
>
> Let say that from Test1 to control i have multiple data
>
> Tester
> Test1 Test2  Test3  Test4  Control
> 20   25  1510   17
> .   . .   .  .
> .   . .   .  .
> 40   20   1535  45
>
> Is this the method i need to use?
>
> anova(lm(..this is where i am not sure how to put them.
>
> is this something to do with anova(lm(dependent~independent*independent,
> data=name)
>
> if they are all independent, how do i put them together?
>
> thanks.
>
>
> Ben Bolker-2 wrote:
> >
> >
> >
> > CrazyJoe  hotmail.com> writes:
> >
> >>
> >> I am very new to R. I am trying to perform an Anova Test and see if it
> >> differs or not.
> >>
> >> Basically, i have 4 tests and 1 control.
> >>
> >> Tester
> >> Test1 Test2  Test3  Test4  Control
> >> 20   25  1510   17
> >>
> >
> > You can't make any inferences with the data you have here.
> > You need to have multiple observations per treatment!
> > See the examples for ?lm .
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Anova-Test-tf3758829.html#a10625154
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aov problem

2007-05-15 Thread Christophe Pallier
You probably have an unbalanced design (different number of cases in the
different conditions defined by the crossing "fact1*fact2"). One possiblity
is to use the 'car' package and Anova to get "type III" sums of squares.
This is equivalent to pretending that there is no "unbalance".

Christophe Pallier

On 5/15/07, Anders Malmendal <[EMAIL PROTECTED]> wrote:
>
> I am using R to make two-way ANOVA on a number of variables using
>
> g <- aov(var ~ fact1*fact2)
>
> where var is a matrix containing the variables.
> However the outcome seem to be dependent on the order of fact1 and fact2
> (i.e. fact2*fact1) gives a slightly (factor of 1.5) different result.
> Any ideas why this is?
>
> Thanks for any help
> Anders
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hi

2007-05-10 Thread Christophe Pallier
Have you loaded the library containing lda?

library(MASS)

Christophe Pallier

On 5/10/07, elyakhlifi mustapha <[EMAIL PROTECTED]> wrote:
>
> hi,
> I have this error
>
> > tr <- sample(1:50, 25)
> > train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
> > test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
> > cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
> > z <- lda(train, cl)
> Erreur : impossible de trouver la fonction "lda"
>
> I don't understand why R doesn't recognize the lda function,
> can you help me please?
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Significance test for discrete data

2007-05-10 Thread Christophe Pallier
The Wilcoxon paired rank sign test assumes symmetry.
(cf. http://www.basic.northwestern.edu/statguidefiles/srank_paired_alts.html
)

Christophe Pallier

On 5/10/07, Alexander Kollmann <[EMAIL PROTECTED]> wrote:
>
> Hello,
>
> given situation:
>
> - pre / post test comparison of discrete, paired data [values can be
> 1,2,3,4].
> - the data are neither normal distributed nor symetric
>
> Which test should I use to calculate the p-value? Is the wilcoxon rank sum
> test ok ?
>
>
> Alex
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aov() and lme() (repeated measures and ANOVA)

2005-10-28 Thread Christophe Pallier
Hello,


> Subject: [R] aov() and lme()
> From: Jan Wiener <[EMAIL PROTECTED]>
> Date: Thu, 27 Oct 2005 13:14:59 +0200
> 
> Sorry for reposting, but even after extensive search I still did not 
> find any answers.
> 
> using: 
> summary(aov(pointErrorAbs~noOfSegments*turnAngle+Error(subj/(noOfSegments+turnAngle)),
>  
> data=anovaAllData ))
> 
> with subj being a random factor and noOfSegments and turnAngle being 
> fixed factors, I get the following results:
> 
> [...] 
> 
> No I trying to fit the same data with lme and using the following call:
> 
> anova(lme(fixed=pointErrorAbs~noOfSegments*turnAngle, random=~1|subj, 
> data=anovaAllData))
> 
> Unfortunately the results are 'really' different from the aov() 
> procedure (I guess I have the call wrong):
> 

Maybe the following post concerning repeated measures and ANOVA can help:

(from http://tolstoy.newcastle.edu.au/~rking/R/help/03b/7663.html)

 > $ anova(lme(DV ~ GROUP*TRIAL,random= ~1|SUB, correlation=corCompSymm() ))
 >
 >(TRIAL and GROUP are fixed factors)
 >
 >Actually, you do not need lme for to run a repeated measure anova.
 >You could use the aov function:
 >
 > $ summary(aov(DV~GROUP*TRIAL+Error(SUB/TRIAL)))
 >
 > ... yields the same results (when data are balanced)


Christophe Pallier
www.pallier.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] r under linux: creating high quality bmp's for win users

2005-03-22 Thread Christophe Pallier
Oops, sorry: I spread a false information in my previous post.
I used, years ago, wmf2eps to convert metafiles -> eps, and not the 
other way round.

Apologies to people who have searched for eps2wmf !
The only utility that I have sometimes used to convert from eps to a 
vector format (xfig) is pstoedit.

Christophe Pallier
Gabor Grothendieck a écrit :
Can you provide a link.  I did a google search and found something
on a Japanese site but it turned out that the writer had made a 
mistake and it linked to wmf2eps, not eps2wmf.

Christophe Pallier  lscp.ehess.fr> writes:
: 
: Hello Christoph!
: 
: In the past, I used an utility called "eps2wmf".
: It only works under Windows though (maybe under Linux with wine?).
: I believe it is available on the CTAN (Tex archives).
: 
: The nice thing is that wmf files are not bitmap and scale well.
: 
: Christophe Pallier
: 
: Christoph Lehmann wrote:
: 
: > Hi
: >
: > I produce graphics with R under linux, but my collaborators often use 
: > windows and cannot import eps pics e.g. in msword
: >
: > what is the standard way to get e.g. bmp's with the same quality as 
: > eps.  going the way: creating eps, convert eps2bmp using 'convert' 
: > doesn't yield good enough bmp's
: >
: > thanks for a short hint
: >
: > cheers
: > christoph
: >
: > __
: > R-help  stat.math.ethz.ch mailing list
: > https://stat.ethz.ch/mailman/listinfo/r-help
: > PLEASE do read the posting guide! 
: > http://www.R-project.org/posting-guide.html
: 
: __
: R-help  stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
:

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] r under linux: creating high quality bmp's for win users

2005-03-22 Thread Christophe Pallier
Hello Christoph!
In the past, I used an utility called "eps2wmf".
It only works under Windows though (maybe under Linux with wine?).
I believe it is available on the CTAN (Tex archives).
The nice thing is that wmf files are not bitmap and scale well.
Christophe Pallier
Christoph Lehmann wrote:
Hi
I produce graphics with R under linux, but my collaborators often use 
windows and cannot import eps pics e.g. in msword

what is the standard way to get e.g. bmp's with the same quality as 
eps.  going the way: creating eps, convert eps2bmp using 'convert' 
doesn't yield good enough bmp's

thanks for a short hint
cheers
christoph
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] contrast matrix for aov

2005-03-10 Thread Christophe Pallier
Prof Brian Ripley wrote:
On Wed, 9 Mar 2005, Darren Weber wrote:
We have a two-factor, repeated measures design, with

Where does `repeated measures' come into this?  You appear to have 
repeated a 2x2 experiment in each of 8 blocks (subjects).  Such a 
design is usually analysed with fixed effects.  (Perhaps you averaged 
over repeats in the first few lines of your code?)

roi.aov <- aov(roi ~ (Cue*Hemisphere) + 
Error(Subject/(Cue*Hemisphere)), data=roiDataframe)

I think the error model should be Error(Subject).  In what sense are 
`Cue' and `Cue:Hemisphere' random effects nested inside `Subject'?

I do not understand this, and I think I am probably not the only one. 
That is why I would be grateful if you could give a bit more information.

My understanding is that the fixed factors Cue and Hemisphere are 
crossed with the random factor Subject (in other words, Cue and 
Hemisphere are within-subjects factors, and this is probably why Darren 
called it a "repeated measure" design).

In this case, it seems to me from the various textbooks I read on Anova, 
that the appropriate MS to  test the interaction Cue:Hemisphere is 
Subject:Cue:Hemisphere (with 7 degress of freedom, as there are 8 
independent subjects). 

If you input Error(Subject/(Cue*Hemisphere)) in the aov formula, then 
the test for the interaction indeed uses the Subject:Cue:Hemisphere 
source of variation in demoninator. This fits with the ouput of other 
softwares.

If you include only 'Subjet', then the test for the interaction has 21 
degrees of Freedom, and I do not understand what this tests.

I apologize in if my terminology is not accurate.  But I hope you can 
clarify what is wrong with the Error(Subject/(Cue*Hemisphere)) term,
or maybe just point us to the relevant textbooks.

Thanks in advance,
Christophe Pallier



Let me fake some `data':
set.seed(1); roiValues <- rnorm(32)
subjectlabels <- paste("V"1:8, sep = "")
options(contrasts = c("contr.helmert", "contr.poly"))
roi.aov <- aov(roi ~ Cue*Hemisphere + Error(Subject), data=roiDataframe)
roi.aov

Call:
aov(formula = roi ~ Cue * Hemisphere + Error(Subject), data = 
roiDataframe)

Grand Mean: 0.1165512
Stratum 1: Subject
Terms:
Residuals
Sum of Squares   4.200946
Deg. of Freedom 7
Residual standard error: 0.7746839
Stratum 2: Within
Terms:
  Cue Hemisphere Cue:Hemisphere Residuals
Sum of Squares   0.216453   0.019712   0.057860 21.896872
Deg. of Freedom 1  1  121
Residual standard error: 1.021131
Estimated effects are balanced
Note that all the action is in one stratum, and the SSQs are the same as
aov(roi ~ Subject + Cue * Hemisphere, data = roiDataframe)
(and also the same as for your fit).
print(summary(roi.aov))

It auto-prints, so you don't need print().

I've tried to create a contrast matrix like this:
cm <- contrasts(roiDataframe$Cue)
which gives the main effect contrasts for the Cue factor.  I really 
want to specify the interaction contrasts, so I tried this:


# c( lh_cueL, lh_cueR, rh_cueL, rh_cueR )
# CueRight>CueLeft for the Left Hemisphere.
# CueLeft>CueRight for the Right Hemisphere
cm <- c(-1, 1, 1, -1)
dim(cm) <- c(2,2)

(That is up to sign what Helmert contrasts give you.)
roi.aov <- aov( roi ~ (Cue*Hemisphere) + 
Error(Subject/(Cue*Hemisphere)),
contrasts=cm, data=roiDataframe)
print(summary(roi.aov))


but the results of these two aov commands are identical.  Is it the 
case that the 2x2 design matrix is always going to give the same F 
values for the interaction regardless of the contrast direction?

Yes, as however you code the design (via `contrasts') you are fitting 
the same subspaces.  Not sure what you mean by `contrast direction', 
though.

However, you have not specified `contrasts' correctly:
contrasts: A list of contrasts to be used for some of the factors in
  the formula.
and cm is not a list, and an interaction is not a factor.
OR, is there some way to get a summary output for the contrasts that 
is not available from the print method?

For more than two levels, yes: see `split' under ?summary.aov.
Also, see se.contrasts which allows you to find the standard error for 
any contrast.

For the fixed-effects model you can use summary.lm:
fit <- aov(roi ~ Subject + Cue * Hemisphere, data = roiDataframe)
summary(fit)
   Df  Sum Sq Mean Sq F value Pr(>F)
Subject 7  4.2009  0.6001  0.5756 0.7677
Cue 1  0.2165  0.2165  0.2076 0.6533
Hemisphere  1  0.0197  0.0197  0.0189 0.8920
Cue:Hemisphere  1  0.0579  0.0579  0.0555 0.8161
Residuals  21 21.8969  1.0427
summary.lm(fit)

Call:
aov(formula = roi ~ Subject + Cue * Hemisphere, data = roiDataframe)
Resid

Re: [R] Need your help in calculating the p-value

2005-02-23 Thread Christophe Pallier

Latha Raja wrote:
Hi,
I am using R to perform wilcox.test and wondering if you know how the p-value in wilcox.test is calculated?
 

You can get the source code of the wilcox.test function:
> methods(wilcox.test)
[1] wilcox.test.default* wilcox.test.formula*
   Non-visible functions are asterisked
> getAnywhere(wilcox.test.default)
...
You will see that it uses 'psignrank' (for the one sample case).
Christophe Pallier
www.pallier.org
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Reproducing SAS GLM in R

2005-02-22 Thread Christophe Pallier
It seems that you want to do is in a Anova with the within factors 
"factCond" and "factRoi", right?

If this is correct, then you should try:
summary(aov(vectdata~factCond*factRoi+Error(factCond*factRoi)))
Christophe Pallier
www.pallier.org
Bela Bauer wrote:
Hi,
I'm still trying to figure out that GLM procedure in SAS.
Let's start with the simple example:
PROC GLM;
MODEL col1 col3 col5 col7 col9 col11 col13 col15 col17 col19 col21 
col23 =/nouni;
repeated roi 6, ord 2/nom mean;
TITLE 'ABDERUS lat ACC 300-500';

That's the same setup that I had in my last email. I have three 
factors: facSubj,facCond and facRoi. I had this pretty much figured 
out with three lengthy calls to lm(), but I have to extend my code to 
also work on models with four, five or even six factors, so that 
doesn't seem like a practical method any more. I've tried with the 
following code with glm(),anova() and drop1() (I use sum contrasts to 
reproduce those Type III SS values); I've also tried many other 
things, but this is the only somewhat reasonable result I get with glm.

> options(contrasts=c("contr.sum","contr.poly"))
> test.glm <- glm(vecData ~ (facCond+facSubj+facRoi)^2)
> anova(test.glm,test="F")
Analysis of Deviance Table
Model: gaussian, link: identity
Response: vecData
Terms added sequentially (first to last)
 Df Deviance Resid. Df Resid. Dev   FPr(>F)
NULL   2391429.30
facCond   1 2.21   2381427.09  3.0764   0.08266 .
facSubj  19   685.94   219 741.16 50.2463 < 2.2e-16 ***
facRoi5   258.77   214 482.38 72.0316 < 2.2e-16 ***
facCond:facSubj  19   172.70   195 309.68 12.6510 < 2.2e-16 ***
facCond:facRoi510.37   190 299.31  2.8867   0.01803 *
facSubj:facRoi   95   231.0595  68.26  3.3850 4.266e-09 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> drop1(test.glm,scope=.~.,test="F")
Single term deletions
Model:
vecData ~ (facCond + facSubj + facRoi)^2
Df Deviance AIC F value Pr(F)
68.26  671.33
facCond  170.47  676.97  3.0764   0.08266 .
facSubj 19   754.19 1209.89 50.2463 < 2.2e-16 ***
facRoi   5   327.03 1037.35 72.0316 < 2.2e-16 ***
facCond:facSubj 19   240.96  936.05 12.6510 < 2.2e-16 ***
facCond:facRoi   578.63  695.27  2.8867   0.01803 *
facSubj:facRoi  95   299.31  836.09  3.3850 4.266e-09 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Now, unfortunately this just isn't the output of SAS (roi corresponds 
to facRoi, ord corresponds to facCond)

SourceDF   Type III SS   Mean Square  F Value  Pr 
> F

 roi5   258.772680651.754536121.28  
<.0001
 Error(roi)95   231.0511739 2.4321176

   Adj Pr > F
  Source G - G H - F
  roi   <.0001<.0001
  Error(roi)
   Greenhouse-Geisser Epsilon0.5367
   Huynh-Feldt Epsilon   0.6333
 SourceDF   Type III SS   Mean Square  F Value  
Pr > F

 ord1 2.2104107 2.2104107 0.24  
0.6276
 Error(ord)19   172.7047994 9.0897263

 SourceDF   Type III SS   Mean Square  F Value  
Pr > F

 roi*ord5   10.370349182.07406984 2.89  
0.0180
 Error(roi*ord)95   68.257322550.71849813

   Adj Pr > F
  Source G - G H - F
  roi*ord   0.06630.0591
  Error(roi*ord)
   Greenhouse-Geisser Epsilon0.4116
   Huynh-Feldt Epsilon   0.4623

As you can see, I get a correct p and F values for the facCond:facRoi 
interaction, but everything else doesn't come out right. The drop1() 
call gives me the correct degrees of freedom, but residual degrees of 
freedom seem to be wrong.

Could you give me any hints how I could get to the SAS results? For 
the lm() calls that work (in special cases), refer to my posting from 
last Friday.
I also have a 4-factorial example, and I've been told that people 
around here do 5- or 6-factorial multivariant ANOVAs, too, so I need a 
general solution.

Thanks a lot!
Bela
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] type III sum of squares

2004-08-11 Thread Christophe Pallier
What are the strengths and weakness of 'aov' in 'car' package?
My model looks something like this :
library(car)
aov(lm(fish.length~zone*area,data=my.data))

'aov' is in the package 'stats', not in 'car'. (see ?aov)
One of the interests of 'aov' (compared to 'lm') is that using the 'Error' term in the 
formula allows to analyse designs  where sampling occurs at several levels (as in some 
though this term is not really correct)..
You may be thinking of the *Anova* function in the car package (?)
'Anova(model)' allows to compute so-called 'type II' and 'type III' sums of squares 
sparing you the need to play with the order of terms if you used 'anova' (notice the lowercase).

Type III sums of square are useful in factorial designs with unequal number of observations. 
When the factors are coded with the contrasts contr.sum or contr.helmert, the test for the main effect of a factor weights equally all subgroups 
(see John Fox's book : "An R and S-plus Companion to Applied Regression", p.140). 

I discussed this topic in a recent thread on this list ([R] Re: Enduring LME 
confusion… or Psychologists and Mixed-Effects)
The archive of R-help contains several post about this topic.
I hope this helps.
Christophe Pallier
http://www.pallier.org
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: Enduring LME confusion… or Psychologists and Mixed-Effects

2004-08-10 Thread Christophe Pallier
Hello,
Suppose I have a typical psychological experiment that is a 
within-subjects design with multiple crossed variables and a 
continuous response variable. Subjects are considered a random 
effect. So I could model
> aov1 <- aov(resp~fact1*fact2+Error(subj/(fact1*fact2))

However, this only holds for orthogonal designs with equal numbers of 
observation and no missing values. These assumptions are easily 
violated so I seek refuge in fitting a mixed-effects model with the 
nlme library.

I suppose that you have, for each subject, enough observations to 
compute his/her average response for each combination of factor1 and 
factor2, no?
If this is the case, you can perform the analysis with the above formula 
on the data obtained by 'aggregate(resp,list(subj,fact1,fact2),mean)'.

This is an analysis with only *within-subject* factors and there 
*cannot* be a problem of unequal number of observation when you have 
only within-subject factors (supposing you have at least one 
observations for each subject in each condition).

I believe the problem with unequal number of observations only  occurs 
when you have at least two crossed *between-subject* (group) variables.

Let's imagine you have two binary group factors (A and B) yielding four 
subgroups of subjects, and for some reason, you do have the same number 
of observations in each subgroup,
Then there are several ways of defining the main effects of A and B.

In many cases, the most reasonable definition of the main effect of A is 
to take the average of A in B1 and in B2 (thus ignoring the number of 
observations, or weithting equally the four subgroups).
To test the null hypothesis of no difference in A when all groups are 
equally weighted, one common approach in psychology is to pretend that 
the number of observation is each group is equal to the harmonic mean of 
the number of observations in each subgroups. The sums of square thud 
obtained can be compared with the error sum of square in the standard 
anova to form an F-test.
This is called the "unweighted" approach.

This can easily be done 'by hand' in R, but there is another approach:
You get equivalent statistics as in the unweighted anova when you use so 
called 'type III' sums of square (I read this in Howell, 1987 
'Statistical methods in psychology',
and in John Fox book 'An R and S-plus companion to appied regression, p. 
140).

It is possible to get type III sums of square using John Fox 'car' library.
library(car)
contrasts(A)=contr.sum
contrasts(B)=contr.sum
Anova(aov(resp~A*B),type='III')

You can compute the equally weighted cell means defining the effect of A 
with, say:

with(aggregate(resp,list(a=a,b=b),mean),tapply(x,a,mean))
I have seen some people advise against using 'type III' sums of square 
but I do not know their rationale. The important thing, it seems to me, 
is to know
which null hypothesis is  tested in a given test. If indeed the  type 
III sums of square test the effect on equally weighted means, they seem 
okay to me
(when this is indeed the hypothesis I want to test). 

Sorry for not answering any of your questions about the use of 'lme' (I 
hope others will do), but I feel that 'lme' is not needed in the context 
of unequal cell frequencies.
(I am happy to be corrected if I am wrong). It seems to me that 'lme' is 
useful when some assumptions of standard anova are violate (e.g. with 
repeated measurements when the assumption of sphericity is false), or 
when you have several random factors.

Christophe Pallier
http://www.pallier.org
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Aggregate rows to see the number of occurences

2004-06-07 Thread Christophe Pallier
> I have a set of data like the following:
  [,1]  [,2]
[1,]   102 ...
[10,]  195
I'd like to aggregate it in order to obtain the frequency (the number 
of occurences) for each couple of values (e.g.: (10,2) appears twice, 
(7,0) appears once). Something cool would be to have this value in a 
third column...
I've been looking at aggregate() but either I couldn't get the right 
parameters, or this is not the right tool to use...

You can use:
x=paste(a[,1],a[,2],sep=",")
table(x)
then, if you need to have the count for each line from the original table:
 table(x)[x]
Or you could indeed use the 'aggregate' function:
 aggregate(a[,1],list(a[,1],a[,2]),length)
This yields one line per unique value (but that may be what you want...)
Christophe Pallier
www.pallier.org
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Getting the groupmean for each person

2004-05-10 Thread Christophe Pallier


Liaw, Andy wrote:

Suppose I
define the function:
fun <- function(x, f) {
   m <- tapply(x, f, mean)
   ans <- x - m[match(f, unique(f))]
   names(ans) <- names(x)
   ans
}
 

May I ask what is the purpose of match(f,unique(f)) ?

To remove the group means, I have be using:

x-tapply(x,f,mean)[f]

for a while, (and I am now changing to 
x-tapply(x,f,mean)[as.character(f)] because of the peculiarities of 
indexing named vectors with factors )

The use of tapply(x,f,mean)[match(f,unique(f))] assumes a particular 
order in the result of tapply, no? It seems a bit dangerous to me.

Christophe Pallier

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] I have a question!!!! Help me!!

2004-04-20 Thread Christophe Pallier

We want to solve X's and Y's value!!
When we have two equation: 
For example:

2X+3Y=5
X+Y=2
 

solve(matrix(c(2,1,3,1),2,2),c(5,2)) yields the solution.

see '?solve' (you have to know about matrix algebra to understand this 
function)

Christophe Pallier

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Histogram ploting

2004-04-18 Thread Christophe Pallier
Hello Mateusz,

The 'hist' function works on the raw data.
In your data set example, you have already computed the number of data 
points in each bin.
What you really want is probably a barplot of N

You could display your data:

plot(Class,N,'h')

Or

names(N)<-Class
barplot(N)
Christophe Pallier

Łoskot wrote:

Hi,

It's my first post on this group.
I've just started learning & using R and I like it ;-)
I have I think simple question. I'm trying to plot
a histogram for my data set.
My data set is defined as follows:
ClassN
12.53
17.510
22.512
27.58
32.57
37.53
42.54
47.52
Class means middle of set of my ranges I define.
N column stores number of measurements counted to
particular class.
And now I would like to plot a simple histogram presenting
numbers of measurements in each class.
As I read in manual, hist function takes x (my N) as the first param
but I can not identify how should I pass my
class ranged into hist function.
I believe you can understand my problem ;-)))
Could anyone help me ?
Kind regards

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] multistratum glm?

2004-04-18 Thread Christophe Pallier
Hello,

I routinely use aov and and the Error term to perform analyses of 
variance of experiments with 'within-subject' factors. I wonder whether 
a notion like 'multistratum models' exists for glm models when 
performing a logit analysis (without being 100%  sure whether this would 
make sense).

I have data of an experiment where the outcome is a categorical variable:

20 individuals listened to 80 synthetic utterances (distributed in 4 
types) and were ask classify them into four categories. (The variables 
in the data.frame are 'subject', 'sentence', 'type', and 'response')

Here is the table of counts table(type,response):

  response
type  a   b  c   d
 a 181 166 42  11
 b  69 170 72  89
 c  90 174 75  61
 d  14 125 53 208
There are several questions of interest, such as, for example:

- are responses distibuted in the same way for the different types?

- are the numbers of 'a' responses for the 'b' and 'c' types 
significantly different?

- is the proportion of 'd' over 'a' responses different for the 'b' and 
'c'  categories?

...  

(I want to make inferences for the population of potential subjects on 
the one hand, and on the population of potential sentences on the other 
hand).

If the responses were continuous, I would just run two one-way anovas: 
one with the factor type over the means by subject*type,
and the other with the factor type over the means by sentences (in 
type). And use t.test to compare between different pairs of types.

Now, as the answers are categorical, I am not sure about the correct 
approach and how to use R to perform such an analysis.

I could treat response as a factor, and use percentages of responses per 
subject in each cell of response*type,
and run an anova on that...[ 
aov(percentage~response*type+Error(subject/(response*type))] But it 
seems incorrect to me to use the response of the subject as an 
independent variable (though I do not have a forceful argument).

Simple Chi-square tests are not the answer either, as a given subject 
contributed several times (80) to the counts in the table above.

My reading of MASS and of several other books suggest the use of 
logit/multinomial models when the response is categorical. But in all 
the examples provided, the units of analysis contribute only one 
measurement. Should I include the subject and sentences factors in the 
formula? But then they would be treated as fixed-factors in the 
analysis, would they not?

Any suggestion is welcome.

Christophe Pallier
www.pallier.org
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] applying data generating function

2004-03-07 Thread Christophe Pallier


Fred J. wrote:

I need to generate a data set based on this equation
X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a
N(0,0,001) random variable
I need say 100 values.
How do I do this? 

 

I assume X(t) and x(t) are the same (?).

f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) }
v=c()
x=.1 # starting point
for (i in 1:100) { x=f(x); v=append(v,x) }
There may be smarter ways...

Christophe Pallier

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Testing significance in a design with unequal but proportional sample sizes

2004-03-05 Thread Christophe Pallier


Prof Brian Ripley wrote:

On Fri, 5 Mar 2004, pallier wrote:

...

 

Actually, the different types of main effects defined above just 
correspond to different
contrasts on the cell means. So if there is an easy solution to compute 
arbitrary contrasts
on the cell means in a factorial design, this could an approach to this
question. (Anyone?)
   

There are at least three such ways.  ?contrasts (for the assignment
function contrasts<-)  and ?C, as well as the contrasts= argument to aov 
(the function you were discussing ...).
 

Thanks.
I know the existence of 'contrasts' and I read the  section about 
contrasts matrix in your book (MASS 3rd edition), as well as
in the R online documentation, but I probably do not understand them 
well: It still escapes me how to proceed to compute
"arbitrary" contrasts, such as, say:

a1b1 a1b2  a2b1 a2b2
  1   1  -1  -1
a1b1 a1b2  a2b1 a2b2
 .5  .5   -1   0
in a model "x~ a * b"  where a and b are two binary factors.

(the contrasts should be on the cell means, ignoring the sample size of 
subgroups. I know how to compute the size of the contrasts from the 
table of means returned by tapply, but I whould also need the associated 
MSE).

Sorry if the solution is obvious.

Christophe Pallier

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] calling R from a shell script and have it display graphics

2004-02-12 Thread Christophe Pallier
Hello,

I am running R under Linux/x11.

I would like to call R from a shell script and have it display a series 
of graphics.
The graphics should remain visible until the user clicks or presses a key.

I first tried R BATCH, but it does not load the x11 module, making it 
impossible to open x11 or png devices.

Then, I tried to call R with a 'here' script:

R --vanilla --quiet --args text.txt <<'EOF'
file=commandArgs()[5]
cat('processing ',file,'\n')
...
x11()
plot(f2,log='xy',type='b',las=1,cex=.5,xlab='rang',ylab='freq')
Sys.sleep(10)
q()
EOF
The problem with this approach is that the script cannot interact with 
the user.
par(ask=T) will fail because it reads input from the script rather than 
from the keyboard.

While I am writing this, a solution comes to my mind: I could save all 
the graphics in png format (using R 

Is there a simpler and cleaner way to achieve this?

Christophe Pallier
www.pallier.org
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] density plot for very large dataset

2003-12-15 Thread Christophe Pallier
Have you tried the 'sm.density' function from the sm library?
I used it for a dataset which 'only' had 13 points.
I'm new to R and am trying to perform a simple, yet 
problematic task.  I 
have two variables for which I would like to measure the 
correlation and 
plot versus each other.  However, I have ~30 million data points 
measurements of each variable.  I can read this into R from file and 
produce a plot with plot(x0, x1) but as you would expect, its 
not pretty 
to look at and produces a postscript file of about 700MB.  
Christophe Pallier
http://www.pallier.org
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] PROC MIXED vs. lme()

2003-12-10 Thread Christophe Pallier
>I'm trying to learn how to do a repeated measures ANOVA in R using > 
lme().
>In SAS, I use the code:
>
>PROC MIXED DATA=[data set below];
>  CLASS sub group trial;
>  MODEL dv = group trial group*trial;
>  REPEATED trial / SUBJECT=sub TYPE=CS;
>run;

>
>In R, I'm trying the code:
>
>results.cs <- lme(DV ~ factor(GROUP)*factor(TRIAL), data=[data set below],
>random= ~factor(TRIAL)|SUB, correlation=corCompSymm() )
>anova(results.cs)
Try

anova(lme(DV ~ GROUP*TRIAL,random= ~1|SUB, correlation=corCompSymm() ))

This yields the correct results (I converted all the factors into... 
factors). Note that the 'trial' factor is fixed.

Actually you do not need lme for that. You could use the aov function:

summary(aov(DV~GROUP*TRIAL+Error(SUB/TRIAL)))

(it works well because the data is balanced)

Christophe Pallier
http://www.pallier.org
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] PROC MIXED vs. lme()

2003-12-10 Thread Christophe Pallier
Hi

>I'm trying to learn how to do a repeated measures ANOVA in R using lme().
>In SAS, I use the code:
>
>PROC MIXED DATA=[data set below];
>  CLASS sub group trial;
>  MODEL dv = group trial group*trial;
>  REPEATED trial / SUBJECT=sub TYPE=CS;
>
>In R, I'm trying the code:
>
>results.cs <- lme(DV ~ factor(GROUP)*factor(TRIAL), data=[data set below],
>random= ~factor(TRIAL)|SUB, correlation=corCompSymm() )
>anova(results.cs)
Try

$ anova(lme(DV ~ GROUP*TRIAL,random= ~1|SUB, correlation=corCompSymm() ))

It yields the correct result (I converted all the factors into factors). 
Trial is a fixed, not random factor.

Actually, you do not need lme for to run a repeated measure anova.
You could use the aov function:
$ summary(aov(DV~GROUP*TRIAL+Error(SUB/TRIAL)))

This, again, yields the correct results.
Hope this helps,
Christophe Pallier
http://www.pallier.org
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] aggregate and names of factors

2003-12-08 Thread Christophe Pallier
Hello,

I use the function 'aggregate' a lot.

One small annoyance is that it is necessary to name the factors in the
'by' list to get the names in the resulting data.frame (else, they
appear as Group.1, Group.2...etc). For example, I am forced to
write:
aggregate(y,list(f1=f1,f2=f2),mean)

instead of aggregate(y,list(f1,f2),mean)

(for two factors with short names, it is not such a big deal, but I
ususally have about 8 factors with long names...)
I wrote a modified 'aggregate.data.frame' function (see the code
below) so that it parses the names of the factors and uses them in the 
output
data.frame. I can now typer aggregate(y,list(f1,f2),mean) ans the 
resulting data.frame
has variables with names 'f1' and 'f2'.

However, I have a few questions:

1. Is is a good idea at all? When expressions rather than variables are
  used as factors, this will probably result in a mess. Can one test
  if an argument within a list, is just a variable name or a more
  complex expression?). Is there a better way?
2. I would also like to keep the name of the data when it is a
  vector, and not a data.frame. The current version transforms it into 'x'.
  I have not managed to modify this behavior, so I am forced to use
   aggregate(data.frame(y),list(f1,f2),mean)
3. I would love to have yet another a version that handles formula so
  that I could type:
  aggregate(y~f1*f2)

  I have a provisory version (see below), but it does not work very
  well.  I would be grateful for any suggestions. In particular, I
  would love to have a 'subset' parameter, as in the lm
  function)
Here is the small piece of code fot the embryo of aggregate.formula:

my.aggregate.formula = function(formula,FUN=mean) {
{
   d=model.frame(formula)
   factor.names=lapply(names(d)[sapply(d,is.factor)],as.name)
   factor.list=lapply(factor.names,eval)
   names(factor.list)=factor.names
   aggregate(d[1],factor.list,FUN)
}


Christophe Pallier
http://www.pallier.org
---

HEre is the code for aggregate.data.frame that recovers the name sof the 
factors:

my.aggregate.data.frame <- function (x, by, FUN, ...)
{
  if (!is.data.frame(x)) {
   x <- as.data.frame(x)
 }
   
   if (!is.list(by))
   stop("`by' must be a list")

   if (is.null(names(by))) {
 #  names(by) <- paste("Group", seq(along = by), sep = ".")
   names(by)=lapply(substitute(by)[-1],deparse)
   }
   else {
   nam <- names(by)
   ind <- which(nchar(nam) == 0)
   if (any(ind)) {
 names(by)[ind] <- lapply(substitute(by)[c(-1,-(ind))],deparse)
   }
   }
   y <- lapply(x, tapply, by, FUN, ..., simplify = FALSE)
   if (any(sapply(unlist(y, recursive = FALSE), length) > 1))
   stop("`FUN' must always return a scalar")
   z <- y[[1]]
   d <- dim(z)
   w <- NULL
   for (i in seq(along = d)) {
   j <- rep(rep(seq(1:d[i]), prod(d[seq(length = i - 1)]) *
   rep(1, d[i])), prod(d[seq(from = i + 1, length = length(d) -
   i)]))
   w <- cbind(w, dimnames(z)[[i]][j])
   }
   w <- w[which(!unlist(lapply(z, is.null))), ]
   y <- data.frame(w, lapply(y, unlist, use.names = FALSE))
   names(y) <- c(names(by), names(x))
   y
}
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help