Re: [R] subsetting a data set
Petr, Thanks, I shall store all this away for reference and have a look at the posting guide. I didn't expect it to be as complicated as it has turned out. As you will see from my post to Sian, my pressing problem was solved by replacing the "&" with an "|" in the sample code I gave. Graham On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > Hi > > On 8 Sep 2006 at 10:33, Graham Smith wrote: > > Date sent: Fri, 8 Sep 2006 10:33:49 +0100 > From: "Graham Smith" <[EMAIL PROTECTED]> > To: "Petr Pikal" <[EMAIL PROTECTED]> > Copies to: r-help@stat.math.ethz.ch > Subject:Re: [R] subsetting a data set > > > Petr, > > > > Thanks again, but the data is GQ1, Max is a variable (column) > > > > So I have used > > > > by(GQ1[,"Max"], list(GQ1$Status), summary) > > > > Which is very good, and is better than the way I did it before by > > summarising for each status level individually, but that still isn't > > combing the data for Status == "Expert" and Status = "Ecol" > > > > So at the moment the status variable has 3 levels Expert, Ecol and > > Stake, > > look at ?factors how to deal with factors, if your variable is not a > factor (see ?str) than turn it to one. > > x<-sample(letters[1:3], 20, replace=T) #character > x.f<-as.factor(x) #turn to factor > > x.f > [1] b c b a c a c a a a a a b c c c b b c b > Levels: a b c > > levels(x.f)<-c("x","x","y") #rename levels > > x.f > [1] x y x x y x y x x x x x x y y y x x y x > Levels: x y > > > > > > I want to analsye that at two levels: Expert and Ecol combined into a > > new level called "AllEcol" and the exsiting level "Stake" > > so in your case something like > > GQ1$statusComb<-factor(GQ1$status, labels=c("AllEcol","AllEcol", > "Stake")) > > shall do it. Beware of label ordering!!! > > BTW. It had been good if you provided a usable example, as stated in > posting guide. Many times trying to elaborate an example I will solve > the problem myself. > > HTH > Petr > > > > > It is this combining the levels that has got me stuck. > > > > Thanks again, > > > > Graham > > > > On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > > > > > Sorry, I did not notice that in your case Max is not a function but > > > your data. So probably > > > > > > by(Max[, your.columns], list(Max$status), summary) > > > > > > is maybe what you want. > > > HTH > > > Petr > > > > > > > > > On 8 Sep 2006 at 10:31, Petr Pikal wrote: > > > > > > From: "Petr Pikal" <[EMAIL PROTECTED]> > > > To: "Graham Smith" <[EMAIL PROTECTED]>, > > > r-help@stat.math.ethz.ch > > > Date sent: Fri, 08 Sep 2006 10:31:12 +0200 > > > Priority: normal > > > Subject:Re: [R] subsetting a data set > > > > > > > Hi > > > > > > > > I am not sure if your Max is the same as max so I am not sure what > > > > you exactly want from your data. However you shall consult > > > > ?tapply, ?by, ?aggregate and maybe also ?"[" together with chapter > > > > 2 in intro manual in docs directory. > > > > > > > > aggregate(data[, some.columns], list(data$factor1, data$factor2), > > > > max) > > > > > > > > will give you maximum for specified columns based on spliting the > > > > data according to both factors > > > > > > > > Also connection summary with max is not common and I wonder what > > > > is your output in this case. I believe that there are six same > > > > numbers. However R is case sensitive and maybe Max does something > > > > different from max. In my case it throws an error. > > > > > > > > HTH > > > > Petr > > > > > > > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > > > > > > > Date sent:Fri, 8 Sep 2006 08:06:16 +0100 > > > > From: "Graham Smith" <[EMAIL PROTECTED]> > > > > To: r-help@stat.math.ethz.ch > > > > Subject: [R] subsetting a data set > >
Re: [R] subsetting a data set
Sean, This seems to be getting there except that I am going to need a data.frameto hold "AllEcol" rather than a column, as GQ1 has 16 variable. So maybe this needs turned around into something like AllEcol<- GQ1[(GQ1$Status == "Expert) | (GQ1$Status == "Ecol"),] Except this doesn't work, as I obviuosly haven't got the syntax right. However, the direct answer to my question was contained in your answer. If you go back to my original post I was principally asking why this (below) didn't work, and how I could get around it. summary (Max[Status=="Ecol"& Status=="Expert"]) by replacing the "&" with "|" , as your example, I can now combine both levels and produce a summary. summary (Max[Status=="Ecol" | Status=="Expert"]) The only comparable example I was able to find used the "&" symbol, which is why I tried it. Many thanks, Graham On 08/09/06, Sean O'Riordain <[EMAIL PROTECTED]> wrote: > > Hi Graham, > Try creating a new column with the two levels that you want... > > something along the lines of (warning untested!!!) > > GQ1[(GQ1$Status == "Expert) | (GQ1$Status == "Ecol"),]$newColumn <- > "AllEcol" > GQ1[GQ1$Status == "Stake",]$newColumn <- "Stake" > > and then do the > by(GQ1[,"Max"], list(GQ1$NewColumn), summary) > > when in doubt... break the problem into smaller chunks... :-) > > cheers, > Sean > > On 08/09/06, Graham Smith <[EMAIL PROTECTED]> wrote: > > Petr, > > > > Thanks again, but the data is GQ1, Max is a variable (column) > > > > So I have used > > > > by(GQ1[,"Max"], list(GQ1$Status), summary) > > > > Which is very good, and is better than the way I did it before by > > summarising for each status level individually, but that still isn't > combing > > the data for Status == "Expert" and Status = "Ecol" > > > > So at the moment the status variable has 3 levels Expert, Ecol and > Stake, > > > > I want to analsye that at two levels: Expert and Ecol combined into a > new > > level called "AllEcol" and the exsiting level "Stake" > > > > It is this combining the levels that has got me stuck. > > > > Thanks again, > > > > Graham > > > > On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > > > > > Sorry, I did not notice that in your case Max is not a function but > > > your data. So probably > > > > > > by(Max[, your.columns], list(Max$status), summary) > > > > > > is maybe what you want. > > > HTH > > > Petr > > > > > > > > > On 8 Sep 2006 at 10:31, Petr Pikal wrote: > > > > > > From: "Petr Pikal" <[EMAIL PROTECTED]> > > > To: "Graham Smith" <[EMAIL PROTECTED]>, > > > r-help@stat.math.ethz.ch > > > Date sent: Fri, 08 Sep 2006 10:31:12 +0200 > > > Priority: normal > > > Subject:Re: [R] subsetting a data set > > > > > > > Hi > > > > > > > > I am not sure if your Max is the same as max so I am not sure what > you > > > > exactly want from your data. However you shall consult ?tapply, ?by, > > > > ?aggregate and maybe also ?"[" together with chapter 2 in intro > manual > > > > in docs directory. > > > > > > > > aggregate(data[, some.columns], list(data$factor1, data$factor2), > max) > > > > > > > > will give you maximum for specified columns based on spliting the > data > > > > according to both factors > > > > > > > > Also connection summary with max is not common and I wonder what is > > > > your output in this case. I believe that there are six same numbers. > > > > However R is case sensitive and maybe Max does something different > > > > from max. In my case it throws an error. > > > > > > > > HTH > > > > Petr > > > > > > > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > > > > > > > Date sent:Fri, 8 Sep 2006 08:06:16 +0100 > > > > From: "Graham Smith" <[EMAIL PROTECTED]> > > > > To: r-help@stat.math.ethz.ch > > > > Subject: [R] subsetting a data set > > > > > > > > > I have a data set call
Re: [R] subsetting a data set
Hi On 8 Sep 2006 at 10:33, Graham Smith wrote: Date sent: Fri, 8 Sep 2006 10:33:49 +0100 From: "Graham Smith" <[EMAIL PROTECTED]> To: "Petr Pikal" <[EMAIL PROTECTED]> Copies to: r-help@stat.math.ethz.ch Subject: Re: [R] subsetting a data set > Petr, > > Thanks again, but the data is GQ1, Max is a variable (column) > > So I have used > > by(GQ1[,"Max"], list(GQ1$Status), summary) > > Which is very good, and is better than the way I did it before by > summarising for each status level individually, but that still isn't > combing the data for Status == "Expert" and Status = "Ecol" > > So at the moment the status variable has 3 levels Expert, Ecol and > Stake, look at ?factors how to deal with factors, if your variable is not a factor (see ?str) than turn it to one. x<-sample(letters[1:3], 20, replace=T) #character x.f<-as.factor(x) #turn to factor > x.f [1] b c b a c a c a a a a a b c c c b b c b Levels: a b c > levels(x.f)<-c("x","x","y") #rename levels > x.f [1] x y x x y x y x x x x x x y y y x x y x Levels: x y > > > I want to analsye that at two levels: Expert and Ecol combined into a > new level called "AllEcol" and the exsiting level "Stake" so in your case something like GQ1$statusComb<-factor(GQ1$status, labels=c("AllEcol","AllEcol", "Stake")) shall do it. Beware of label ordering!!! BTW. It had been good if you provided a usable example, as stated in posting guide. Many times trying to elaborate an example I will solve the problem myself. HTH Petr > > It is this combining the levels that has got me stuck. > > Thanks again, > > Graham > > On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > > > Sorry, I did not notice that in your case Max is not a function but > > your data. So probably > > > > by(Max[, your.columns], list(Max$status), summary) > > > > is maybe what you want. > > HTH > > Petr > > > > > > On 8 Sep 2006 at 10:31, Petr Pikal wrote: > > > > From: "Petr Pikal" <[EMAIL PROTECTED]> > > To: "Graham Smith" <[EMAIL PROTECTED]>, > > r-help@stat.math.ethz.ch > > Date sent: Fri, 08 Sep 2006 10:31:12 +0200 > > Priority: normal > > Subject:Re: [R] subsetting a data set > > > > > Hi > > > > > > I am not sure if your Max is the same as max so I am not sure what > > > you exactly want from your data. However you shall consult > > > ?tapply, ?by, ?aggregate and maybe also ?"[" together with chapter > > > 2 in intro manual in docs directory. > > > > > > aggregate(data[, some.columns], list(data$factor1, data$factor2), > > > max) > > > > > > will give you maximum for specified columns based on spliting the > > > data according to both factors > > > > > > Also connection summary with max is not common and I wonder what > > > is your output in this case. I believe that there are six same > > > numbers. However R is case sensitive and maybe Max does something > > > different from max. In my case it throws an error. > > > > > > HTH > > > Petr > > > > > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > > > > > Date sent:Fri, 8 Sep 2006 08:06:16 +0100 > > > From: "Graham Smith" <[EMAIL PROTECTED]> > > > To: r-help@stat.math.ethz.ch > > > Subject: [R] subsetting a data set > > > > > > > I have a data set called GQ1, which has 20 variables one of > > > > which is a factor called Status at thre levels "Expert", "Ecol" > > > > and "Stake" > > > > > > > > I have managed to evaluate some of the data split by status > > > > using commands like: > > > > > > > > summary (Max[Status=="Ecol"]) > > > > > > > > BUT how do I produce asummary for Ecol and Expert combined, the > > > > only example I can find suggsts I could use > > > > > > > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't > > > > work. > > > > > > > > Additionally on the same vein, if I cannot work out how to > > > > create a new data set that would contai
Re: [R] subsetting a data set
Sian On 08/09/06, Sean O'Riordain <[EMAIL PROTECTED]> wrote: > > Hi Graham, > Try creating a new column with the two levels that you want... > > something along the lines of (warning untested!!!) > > GQ1[(GQ1$Status == "Expert) | (GQ1$Status == "Ecol"),]$newColumn <- > "AllEcol" > GQ1[GQ1$Status == "Stake",]$newColumn <- "Stake" > > and then do the > by(GQ1[,"Max"], list(GQ1$NewColumn), summary) > > when in doubt... break the problem into smaller chunks... :-) > > cheers, > Sean > > On 08/09/06, Graham Smith <[EMAIL PROTECTED]> wrote: > > Petr, > > > > Thanks again, but the data is GQ1, Max is a variable (column) > > > > So I have used > > > > by(GQ1[,"Max"], list(GQ1$Status), summary) > > > > Which is very good, and is better than the way I did it before by > > summarising for each status level individually, but that still isn't > combing > > the data for Status == "Expert" and Status = "Ecol" > > > > So at the moment the status variable has 3 levels Expert, Ecol and > Stake, > > > > I want to analsye that at two levels: Expert and Ecol combined into a > new > > level called "AllEcol" and the exsiting level "Stake" > > > > It is this combining the levels that has got me stuck. > > > > Thanks again, > > > > Graham > > > > On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > > > > > Sorry, I did not notice that in your case Max is not a function but > > > your data. So probably > > > > > > by(Max[, your.columns], list(Max$status), summary) > > > > > > is maybe what you want. > > > HTH > > > Petr > > > > > > > > > On 8 Sep 2006 at 10:31, Petr Pikal wrote: > > > > > > From: "Petr Pikal" <[EMAIL PROTECTED]> > > > To: "Graham Smith" <[EMAIL PROTECTED]>, > > > r-help@stat.math.ethz.ch > > > Date sent: Fri, 08 Sep 2006 10:31:12 +0200 > > > Priority: normal > > > Subject:Re: [R] subsetting a data set > > > > > > > Hi > > > > > > > > I am not sure if your Max is the same as max so I am not sure what > you > > > > exactly want from your data. However you shall consult ?tapply, ?by, > > > > ?aggregate and maybe also ?"[" together with chapter 2 in intro > manual > > > > in docs directory. > > > > > > > > aggregate(data[, some.columns], list(data$factor1, data$factor2), > max) > > > > > > > > will give you maximum for specified columns based on spliting the > data > > > > according to both factors > > > > > > > > Also connection summary with max is not common and I wonder what is > > > > your output in this case. I believe that there are six same numbers. > > > > However R is case sensitive and maybe Max does something different > > > > from max. In my case it throws an error. > > > > > > > > HTH > > > > Petr > > > > > > > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > > > > > > > Date sent:Fri, 8 Sep 2006 08:06:16 +0100 > > > > From: "Graham Smith" <[EMAIL PROTECTED]> > > > > To: r-help@stat.math.ethz.ch > > > > Subject: [R] subsetting a data set > > > > > > > > > I have a data set called GQ1, which has 20 variables one of which > is > > > > > a factor called Status at thre levels "Expert", "Ecol" and "Stake" > > > > > > > > > > I have managed to evaluate some of the data split by status using > > > > > commands like: > > > > > > > > > > summary (Max[Status=="Ecol"]) > > > > > > > > > > BUT how do I produce asummary for Ecol and Expert combined, the > > > > > only example I can find suggsts I could use > > > > > > > > > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't > > > > > work. > > > > > > > > > > Additionally on the same vein, if I cannot work out how to create > a > > > > > new data set that would contain all th
Re: [R] subsetting a data set
Hi Graham, Try creating a new column with the two levels that you want... something along the lines of (warning untested!!!) GQ1[(GQ1$Status == "Expert) | (GQ1$Status == "Ecol"),]$newColumn <- "AllEcol" GQ1[GQ1$Status == "Stake",]$newColumn <- "Stake" and then do the by(GQ1[,"Max"], list(GQ1$NewColumn), summary) when in doubt... break the problem into smaller chunks... :-) cheers, Sean On 08/09/06, Graham Smith <[EMAIL PROTECTED]> wrote: > Petr, > > Thanks again, but the data is GQ1, Max is a variable (column) > > So I have used > > by(GQ1[,"Max"], list(GQ1$Status), summary) > > Which is very good, and is better than the way I did it before by > summarising for each status level individually, but that still isn't combing > the data for Status == "Expert" and Status = "Ecol" > > So at the moment the status variable has 3 levels Expert, Ecol and Stake, > > I want to analsye that at two levels: Expert and Ecol combined into a new > level called "AllEcol" and the exsiting level "Stake" > > It is this combining the levels that has got me stuck. > > Thanks again, > > Graham > > On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > > > Sorry, I did not notice that in your case Max is not a function but > > your data. So probably > > > > by(Max[, your.columns], list(Max$status), summary) > > > > is maybe what you want. > > HTH > > Petr > > > > > > On 8 Sep 2006 at 10:31, Petr Pikal wrote: > > > > From: "Petr Pikal" <[EMAIL PROTECTED]> > > To: "Graham Smith" <[EMAIL PROTECTED]>, > > r-help@stat.math.ethz.ch > > Date sent: Fri, 08 Sep 2006 10:31:12 +0200 > > Priority: normal > > Subject:Re: [R] subsetting a data set > > > > > Hi > > > > > > I am not sure if your Max is the same as max so I am not sure what you > > > exactly want from your data. However you shall consult ?tapply, ?by, > > > ?aggregate and maybe also ?"[" together with chapter 2 in intro manual > > > in docs directory. > > > > > > aggregate(data[, some.columns], list(data$factor1, data$factor2), max) > > > > > > will give you maximum for specified columns based on spliting the data > > > according to both factors > > > > > > Also connection summary with max is not common and I wonder what is > > > your output in this case. I believe that there are six same numbers. > > > However R is case sensitive and maybe Max does something different > > > from max. In my case it throws an error. > > > > > > HTH > > > Petr > > > > > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > > > > > Date sent:Fri, 8 Sep 2006 08:06:16 +0100 > > > From: "Graham Smith" <[EMAIL PROTECTED]> > > > To: r-help@stat.math.ethz.ch > > > Subject: [R] subsetting a data set > > > > > > > I have a data set called GQ1, which has 20 variables one of which is > > > > a factor called Status at thre levels "Expert", "Ecol" and "Stake" > > > > > > > > I have managed to evaluate some of the data split by status using > > > > commands like: > > > > > > > > summary (Max[Status=="Ecol"]) > > > > > > > > BUT how do I produce asummary for Ecol and Expert combined, the > > > > only example I can find suggsts I could use > > > > > > > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't > > > > work. > > > > > > > > Additionally on the same vein, if I cannot work out how to create a > > > > new data set that would contain all the data for all the variables > > > > but only for the data where Status = Ecol, or where status equalles > > > > Ecol and Expert. > > > > > > > > I know this is yet again a very simple problem, but I really can't > > > > find the solution in the help or the books I have. > > > > > > > > Many thanks, > > > > > > > > Graham > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > __ > > > > R-help@stat.math.ethz.ch mailing list
Re: [R] subsetting a data set
Petr, Thanks again, but the data is GQ1, Max is a variable (column) So I have used by(GQ1[,"Max"], list(GQ1$Status), summary) Which is very good, and is better than the way I did it before by summarising for each status level individually, but that still isn't combing the data for Status == "Expert" and Status = "Ecol" So at the moment the status variable has 3 levels Expert, Ecol and Stake, I want to analsye that at two levels: Expert and Ecol combined into a new level called "AllEcol" and the exsiting level "Stake" It is this combining the levels that has got me stuck. Thanks again, Graham On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > Sorry, I did not notice that in your case Max is not a function but > your data. So probably > > by(Max[, your.columns], list(Max$status), summary) > > is maybe what you want. > HTH > Petr > > > On 8 Sep 2006 at 10:31, Petr Pikal wrote: > > From: "Petr Pikal" <[EMAIL PROTECTED]> > To: "Graham Smith" <[EMAIL PROTECTED]>, > r-help@stat.math.ethz.ch > Date sent: Fri, 08 Sep 2006 10:31:12 +0200 > Priority: normal > Subject:Re: [R] subsetting a data set > > > Hi > > > > I am not sure if your Max is the same as max so I am not sure what you > > exactly want from your data. However you shall consult ?tapply, ?by, > > ?aggregate and maybe also ?"[" together with chapter 2 in intro manual > > in docs directory. > > > > aggregate(data[, some.columns], list(data$factor1, data$factor2), max) > > > > will give you maximum for specified columns based on spliting the data > > according to both factors > > > > Also connection summary with max is not common and I wonder what is > > your output in this case. I believe that there are six same numbers. > > However R is case sensitive and maybe Max does something different > > from max. In my case it throws an error. > > > > HTH > > Petr > > > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > > > Date sent:Fri, 8 Sep 2006 08:06:16 +0100 > > From: "Graham Smith" <[EMAIL PROTECTED]> > > To: r-help@stat.math.ethz.ch > > Subject: [R] subsetting a data set > > > > > I have a data set called GQ1, which has 20 variables one of which is > > > a factor called Status at thre levels "Expert", "Ecol" and "Stake" > > > > > > I have managed to evaluate some of the data split by status using > > > commands like: > > > > > > summary (Max[Status=="Ecol"]) > > > > > > BUT how do I produce asummary for Ecol and Expert combined, the > > > only example I can find suggsts I could use > > > > > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't > > > work. > > > > > > Additionally on the same vein, if I cannot work out how to create a > > > new data set that would contain all the data for all the variables > > > but only for the data where Status = Ecol, or where status equalles > > > Ecol and Expert. > > > > > > I know this is yet again a very simple problem, but I really can't > > > find the solution in the help or the books I have. > > > > > > Many thanks, > > > > > > Graham > > > > > > [[alternative HTML version deleted]] > > > > > > __ > > > R-help@stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html and provide commented, > > > minimal, self-contained, reproducible code. > > > > Petr Pikal > > [EMAIL PROTECTED] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > > Petr Pikal > [EMAIL PROTECTED] > > [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting a data set
Hi if you use summary aggregate probably will not work and tapply have to be called differently tapply(seq(along=Max[,1]), list(Max$Status), function(i, x) summary(x[i]), x=Max[,one.column]) or you can use by by(Max[,1:5]), list(Max$Status), summary) or if you do not like the output something like that lll <- lapply(as.list(Max[,your.columns]), function(x) sapply(split(x,Max$Status),summary)) do.call("rbind",lll) or do.call("data.frame",lll) HTH Petr On 8 Sep 2006 at 10:03, Graham Smith wrote: Date sent: Fri, 8 Sep 2006 10:03:51 +0100 From: "Graham Smith" <[EMAIL PROTECTED]> To: "Petr Pikal" <[EMAIL PROTECTED]> Copies to: r-help@stat.math.ethz.ch Subject:Re: [R] subsetting a data set > Petr, > > Thanks I shall have at look at these options. > > Sorry about the confusion with the "Max", in my example "Max" is the > name of the variable that I am summarising. I chose a poor example to > cut and paste form R, not thinking about the obvious confusion this > would cause. > > Thanks again > > Graham > > On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > > > Hi > > > > I am not sure if your Max is the same as max so I am not sure what > > you exactly want from your data. However you shall consult ?tapply, > > ?by, ?aggregate and maybe also ?"[" together with chapter 2 in intro > > manual in docs directory. > > > > aggregate(data[, some.columns], list(data$factor1, data$factor2), > > max) > > > > will give you maximum for specified columns based on spliting the > > data according to both factors > > > > Also connection summary with max is not common and I wonder what is > > your output in this case. I believe that there are six same numbers. > > However R is case sensitive and maybe Max does something different > > from max. In my case it throws an error. > > > > HTH > > Petr > > > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > > > Date sent: Fri, 8 Sep 2006 08:06:16 +0100 > > From: "Graham Smith" < [EMAIL PROTECTED]> > > To: r-help@stat.math.ethz.ch > > Subject:[R] subsetting a data set > > > > > I have a data set called GQ1, which has 20 variables one of which > > > is a factor called Status at thre levels "Expert", "Ecol" and > > > "Stake" > > > > > > I have managed to evaluate some of the data split by status using > > > commands like: > > > > > > summary (Max[Status=="Ecol"]) > > > > > > BUT how do I produce asummary for Ecol and Expert combined, the > > > only example I can find suggsts I could use > > > > > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't > > > work. > > > > > > Additionally on the same vein, if I cannot work out how to create > > > a new data set that would contain all the data for all the > > > variables but only for the data where Status = Ecol, or where > > > status equalles Ecol and Expert. > > > > > > I know this is yet again a very simple problem, but I really can't > > > find the solution in the help or the books I have. > > > > > > Many thanks, > > > > > > Graham > > > > > > [[alternative HTML version deleted]] > > > > > > __ > > > R-help@stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html and provide commented, > > > minimal, self-contained, reproducible code. > > > > Petr Pikal > > [EMAIL PROTECTED] > > > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code. Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting a data set
Sorry, I did not notice that in your case Max is not a function but your data. So probably by(Max[, your.columns], list(Max$status), summary) is maybe what you want. HTH Petr On 8 Sep 2006 at 10:31, Petr Pikal wrote: From: "Petr Pikal" <[EMAIL PROTECTED]> To: "Graham Smith" <[EMAIL PROTECTED]>, r-help@stat.math.ethz.ch Date sent: Fri, 08 Sep 2006 10:31:12 +0200 Priority: normal Subject: Re: [R] subsetting a data set > Hi > > I am not sure if your Max is the same as max so I am not sure what you > exactly want from your data. However you shall consult ?tapply, ?by, > ?aggregate and maybe also ?"[" together with chapter 2 in intro manual > in docs directory. > > aggregate(data[, some.columns], list(data$factor1, data$factor2), max) > > will give you maximum for specified columns based on spliting the data > according to both factors > > Also connection summary with max is not common and I wonder what is > your output in this case. I believe that there are six same numbers. > However R is case sensitive and maybe Max does something different > from max. In my case it throws an error. > > HTH > Petr > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > Date sent:Fri, 8 Sep 2006 08:06:16 +0100 > From: "Graham Smith" <[EMAIL PROTECTED]> > To: r-help@stat.math.ethz.ch > Subject: [R] subsetting a data set > > > I have a data set called GQ1, which has 20 variables one of which is > > a factor called Status at thre levels "Expert", "Ecol" and "Stake" > > > > I have managed to evaluate some of the data split by status using > > commands like: > > > > summary (Max[Status=="Ecol"]) > > > > BUT how do I produce asummary for Ecol and Expert combined, the > > only example I can find suggsts I could use > > > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't > > work. > > > > Additionally on the same vein, if I cannot work out how to create a > > new data set that would contain all the data for all the variables > > but only for the data where Status = Ecol, or where status equalles > > Ecol and Expert. > > > > I know this is yet again a very simple problem, but I really can't > > find the solution in the help or the books I have. > > > > Many thanks, > > > > Graham > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > > Petr Pikal > [EMAIL PROTECTED] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code. Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting a data set
Petr, Thanks I shall have at look at these options. Sorry about the confusion with the "Max", in my example "Max" is the name of the variable that I am summarising. I chose a poor example to cut and paste form R, not thinking about the obvious confusion this would cause. Thanks again Graham On 08/09/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > Hi > > I am not sure if your Max is the same as max so I am not sure what > you exactly want from your data. However you shall consult ?tapply, > ?by, ?aggregate and maybe also ?"[" together with chapter 2 in intro > manual in docs directory. > > aggregate(data[, some.columns], list(data$factor1, data$factor2), > max) > > will give you maximum for specified columns based on spliting the > data according to both factors > > Also connection summary with max is not common and I wonder what is > your output in this case. I believe that there are six same numbers. > However R is case sensitive and maybe Max does something different > from max. In my case it throws an error. > > HTH > Petr > > On 8 Sep 2006 at 8:06, Graham Smith wrote: > > Date sent: Fri, 8 Sep 2006 08:06:16 +0100 > From: "Graham Smith" < [EMAIL PROTECTED]> > To: r-help@stat.math.ethz.ch > Subject:[R] subsetting a data set > > > I have a data set called GQ1, which has 20 variables one of which is a > > factor called Status at thre levels "Expert", "Ecol" and "Stake" > > > > I have managed to evaluate some of the data split by status using > > commands like: > > > > summary (Max[Status=="Ecol"]) > > > > BUT how do I produce asummary for Ecol and Expert combined, the only > > example I can find suggsts I could use > > > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't work. > > > > Additionally on the same vein, if I cannot work out how to create a > > new data set that would contain all the data for all the variables but > > only for the data where Status = Ecol, or where status equalles Ecol > > and Expert. > > > > I know this is yet again a very simple problem, but I really can't > > find the solution in the help or the books I have. > > > > Many thanks, > > > > Graham > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > > Petr Pikal > [EMAIL PROTECTED] > > [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting a data set
Hi I am not sure if your Max is the same as max so I am not sure what you exactly want from your data. However you shall consult ?tapply, ?by, ?aggregate and maybe also ?"[" together with chapter 2 in intro manual in docs directory. aggregate(data[, some.columns], list(data$factor1, data$factor2), max) will give you maximum for specified columns based on spliting the data according to both factors Also connection summary with max is not common and I wonder what is your output in this case. I believe that there are six same numbers. However R is case sensitive and maybe Max does something different from max. In my case it throws an error. HTH Petr On 8 Sep 2006 at 8:06, Graham Smith wrote: Date sent: Fri, 8 Sep 2006 08:06:16 +0100 From: "Graham Smith" <[EMAIL PROTECTED]> To: r-help@stat.math.ethz.ch Subject:[R] subsetting a data set > I have a data set called GQ1, which has 20 variables one of which is a > factor called Status at thre levels "Expert", "Ecol" and "Stake" > > I have managed to evaluate some of the data split by status using > commands like: > > summary (Max[Status=="Ecol"]) > > BUT how do I produce asummary for Ecol and Expert combined, the only > example I can find suggsts I could use > > summary (Max[Status=="Ecol"& Status=="Expert"]) but that doesn't work. > > Additionally on the same vein, if I cannot work out how to create a > new data set that would contain all the data for all the variables but > only for the data where Status = Ecol, or where status equalles Ecol > and Expert. > > I know this is yet again a very simple problem, but I really can't > find the solution in the help or the books I have. > > Many thanks, > > Graham > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code. Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.