[R] Beginner question: select cases
Hello all, I hope i chose the right list as my question is a beginner-question. I have a data set with 3 colums London, Rome and Vienna - the location is presented through a 1 like this: London RomeVienna q1 0 0 1 4 0 1 0 2 1 0 0 3 I just want to calculate the means of a variable q1. I tried following script: # calculate the mean of all locations results - subset(results, subset== 1 ) mean(results$q1) # calculate the mean of London results - subset(results, subset== 1 , select=c(London)) mean(results$q1) # calculate the mean of Rome results - subset(results, subset== 1 , select=c(Rome)) mean(results$q1) # calcualate the mean of Vienna results - subset(results, subset== 1 , select=c(Vienna)) mean(results$q1) As all results are 1.68 and there is defenitely a difference in the three locations I wonder whats going on. I get confused as the Rcmdr asks me to overwrite things and there is no just filter option. Any help would be apprechiated. Thank you in advance. Regards Peter ___CURE - Center for Usability Research Engineering___ Peter Wolkerstorfer Usability Engineer Hauffgasse 3-5, 1110 Wien, Austria [Tel] +43.1.743 54 51.46 [Fax] +43.1.743 54 51.30 [Mail] [EMAIL PROTECTED] [Web] http://www.cure.at __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner question: select cases
Your problem would be a lot easier if you coded the location in one variable instead of three variables. Then you could calculate the means with one line of code: by(results$q1, results$location, mean) With your dataset you could use by(results$London, results$location, mean) by(results$Rome, results$location, mean) by(results$Vienna, results$location, mean) see ?by for more information And take a good look at your code. You take a subset from results and the assign it to results. This means that you replace the original results dataframe with a subset of it. As you take the subset for the next city, you won't take a subset from the original dataset but for the previous subset! Cheers, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Peter Wolkerstorfer - CURE Verzonden: maandag 25 september 2006 13:51 Aan: r-help@stat.math.ethz.ch Onderwerp: [R] Beginner question: select cases Hello all, I hope i chose the right list as my question is a beginner-question. I have a data set with 3 colums London, Rome and Vienna - the location is presented through a 1 like this: London RomeVienna q1 0 0 1 4 0 1 0 2 1 0 0 3 I just want to calculate the means of a variable q1. I tried following script: # calculate the mean of all locations results - subset(results, subset== 1 ) mean(results$q1) # calculate the mean of London results - subset(results, subset== 1 , select=c(London)) mean(results$q1) # calculate the mean of Rome results - subset(results, subset== 1 , select=c(Rome)) mean(results$q1) # calcualate the mean of Vienna results - subset(results, subset== 1 , select=c(Vienna)) mean(results$q1) As all results are 1.68 and there is defenitely a difference in the three locations I wonder whats going on. I get confused as the Rcmdr asks me to overwrite things and there is no just filter option. Any help would be apprechiated. Thank you in advance. Regards Peter ___CURE - Center for Usability Research Engineering___ Peter Wolkerstorfer Usability Engineer Hauffgasse 3-5, 1110 Wien, Austria [Tel] +43.1.743 54 51.46 [Fax] +43.1.743 54 51.30 [Mail] [EMAIL PROTECTED] [Web] http://www.cure.at __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner question: select cases
Peter, There is a much easier way to do this. First, you should consider organizing your data as follows: set.seed(1) # for replication only # Here is a sample dataframe tmp - data.frame(city = gl(3,10, label = c(London, Rome,Vienna )), q1 = rnorm(30)) # Compute the means with(tmp, tapply(q1,city, mean)) London Rome Vienna 0.1322028 0.2488450 -0.1336732 I hope this helps -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Wolkerstorfer - CURE Sent: Monday, September 25, 2006 7:51 AM To: r-help@stat.math.ethz.ch Subject: [R] Beginner question: select cases Hello all, I hope i chose the right list as my question is a beginner-question. I have a data set with 3 colums London, Rome and Vienna - the location is presented through a 1 like this: LondonRomeVienna q1 0 0 1 4 0 1 0 2 1 0 0 3 I just want to calculate the means of a variable q1. I tried following script: # calculate the mean of all locations results - subset(results, subset== 1 ) mean(results$q1) # calculate the mean of London results - subset(results, subset== 1 , select=c(London)) mean(results$q1) # calculate the mean of Rome results - subset(results, subset== 1 , select=c(Rome)) mean(results$q1) # calcualate the mean of Vienna results - subset(results, subset== 1 , select=c(Vienna)) mean(results$q1) As all results are 1.68 and there is defenitely a difference in the three locations I wonder whats going on. I get confused as the Rcmdr asks me to overwrite things and there is no just filter option. Any help would be apprechiated. Thank you in advance. Regards Peter ___CURE - Center for Usability Research Engineering___ Peter Wolkerstorfer Usability Engineer Hauffgasse 3-5, 1110 Wien, Austria [Tel] +43.1.743 54 51.46 [Fax] +43.1.743 54 51.30 [Mail] [EMAIL PROTECTED] [Web] http://www.cure.at __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner question: select cases
--- Peter Wolkerstorfer - CURE [EMAIL PROTECTED] wrote: Hello all, I hope i chose the right list as my question is a beginner-question. I have a data set with 3 colums London, Rome and Vienna - the location is presented through a 1 like this: LondonRomeVienna q1 0 0 1 4 0 1 0 2 1 0 0 3 I just want to calculate the means of a variable q1. I tried following script: # calculate the mean of all locations results - subset(results, subset== 1 ) mean(results$q1) # calculate the mean of London results - subset(results, subset== 1 , select=c(London)) mean(results$q1) # calculate the mean of Rome results - subset(results, subset== 1 , select=c(Rome)) mean(results$q1) # calcualate the mean of Vienna results - subset(results, subset== 1 , select=c(Vienna)) mean(results$q1) As all results are 1.68 and there is defenitely a difference in the three locations I wonder whats going on. I get confused as the Rcmdr asks me to overwrite things and there is no just filter option. Any help would be apprechiated. Thank you in advance. Regards Peter I'm new at R also. However I don't recognize your syntax. I have not seen select used here. Try results - subset(results, London==1 ) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.