[R] stacked and dodged bar graph ggplot
I have some census data with race and ethnicity for various towns. I am trying to make a stacked bar graph where all the race data is in one stacked bar, and all the ethnicity data is in another. Below is a minimal reproducible sample. library("ggplot2") Demog <- data.frame(source=c(rep("Davis",4),rep("Dixon",4),rep("Winters",4)), group =c("Asian / Pacific Islander","Caucasian","Lantinx","Not Latinx","African American", "Native American", "Latinx", "Not Latinx","Mixed race","Other","Latinx", "Not Latinx"), number =c(14491, 42571, 8172, 57450, 562, 184, 7426, 10952, 332, 1488, 3469, 3155), field = rep(c(rep("race",2),rep("ethnicity",2)),3)) Demog$race <- factor(Demog$group, levels=c("Asian / Pacific Islander", "Caucasian", "African American", "Native American / Alaska Native", "mixed race", "other")) Demog$ethn <- factor(Demog$group, levels=c("Latinx","not latinx")) Demog$location <- factor(Demog$source, levels=c( "Dixon", "Winters","Davis")) Demog.bar1 <-ggplot(data = Demog, aes(x = location, y = number, fill = race))+theme_bw() +geom_bar(stat = "identity",position = "stack") + coord_flip() Demog.bar2 <-ggplot(data = Demog, aes(x = location, y = number, fill = ethn))+theme_bw() +geom_bar(stat = "identity",position = "stack") + coord_flip() show(Demog.bar1) show(Demog.bar2) Much thanks, Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] anova.lme
I would like to know the sum of squares for each term in my model. I used the following call to fit the model fit.courseCross - lme(fixed= zGrade ~ Rep + ISE +P7APrior+Female+White+HSGPA+MATH+Years+Course+Course*P7APrior , random= ~1|SID, data = Master.complete[Master.complete$Course != P7A,]) and called an anova on it and get: anova(fit.courseCross) numDF denDF F-value p-value (Intercept) 1 58161 1559.6968 .0001 Rep 1 58161 520.7263 .0001 ISE 1 6266 21.3713 .0001 P7APrior2 58161 358.4827 .0001 Female 1 6266 89.2614 .0001 White 1 6266 235.9984 .0001 HSGPA 1 6266 1156.4116 .0001 MATH1 6266 1036.1354 .0001 Years 1 58161 407.6096 .0001 Course 12 58161 68.9875 .0001 P7APrior:Course24 58161 10.2464 .0001 The documentation for anova.lme says: When only one fitted model object is present, a data frame with the sums of squares, numerator degrees of freedom, denominator degrees of freedom, F-values, and P-values for Wald tests for the terms in the model (when Terms and L are NULL), a combination of model terms (when Terms in not NULL), or linear combinations of the model coefficients (when L is not NULL). noticeably absent is the sum of squares. How do I get them? Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CI for nlme predictions
I am running a mixed effects model with random intercepts fit.courseCross - lme(fixed= zGrade ~ Rep + ISE +P7APrior+Female+White+HSGPA+MATH+Years+Course+Course*P7APrior , random= ~1|SID, data = Master.complete[Master.complete$Course != P7A,]) where all variables are factors except for HSGPA, MATH and Years I noticed that predict.lm has an option for standard error, but predict.nlme does not. I understand that this might be because there is a difference between SE's that conditioned or not on random effects. I have looked at this stack overflow question http://stackoverflow.com/questions/14358811/extract-prediction-band-from-lme-fit (extract-prediction-band-from-lme-fit) but do not understand what is being done. And would like to show the predicted fit of zGrade vs Years with a confidence interval. a-la ggplot's geom_smooth. The particular intercept does not mater ( I don't care what the intercept is, though given a choice I'd prefer grand mean centered) I would be happy with either conditional on unconditional CI's. Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with the nested anova formulas
I am modeling grade as a function of membership in various cohorts. There are four cohorts. (NONE, ISE07,ISE08,ISE09) and two times of cohorts coded as ISE = TRUE (ISE0#) or FALSE (NONE). There is clear co-linearity but that is to be expected. running the following code CutOff -0 fit.base - lme(fixed= zGrade ~ Rep + COHORT/ISE + P7APrior + Female + White + HSGPA + MATH + AP_TOTAL + Years + EOP + Course, random= ~1|SID, data = share[share$GRADE = CutOff,]) I get the following error Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 but if I take out the /ISE I get no error, simmilarly if I take out the COHORT/. I want to test for the effects of the different cohorts within the ISE subset and across ISE NONE I can send the data (the whole is too large) if you wish. Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with the nested anova formulas
I am modeling grade as a function of membership in various cohorts. There are four cohorts. (NONE, ISE07,ISE08,ISE09) and two times of cohorts coded as ISE = TRUE (ISE0#) or FALSE (NONE). There is clear co-linearity but that is to be expected. running the following code CutOff -0 fit.base - lme(fixed= zGrade ~ Rep + COHORT/ISE + P7APrior + Female + White + HSGPA + MATH + AP_TOTAL + Years + EOP + Course, random= ~1|SID, data = share[share$GRADE = CutOff,]) I get the following error Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 but if I take out the /ISE I get no error, simmilarly if I take out the COHORT/. I want to test for the effects of the different cohorts within the ISE subset and across ISE NONE. Robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getting p-value for comparing to gam's from gmcv
I am trying to compare two different GAM fits. I have something like Course.bam20 -bam(zGrade ~ Rep + ISE + White + Female + Years + AP_TOTAL + MATH + HSGPA+ EOP + factor(P7APrior, ordered = FALSE)+s(Yfrm7A,k=20), data= Course, na.action = na.exclude,samfrac =0.1) Course.bam4 -bam(zGrade ~ Rep + ISE + White + Female + Years + AP_TOTAL + MATH + HSGPA+ EOP + factor(P7APrior, ordered = FALSE)+s(Yfrm7A,k=4), data= Course, na.action = na.exclude,samfrac =0.1) anova(Course.bam20, Course.bam4) Model 1: zGrade ~ Rep + ISE + White + Female + Years + AP_TOTAL + MATH + HSGPA + EOP + factor(P7APrior, ordered = FALSE) + s(Yfrm7A, k = 20) Model 2: zGrade ~ Rep + ISE + White + Female + Years + AP_TOTAL + MATH + HSGPA + EOP + factor(P7APrior, ordered = FALSE) + s(Yfrm7A, k = 4) Resid. Df Resid. Dev Df Deviance 14721.7 1907.0 24724.5 1913.5 -2.7919 -6.4986 How can I get a p-value out of the anova? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Centering multi-level unordered factors
I have a question I am not even sure quite how to ask. When r fits models with un-ordered categorical variables as predictors (RHS of model) it automatically converts them into 1 less dichotomous variables than there are levels. For example if I had levels(trait) = (A,B,C) it would automatically recode to NewVar1 NewVar2 A 0 0 B 1 0 C 0 1 What I would like to know is, is there a way that I can center these categorical variables, and if so how for continuous variables it is simple x - x-mean(x) for a single dichotomous variable it is not so hard gender - gender - sum(gender)/length(gender) where the gender are (0,1) or (-.5,.5) for example which would give gender coefficients in a model that would still reflect the difference between the two genders but the intercept and the other coefficients would be for some one of average gender and it is that last part that I am unclear on for a multi (3 or more) level factor. How do you set up variables so that the *other* coefficients reflect the average across the factor levels. Do I need two or three centered variables? and is there a quick way to get at all those variables if my factor has many levels, e.g. 14? Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trouble with nlme: Error in MEEM() : Singularity in backsolve at level 0, block 1
I am trying to fit my data, attached, with the following model CutOff - 0 fit.full - lme(fixed= zGrade ~ Rep + ISE +Yfrm7A +Ufrm7A +Female +White +HSGPA +MATH +AP_TOTAL +Years +Course + Course*Rep + Course*ISE +Course*Yfrm7A+Course*Ufrm7A +Course*Female +Course*White +Course*HSGPA +Course*MATH +Course*AP_TOTA L+Course*Years, random= ~1|SID, data = Master.complete[Master.complete$GRADE = CutOff,]) I get the following error Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 when I take out +Course*Yfrm7A+Course*Ufrm7A and just use fit.full - lme(fixed= zGrade ~ Rep + ISE +Yfrm7A +Ufrm7A +Female +White +HSGPA +MATH +AP_TOTAL +Years +Course + Course*Rep + Course*ISE +Course*Female+Course*White+Course*HSGPA+Course*MATH+Course*AP_TOTAL+Course*Years, random= ~1|SID, data = Master.complete[Master.complete$GRADE = CutOff,]) I don't get an error I think this is because when Course == P7A Yfrm7A==Ufrm7A==0. I am not sure how to work around this. any suggestions would be welcome. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] re-coding variables
I am running a large mixed model, 65k entries on 11 fixed effects and one random. One of the fixed effects is Course a factor that takes on 14 different values levels(Master.complete$Course) [1] B101 B2A B2B B2C C118A C118B C118C C2A C2B [10] C2C N101 P7A P7B P7C and another Yfrm7A that is continuous summary(Master.complete$Yfrm7A) Min. 1st Qu. Median Mean 3rd Qu. Max. -5.25000 -0.75000 0.0 -0.07688 0.5 6.0 but all the values are 0(zero) when course==P7A summary(Master.complete$Yfrm7A[Master.complete$Course==P7A]) Min. 1st Qu. MedianMean 3rd Qu.Max. 0 0 0 0 0 0 Thus when I run the following mixed model I get no errors fit.full - lme(fixed= zGrade ~ Rep + ISE +Yfrm7A+Ufrm7A+Female+White+HSGPA+MATH+AP_TOTAL+Years+Course + Course*Rep + Course*Female +Course*White, random= ~1|SID, data = Master.complete, na.action=na.exclude) but if I add in a Course*Yfrm7A term fit.full - lme(fixed= zGrade ~ Rep + ISE +Yfrm7A+Ufrm7A+Female+White+HSGPA+MATH+AP_TOTAL+Years+Course + + Course*Rep + Course*Female +Course*White+Course*Yfrm7A, + random= ~1|SID, data = Master.complete, na.action=na.exclude) I get Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 I suspect I could solve this problem with ordering the levels of Course so that P7A was the first level and thus the one that others were compared to but I am unclear on how to do so. Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot legend formatting
I am having trouble getting my legend to format correctly in ggplot2. A full description and pictures are in the ggplot google grouphttps://groups.google.com/forum/?hl=en#!topic/ggplot2/LSarpgmSG8k. but the short description is that in guides(fill = guide_legend(nrow = 3),bycol = TRUE) changing the call to have byrow=TRUE does not change the plot. Thanks Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot interactions
On Tue, Sep 10, 2013 at 11:33 PM, Robert Lynch robert.b.ly...@gmail.comwrote: I am sorry to ask what I am sure is a simple question but I am stuck trying to figure out how different parts of ggplot2 calls interact I am plotting using the following code ggplot(Chem.comp, aes(Course, GRADE)) + geom_boxplot(notch = TRUE,aes(fill = COHORT))+ labs(y =Grade Points in class, x = Chemistry 2 quarter) + ggtitle(expression(atop(Comparison between ISE cohorts and Peers, atop(italic(in Chem 2 classes), ylim(0,4.)+ scale_fill_manual(name = ISE Cohorts \nComparison groups, values = c(blue,red,blue3,red3,blue4,red4)) + theme(plot.title = element_text(size = 25, face = bold, colour = black, vjust = -1))+ guides(fill = guide_legend(nrow = 3),byrow = TRUE) which give Rplot.jpeg which is has the appropriate title, but the colors and the wrong are wrong. if I comment out the ggtitle() and theme(), or just ggtitle() I get Rplot01.jpeg which has the right colors but no title and subtitle. Also the legend is out of order. the first row should read ISE07 CMP07 with 08 on the second row and 09 the third with a red column and a blue column. Changing byrow = TRUE to bycol = TRUE does not change the plotting of the legend nor does byrow=FALSE I am asking for help with getting the title and sub-title to both show up at the same time as the appropriate colors for the different factor levels. And to get the legend to render so that the legend looks sort of like ISE07 [redbox ] [bluebox ] CMP07 ISE08 [red3box][blue3box] CMP08 ISE09 [red4box] [blue4box] CMP09 the exact colors are not important the the vertical and horizontal alignment is. Thanks! Robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot interactions
I am sorry to ask what I am sure is a simple question but I am stuck trying to figure out how different parts of ggplot2 calls interact I am plotting using the following code ggplot(Chem.comp, aes(Course, GRADE)) + geom_boxplot(notch = TRUE,aes(fill = COHORT))+ labs(y =Grade Points in class, x = Chemistry 2 quarter) + ggtitle(expression(atop(Comparison between ISE cohorts and Peers, atop(italic(in Chem 2 classes), ylim(0,4.)+ scale_fill_manual(name = ISE Cohorts \nComparison groups, values = c(blue,red,blue3,red3,blue4,red4)) + theme(plot.title = element_text(size = 25, face = bold, colour = black, vjust = -1))+ guides(fill = guide_legend(nrow = 3),byrow = TRUE) which gives me a plot [available as a jpeg, not attached due to size limits] which is has the appropriate title, but the colors and the legend are wrong. The colors cycle through R's standard colors, and the legend is 1 column. if I comment out the ggtitle() and theme(), or just ggtitle() I get a plot [available, but not attached ue to size limits] which has the right colors and mostly right legend, but no title and subtitle. The legend is out of order. the first row should read ISE07 CMP07 with 08 on the second row and 09 the third with a red column and a blue column. Changing byrow = TRUE to bycol = TRUE does not change the plotting of the legend nor does byrow=FALSE I am asking for help with getting the title and sub-title to both show up at the same time as the appropriate colors for the different factor levels. And to get the legend to render so that the legend looks sort of like ISE07[red box][blue box ]CMP07 ISE08 [red3 box] [blue3 box]CMP08 ISE09 [red4 box] [blue4 box]CMP09 the exact colors are not important but the the vertical and horizontal alignment is. Thanks! Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] finding both rows that are duplicated in a data frame
I have a data frame that looks like id1-c(1,1,2,2,3,3,4,5,5,6,6,7,8,9,9,10) id2-c(22,22,34,34,15,15,76,45,45,84,84,37,52,66,66,91) GENDER-sample(c(G-UNK,G-M,G-F),16, replace = TRUE) ETH -sample(c(E-AF,E-UNK,E-VT),16, replace = TRUE) example-cbind(id1,id2,GENDER,ETH) where there are two id's and some duplicate entries for ID's that have different GENDER or ETH(nicity) I would like to get a data frame that doesn't have the duplicates, but the ones that are kept are which ever GENDER is not G-UNK (unknown) and the kept ETH is what ever is not E-UNK the resultant data frame should have 10 rows with no *-UNK in either of the last two columns ( unless both entries were UNK) yes the example data may have some impossible results but it does capture important aspects. 1) G-UNK is alphabetically last of G-F, G-M G-UNK 2) E-UNK is in the middle alphabetically 3) some times the first entry is the unknown gender, some times it is the second *likely to happen with random sample 4) some times both entries for one variable, GENDER or ETH are unknown. 5) only appears to be two of each row, * not 100% sure Thanks! Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] string processing(regular expressions)
I have a variable that is course # nCourse - as.factor(c(002A,002B,002C,007A,007B,007C,101,118A,118B,118C)) And I would like to get rid of the leading zeros, and have the following set (2A,2B,2C,7A,7B,7C,101,118A,118B,118C) to paste() together with the department, B,P,C (bio, phys, chem etc) I am stuck trying to figure out regular expressions, they are new to me. Thank You very much [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Legend formatting (ggplot2)
I am having trouble getting my legend to format the way I want it to. I suspect it is something simple. the code I have is library(ggplot2) ggplot(Chem.comp, aes(Course, GRADE.)) + geom_boxplot(notch = TRUE,aes(fill = COHORT))+ labs(title = Comparison between ISE cohorts and Peers in the Same Chem 2 class, y =Grade Points in class, x = Chemistry 2 quarter) + ylim(0,4.)+guides(colour = guide_legend(nrow = 3))+ scale_fill_manual(name = ISE Cohorts \nComparison groups, values = c(blue,red,blue3,red3,blue4,red4)) which plots as attached I would like to have the legend as two columns one blue (ISE07, ISE08, ISE09) and one red ( Comparison 07, Comparison 08, Comparison 09) the guids(colour = guide_legend(nrow=3)) is what I found at stack overflow ( http://stackoverflow.com/questions/12323416/arranging-ggplot2-legend-items-in-a-grid ) and I am not quite sure how to parse for myself the ggplot documentation page but it looks the same. http://docs.ggplot2.org/current/guide_legend.html Ideally I'd like the legend to have a column of text lables,(ISE...) a column of blue boxes, a column of red boxes and a column of text labels (Comp). but that is mostly just bonus. Thanks Robert 1ColLedgend.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] the inverse of assign()
I am looking for a way to extract the name of a variable that has been passed into a function for example foo -function(x){ write.csv(x, file = paste(NAME(x), csv, sep =.)) } is there a function NAME that would let the calls foo(bar) write the file bar.csv and foo(stuff) write the file stuff.csv Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ave function
I tried lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) shapiro.test(x$GRADE)) and I got Error in shapiro.test(x$GRADE.) : sample size must be between 3 and 5000 I also tried with(Clean, aggregate(GRADE,list(TERM,INST_NUM),FUN=shapiro.test)) and got Group.1 Group.2 x 1 201001 689809 0.9546164 2 201201 689809 0.9521624 3 201301 689809 0.9106206 4 200701 994474 0.8862705 5 200710 994474 0.9176743 6 201203 1105752 0.9382688 . . . 72 201001 1759272 0.9291295 73 201101 1759272 0.9347072 74 201110 1897809 0.9395375 Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs I am not sure how to interpret the output of the second. Thanks! On Tue, Aug 13, 2013 at 11:01 AM, arun smartpink...@yahoo.com wrote: Hi, You could try: lapply(split(Clean,list(Clean$TERM,Clean$INST_NUM)),function(x) shapiro.test(x$GRADE)) A.K. - Original Message - From: Robert Lynch robert.b.ly...@gmail.com To: r-help@r-project.org Cc: Sent: Tuesday, August 13, 2013 1:46 PM Subject: [R] ave function I've written the following function CoursePrep - function (Source, SaveName) { Clean$TERM - as.factor(Clean$TERM) Clean$INST_NUM - as.factor(Clean$INST_NUM) Clean$zGrade - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) write.csv(Clean,paste(SaveName, csv, sep =.), row.names = FALSE) return(Clean) } which is all well and good, but I wan't to throw a shapiro.test in before I normalize. that is I don't really understand quite how I did ( I got help) what I wanted to in the Clean$zGrade - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) that code for the whole of Clean finds all sets of GRADE.'s that have the same INST_NUM and TERM computes a mean, subtracts off the mean and divides by the standard deviation. I would like to for each one of those sets of grades to call shapiro.test() on the set, to see if it is normal *before* I assume it is. I know the naive with(Clean, shapiro.test( list(TERM, INST_NUM))) doesn't work. with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = function(x)shapiro.test(x))) which returns Error in shapiro.test(x) : sample size must be between 3 and 5000 and I have checked that the sets selected are all of length between 3 and 5000. using the following on my full data ClassSize - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = function(x)length(x))) summary(ClassSize) Min. 1st Qu. MedianMean 3rd Qu.Max. 22.0 198.0 241.0 244.4 279.0 466.0 here is some sample data GRADE TERM INST_NUM 1, 9, 1 2, 9, 1 3, 9, 1 1.5, 8, 2 1.75, 8, 2 2, 8, 2 0.5, 9, 2 2, 9, 2 3.5, 9, 2 3.5,8, 1 3.75, 8, 1 4, 8, 1 and hopefully the code would test the following set of grades (1,2,3)(1.5,1.75,2)(0.5,2,3.5)(3.5,3.75,4) Thanks Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ave function
I've written the following function CoursePrep - function (Source, SaveName) { Clean$TERM - as.factor(Clean$TERM) Clean$INST_NUM - as.factor(Clean$INST_NUM) Clean$zGrade - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) write.csv(Clean,paste(SaveName, csv, sep =.), row.names = FALSE) return(Clean) } which is all well and good, but I wan't to throw a shapiro.test in before I normalize. that is I don't really understand quite how I did ( I got help) what I wanted to in the Clean$zGrade - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) that code for the whole of Clean finds all sets of GRADE.'s that have the same INST_NUM and TERM computes a mean, subtracts off the mean and divides by the standard deviation. I would like to for each one of those sets of grades to call shapiro.test() on the set, to see if it is normal *before* I assume it is. I know the naive with(Clean, shapiro.test( list(TERM, INST_NUM))) doesn't work. with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = function(x)shapiro.test(x))) which returns Error in shapiro.test(x) : sample size must be between 3 and 5000 and I have checked that the sets selected are all of length between 3 and 5000. using the following on my full data ClassSize - with(Clean, ave(GRADE., list(TERM, INST_NUM), FUN = function(x)length(x))) summary(ClassSize) Min. 1st Qu. MedianMean 3rd Qu.Max. 22.0 198.0 241.0 244.4 279.0 466.0 here is some sample data GRADE TERM INST_NUM 1, 9, 1 2, 9, 1 3, 9, 1 1.5, 8, 2 1.75, 8, 2 2, 8, 2 0.5, 9, 2 2, 9, 2 3.5, 9, 2 3.5,8, 1 3.75, 8, 1 4, 8, 1 and hopefully the code would test the following set of grades (1,2,3)(1.5,1.75,2)(0.5,2,3.5)(3.5,3.75,4) Thanks Robert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with apply (lapply or sapply not sure)
I am reading in a bunch of files and then processing them all in the same way. I am sure there as a better way then to copy and past the code for each file. Here is what I've done so far InputFiles- as.character(list.files(~/ISLE/RWork/DataWarehouseMining/byCourse/)) #Path to the Course data files for (i in InputFiles) { # print(head(read.csv(paste(~/ISLE/RWork/DataWarehouseMining/byCourse/, i, sep= print(paste(Reading file ~/ISLE/RWork/DataWarehouseMining/byCourse/, i, sep=,)) assign(i, read.csv(paste(~/ISLE/RWork/DataWarehouseMining/byCourse/, i, sep=))) }#note last file is NOT a course file by the student information. Master-StudentInfoForRobertWUnitAt7A_2.csv #this is the last file CourseFiles -InputFiles[- c(15,16)] # ignore the student info ...7A.csv ...7A_2.csv #for each file I do the following #Bis 101 summary(BigInstBIS101.csv) B101 - BigInstBIS101.csv[-c(3,4,8)] summary(B101) B101$WH_ID - as.factor(B101$WH_ID) B101$SID - as.factor(B101$SID) B101$TERM - as.factor(B101$TERM) B101$CRN - as.factor(B101$CRN) B101$CRN_TRM - as.factor(B101$CRN_TRM) B101$INST_NUM - as.factor(B101$INST_NUM) B101$zGrade - with(B101, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) write.csv(B101,B101.csv, row.names = FALSE) #Bis 2A B2A - BigInstBIS2A.csv[-c(3,4,8)] summary(B2A) B2A$WH_ID - as.factor(B2A$WH_ID) B2A$SID - as.factor(B2A$SID) B2A$TERM - as.factor(B2A$TERM) B2A$CRN - as.factor(B2A$CRN) B2A$CRN_TRM - as.factor(B2A$CRN_TRM) B2A$INST_NUM - as.factor(B2A$INST_NUM) B2A$zGrade - with(B2A, ave(GRADE., list(TERM, INST_NUM), FUN = scale)) write.csv(B2A,B2A.csv, row.names = FALSE) And so on for another 12 courses, however I am changing what I am doing as part of the reading in the file and don't want to replace the code in 14 different places. suggestions? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] weighted average
I am trying to compute GPA from class grades(which have been normallized) I have for example the following matrix Master = SIDB2AB2BB2C C2A C2BC2CC118AC118B C118C 0010.010.5 -0.41.2 -1.8 0.3 -0.3 0.4 0.5 0020.010.5 -0.40.5 -0.4 1.2 -1.8 0.3 -0.3 0030.040.05 0.5-0.4 - 0.5 0.4 -1.2 1.8 0.3 etc Where each column has a zero mean and a standard deviation of 1. I want to calculate a weighted average for each row(student ID) that takes into account that B2A, C118A, C118B, and C118C are all 4 unit classes, and the rest, B2B, B2C, C2A,C2B,C2C are 5 unit classes I have tried Units-c(4,5,5,5,5,5,4,4,4) Master$zGPA -weighted.means(Master[,2:10],Units) But that gets me one number and not a vector. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interpreting GLM results and plotting fits
I am trying to interpret the output of GLM and I am not sure how it is treating the factor GENDER with levels G-M and G-F. Below is the output of summary(GPA.lm) Call: glm(formula = zGPA ~ Units.Bfr.7A * GENDER, data = Master1) Deviance Residuals: Min 1Q Median 3Q Max -1.1432 -0.3285 -0.1061 0.2283 1.8286 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)-2.513e-01 2.238e-02 -11.230 2e-16 *** Units.Bfr.7A2.297e-05 2.851e-04 0.0810.936 GENDERG-M 3.183e-01 4.536e-02 7.018 2.56e-12 *** Units.Bfr.7A:GENDERG-M -3.073e-03 5.975e-04 -5.142 2.82e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for gaussian family taken to be 0.2432662) Null deviance: 1204.2 on 4875 degrees of freedom Residual deviance: 1185.2 on 4872 degrees of freedom (106 observations deleted due to missingness) AIC: 6950.8 Number of Fisher Scoring iterations: 2 second I would like to draw two lines w/ confidince intervals on the scatter plot. One for G-M and the other for G-F I think I am doing this with stat_smooth(aes(group=GENDER), method=glm, fullrange=TRUE) but again am not sure quite what is being outputted. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data selection
I have two different data frames ( actually a set of data frames for each class and one master one into which i want pull some data from each of the frame in the set) one is all students that have taken a course so the set of data frames is B101 B2A B2B B2C etc. . . and each one has lots of data e.g. B101 SID zGRADE 444 -.2 458 0 587 .2 etc and Master SID 587 etc and I would like to make a field in master for each data frame e.g. Master$B101, Master$B2A and populate it with the zGrades from each of the data frames the SID in Master are a wholely containted sub set of the ones in each of the other data frames [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple selection and normalization
Hi-- I am trying to normalize course grades for each instance of a course, e.g. Stats 1 Fall2009 J. Smith. I have a frame for all instances of a course, e.g. stats 1 in the last 5 years, that looks like SIDN TERM GRADE INST where SIDN is a Student ID Number, TERM is a factor that gives the quarter and year a course was offered, GRADE is a 0-4.3 grade and INST is the instructor, again as a factor. Course offerings are determined by the TERM and INST. That is one inst. assigned grades to all the students they were responsible for that term. Multiple instructors may have taught the same term. For every course offering I would like to normalize the GRADE: Z- (GRADE - mean)/SD where the mean and SD are over a single course offering. Thanks! RBL [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.