Hi: See inline, please.
On Thu, Jan 20, 2011 at 4:22 PM, Nathan Miller <natemille...@gmail.com>wrote: > Hello, > > I am trying to generate a set of boxplots using ggplot with the following > data with 4 columns (Day, Site, VO2, Cruise) > > AllCorbulaMR > Day Site VO2 Cruise > 1 1 1 148.43632670 1 > 2 1 1 61.73864969 1 > 3 1 1 92.64536096 1 > 4 1 1 73.35434957 1 > 5 1 1 69.85568810 1 > 6 1 1 98.71116866 1 > 7 1 1 67.57880107 1 > 8 1 1 80.57959160 1 > 9 1 1 53.38577137 1 > 10 1 2 81.08429594 1 > 11 1 2 79.73019687 1 > 12 1 2 67.93991806 1 > 13 1 2 50.69929558 1 > 14 1 2 42.02457680 1 > 15 1 2 64.10049924 1 > 16 1 2 80.02264095 1 > 17 1 2 67.14828804 1 > 18 1 2 93.33363743 1 > 19 1 3 53.86021985 1 > 20 1 3 50.53366868 1 > 21 1 3 52.12437086 1 > 22 1 3 43.44618922 1 > 23 1 3 64.64322840 1 > 24 1 3 55.03761768 1 > 25 1 3 67.79501374 1 > 26 1 4 12.70806068 1 > 27 1 4 114.56401960 1 > 28 1 4 34.34450695 1 > 29 1 4 76.70849935 1 > 30 1 4 68.99752863 1 > 31 1 4 71.23080332 1 > 32 1 1 0.08222308 2 > 33 1 1 NA 2 > 34 1 1 0.03743258 2 > 35 1 1 0.04496363 2 > 36 1 1 0.07184903 2 > 37 1 1 0.05637676 2 > 38 1 1 0.05163886 2 > 39 1 1 0.03022606 2 > 40 1 1 0.04150667 2 > 41 2 1 0.04982530 2 > 42 2 1 0.05248479 2 > 43 2 1 0.03839707 2 > 44 2 1 0.04283591 2 > 45 2 1 0.03285247 2 > 46 2 1 0.03965853 2 > 47 2 1 NA 2 > 48 2 1 0.03637822 2 > 49 2 1 0.03686663 2 > 50 1 2 0.04086229 2 > 51 1 2 NA 2 > 52 1 2 0.01891389 2 > 53 1 2 0.03365864 2 > 54 1 2 0.04179611 2 > 55 1 2 0.04675111 2 > 56 1 2 0.04734616 2 > 57 1 2 0.04046907 2 > 58 1 2 0.03395499 2 > 59 2 2 0.02104620 2 > 60 2 2 NA 2 > 61 2 2 NA 2 > 62 2 2 NA 2 > 63 2 2 0.01882796 2 > 64 2 2 NA 2 > 65 2 2 NA 2 > 66 2 2 0.02328894 2 > 67 2 2 0.02635327 2 > 68 1 3 0.06030056 2 > 69 1 3 0.04728888 2 > 70 1 3 0.04307900 2 > 71 1 3 0.05144241 2 > 72 1 3 0.03223973 2 > 73 1 3 0.05145292 2 > 74 1 3 0.02718536 2 > 75 1 3 0.02830348 2 > 76 1 3 0.05859836 2 > 77 2 3 0.04521778 2 > 78 2 3 0.03242385 2 > 79 2 3 0.03412688 2 > 80 2 3 0.04407171 2 > 81 2 3 0.04517834 2 > 82 2 3 NA 2 > 83 2 3 0.02745407 2 > 84 2 3 0.03118602 2 > 85 2 3 0.04420074 2 > 86 1 4 0.05352334 2 > 87 1 4 0.05378120 2 > 88 1 4 0.04394838 2 > 89 1 4 0.02597939 2 > 90 1 4 0.05476946 2 > 91 1 4 0.04371743 2 > 92 2 4 0.04022729 2 > 93 2 4 0.04078509 2 > 94 2 4 0.04911994 2 > 95 2 4 0.04068468 2 > 96 2 4 NA 2 > 97 2 4 NA 2 > 98 1 1 0.08892223 3 > 99 1 1 0.08617873 3 > 100 1 1 0.06108950 3 > 101 1 1 0.19047922 3 > 102 1 1 0.09865930 3 > 103 1 1 0.12103549 3 > 104 1 1 0.06788404 3 > 105 1 1 0.12629497 3 > 106 1 1 0.10947173 3 > 107 1 1 0.11381467 3 > 108 2 1 0.07809781 3 > 109 2 1 0.04397586 3 > 110 2 1 0.06317635 3 > 111 2 1 0.02020365 3 > 112 2 1 0.09525985 3 > 113 2 1 0.04732347 3 > 114 2 1 0.03043341 3 > 115 2 1 0.04419395 3 > 116 2 1 NA 3 > 117 2 1 NA 3 > 118 1 2 0.03380003 3 > 119 1 2 0.02600926 3 > 120 1 2 0.03980552 3 > 121 1 2 0.03659985 3 > 122 1 2 0.04867881 3 > 123 1 2 0.03694679 3 > 124 1 2 0.03372825 3 > 125 1 2 0.03644750 3 > 126 1 2 0.01497611 3 > 127 1 2 0.02697976 3 > 128 2 2 0.03136923 3 > 129 2 2 0.03602215 3 > 130 2 2 0.04000660 3 > 131 2 2 0.03673098 3 > 132 2 2 0.03090854 3 > 133 2 2 0.04877643 3 > 134 2 2 0.02468537 3 > 135 2 2 NA 3 > 136 2 2 NA 3 > 137 2 2 NA 3 > 138 1 3 0.04809866 3 > 139 1 3 0.02380070 3 > 140 1 3 0.03672271 3 > 141 1 3 0.03232115 3 > 142 1 3 0.02950701 3 > 143 1 3 0.05068163 3 > 144 1 3 0.03004234 3 > 145 1 3 0.03090461 3 > 146 1 3 0.04539888 3 > 147 1 3 0.03261571 3 > 148 2 3 0.02069708 3 > 149 2 3 0.02117658 3 > 150 2 3 0.03244907 3 > 151 2 3 0.03404048 3 > 152 2 3 0.04597381 3 > 153 2 3 0.04034278 3 > 154 2 3 0.02167128 3 > 155 2 3 0.02245179 3 > 156 2 3 NA 3 > 157 2 3 NA 3 > 158 1 4 0.03935839 3 > 159 1 4 0.02932294 3 > 160 1 4 0.04928409 3 > 161 1 4 0.05075344 3 > 162 1 4 0.04353663 3 > 163 1 4 0.03376173 3 > 164 1 4 0.03640901 3 > 165 1 4 0.03616992 3 > 166 1 4 0.04999144 3 > 167 1 4 NA 3 > 168 2 4 0.02864131 3 > 169 2 4 0.02269317 3 > 170 2 4 0.04231203 3 > 171 2 4 0.03117968 3 > 172 2 4 0.05936813 3 > 173 2 4 0.04453866 3 > 174 2 4 0.04596834 3 > 175 2 4 0.03316604 3 > 176 2 4 0.03557714 3 > 177 2 4 0.03546922 3 > 178 1 1 25.85293219 4 > 179 1 1 114.35190440 4 > 180 1 1 41.24654852 4 > 181 1 1 30.41020474 4 > 182 1 1 36.48809050 4 > 183 1 1 54.41756532 4 > 184 1 1 19.80251528 4 > 185 1 1 69.26706620 4 > 186 1 1 24.58416429 4 > 187 1 1 34.07847862 4 > 188 2 1 21.89169319 4 > 189 2 1 59.63530232 4 > 190 2 1 18.52010709 4 > 191 2 1 41.19883744 4 > 192 2 1 48.19844111 4 > 193 2 1 15.39495989 4 > 194 2 1 29.50623201 4 > 195 2 1 51.30763254 4 > 196 2 1 7.47325378 4 > 197 2 1 24.31193857 4 > 198 1 2 19.29834204 4 > 199 1 2 36.58998998 4 > 200 1 2 28.68983063 4 > 201 1 2 67.06563511 4 > 202 1 2 30.00310234 4 > 203 1 2 28.28411410 4 > 204 1 2 34.81315669 4 > 205 1 2 45.49758389 4 > 206 1 2 30.96530199 4 > 207 1 2 35.12478034 4 > 208 2 2 20.10730199 4 > 209 2 2 20.51722925 4 > 210 2 2 0.74851863 4 > 211 2 2 16.93243539 4 > 212 2 2 8.88325120 4 > 213 2 2 49.09739859 4 > 214 2 2 6.19244047 4 > 215 2 2 27.52476529 4 > 216 2 2 14.38173305 4 > 217 2 2 5.80964158 4 > 218 1 3 29.04103986 4 > 219 1 3 0.92325019 4 > 220 1 3 29.13337158 4 > 221 1 3 34.74658994 4 > 222 1 3 49.14354727 4 > 223 1 3 22.07507554 4 > 224 1 3 34.33727406 4 > 225 1 3 36.87618286 4 > 226 1 3 31.18096886 4 > 227 1 3 36.55777870 4 > 228 2 3 25.58876028 4 > 229 2 3 0.98494566 4 > 230 2 3 24.97630908 4 > 231 2 3 11.95195819 4 > 232 2 3 21.70297861 4 > 233 2 3 13.65235649 4 > 234 2 3 10.50488380 4 > 235 2 3 2.87966024 4 > 236 2 3 2.73826233 4 > 237 2 3 0.11720351 4 > 238 1 4 4.55652677 4 > 239 1 4 8.70352253 4 > 240 1 4 41.11933137 4 > 241 1 4 28.85120068 4 > 242 1 4 32.32787505 4 > 243 1 4 23.07206283 4 > 244 1 4 21.14382198 4 > 245 1 4 13.81868580 4 > 246 1 4 58.54591722 4 > 247 1 4 43.80195253 4 > 248 2 4 46.78384255 4 > 249 2 4 0.61816630 4 > 250 2 4 13.55381496 4 > 251 2 4 4.48444064 4 > 252 2 4 14.19295226 4 > 253 2 4 11.94361059 4 > 254 2 4 35.62088407 4 > 255 2 4 16.11595544 4 > 256 2 4 34.80256181 4 > 257 2 4 19.66686930 4 > > I would like to make a boxplot with Site on the x-axis and VO2 on the > y-axis > and with the fill colour specified by the Cruise. I have the following code > > p=ggplot(AllCorbulaMR,aes(factor(Site),VO2)) > > p+geom_boxplot(aes(fill=factor(Cruise)))+scale_fill_manual('Cruise', > values=(colours()[c(375,577,573,439)]), breaks=c(1,2,3,4), labels=c('Cruise > 1', 'Cruise 2', 'Cruise 3', 'Cruise 4'))+xlab('Sampling > Site')+ylab(expression("Metabolic Rate " > (mu*moles*~O[2]*~mg^-1*~hr^-1)))+opts(title="Corbula > Cruises")+ylim(c(0,0.2)) > > > The issue I have at the moment is that only Cruises 2 and 3 are plotted > correctly. The other two are not shown or the boxes are plotted as single > horizontal lines. I also get a warning that 129 rows containing missing > data > were removed (probably explaining the missing boxplots). I have a few > missing data points (specified NA), but not 129. Can anyone see an error in > the code I have that would explain why some rows of data are being > disregarded? > (1) Groups 1 and 4 have much larger VO2 values than do groups 2 and 3. Your VO2 data range from about 0.01 to 150, which suggests it might a a prime candidate for a log transformation. (2) You've told ggplot() to restrict the y-range from 0 to 0.2; that's why 129 rows of data are removed. You can see all the observations whose VO2 value is less than 0.2, but everything larger than that is suppressed because of your choice of y-range. The 'complete' plot of the data (after some reorganization of your code with spacing) produces flatlines for Cruises 2 and 3, since the full range of VO2 values is from 0 to 150. (I'm pretty sure you did that and then restricted the y-range later because of this.) I reran your code with metabolic rate plotted on the log_10 scale and got what looked like an acceptable plot to me, although I'd look carefully at the data on the upper end of the scale. Here's the modified code (with spacing, since it affects the position of labels): p=ggplot(AllCorbulaMR,aes(factor(Site),VO2)) p + geom_boxplot(aes(fill=factor(Cruise))) + scale_fill_manual('Cruise', values=(colours()[c(375,577,573,439)]), breaks=c(1,2,3,4), labels=c('Cruise1', 'Cruise 2', 'Cruise 3', 'Cruise 4')) + xlab('Sampling Site') + scale_y_continuous(trans = 'log10') + ylab(expression("Metabolic Rate "(mu*moles*~O[2]*~mg^-1*~hr^-1))) + opts(title="Corbula Cruises") You can adjust the labeling/scaling of the y-axis using some of the options available in scale_continuous() if desired. There are quite a few transformation options; you may well find a better one than what I chose here. There is also a list devoted to ggplot2, to which you can subscribe at http://had.co.nz/ggplot2, where questions like this might be better placed. The on-line help system for ggplot2, with myriad examples, is found towards the bottom of that page. HTH, Dennis Thank you, > > Nate > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.