Re: [R] Determining maximum hourly slope per day
Thanks for your example Peter. It does seem like one means of getting the job done and if I'm unable to figure out a means with zoo::rollapply I may follow your example. Hopefully someone else can chime in. Thanks, Nate --- Nathan A. Miller Romberg Tiburon Center San Francisco State University 3150 Paradise Dr Tiburon, CA 94920 www.nate-miller.org On Wed, Mar 13, 2013 at 4:34 PM, Peter Ehlers ehl...@ucalgary.ca wrote: On 2013-03-12 17:10, Nathan Miller wrote: Hello, I have a challenge! I have a large dataset with three columns, date,temp, location. date is in the format %m/%d/%y %H:%M, with a temp recorded every 10 minutes. These temperatures of surface temperatures and so fluctuate during the day, heating up and then cooling down, so the data is a series of peaks and troughs. I would like to develop a function that would go through a dataset consisting of many sequential dates and determine for each day the maximum hourly slope of temp~date for each site (the fastest hourly rate of heating). The output would be the date, the maximum hourly slope for that date, and the location. It would also be great if I could extract when during the day the maximum hourly slope occurred. I have been playing around with using the package lubridate to identify each hour of the day using something like this to create a separate column grouping the data into hours library(lubridate) data$date2 - floor_date(data$date, hour) I was then imagining something like this though this code doesn't work as written. ddply(data, .(location, date2), function(d) max(rollapply(slope(d$temp~d$**date, data=d))) Essentially what I'm imagining is calculating the slope (though I'd have to write a quick slope function) of the date/temp relationship, use rollapply to apply this function across the dataset, and determine the maximum slope, grouped by location and hour (using date2). Hmm... and per day! This seems complicated. Can others think of a simpler, more elegant means of extracting this type of data? I struggled to put together a working example with a set of data, but if this doesn't make sense let me know and I'll see what I can do. Thanks, Nate First, let's ignore location; if you can do it for one location, you can surely do it for others. Second, let's ignore date; if you can do it for one date, you can surely do it for others. That leaves us with the question of what you want to do for one given date. If you want the maximum slope for any 60-minute interval on that date (which I take your question to mean), then rollapply should do the job. But I'm not very familiar with zoo, so here's a crude approach: d - data.frame(time = 1:72, temp = rnorm(72)) slope - rep(NA, 72) for(i in 6:72) { slope[i] - coef(lm(temp ~ time, data = d, subset = (i-5):i))[2] } maxslope - max(slope, na.rm = TRUE) idx - which.max(slope) Obviously, this can be extended to cover more than a 24-hour period. Now, let's wait for Gabor to show us the trivial way with zoo::rollapply. Peter Ehlers [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Determining maximum hourly slope per day
Hello, I have a challenge! I have a large dataset with three columns, date,temp, location. date is in the format %m/%d/%y %H:%M, with a temp recorded every 10 minutes. These temperatures of surface temperatures and so fluctuate during the day, heating up and then cooling down, so the data is a series of peaks and troughs. I would like to develop a function that would go through a dataset consisting of many sequential dates and determine for each day the maximum hourly slope of temp~date for each site (the fastest hourly rate of heating). The output would be the date, the maximum hourly slope for that date, and the location. It would also be great if I could extract when during the day the maximum hourly slope occurred. I have been playing around with using the package lubridate to identify each hour of the day using something like this to create a separate column grouping the data into hours library(lubridate) data$date2 - floor_date(data$date, hour) I was then imagining something like this though this code doesn't work as written. ddply(data, .(location, date2), function(d) max(rollapply(slope(d$temp~d$date, data=d))) Essentially what I'm imagining is calculating the slope (though I'd have to write a quick slope function) of the date/temp relationship, use rollapply to apply this function across the dataset, and determine the maximum slope, grouped by location and hour (using date2). Hmm... and per day! This seems complicated. Can others think of a simpler, more elegant means of extracting this type of data? I struggled to put together a working example with a set of data, but if this doesn't make sense let me know and I'll see what I can do. Thanks, Nate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding 95% contours around scatterplot points with ggplot2
Hi all, I have been looking for means of add a contour around some points in a scatterplot as a means of representing the center of density for of the data. I'm imagining something like a 95% confidence estimate drawn around the data. So far I have found some code for drawing polygons around the data. These look nice, but in some cases the polygons are strongly influenced by outlying points. Does anyone have a thought on how to draw a contour which is more along the lines of a 95% confidence space? I have provided a working example below to illustrate the drawing of the polygons. As I said I would rather have three ovals/95% contours drawn around the points by level to capture the different density distributions without the visualization being heavily influenced by outliers. I have looked into the code provided here from Hadley https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 using the mvtnorm package and the dmvnorm function, but haven't been able to get it work for my data example. The calculated densities are always zero (at this step of Hadley's code: dgrid$dens - dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) I appreciate any assistance. Thanks, Nate x-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), seq(0.4,0.6,length.out=30)) y-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) data-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) find_hull - function(data) data[chull(data$x, data$y), ] hulls - ddply(data, .(level), find_hull) fig1 - ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+geom_point() fig1 - fig1 + geom_polygon(data=hulls, alpha=.2) fig1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding 95% contours around scatterplot points with ggplot2
Thanks Ista, I have played a bit with stat_density2d as well. It doesn't completely capture what I am looking for and ends up being quite busy at the same time. I'm looking for a way of helping those looking that the figure to see the broad patterns of where in the x/y space the data from different groups are distributed. Using the 95% CI type idea is so that I don't end up arbitrarily drawing circles around each set of points. I appreciate your direction though. Nate On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn istaz...@gmail.com wrote: Hi Nathan, This only fits some of your criteria, but have you looked at ?stat_density2d? Best, Ista On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller natemille...@gmail.com wrote: Hi all, I have been looking for means of add a contour around some points in a scatterplot as a means of representing the center of density for of the data. I'm imagining something like a 95% confidence estimate drawn around the data. So far I have found some code for drawing polygons around the data. These look nice, but in some cases the polygons are strongly influenced by outlying points. Does anyone have a thought on how to draw a contour which is more along the lines of a 95% confidence space? I have provided a working example below to illustrate the drawing of the polygons. As I said I would rather have three ovals/95% contours drawn around the points by level to capture the different density distributions without the visualization being heavily influenced by outliers. I have looked into the code provided here from Hadley https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 using the mvtnorm package and the dmvnorm function, but haven't been able to get it work for my data example. The calculated densities are always zero (at this step of Hadley's code: dgrid$dens - dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) I appreciate any assistance. Thanks, Nate x-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), seq(0.4,0.6,length.out=30)) y-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) data-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) find_hull - function(data) data[chull(data$x, data$y), ] hulls - ddply(data, .(level), find_hull) fig1 - ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+geom_point() fig1 - fig1 + geom_polygon(data=hulls, alpha=.2) fig1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding 95% contours around scatterplot points with ggplot2
Hi Ista, Thanks. That does look pretty nice and I hadn't realized that was possible. Do you know how to extract information regarding those curves? I'd like to be able to report something about what portion of the data they encompass or really any other feature about them in a figure legend. I'll look into stat_density2d and see if I can determine how they are set. Thanks for your help, Nate On Mon, Jan 28, 2013 at 12:37 PM, Ista Zahn istaz...@gmail.com wrote: Hi Nate, You can make it less busy using the bins argument. This is not documented, except in the examples to stat_contour, but try ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+ geom_point()+ stat_density2d(bins=2) HTH, Ista On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller natemille...@gmail.com wrote: Thanks Ista, I have played a bit with stat_density2d as well. It doesn't completely capture what I am looking for and ends up being quite busy at the same time. I'm looking for a way of helping those looking that the figure to see the broad patterns of where in the x/y space the data from different groups are distributed. Using the 95% CI type idea is so that I don't end up arbitrarily drawing circles around each set of points. I appreciate your direction though. Nate On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn istaz...@gmail.com wrote: Hi Nathan, This only fits some of your criteria, but have you looked at ?stat_density2d? Best, Ista On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller natemille...@gmail.com wrote: Hi all, I have been looking for means of add a contour around some points in a scatterplot as a means of representing the center of density for of the data. I'm imagining something like a 95% confidence estimate drawn around the data. So far I have found some code for drawing polygons around the data. These look nice, but in some cases the polygons are strongly influenced by outlying points. Does anyone have a thought on how to draw a contour which is more along the lines of a 95% confidence space? I have provided a working example below to illustrate the drawing of the polygons. As I said I would rather have three ovals/95% contours drawn around the points by level to capture the different density distributions without the visualization being heavily influenced by outliers. I have looked into the code provided here from Hadley https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8 using the mvtnorm package and the dmvnorm function, but haven't been able to get it work for my data example. The calculated densities are always zero (at this step of Hadley's code: dgrid$dens - dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma) ) I appreciate any assistance. Thanks, Nate x-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30), seq(0.4,0.6,length.out=30)) y-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25)) data-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y) find_hull - function(data) data[chull(data$x, data$y), ] hulls - ddply(data, .(level), find_hull) fig1 - ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+geom_point() fig1 - fig1 + geom_polygon(data=hulls, alpha=.2) fig1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] More efficient use of reshape?
Thanks John, I appreciate your help. With the help of Dennis Murphy, code along the lines of this climData_melt - melt(clim.data, id = 'date', measure = c(GISS, HAD, NOAA, RSS, UAH)) actually gets the data into the correct form very simply and from there its easy to plot with faceting in ggplot2. Thanks for helping clear up the reshape and reshape2 issues. I try to make sure I have those types of things figured out before posting to the list. Didn't mean to confuse the issue by incorrectly referring to the packages/functions or their locations. Thanks, Nate On Fri, Dec 14, 2012 at 7:38 AM, John Kane jrkrid...@inbox.com wrote: I think David was pointing out that reshape() is not a reshape2 function. It is in the stats package. I am not sure exactly what you are doing but perhaps something along the lines of library(reshape2) mm - melt(clim.data, id = Cs(yr_frac, yr_mn,AMO, NINO34, SSTA)) is a start? I also don't think that the more recent versions of ggplot2 automatically load reshape2 so it may be that you are working with a relatively old installation of ggplot and reshape? sessionInfo() R version 2.15.2 (2012-10-26) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] lubridate_1.2.0directlabels_2.9 RColorBrewer_1.0-5 gridExtra_0.9.1stringr_0.6.2 [6] scales_0.2.3 plyr_1.8 reshape2_1.2.1 ggplot2_0.9.3 loaded via a namespace (and not attached): [1] colorspace_1.2-0 dichromat_1.2-4 digest_0.6.0 gtable_0.1.2 labeling_0.1 [6] MASS_7.3-22 munsell_0.4 proto_0.3-9.2tools_2.15.2 John Kane Kingston ON Canada -Original Message- From: natemille...@gmail.com Sent: Thu, 13 Dec 2012 09:58:34 -0800 To: dwinsem...@comcast.net Subject: Re: [R] More efficient use of reshape? Sorry David, In my attempt to simplify example and just include the code I felt was necessary I left out the loading of ggplot2, which then imports reshape2, and which was actually used in the code I provided. Sorry to the mistake and my misunderstanding of where the reshape function was coming from. Should have checked that more carefully. Thanks, Nate On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius dwinsem...@comcast.netwrote: On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote: Hi all, I have played a bit with the reshape package and function along with melt and cast, but I feel I still don't have a good handle on how to use them efficiently. Below I have included a application of reshape that is rather clunky and I'm hoping someone can offer advice on how to use reshape (or melt/cast) more efficiently. You do realize that the 'reshape' function is _not_ in the reshape package, right? And also that the reshape package has been superseded by the reshape2 package? -- David. #For this example I am using climate change data available on-line file - ( http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv ) clim.data - read.csv(file, header=TRUE) library(lubridate) library(reshape) #I've been playing with the lubridate package a bit to work with dates, but as the climate dataset only uses year and month I have #added a day to each entry in the yr_mn column and then used dym from lubridate to generate the POSIXlt formatted dates in #a new column clim.data$date clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=) clim.data$date-dym(clim.data$**yr_mn) #Now to the reshape. The dataframe is in a wide format. The columns GISS, HAD, NOAA, RSS, and UAH are all different sources #from which the global temperature anomaly has been calculated since 1880 (actually only 1978 for RSS and UAH). What I would like to #do is plot the temperature anomaly vs date and use ggplot to facet by the different data source (GISS, HAD, etc.). Thus I need the #data in long format with a date column, a temperature anomaly column, and a data source column. The code below works, but its #really very clunky and I'm sure I am not using these tools as efficiently as I can. #The varying=list(3:7) specifies the columns in the dataframe that corresponded to the sources (GISS, etc.), though then in the resulting #reshaped dataframe the sources are numbered 1-5, so I have to reassigned their names. In addition, the original dataframe has #additional data columns I do not want and so after reshaping I
Re: [R] More efficient use of reshape?
Sorry David, In my attempt to simplify example and just include the code I felt was necessary I left out the loading of ggplot2, which then imports reshape2, and which was actually used in the code I provided. Sorry to the mistake and my misunderstanding of where the reshape function was coming from. Should have checked that more carefully. Thanks, Nate On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius dwinsem...@comcast.netwrote: On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote: Hi all, I have played a bit with the reshape package and function along with melt and cast, but I feel I still don't have a good handle on how to use them efficiently. Below I have included a application of reshape that is rather clunky and I'm hoping someone can offer advice on how to use reshape (or melt/cast) more efficiently. You do realize that the 'reshape' function is _not_ in the reshape package, right? And also that the reshape package has been superseded by the reshape2 package? -- David. #For this example I am using climate change data available on-line file - ( http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csvhttp://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv ) clim.data - read.csv(file, header=TRUE) library(lubridate) library(reshape) #I've been playing with the lubridate package a bit to work with dates, but as the climate dataset only uses year and month I have #added a day to each entry in the yr_mn column and then used dym from lubridate to generate the POSIXlt formatted dates in #a new column clim.data$date clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=) clim.data$date-dym(clim.data$**yr_mn) #Now to the reshape. The dataframe is in a wide format. The columns GISS, HAD, NOAA, RSS, and UAH are all different sources #from which the global temperature anomaly has been calculated since 1880 (actually only 1978 for RSS and UAH). What I would like to #do is plot the temperature anomaly vs date and use ggplot to facet by the different data source (GISS, HAD, etc.). Thus I need the #data in long format with a date column, a temperature anomaly column, and a data source column. The code below works, but its #really very clunky and I'm sure I am not using these tools as efficiently as I can. #The varying=list(3:7) specifies the columns in the dataframe that corresponded to the sources (GISS, etc.), though then in the resulting #reshaped dataframe the sources are numbered 1-5, so I have to reassigned their names. In addition, the original dataframe has #additional data columns I do not want and so after reshaping I create another! dataframe with just the columns I need, and #then I have to rename them so that I can keep track of what everything is. Whew! Not the most elegant of code. d-reshape(clim.data, varying=list(3:7),idvar=date**, v.names=anomaly,direction=**long) d$time-ifelse(d$time==1,**GISS,d$time) d$time-ifelse(d$time==2,HAD**,d$time) d$time-ifelse(d$time==3,**NOAA,d$time) d$time-ifelse(d$time==4,RSS**,d$time) d$time-ifelse(d$time==5,UAH**,d$time) new.data-data.frame(d$date,d$**time,d$anomaly) names(new.data)-c(date,**source,anomaly) I realize this is a mess, though it works. I think with just some help on how better to work this example I'll probably get over the learning hump and actually figure out how to use these data manipulation functions more cleanly. Any advice or assistance would be appreciated. Thanks, Nate [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Violin plot of categorical/binned data
Hi, I'm trying to create a plot showing the density distribution of some shipping data. I like the look of violin plots, but my data is not continuous but rather binned and I want to make sure its binned nature (not smooth) is apparent in the final plot. So for example, I have the number of individuals per vessel, but rather than having the actual number of individuals I have data in the format of: 7 values of zero, 11 values between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To plot this data I generated a new dataset with the first 7 values being 0, representing the 7 values of 0, the next 11 values being 5.5, representing the 11 values between 1-10, etc. Sample data below. I can make a violin plot (code below) using a log y-axis, which looks alright (though I do have to deal with the zeros still), but in its default format it hides the fact that these are binned data, which seems a bit misleading. Is it possible to make a violin plot that looks a bit more angular (more corners, less smoothing) or in someway shows the distribution, but also clearly shows the true nature of these data? I've tried playing with the bandwidth adjustment and the kernel but haven't been able to get a figure that seems to work. Anyone have some thoughts on this? Thanks, Nate library(ggplot2) library(scales) p=ggplot(data2,(aes(vessel,values))) p+geom_violin()+ scale_y_log10(breaks = trans_breaks(log10, function(x) 10^x),labels = trans_format(log10, math_format(10^.x))) data2-read.table(textConnection( vessel values rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 0.0e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+00 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+01 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+02 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+03 rec 5.5e+04 rec 5.5e+04 rec 5.5e+04 rec 5.5e+05 rec 5.5e+05,header=T) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] drop1, 2-way Unbalanced ANOVA
Hi all, I've spent quite a lot of time searching through the help lists and reading about how best to run perform a 2-way ANOVA with unbalanced data. I realize this has been covered a great deal so I was trying to avoid adding yet another entry to the long list considering the use of different SS, etc. Unfortunately, I have come to the point where I feel I have to wade in and see if someone can help me out. Hopefully I'll phrase this properly given and hopefully it will end up only requiring a simple response. I have an experiment where I have measured a response variable (such as water content) following exposure to two treatments (oxygen content and medium). Oxygen content has three levels (5, 20, 35) and medium has two levels (Air, Water). I am interested if water content is different under the two treatments and whether the effect of oxygen content depends upon the medium in which the experiment was conducted (Air or Water). Unfortunately, the design is unbalanced as some experimental subjects had to be removed from the experiment. I realize that if I just use aov() to perform a two-way ANOVA the order in which the terms (oxygen content and medium) are entered will give different results because of the sequential SS. What I have done in the past is utilize drop1() in conjunction with aov() drop1(aov(WaterContent~Oxygen*Medium, data), test=F) to see if the interaction term was significant (F, p-value) and if its inclusion improved model fit (AIC). If from this I determine that the interaction term can be removed and the model can be rerun without it, I am able to test for main-effects and get F and p-values that I can report in a manuscript. However, if the interaction term is significant and its inclusion is warranted, drop1() only provide me with SS, F, and p-value for the interaction term. Now this is fine, because I do not wish to interpret the main-effects with a significant interaction, but in a manuscript reviewers will request an ANOVA table where l will be asked to report SS, F and p-values for the other terms. I don't have those because I used drop1() which only provides these for the highest order term in the model. How best should I calculate the values that I know I will be asked to provide in a manuscript? I don't wish to come across as a scientist who is simply a slave to the F and p-values with little regard for the data, the hypotheses, and the actual statistical interpretation. I am interested in doing this right, but I also know that practically in the current status of our field, while I focus on doing statistics that address my hypotheses of interest and can choose to not discuss the main effects in isolation when an interaction exists, I will be asked to provide the ANOVA table with all the degrees of freedom, SS, F-values, p-values etc...for the entire model, not just the highest order term. Can anyone provide advice here? Should I just use the car package and Type III SS with an appropriate contrast and not use the drop1() function, even though I'm really not interested in using the Type III SS and I kinda like the drop1()? I am not opposed to Type II SS, but clearly if the interaction is important then using Type II SS, which do not consider interactions, are not appropriate. Hopefully this is somewhat clear and doesn't simply sound like a rehashing of the same old ANOVA and SS story. Maybe I should be doing something completely different I greatly appreciate constructive comments. Thanks, Nate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot stat_summary (mean_cl_boot)
Hello, This is a pretty simple question, but after spending quite a bit of time looking at Hmisc and using Google, I can't find the answer. If I use stat_summary(fun.data=mean_cl_boot) in ggplot to generate 95% confidence intervals, how many bootstrap iterations are preformed by default? Can this be changed? I would at least like to be able to report the number of boot strap interations used to generate the CIs. I haven't been able to find mean_cl_boot as a function itself or something ressembling it in the Hmisc documentation, but it seems as though Hmisc is wrapped up in stat_summary() and is called to compute mean_cl_boot. Many thanks for clearing this up for me, Nate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot stat_summary (mean_cl_boot)
Sorry, I didn't realize I was being so obscure. Within ggplot it is possible to use stat_summary() to generate confidence intervals about a mean. One method for generating these CI assumes normality. The other uses bootstrapping to generate the CI. I am using the second method which requires code like this stat_summary(fun.data=mean_cl_boot, geom=errorbar,width=0.1,colour = red) I've added some extra flourishes to make them look like errorbars, alter the width and specify color. I would like some details regarding how this bootstrapped CI is calculated. If I type ?mean_cl_boot at the R command line I get a minimal help file for wrap_hmisc {ggplot2} which is described wrap up a selection of Hmisc to make it easy to use with stat_summary I did not mean to suggest that ggplot2 calls Hmisc when I run stat_summary(), but simply that it appears that stat_summary() seems to have been based upon a selection of Hmisc, hence I went looking in Hmisc to try to find details regarding stat_summary(). I was unsuccessful in this attempt. I don't believe a great deal of debugging is necessary. I am simply looking for details regarding how mean_cl_boot works. If you don't have information regarding how it works (such as the default number of resamplings) there is no need to respond. Thanks for any assistance, Nate On Wed, Nov 9, 2011 at 1:10 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 9, 2011, at 2:59 PM, Nathan Miller wrote: Hello, This is a pretty simple question, but after spending quite a bit of time looking at Hmisc and using Google, I can't find the answer. If I use stat_summary(fun.data=mean_**cl_boot) in ggplot to generate 95% confidence intervals, how many bootstrap iterations are preformed by default? Can this be changed? I would at least like to be able to report the number of boot strap interations used to generate the CIs. I haven't been able to find mean_cl_boot as a function itself or something ressembling it in the Hmisc documentation, but it seems as though Hmisc is wrapped up in stat_summary() and is called to compute mean_cl_boot. You seem really, really confused (and you offer very little in the way of context to support debugging efforts). You are referring to ggplot functions. As far as I know there is no connection between the Hmisc and ggplot (or ggplot2) packages. Al things change, I know, but Frank just completed switching over to Lattice a couple of years ago. -- David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot stat_summary (mean_cl_boot)
Ok, I got it. smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE) Looks like its 1000. Cool. Thanks for the help, Nate On Wed, Nov 9, 2011 at 1:35 PM, Nathan Miller natemille...@gmail.comwrote: Sorry, I didn't realize I was being so obscure. Within ggplot it is possible to use stat_summary() to generate confidence intervals about a mean. One method for generating these CI assumes normality. The other uses bootstrapping to generate the CI. I am using the second method which requires code like this stat_summary(fun.data=mean_cl_boot, geom=errorbar,width=0.1,colour = red) I've added some extra flourishes to make them look like errorbars, alter the width and specify color. I would like some details regarding how this bootstrapped CI is calculated. If I type ?mean_cl_boot at the R command line I get a minimal help file for wrap_hmisc {ggplot2} which is described wrap up a selection of Hmisc to make it easy to use with stat_summary I did not mean to suggest that ggplot2 calls Hmisc when I run stat_summary(), but simply that it appears that stat_summary() seems to have been based upon a selection of Hmisc, hence I went looking in Hmisc to try to find details regarding stat_summary(). I was unsuccessful in this attempt. I don't believe a great deal of debugging is necessary. I am simply looking for details regarding how mean_cl_boot works. If you don't have information regarding how it works (such as the default number of resamplings) there is no need to respond. Thanks for any assistance, Nate On Wed, Nov 9, 2011 at 1:10 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 9, 2011, at 2:59 PM, Nathan Miller wrote: Hello, This is a pretty simple question, but after spending quite a bit of time looking at Hmisc and using Google, I can't find the answer. If I use stat_summary(fun.data=mean_**cl_boot) in ggplot to generate 95% confidence intervals, how many bootstrap iterations are preformed by default? Can this be changed? I would at least like to be able to report the number of boot strap interations used to generate the CIs. I haven't been able to find mean_cl_boot as a function itself or something ressembling it in the Hmisc documentation, but it seems as though Hmisc is wrapped up in stat_summary() and is called to compute mean_cl_boot. You seem really, really confused (and you offer very little in the way of context to support debugging efforts). You are referring to ggplot functions. As far as I know there is no connection between the Hmisc and ggplot (or ggplot2) packages. Al things change, I know, but Frank just completed switching over to Lattice a couple of years ago. -- David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot stat_summary (mean_cl_boot)
For all those that are interested. To adjust the number of reps in the stat_summary() mean_cl_boot function simply specify B to the number of bootstrap resamples. I set B to 2000 resamplings below. stat_summary(fun.data=mean_cl_boot, geom=errorbar,width=0.1,colour = red, B=2000 ) If you run mean_cl_boot within stat_summary() and ggplot setting reps=T does not appear to return a vector of the resampled means as an attribute that I could locate anywhere. However, you can run smean.cl.boot code outside of ggplot. x-smean.cl.boot(OsmData$Mean, B=2000, reps=T) attr(x,reps) Thus, outside of ggplot you can use reps=T to check the resampling is proceeding as you expect, before adding it to the ggplot code. I did some checks setting B=1 and B=5 as well as large numbers both inside and outside of the ggplot code to assure myself that my adjustments to B within stat_summary() within ggplot were actually doing what I thought. Finally, despite the fact that the Hmisc function is called smean.cl.boot, as David points out, within ggplot and stat_summary you must use mean_cl_boot without the s before mean. Within ggplot mean_cl_boot is the correct notation and it does work. I really like ggplot, but can agree that it isn't always clear how to get from point A to point B. My hope in writing this out is that someone else might start their own exploration of these issues a little further down the road than I found myself when I started looking into this. Thanks, Nate On Wed, Nov 9, 2011 at 1:46 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 9, 2011, at 4:35 PM, Nathan Miller wrote: Sorry, I didn't realize I was being so obscure. Within ggplot it is possible to use stat_summary() to generate confidence intervals about a mean. One method for generating these CI assumes normality. The other uses bootstrapping to generate the CI. I am using the second method which requires code like this stat_summary(fun.data=mean_**cl_boot, geom=errorbar,width=0.1,**colour = red) I've added some extra flourishes to make them look like errorbars, alter the width and specify color. I would like some details regarding how this bootstrapped CI is calculated. If I type ?mean_cl_boot at the R command line I get a minimal help file for wrap_hmisc {ggplot2} which is described wrap up a selection of Hmisc to make it easy to use with stat_summary I did not mean to suggest that ggplot2 calls Hmisc when I run stat_summary(), Actually it does. but simply that it appears that stat_summary() seems to have been based upon a selection of Hmisc, hence I went looking in Hmisc to try to find details regarding stat_summary(). I was unsuccessful in this attempt. I don't believe a great deal of debugging is necessary. I am simply looking for details regarding how mean_cl_boot works. It doesn't. That is not the right name. If you don't have information regarding how it works (such as the default number of resamplings) there is no need to respond. Hadley's help files in ggplot2 are terse (or the links to outside resources crash my R sessions) to the point of being too frustrating for me to consider using that package, so I don't know if optional parameters can be passed to the Hmisc functions. If they are, then you should set reps=TRUE and then see what happens to the number of reps from the returned object ... if the wrap_hmisc function does happen to catch it. x - rnorm(100) smean.cl.boot(x) Mean Lower Upper -0.0211511 -0.2013623 0.1469728 smean.cl.boot(x, reps=TRUE) Mean Lower Upper -0.03465361 -0.21233213 0.15178655 attr(,reps) [1] 0.0283330508 -0.1250784237 0.0744640779 0.1310826601 -0.1373094536 [6] 0.0629291714 0.0145916070 -0.0860141221 0.0549134451 0.0732892908 snipped pages of intervening output. [991] 0.1029922424 0.0613358597 -0.0645577851 -0.1664905503 -0.1249615180 [996] -0.0751783377 -0.0043747455 -0.1155948060 -0.0750075659 0.1244430930 I don't see where the number of reps is returned, but the B setting defaults to 1000. -- david. Thanks for any assistance, Nate On Wed, Nov 9, 2011 at 1:10 PM, David Winsemius dwinsem...@comcast.net wrote: On Nov 9, 2011, at 2:59 PM, Nathan Miller wrote: Hello, This is a pretty simple question, but after spending quite a bit of time looking at Hmisc and using Google, I can't find the answer. If I use stat_summary(fun.data=mean_**cl_boot) in ggplot to generate 95% confidence intervals, how many bootstrap iterations are preformed by default? Can this be changed? I would at least like to be able to report the number of boot strap interations used to generate the CIs. I haven't been able to find mean_cl_boot as a function itself or something ressembling it in the Hmisc documentation, but it seems as though Hmisc is wrapped up in stat_summary() and is called to compute mean_cl_boot. You seem really, really confused (and you offer very little
[R] Calculating difference between values in data frame based on separate column
Hi all, Say I have a data frame something like the one below with different sample vials, measured before(B) and after(A) some process, with a value recorded at each measurement point vialmeasurevalue 1 B26 1 A12 2 B 45 2 A 30 3 B 32 3 A 27 4 B 34 4 A 6 Is there an easy means by which I can subtract the after (A) measurements from the before (B) measurement for each vial in this data frame so I am left with a data frame that contains the vial number and difference between A and B? I've played around with writing a function and applying it with ddply, but haven't stumbled on a technique that works, though I feel there should be something simple means of doing this that I'm just not seeing. Thanks for you help, Nate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying function with separate dataframe (calibration file) supplying some inputs
Thanks Josh, I appreciate your comments. Merge is quite easy and I guess I should just go that route. I don't have so many data points so the speed isn't really much of an issue. I was just curious whether there was anything more elegant. Yes, the function is not clear or ideal for troubleshooting. I adapted this from an excel sheet in an effort to speed my data analysis and in the excel file the final calculation was simply a long string of calculations. I appreciate your attempt to make it clearer and to chunk it into fewer easier to read pieces. I plan to go back and do that as well. Thanks again. Nate On Wed, Oct 19, 2011 at 10:33 PM, Joshua Wiley jwiley.ps...@gmail.comwrote: Hi Nathan, I honestly do not think that anything else will be much better than merging the two datasets. If the datasets are not merged, you essentially have to apply your optode function to each vial, store the results, then combine them all together. This is innefficient. Merging the datasets may be innefficient in a way, but once it is done, your function can be applied to the entire dataset in one step. If you have big data and the merge is slow, take a look at the data.table package. I have recently (not that it was bad before, I just never really knew how much it could do) been quite impressed with it. As a whole other note, your optode function was quite difficult to read, and I highly doubt you can confidently look at the code and ensure there is not a typo, missed operator, etc. somewhere in that block of formula code. I attempted to clean it up some, though perhaps not with 100% success. ### optode2 - function(cal0, T0, cal100, T100, phase, temp) { dr - pi/180 f1 - 0.801 deltaPsiK - (-0.08) deltaKsvK - 0.000383 m - 22.9 tan0T100 - tan(((cal0 + deltaPsiK * (T100 - T0))) * dr) tan0Tm - tan((cal0 + (deltaPsiK * (temp - T0))) * dr) tan100T100 - tan(cal100 * dr) tanmTm - tan(phase * dr) A - tan100T100 / tan0T100 / m * 100^2 B - tan100T100 / tan0T100 * 100 + tan100T100 / tan0T100 / m *100 - f1 / m * 100 - 100 + f1 * 100 C - tan100T100 / tan0T100 - 1 KsvT100 - (- B + (sqrt(B^2 - 4 * A * C))) / (2 * A) KsvTm - KsvT100 + (deltaKsvK * (temp - T100)) a - tanmTm / tan0Tm / m * KsvTm^2 b - tanmTm / tan0Tm * KsvTm + tanmTm / tan0Tm / m * KsvTm - f1 / m * KsvTm - KsvTm + f1 * KsvTm c - tanmTm / tan0Tm - 1 tot - tanmTm / tan0T100 big - tot * KsvTm + tanmTm / tan0T100 / m * KsvTm - f1 / m * KsvTm - KsvTm + f1 * KsvTm saturation - (-big + (sqrt((big)^2-4 * (tanmTm / tan0T100 / m * KsvTm^2) * (tot - 1 / (2 * (tot / m * KsvTm^2)) return(saturation) } ## Read in your example data d1 - read.table(textConnection( vial cal0T0 cal100 T100 1 61 1828 18 2 60.81827.118 3 60.21828.3 18 4 59.818 27.2 18), header = TRUE, stringsAsFactors = FALSE) d2 - read.table(textConnection( vial phasetemp time 1 3117.510 1 31.5 17.420 1 32.8 17.530 2 29.0 17.5 10 2 29.7 17.5 20 2 30.9 17.5 30 3 27.1 17.4 10 3 27.6 17.4 20 3 28.1 17.5 30 4 31.0 17.6 10 4 33.3 17.6 20 4 35.617.6 30), header = TRUE, stringsAsFactors = FALSE) closeAllConnections() dat - merge(d1, d2, by = vial) ## optode wrapper f - function(d) optode2(d$cal0, d$T0, d$cal100, d$T100, d$phase, d$temp) dat$oxygen - f(dat) dat ### Cheers, Josh On Wed, Oct 19, 2011 at 8:38 PM, Nathan Miller natemille...@gmail.com wrote: Hello, I am not entirely sure the subject line captures what I am trying to do, but hopefully this description of the problem will help folks to see my challenge and hopefully offer constructive assistance. I have an experimental setup where I measure the decrease in oxygen in small vials as an organism, such as an oyster, consumes the oxygen. Each vial is calibrated before the experiment and these calibrations are used to convert the raw data after the experiment into oxygen values. I end up with two dataframes. One has the calibration data and for example could look like this vial cal0T0 cal100 T100 1 61 1828 18 2 60.81827.118 3 60.21828.3 18 4 59.818 27.2 18 The second is a data file which could look like this vial phasetemp time 1 3117.510 1 31.5 17.420 1 32.8 17.530 2 29.0 17.5 10 2 29.7 17.5 20 2 30.9 17.5 30 3 27.1 17.4 10 3 27.6 17.4 20 3 28.1 17.5 30 4
[R] Applying function with separate dataframe (calibration file) supplying some inputs
Hello, I am not entirely sure the subject line captures what I am trying to do, but hopefully this description of the problem will help folks to see my challenge and hopefully offer constructive assistance. I have an experimental setup where I measure the decrease in oxygen in small vials as an organism, such as an oyster, consumes the oxygen. Each vial is calibrated before the experiment and these calibrations are used to convert the raw data after the experiment into oxygen values. I end up with two dataframes. One has the calibration data and for example could look like this vial cal0T0 cal100 T100 1 61 1828 18 2 60.81827.118 3 60.21828.3 18 4 59.818 27.2 18 The second is a data file which could look like this vial phasetemp time 1 3117.510 1 31.5 17.420 1 32.8 17.530 2 29.0 17.5 10 2 29.7 17.5 20 2 30.9 17.5 30 3 27.1 17.4 10 3 27.6 17.4 20 3 28.1 17.5 30 4 31.0 17.6 10 4 33.3 17.6 20 4 35.617.6 30 I have a complicated function (included at the bottom) that uses the calibration values and the raw data to calculate actual oxygen levels. It works great, but as its currently written it requires that each calibration be entered individually. I would rather apply the function based upon the vial number (applying the calibration for vial 1 to all vial 1 data, calibration for vial 2 to all vial 2 data). I have managed to do this by combining the two dataframes into one that looks like this data vial phasetemp time cal0 T0cal100 T100 1 3117.510 61 18 2818 1 31.5 17.42061 18 2818 1 32.8 17.53061 18 2818 1 33.6 17.5 40 61 18 2818 2 29.0 17.5 10 60.81827.1 18 2 29.7 17.5 20 60.81827.1 18 2 30.9 17.5 30 60.81827.1 18 3 27.1 17.4 10 60.21828.3 18 3 27.6 17.4 20 60.21828.3 18 3 28.1 17.5 30 60.21828.3 18 4 31.0 17.6 10 59.818 27.2 18 4 33.3 17.6 20 59.818 27.2 18 4 35.617.6 30 59.818 27.2 18 I have then used ddply to apply my function grouped by vial oxygen-ddply(data,.(vial), function(d) optode(d$cal0, d$T0, d$cal100, d$T100, d$phase, d$temp)) This works, but I do not like having to put the calibrations into the same dataframe as the data. Can anyone show me an example of how I could have a function reference a dataframe (like the calibration data) based upon a grouping variable (like vial) and use that data as partial inputs to the function when it is applied to another dataframe (the actual data)? I don't necessarily need an example using my sample data or the function I have included here (as it is rather unwieldly). A simplified example of how to achieve this type of cross referencing would suffice and I can apply the principle to my current problem. Thanks so much, Nate optode-function(cal0,T0,cal100,T100,phase,temp) { f1=0.801 deltaPsiK=-0.08 deltaKsvK=0.000383 m=22.9 tan0T100=tan(((cal0+deltaPsiK*(T100-T0)))*pi/180) tan0Tm=tan((cal0+(deltaPsiK*(temp-T0)))*pi/180) tan100T100=tan(cal100*pi/180) tanmTm=tan(phase*pi/180) A=tan100T100/tan0T100*1/m*100^2 B=tan100T100/tan0T100*100+tan100T100/tan0T100*1/m*100-f1*1/m*100-100+f1*100 C=tan100T100/tan0T100-1 KsvT100=(-B+(sqrt(B^2-4*A*C)))/(2*A) KsvTm=KsvT100+(deltaKsvK*(temp-T100)) a=tanmTm/tan0Tm*1/m*KsvTm^2 b=tanmTm/tan0Tm*KsvTm+tanmTm/tan0Tm*1/m*KsvTm-f1*1/m*KsvTm-KsvTm+f1*KsvTm c=tanmTm/tan0Tm-1
Re: [R] ggplot2-Issue placing error bars behind data points
Jeff Newmiller and Dennis, As always, very helpful. I appreciate the comments on how best to organize the code as well. Simplifies things greatly. Thanks, Nate On Wed, Sep 7, 2011 at 3:33 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: For your test data, try this: # Result of dput(NerveSurv) NerveSurv - structure(list(Time = c(0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5 ), SAP = c(1, 1.04, 1.04, 1.06, 1.04, 1.22, 1.01, 1, 1.01, 1.01, 1.06, 1.01, 0.977, 0.959, 1, 1.01, 1.06, 0.921, 0.951, 0.904, 0.911), Temp = c(25L, 25L, 25L, 25L, 25L, 25L, 25L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), SAPSE = c(0, 0.0412, 0.0935, 0.0818, 0.131, 0.144, 0.0712, 0, 0.0156, 0.0337, 0.0481, 0.0168, 0.00486, 0.0155, 0, 0.00462, 0.0491, 0.0329, 0.0304, 0.0471, 0.0722)), .Names = c(Time, SAP, Temp, SAPSE ), class = data.frame, row.names = c(NA, -21L)) limits-aes(ymax = NerveSurv$SAP + NerveSurv$SAPSE, ymin = NerveSurv$SAP - NerveSurv$SAPSE) # library('ggplot2') p - ggplot(data=NerveSurv,aes(x=Time,y=SAP)) p + geom_errorbar(limits,width=0.2) + geom_point(aes(fill=factor(Temp)), colour = colors()[173], shape = 21, size=4) + xlab(Time (min)) + xlim(c(0, 2)) + ylab(Normalized Spontaneous Action Potential Rate) + ylim(c(0,1.6)) + scale_fill_manual(Acclimation\nTemperature, breaks=c(8,18,25), labels=c(25ºC, 18 ºC, 8 ºC), values=c(colours()[c(173,253,218)])) Notice the following in the above code: (1) As Jeff Newmiller pointed out, put geom_errorbar() before geom_point(). (2) If you are going to be using the same value of a plot aesthetic, you *set* it outside aes() rather than *map* it inside aes(). See the revised code for geom_point(). The idea is that if a variable is assigned to a plot aesthetic (e.g., color, fill, shape), then it needs to be mapped inside aes(); if an aesthetic is set to a specific value, it is set outside aes(). (3) Your test data had an effective x-range of 0 - 1.5 rather than 0 - 50, so I shrunk xlim() for readability. (4) You can use \n inside of a character string as a carriage return. See the legend title for scale_fill_manual(). (5) Style note: you want the layer addition operator + to be at the end of a line, not at the beginning. Copy and paste this code verbatim to see what I mean: p + geom_errorbar(limits,width=0.2) + geom_point(aes(fill=factor(Temp)), colour = colors()[173], shape = 21, size=4) This happens because the first line is syntactically complete. HTH, Dennis On Wed, Sep 7, 2011 at 2:48 PM, Nathan Miller natemille...@gmail.com wrote: Hi all, This seems like a basic problem, but no amount of playing with the code has solved it. I have a time-series data set like that shown below (only longer) and am seeking to plot the data with filled, circular points and error bars. I would like the error bars to be behind the points otherwise they tend to obscure the points (especially when I have a lot of points in the actual data set). Despite providing a fill colour for the points and using shapes that utilize fill colours, the error bars are always placed on top of the points. Can anyone see the error I am making? I simply want to move the error bars so they are behind the data points. Thanks for your help. I assume its simple...or else its a bug. Nate Time SAP TempSAPSE 0.00 1.00 25 0.00 0.25 1.04 25 0.041200 0.50 1.04 25 0.093500 0.75 1.06 25 0.081800 1.00 1.04 25 0.131000 1.25 1.22 25 0.144000 1.50 1.01 25 0.071200 0.00 1.00 15 0.00 0.25 1.01 15 0.015600 0.50 1.01 150.033700 0.75 1.06 150.048100 1.00 1.01 150.016800 1.25 0.977000 150.004860 1.50 0.959000 150.015500 0.00 1.008 0.00 0.25 1.018 0.004620 0.50 1.068 0.049100 0.75 0.9210008 0.032900 1.00 0.9510008 0.030400 1.25 0.904000 8 0.047100 1.50 0.911000 8 0.072200 limits-aes(ymax=NerveSurv$SAP+NerveSurv$SAPSE,ymin=NerveSurv$SAP-NerveSurv$SAPSE) p-ggplot(data=NerveSurv,aes(x=Time,y=SAP)) p+geom_point(aes(colour=factor(Temp), shape=factor(Temp), fill=factor(Temp)), size=4) +geom_errorbar(limits,width=0.2) +xlab(Time (min)) +xlim(c(0,50)) +ylab(Normalized Spontaneous Action Potential Rate) +ylim(c(0,1.6)) +scale_shape_manual(Acclimation Temperature,breaks=c(8,18,25), labels=c(25 ºC, 18 ºC, 8 ºC),values=c(21,21,21)) +scale_fill_manual(Acclimation Temperature,breaks=c(8,18,25), labels=c(25 ºC, 18 ºC, 8 ºC),values=c(colours()[c(173,253,218)])) +scale_colour_manual(Acclimation Temperature,breaks=c(8,18,25), labels=c(25 ºC, 18 ºC, 8 ºC
[R] ggplot2-Issue placing error bars behind data points
Hi all, This seems like a basic problem, but no amount of playing with the code has solved it. I have a time-series data set like that shown below (only longer) and am seeking to plot the data with filled, circular points and error bars. I would like the error bars to be behind the points otherwise they tend to obscure the points (especially when I have a lot of points in the actual data set). Despite providing a fill colour for the points and using shapes that utilize fill colours, the error bars are always placed on top of the points. Can anyone see the error I am making? I simply want to move the error bars so they are behind the data points. Thanks for your help. I assume its simple...or else its a bug. Nate Time SAP TempSAPSE 0.00 1.00 25 0.00 0.25 1.04 25 0.041200 0.50 1.04 25 0.093500 0.75 1.06 25 0.081800 1.00 1.04 25 0.131000 1.25 1.22 25 0.144000 1.50 1.01 25 0.071200 0.00 1.00 15 0.00 0.25 1.01 15 0.015600 0.50 1.01 150.033700 0.75 1.06 150.048100 1.00 1.01 150.016800 1.25 0.977000 150.004860 1.50 0.959000 150.015500 0.00 1.008 0.00 0.25 1.018 0.004620 0.50 1.068 0.049100 0.75 0.9210008 0.032900 1.00 0.9510008 0.030400 1.25 0.9040008 0.047100 1.50 0.9110008 0.072200 limits-aes(ymax=NerveSurv$SAP+NerveSurv$SAPSE,ymin=NerveSurv$SAP-NerveSurv$SAPSE) p-ggplot(data=NerveSurv,aes(x=Time,y=SAP)) p+geom_point(aes(colour=factor(Temp), shape=factor(Temp), fill=factor(Temp)), size=4) +geom_errorbar(limits,width=0.2) +xlab(Time (min)) +xlim(c(0,50)) +ylab(Normalized Spontaneous Action Potential Rate) +ylim(c(0,1.6)) +scale_shape_manual(Acclimation Temperature,breaks=c(8,18,25), labels=c(25 ºC, 18 ºC, 8 ºC),values=c(21,21,21)) +scale_fill_manual(Acclimation Temperature,breaks=c(8,18,25), labels=c(25 ºC, 18 ºC, 8 ºC),values=c(colours()[c(173,253,218)])) +scale_colour_manual(Acclimation Temperature,breaks=c(8,18,25), labels=c(25 ºC, 18 ºC, 8 ºC), values=c(colours()[c(173,173,173)])) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Regression with 2 grouping variables
Thanks Dennis! Worked perfectly. I keep forgetting that plyr can split data based on multiple subsetting variables. Thanks so much, Nate On Mon, Aug 22, 2011 at 10:12 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: You're kind of on the right track, but there is no conditioning formula in lm(); it's not lattice :) This is relatively easy to do with the plyr package, though: library('plyr') # Generate a list of models - the subsetting variables (Site, Vial) are # used to generate the data splits and the function is run on a generic # data split d, assumed to be a data frame. mlist - dlply(feed1, .(Site, Vial), function(d) lm(lnRFU ~ Time, data = d)) # To get the set of model coefficients, take the list mlist as the input # data (argument 1) and then for each generic component 'm', extract its # coefficients: ldply(mlist, function(m) coef(m)) For your test data, only Vial varies - the example code below reflects that: mlist - dlply(feed1, .(Vial), function(d) lm(lnRFU ~ Time, data = d)) length(mlist) # three component model objects [1] 3 ldply(mlist, function(m) coef(m)) Vial (Intercept) Time 1110.75440 -0.001508621 2210.83171 -0.005095100 3310.81087 -0.004897600 This idea can be generalized: if you want to pull out similar pieces of output from each model, run ldply() on the list of models and create a utility function that outputs, for a generic model object m, what you want to have returned. Common choices include R^2 values, tables of coefficients (as lists instead of data frames) or residuals and predicted values. The game is to write the function so that it takes a [list] model object (here, m) as input and a data frame as output. You can also extract output from summary(m) in a similar way, using m as the input object. HTH, Dennis On Mon, Aug 22, 2011 at 6:15 PM, Nathan Miller natemille...@gmail.com wrote: Hi all, I have a data set that looks a bit like this. feed1 RFU Site Vial Time lnRFU 1 811 10 10.702075 2 4752111 20 10.768927 3 4290511 30 10.66674 4 4686711 40 10.755069 5 4299511 50 10.668839 6 4307411 60 10.670675 7 4119511 70 10.626072 8 4709012 10 10.759816 9 4810012 20 10.781037 10 4321512 30 10.673943 11 3965612 40 10.587998 12 3879912 50 10.566150 13 3842412 60 10.556438 14 35240 1 2 70 10.469937 15 4642713 10 10.745636 16 46418 1 3 20 10.745443 17 4209513 30 10.647684 .. There are 5 columns of data, three levels of Site, 10 Vials per site, and measurements were taken at 10 min intervals from 10-70.. I am primarily interested in the relationship between Time and lnRFU to calculate the rate at which lnRFU declines over time. I have a nice plot using a ggplot2 code that looks like this p-ggplot(data=feed1,aes(x=Time,y=lnRFU)) p+geom_point(size=4)+facet_grid(Site~Vial)+geom_smooth(method=lm) The graph is useful to visualize the changes over time and grouped by both Site and Vial, but I also need the slopes of the linear regressions for each Vial, within a Site. This is where I run into a problem. I want to run a linear regression of lnRFU as a function of Time grouped by both Site and Vial. Its easy to visualize this comparison in ggplot using facet_grid(), but I'm not sure how to do a similar comparison/analysis within lm() I imagine something like fit-lm(lnRFU~Time | Vial * Site, data=feed1) in which I group by both Vial and Site, but obviously this code doesn't work. Does anyone have an idea for how to do a linear regression with two grouping variables? Do I have to go back and combine Vial and Site into a single grouping variable or can I leave the dataframe the way it is? I'm trying to imagine a means of accomplishing the same type of thing that facet_grid does when it allows you to plot the data as a function of two grouping variables. Thanks for you time. I greatly appreciate it. Nate Miller [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear Regression with 2 grouping variables
Hi all, I have a data set that looks a bit like this. feed1 RFU Site Vial Time lnRFU 1 811 10 10.702075 2 4752111 20 10.768927 3 4290511 30 10.66674 4 4686711 40 10.755069 5 4299511 50 10.668839 6 4307411 60 10.670675 7 4119511 70 10.626072 8 4709012 10 10.759816 9 4810012 20 10.781037 10 4321512 30 10.673943 11 3965612 40 10.587998 12 3879912 50 10.566150 13 3842412 60 10.556438 14 3524012 70 10.469937 15 4642713 10 10.745636 16 4641813 20 10.745443 17 4209513 30 10.647684 .. There are 5 columns of data, three levels of Site, 10 Vials per site, and measurements were taken at 10 min intervals from 10-70.. I am primarily interested in the relationship between Time and lnRFU to calculate the rate at which lnRFU declines over time. I have a nice plot using a ggplot2 code that looks like this p-ggplot(data=feed1,aes(x=Time,y=lnRFU)) p+geom_point(size=4)+facet_grid(Site~Vial)+geom_smooth(method=lm) The graph is useful to visualize the changes over time and grouped by both Site and Vial, but I also need the slopes of the linear regressions for each Vial, within a Site. This is where I run into a problem. I want to run a linear regression of lnRFU as a function of Time grouped by both Site and Vial. Its easy to visualize this comparison in ggplot using facet_grid(), but I'm not sure how to do a similar comparison/analysis within lm() I imagine something like fit-lm(lnRFU~Time | Vial * Site, data=feed1) in which I group by both Vial and Site, but obviously this code doesn't work. Does anyone have an idea for how to do a linear regression with two grouping variables? Do I have to go back and combine Vial and Site into a single grouping variable or can I leave the dataframe the way it is? I'm trying to imagine a means of accomplishing the same type of thing that facet_grid does when it allows you to plot the data as a function of two grouping variables. Thanks for you time. I greatly appreciate it. Nate Miller [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot boxplot
Hello, I am trying to generate a set of boxplots using ggplot with the following data with 4 columns (Day, Site, VO2, Cruise) AllCorbulaMR Day Site VO2 Cruise 1 11 148.43632670 1 2 11 61.73864969 1 3 11 92.64536096 1 4 11 73.35434957 1 5 11 69.85568810 1 6 11 98.71116866 1 7 11 67.57880107 1 8 11 80.57959160 1 9 11 53.38577137 1 1012 81.08429594 1 1112 79.73019687 1 1212 67.93991806 1 1312 50.69929558 1 1412 42.02457680 1 1512 64.10049924 1 1612 80.02264095 1 1712 67.14828804 1 1812 93.33363743 1 1913 53.86021985 1 2013 50.53366868 1 2113 52.12437086 1 2213 43.44618922 1 2313 64.64322840 1 2413 55.03761768 1 2513 67.79501374 1 2614 12.70806068 1 2714 114.56401960 1 2814 34.34450695 1 2914 76.70849935 1 3014 68.99752863 1 3114 71.23080332 1 3211 0.08222308 2 3311 NA 2 3411 0.03743258 2 3511 0.04496363 2 3611 0.07184903 2 3711 0.05637676 2 3811 0.05163886 2 3911 0.03022606 2 4011 0.04150667 2 4121 0.04982530 2 4221 0.05248479 2 4321 0.03839707 2 4421 0.04283591 2 4521 0.03285247 2 4621 0.03965853 2 4721 NA 2 4821 0.03637822 2 4921 0.03686663 2 5012 0.04086229 2 5112 NA 2 5212 0.01891389 2 5312 0.03365864 2 5412 0.04179611 2 5512 0.04675111 2 5612 0.04734616 2 5712 0.04046907 2 5812 0.03395499 2 5922 0.02104620 2 6022 NA 2 6122 NA 2 6222 NA 2 6322 0.01882796 2 6422 NA 2 6522 NA 2 6622 0.02328894 2 6722 0.02635327 2 6813 0.06030056 2 6913 0.0472 2 7013 0.04307900 2 7113 0.05144241 2 7213 0.03223973 2 7313 0.05145292 2 7413 0.02718536 2 7513 0.02830348 2 7613 0.05859836 2 7723 0.04521778 2 7823 0.03242385 2 7923 0.03412688 2 8023 0.04407171 2 8123 0.04517834 2 8223 NA 2 8323 0.02745407 2 8423 0.03118602 2 8523 0.04420074 2 8614 0.05352334 2 8714 0.05378120 2 8814 0.04394838 2 8914 0.02597939 2 9014 0.05476946 2 9114 0.04371743 2 9224 0.04022729 2 9324 0.04078509 2 9424 0.04911994 2 9524 0.04068468 2 9624 NA 2 9724 NA 2 9811 0.08892223 3 9911 0.08617873 3 100 11 0.06108950 3 101 11 0.19047922 3 102 11 0.09865930 3 103 11 0.12103549 3 104 11 0.06788404 3 105 11 0.12629497 3 106 11 0.10947173 3 107 11 0.11381467 3 108 21 0.07809781 3 109 21 0.04397586 3 110 21 0.06317635 3 111 21 0.02020365 3 112 21 0.09525985 3 113 21 0.04732347 3 114 21 0.03043341 3 115 21 0.04419395 3 116 21 NA 3 117 21 NA 3 118 12 0.03380003 3 119 12 0.02600926 3 120 12 0.03980552 3 121 12 0.03659985 3 122 12 0.04867881 3 123 12 0.03694679 3 124 12 0.03372825 3 125 12 0.03644750 3 126 12 0.01497611 3 127 12 0.02697976 3 128 22 0.03136923 3 129 22 0.03602215 3 130 22 0.04000660 3 131 22 0.03673098 3 132 22 0.03090854 3 133 22 0.04877643 3 134 22 0.02468537 3 135 22 NA 3 136 22 NA 3 137 22 NA 3 138 13 0.04809866 3 139 13 0.02380070 3 140 13 0.03672271 3 141 13 0.03232115 3 142 13 0.02950701 3 143 13 0.05068163 3 144 13 0.03004234 3 145 13 0.03090461 3 146 13 0.04539888 3 147
[R] Error in combined for() and if() code
Hello, I am trying to filter a data set like below so that the peaks in the Phase value are more obvious and can be identified by a peak finding function following the useful advise of Carl Witthoft. I have written the following for(i in length(data$Phase)){ newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){ data$Phase[i+1] }else{data$Phase[i] } } I get the following error which I have not seen before when I paste the code into R Error in if (abs(data$Phase[i + 1] - data$Phase[i]) 6) { : missing value where TRUE/FALSE needed I don't have much experience with such loops as I have tried to avoid using them in the past. Can anyone identify the error(s) in the code I have written or a simpler means of writing such a filter? Thank you, Nate data= Time Phase 1 0.000 15.18 2 0.017 13.42 3 0.034 11.40 4 0.051 18.31 5 0.068 25.23 6 0.085 33.92 7 0.102 42.86 8 0.119 42.87 9 0.136 42.88 10 0.153 42.88 11 0.170 42.87 12 0.186 42.88 13 0.203 42.88 14 0.220 42.78 15 0.237 33.50 16 0.254 24.81 17 0.271 17.20 18 0.288 10.39 19 0.305 13.97 20 0.322 16.48 21 0.339 14.75 22 0.356 20.80 23 0.373 25.79 24 0.390 31.25 25 0.407 39.89 26 0.423 40.04 27 0.440 40.05 28 0.457 40.05 29 0.474 40.05 30 0.491 40.05 31 0.508 40.06 32 0.525 40.07 33 0.542 32.23 34 0.559 23.90 35 0.576 17.86 36 0.592 11.63 37 0.609 12.78 38 0.626 13.12 39 0.643 10.93 40 0.660 10.63 41 0.677 10.82 42 0.694 11.84 43 0.711 20.44 44 0.728 27.33 45 0.745 34.22 46 0.762 41.55 47 0.779 41.55 48 0.796 41.55 49 0.813 41.53 50 0.830 41.53 51 0.847 41.52 52 0.864 41.52 53 0.880 41.53 54 0.897 41.53 55 0.914 33.07 56 0.931 25.12 57 0.948 19.25 58 0.965 11.30 59 0.982 12.48 60 0.999 13.85 61 1.016 13.62 62 1.033 12.62 63 1.050 19.39 64 1.067 25.48 65 1.084 31.06 66 1.101 39.49 67 1.118 39.48 68 1.135 39.46 69 1.152 39.45 70 1.169 39.43 71 1.185 39.42 72 1.202 39.42 73 1.219 39.41 74 1.236 39.41 75 1.253 37.39 76 1.270 29.03 77 1.287 20.61 78 1.304 14.07 79 1.321 9.12 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in combined for() and if() code
Hi all, I haven't solved the problem of filtering the data, but I have managed to find all the peaks in the data despite their relatively flat nature using peaks() in the IDPmisc package. It works really well for my data and the ability to set a lower threshold for peaks to report is convenient as well. Maybe I'll came back to the data filtering problem later. Thanks for your help and comments, Nate On Tue, Dec 28, 2010 at 10:49 AM, David Winsemius dwinsem...@comcast.netwrote: On Dec 28, 2010, at 1:08 PM, Nathan Miller wrote: Hello, I am trying to filter a data set like below so that the peaks in the Phase value are more obvious and can be identified by a peak finding function following the useful advise of Carl Witthoft. I have written the following for(i in length(data$Phase)){ newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){ data$Phase[i+1] }else{data$Phase[i] } } I get the following error which I have not seen before when I paste the code into R Error in if (abs(data$Phase[i + 1] - data$Phase[i]) 6) { : missing value where TRUE/FALSE needed I don't have much experience with such loops as I have tried to avoid using them in the past. Can anyone identify the error(s) in the code I have written or a simpler means of writing such a filter? Sometimes it's more informative to look at the data first. Here's a plot of the data with the first and second differences underneath plot(data, ylim=c(-5, max(data$Phase)) ) lines(data$Time[-1], diff(data$Phase) ) lines(data$Time[-(1:2)], diff(diff(data$Phase)), col=red) Your data had rather flat-topped maxima. These maxima are defined by the interval between the times when the first differences are zero (OR go from positive to negative) AND the second differences are negative (OR zero). There is a package on CRAN: http://cran.r-project.org/web/packages/msProcess/index.html that purports to do peak finding. I would think the local maxima in you data might need some filtering and presumably the mass-spec people have need of that too. Thank you, Nate data= Time Phase 1 0.000 15.18 2 0.017 13.42 3 0.034 11.40 4 0.051 18.31 5 0.068 25.23 6 0.085 33.92 7 0.102 42.86 8 0.119 42.87 9 0.136 42.88 10 0.153 42.88 11 0.170 42.87 12 0.186 42.88 13 0.203 42.88 14 0.220 42.78 15 0.237 33.50 16 0.254 24.81 17 0.271 17.20 18 0.288 10.39 19 0.305 13.97 20 0.322 16.48 21 0.339 14.75 22 0.356 20.80 23 0.373 25.79 24 0.390 31.25 25 0.407 39.89 26 0.423 40.04 27 0.440 40.05 28 0.457 40.05 29 0.474 40.05 30 0.491 40.05 31 0.508 40.06 32 0.525 40.07 33 0.542 32.23 34 0.559 23.90 35 0.576 17.86 36 0.592 11.63 37 0.609 12.78 38 0.626 13.12 39 0.643 10.93 40 0.660 10.63 41 0.677 10.82 42 0.694 11.84 43 0.711 20.44 44 0.728 27.33 45 0.745 34.22 46 0.762 41.55 47 0.779 41.55 48 0.796 41.55 49 0.813 41.53 50 0.830 41.53 51 0.847 41.52 52 0.864 41.52 53 0.880 41.53 54 0.897 41.53 55 0.914 33.07 56 0.931 25.12 57 0.948 19.25 58 0.965 11.30 59 0.982 12.48 60 0.999 13.85 61 1.016 13.62 62 1.033 12.62 63 1.050 19.39 64 1.067 25.48 65 1.084 31.06 66 1.101 39.49 67 1.118 39.48 68 1.135 39.46 69 1.152 39.45 70 1.169 39.43 71 1.185 39.42 72 1.202 39.42 73 1.219 39.41 74 1.236 39.41 75 1.253 37.39 76 1.270 29.03 77 1.287 20.61 78 1.304 14.07 79 1.321 9.12 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding flat-topped peaks in simple data set
Hello, Thank you to all those great folks that have helped me in the past (especially Dennis Murphy). I have a new challenge. I often generate time-series data sets that look like the one below, with a variable (Phase) which has a series of flat-topped peaks (sample data below with 5 peaks). I would like to calculate the phase value for each peak. It would be great to calculate the mean value for the peak (possibly using rollmean(), but given the low variability within the broad peak even just picking one value from the peak would be sufficient. I have tried using peaks() from the library(simecol), peaks(data$Time,data$phase, model=max), but because of the flat nature of the peaks and the fact that peaks() looks for values with lower neighbors I am unable to find all the peaks (especially the one between time 0.762-0.897). For all my data files the maximum phase values will fall between 27 and 62 so adding a function which removes the values below 27 is reasonable and a technique I have used to eliminate some of the noise between the peaks. data= Time Phase 1 0.000 15.18 2 0.017 13.42 3 0.034 11.40 4 0.051 18.31 5 0.068 25.23 6 0.085 33.92 7 0.102 42.86 8 0.119 42.87 9 0.136 42.88 10 0.153 42.88 11 0.170 42.87 12 0.186 42.88 13 0.203 42.88 14 0.220 42.78 15 0.237 33.50 16 0.254 24.81 17 0.271 17.20 18 0.288 10.39 19 0.305 13.97 20 0.322 16.48 21 0.339 14.75 22 0.356 20.80 23 0.373 25.79 24 0.390 31.25 25 0.407 39.89 26 0.423 40.04 27 0.440 40.05 28 0.457 40.05 29 0.474 40.05 30 0.491 40.05 31 0.508 40.06 32 0.525 40.07 33 0.542 32.23 34 0.559 23.90 35 0.576 17.86 36 0.592 11.63 37 0.609 12.78 38 0.626 13.12 39 0.643 10.93 40 0.660 10.63 41 0.677 10.82 42 0.694 11.84 43 0.711 20.44 44 0.728 27.33 45 0.745 34.22 46 0.762 41.55 47 0.779 41.55 48 0.796 41.55 49 0.813 41.53 50 0.830 41.53 51 0.847 41.52 52 0.864 41.52 53 0.880 41.53 54 0.897 41.53 55 0.914 33.07 56 0.931 25.12 57 0.948 19.25 58 0.965 11.30 59 0.982 12.48 60 0.999 13.85 61 1.016 13.62 62 1.033 12.62 63 1.050 19.39 64 1.067 25.48 65 1.084 31.06 66 1.101 39.49 67 1.118 39.48 68 1.135 39.46 69 1.152 39.45 70 1.169 39.43 71 1.185 39.42 72 1.202 39.42 73 1.219 39.41 74 1.236 39.41 75 1.253 37.39 76 1.270 29.03 77 1.287 20.61 78 1.304 14.07 79 1.321 9.12 I have also tried using the peak finding code from Prof. Ripely, but it doesn't seem able to find all the peaks either even if I widen the span. peaks-function(series,span=3) { z - embed(series, span) s - span%/%2 v- max.col(z) == 1 + s result - c(rep(FALSE,s),v) result - result[1:(length(result)-s)] result } Can anyone offer some advice on how to find a series of peaks in a data file when the peaks are flat-topped and actually less peaked? Thanks so much! Nate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot missing fill colours in boxplot legend
Hello, I am trying to create a series of boxplots with the following data, three columns, Day (1 or 2), Site (1-4), and VO2 (some values missing for some Sites or Days) CorbulaMR3 Day Site VO2 111 88.92223 211 86.17873 311 61.08950 411 190.47922 511 98.65930 611 121.03549 711 67.88404 811 126.29497 911 109.47173 10 11 113.81467 11 21 78.09781 12 21 43.97586 13 21 63.17635 14 21 20.20365 15 21 95.25985 16 21 47.32347 17 21 30.43341 18 21 44.19395 19 21NA 20 21NA 21 12 33.80003 22 12 26.00926 23 12 39.80552 24 12 36.59985 25 12 48.67881 26 12 36.94679 27 12 33.72825 28 12 36.44749 29 12 14.97611 30 12 26.97976 31 22 31.36923 32 22 36.02215 33 22 40.00660 34 22 36.73098 35 22 30.90854 36 22 48.77643 37 22 24.68537 38 22NA 39 22NA 40 22NA 41 13 48.09866 42 13 23.80070 43 13 36.72271 44 13 32.32115 45 13 29.50701 46 13 50.68163 47 13 30.04234 48 13 30.90461 49 13 45.39888 50 13 32.61571 51 23 20.69708 52 23 21.17658 53 23 32.44907 54 23 34.04048 55 23 45.97381 56 23 40.34278 57 23 21.67128 58 23 22.45179 59 23NA 60 23NA 61 14 39.35840 62 14 29.32294 63 14 49.28409 64 14 50.75344 65 14 43.53663 66 14 33.76173 67 14 36.40901 68 14 36.16992 69 14 49.99144 70 14NA 71 24 28.64131 72 24 22.69317 73 24 42.31203 74 24 31.17968 75 24 59.36813 76 24 44.53866 77 24 45.96834 78 24 33.16604 79 24 35.57714 80 24 35.46922 I have the following code which works well except that in the legend the fill colour for Day 1 and Day 2 is left blank. Instead the little boxes next to the labels (Day 1, Day 2) are both unfilled (white). The boxplots are filled properly, but without the fill colour in the legend its difficult decipher the plot p=ggplot(CorbulaMR3,aes(factor(Site),VO2)) p+geom_boxplot(aes(fill=factor(Day)))+scale_fill_manual('Day', breaks = c('Day 1', 'Day 2'), + values = c('grey','black'), + labels = c('Day 1', 'Day 2'))+xlab('Sampling Site')+ylab('Metabolic Rate')+opts(title=Sampling) I've played around and if I don't use the scale_fill_manual() and instead simply plot using p+geom_boxplot(aes(fill=factor(Day))) the legend boxes are filled, but obviously plot/legend is not labeled in the manner I would like and I don't much care for the default colours. Any thoughts on why when I use the scale_fill_manual() the fill colours are not shown in the legend? I'm pretty sure its something simple. I'm working on a Mac with R 2.12.0 Thank you, Nate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wait for user input with readline()
Hello, I am trying write a script that includes a prompt for user input using readlines(). I am running into the problem that when I run readlines() as a single line the prompt works perfectly, but when I try to run a block of code which includes the readline function, the script doesn't wait for the user input. I have seen this question posted before when I did a search, but I didn't find an suitable answer. Is there a means of ensuring that the script does not proceed until a value has been entered to readline(). Can I put readline in a function that will wait for input? Are there other options for getting user input that allow require that the script wait for user input? Thanks for your help, Nate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Batch Processing Files
Dennis and all, Thank you for the help as I try to get this method for importing and batch processing files organized. I currently have this set-up to import data from two files in my working directory. Var1 specifies data from file 1 and file 2. filenames=list.files() library(plyr) import.list=adply(filenames, 1, read.csv) import.list Var1 Time O2_conc Chla_conc 1 10 273.5 300 2 1 10 268.1 240 3 1 20 262.8 210 4 1 30 257.4 175 5 1 40 252.0 155 6 1 50 246.6 138 7 1 60 241.3 129 8 1 70 235.9 121 9 1 80 230.5 117 1020 270.0 300 112 10 269.0 240 122 20 257.0 210 132 30 259.0 175 142 40 230.0 155 152 50 220.0 138 162 60 221.0 129 172 70 450.0 121 182 80 250.0 117 For my calculation I am only interested in column 3, so I used split() to pull out this column based upon the value of Var1. split(import.list[,3],import.list$Var1) $`1` [1] 273.5 268.1 262.8 257.4 252.0 246.6 241.3 235.9 230.5 $`2` [1] 270 269 257 259 230 220 221 450 250 Finally I want to find the minimum value in column 3 (for each file) and the three previous data points, and then calculate the mean of these four values. a=which.min(data$1) b=(which.min(data$1)-3) c=c(a:b) d=mean(data$1[c]) d [1] 238.575 So far this works, but this last set of calculations (the min, mean) are not very automated and are probably a bit cumbersome. Rather than having to replace a=which.min(data$1) with a=which.min(data$2) to do this calculation on the second file I would like to write a loop or use some R package with looping abilities to cycle through this (I have more than 2 files, this is just a test set). Im imagining a psuedocode something like this For (i in 1:20) { a[i]=which.min(data$[i]) b[i]=which.min(data$[i])-3 c[i]=c(a[i]:b[i]) etc ...though this particular code is most certainly too cumbersome. Any thoughts on how to make a loop so that I can cycle through the files and run these calculations? Thank you, Nate On Mon, Nov 15, 2010 at 9:12 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: See inline. On Mon, Nov 15, 2010 at 4:26 PM, Nate Miller natemille...@gmail.comwrote: Hi All! I have some experience with R, but less experience writing scripts using R and have run into a challenge that I hope someone can help me with. I have multiple .csv files of data each with the same 3 columns of data, but potentially of varying lengths (some data files are from short measurements, others from longer ones). One file for example might look like this... Time, O2_conc, Chla_conc 0,270,300 10, 260, 280 20, 245, 268 30, 233, 238 40, 222, 212 50, 215, 201 60, 208, 193 70, 206, 191 80, 207,189 90, 206, 186 100, 206, 183 110, 207, 178 120, 205, 174 130, 240, 171 140, 270, 155 I am looking for an efficient means of batch (or sequentially) processing these files so that I can 1. import each data file 2. find the minimum value recorded in column 2 and the previous 5 data points Don't know what you mean by the 'previous 5 data points' ...are you referring to a rolling minimum? 3. and average these 10 values to get a mean, minimum value. If the surmise above is correct, you should get 11 rolling mins for a vector of length 15. Here's an example using the rollapply() function from the zoo package: library(zoo) x - rpois(15, 10) x [1] 17 12 17 9 8 10 7 11 15 11 11 15 5 9 12 rollapply(zoo(x), 5, FUN = min) 3 4 5 6 7 8 9 10 11 12 13 8 8 7 7 7 7 7 11 5 5 5 mean(rollapply(zoo(x), 5, FUN = min)) [1] 7 Currently I have imported the data files using the following filenames=list.files() library(plyr) import.list=adply(filenames, 1, read.csv) This seems to be a reasonable approach. Does the result keep a column for the file names? and I know how to write a code to calculate the minimum value and the 5 preceding values in a single column, in a single file. I think the problem I am running into is scaling this code up so that I can import multiple files and calculating mean, minimum value for the 2^nd column in each of them. As long as you have an indicator for each file, this should be pretty straightforward. Write a function that produces the summaries for one data frame and then do something like ddply(slurpedFiles, .(dsIndicator), myfunction) to map the function to all of them. The data.table package is an alternative, where you should be able to do similar things using the data set names as a key. data.table 'thinks' more like an SQL, but it can be very efficient. You should be able to do what you're asking for with either package. HTH, Dennis Can anyone offer some advice on how to batch processes