Re: [R] Determining maximum hourly slope per day

2013-03-14 Thread Nathan Miller
Thanks for your example Peter.

It does seem like one means of getting the job done and if I'm unable to
figure out a means with zoo::rollapply I may follow your example.

Hopefully someone else can chime in.

Thanks,
Nate

---
Nathan A. Miller

Romberg Tiburon Center
San Francisco State University
3150 Paradise Dr
Tiburon, CA 94920

www.nate-miller.org


On Wed, Mar 13, 2013 at 4:34 PM, Peter Ehlers ehl...@ucalgary.ca wrote:

 On 2013-03-12 17:10, Nathan Miller wrote:

 Hello,

 I have a challenge!

 I have a large dataset with three columns, date,temp, location.
 date is in the format %m/%d/%y %H:%M, with a temp recorded every 10
 minutes. These temperatures of surface temperatures and so fluctuate
 during
 the day, heating up and then cooling down, so the data is a series of
 peaks
 and troughs. I would like to develop a function that would go through a
 dataset consisting of many sequential dates and determine for each day the
 maximum hourly slope of temp~date for each site (the fastest hourly rate
 of
 heating). The output would be the date, the maximum hourly slope for that
 date, and the location. It would also be great if I could extract when
 during the day the maximum hourly slope occurred.

 I have been playing around with using the package lubridate to identify
 each hour of the day using something like this to create a separate column
 grouping the data into hours

 library(lubridate)
 data$date2 - floor_date(data$date, hour)

 I was then imagining something like this though this code doesn't work as
 written.

 ddply(data, .(location, date2), function(d)
 max(rollapply(slope(d$temp~d$**date, data=d)))

 Essentially what I'm imagining is calculating the slope (though I'd have
 to
 write a quick slope function) of the date/temp relationship, use rollapply
 to apply this function across the dataset, and determine the maximum
 slope,
 grouped by location and hour (using date2). Hmm... and per day!

 This seems complicated. Can others think of a simpler, more elegant means
 of extracting this type of data? I struggled to put together a working
 example with a set of data, but if this doesn't make sense let me know and
 I'll see what I can do.


 Thanks,
 Nate


 First, let's ignore location; if you can do it for one location,
 you can surely do it for others.

 Second, let's ignore date; if you can do it for one date, you
 can surely do it for others.

 That leaves us with the question of what you want to do for one
 given date. If you want the maximum slope for any 60-minute interval
 on that date (which I take your question to mean), then rollapply
 should do the job. But I'm not very familiar with zoo, so here's a
 crude approach:

   d - data.frame(time = 1:72, temp = rnorm(72))
   slope - rep(NA, 72)
   for(i in 6:72) {
  slope[i] - coef(lm(temp ~ time, data = d, subset = (i-5):i))[2]
   }
   maxslope - max(slope, na.rm = TRUE)
   idx - which.max(slope)

 Obviously, this can be extended to cover more than a 24-hour period.

 Now, let's wait for Gabor to show us the trivial way with zoo::rollapply.

 Peter Ehlers



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Determining maximum hourly slope per day

2013-03-12 Thread Nathan Miller
Hello,

I have a challenge!

I have a large dataset with three columns, date,temp, location.
date is in the format %m/%d/%y %H:%M, with a temp recorded every 10
minutes. These temperatures of surface temperatures and so fluctuate during
the day, heating up and then cooling down, so the data is a series of peaks
and troughs. I would like to develop a function that would go through a
dataset consisting of many sequential dates and determine for each day the
maximum hourly slope of temp~date for each site (the fastest hourly rate of
heating). The output would be the date, the maximum hourly slope for that
date, and the location. It would also be great if I could extract when
during the day the maximum hourly slope occurred.

I have been playing around with using the package lubridate to identify
each hour of the day using something like this to create a separate column
grouping the data into hours

library(lubridate)
data$date2 - floor_date(data$date, hour)

I was then imagining something like this though this code doesn't work as
written.

ddply(data, .(location, date2), function(d)
max(rollapply(slope(d$temp~d$date, data=d)))

Essentially what I'm imagining is calculating the slope (though I'd have to
write a quick slope function) of the date/temp relationship, use rollapply
to apply this function across the dataset, and determine the maximum slope,
grouped by location and hour (using date2). Hmm... and per day!

This seems complicated. Can others think of a simpler, more elegant means
of extracting this type of data? I struggled to put together a working
example with a set of data, but if this doesn't make sense let me know and
I'll see what I can do.


Thanks,
Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Nathan Miller
Hi all,

I have been looking for means of add a contour around some points in a
scatterplot as a means of representing the center of density for of the
data. I'm imagining something like a 95% confidence estimate drawn around
the data.

So far I have found some code for drawing polygons around the data. These
look nice, but in some cases the polygons are strongly influenced by
outlying points. Does anyone have a thought on how to draw a contour which
is more along the lines of a 95% confidence space?

I have provided a working example below to illustrate the drawing of the
polygons. As I said I would rather have three ovals/95% contours drawn
around the points by level to capture the different density distributions
without the visualization being heavily influenced by outliers.

I have looked into the code provided here from Hadley
https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
using the mvtnorm package and the dmvnorm function, but haven't been able
to get it work for my data example. The calculated densities are always
zero (at this step of Hadley's code: dgrid$dens -
dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )

I appreciate any assistance.

Thanks,
Nate

x-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
seq(0.4,0.6,length.out=30))
y-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
data-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)


find_hull - function(data) data[chull(data$x, data$y), ]
hulls - ddply(data, .(level), find_hull)

fig1 - ggplot(data=data, aes(x, y, colour=(factor(level)),
fill=level))+geom_point()
fig1 - fig1 + geom_polygon(data=hulls, alpha=.2)
fig1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Nathan Miller
Thanks Ista,

I have played a bit with stat_density2d as well. It doesn't completely
capture what I am looking for and ends up being quite busy at the same
time. I'm looking for a way of helping those looking that the figure to see
the broad patterns of where in the x/y space the data from different groups
are distributed. Using the 95% CI type idea is so that I don't end up
arbitrarily drawing circles around each set of points. I appreciate your
direction though.

Nate


On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn istaz...@gmail.com wrote:

 Hi Nathan,

 This only fits some of your criteria, but have you looked at
 ?stat_density2d?

 Best,
 Ista

 On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller natemille...@gmail.com
 wrote:
  Hi all,
 
  I have been looking for means of add a contour around some points in a
  scatterplot as a means of representing the center of density for of the
  data. I'm imagining something like a 95% confidence estimate drawn around
  the data.
 
  So far I have found some code for drawing polygons around the data. These
  look nice, but in some cases the polygons are strongly influenced by
  outlying points. Does anyone have a thought on how to draw a contour
 which
  is more along the lines of a 95% confidence space?
 
  I have provided a working example below to illustrate the drawing of the
  polygons. As I said I would rather have three ovals/95% contours drawn
  around the points by level to capture the different density
 distributions
  without the visualization being heavily influenced by outliers.
 
  I have looked into the code provided here from Hadley
  https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
  using the mvtnorm package and the dmvnorm function, but haven't been able
  to get it work for my data example. The calculated densities are always
  zero (at this step of Hadley's code: dgrid$dens -
  dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )
 
  I appreciate any assistance.
 
  Thanks,
  Nate
 
  x-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
  seq(0.4,0.6,length.out=30))
 
 y-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
  data-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)
 
 
  find_hull - function(data) data[chull(data$x, data$y), ]
  hulls - ddply(data, .(level), find_hull)
 
  fig1 - ggplot(data=data, aes(x, y, colour=(factor(level)),
  fill=level))+geom_point()
  fig1 - fig1 + geom_polygon(data=hulls, alpha=.2)
  fig1
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding 95% contours around scatterplot points with ggplot2

2013-01-28 Thread Nathan Miller
Hi Ista,

Thanks. That does look pretty nice and I hadn't realized that was possible.
Do you know how to extract information regarding those curves? I'd like to
be able to report something about what portion of the data they encompass
or really any other feature about them in a figure legend. I'll look into
stat_density2d and see if I can determine how they are set.

Thanks for your help,

Nate


On Mon, Jan 28, 2013 at 12:37 PM, Ista Zahn istaz...@gmail.com wrote:

 Hi Nate,

 You can make it less busy using the bins argument. This is not
 documented, except in the examples to stat_contour, but try

 ggplot(data=data, aes(x, y, colour=(factor(level)), fill=level))+
 geom_point()+
 stat_density2d(bins=2)

 HTH,
 Ista

 On Mon, Jan 28, 2013 at 2:43 PM, Nathan Miller natemille...@gmail.com
 wrote:
  Thanks Ista,
 
  I have played a bit with stat_density2d as well. It doesn't completely
  capture what I am looking for and ends up being quite busy at the same
 time.
  I'm looking for a way of helping those looking that the figure to see the
  broad patterns of where in the x/y space the data from different groups
 are
  distributed. Using the 95% CI type idea is so that I don't end up
  arbitrarily drawing circles around each set of points. I appreciate your
  direction though.
 
  Nate
 
 
  On Mon, Jan 28, 2013 at 10:50 AM, Ista Zahn istaz...@gmail.com wrote:
 
  Hi Nathan,
 
  This only fits some of your criteria, but have you looked at
  ?stat_density2d?
 
  Best,
  Ista
 
  On Mon, Jan 28, 2013 at 12:53 PM, Nathan Miller natemille...@gmail.com
 
  wrote:
   Hi all,
  
   I have been looking for means of add a contour around some points in a
   scatterplot as a means of representing the center of density for of
 the
   data. I'm imagining something like a 95% confidence estimate drawn
   around
   the data.
  
   So far I have found some code for drawing polygons around the data.
   These
   look nice, but in some cases the polygons are strongly influenced by
   outlying points. Does anyone have a thought on how to draw a contour
   which
   is more along the lines of a 95% confidence space?
  
   I have provided a working example below to illustrate the drawing of
 the
   polygons. As I said I would rather have three ovals/95% contours
 drawn
   around the points by level to capture the different density
   distributions
   without the visualization being heavily influenced by outliers.
  
   I have looked into the code provided here from Hadley
  
 https://groups.google.com/forum/?fromgroups=#!topic/ggplot2/85q4SQ9q3V8
   using the mvtnorm package and the dmvnorm function, but haven't been
   able
   to get it work for my data example. The calculated densities are
 always
   zero (at this step of Hadley's code: dgrid$dens -
   dmvnorm(as.matrix(dgrid), ex_mu, ex_sigma)   )
  
   I appreciate any assistance.
  
   Thanks,
   Nate
  
   x-c(seq(0.15,0.4,length.out=30),seq(0.2,0.6,length.out=30),
   seq(0.4,0.6,length.out=30))
  
  
 y-c(0.55,x[1:29]+0.2*rnorm(29,0.4,0.3),x[31:60]*rnorm(30,0.3,0.1),x[61:90]*rnorm(30,0.4,0.25))
   data-data.frame(level=c(rep(1, 30),rep(2,30), rep(3,30)), x=x,y=y)
  
  
   find_hull - function(data) data[chull(data$x, data$y), ]
   hulls - ddply(data, .(level), find_hull)
  
   fig1 - ggplot(data=data, aes(x, y, colour=(factor(level)),
   fill=level))+geom_point()
   fig1 - fig1 + geom_polygon(data=hulls, alpha=.2)
   fig1
  
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] More efficient use of reshape?

2012-12-14 Thread Nathan Miller
Thanks John,

I appreciate your help.

With the help of Dennis Murphy, code along the lines of this

climData_melt - melt(clim.data, id = 'date', measure = c(GISS, HAD,
  NOAA, RSS, UAH))

actually gets the data into the correct form very simply and from there its
easy to plot with faceting in ggplot2.

Thanks for helping clear up the reshape and reshape2 issues. I try to make
sure I have those types of things figured out before posting to the list.
Didn't mean to confuse the issue by incorrectly referring to the
packages/functions or their locations.

Thanks,
Nate



On Fri, Dec 14, 2012 at 7:38 AM, John Kane jrkrid...@inbox.com wrote:

 I think David was pointing out that reshape() is not a reshape2 function.
  It is in the stats package.

 I am not sure exactly what you are doing but perhaps something along the
 lines of

 library(reshape2)
 mm  -  melt(clim.data, id = Cs(yr_frac, yr_mn,AMO, NINO34,
 SSTA))

 is a start?

 I also don't think that the more recent versions of ggplot2 automatically
 load reshape2 so it may be that you are working with a relatively old
 installation of ggplot and reshape?

 sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: i686-pc-linux-gnu (32-bit)

 locale:
  [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
 LC_TIME=en_CA.UTF-8
  [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8
  LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=C LC_NAME=C  LC_ADDRESS=C
 [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8
 LC_IDENTIFICATION=C

 attached base packages:
 [1] grid  stats graphics  grDevices utils datasets  methods
 base

 other attached packages:
 [1] lubridate_1.2.0directlabels_2.9   RColorBrewer_1.0-5
 gridExtra_0.9.1stringr_0.6.2
 [6] scales_0.2.3   plyr_1.8   reshape2_1.2.1 ggplot2_0.9.3

 loaded via a namespace (and not attached):
 [1] colorspace_1.2-0 dichromat_1.2-4  digest_0.6.0 gtable_0.1.2
 labeling_0.1
 [6] MASS_7.3-22  munsell_0.4  proto_0.3-9.2tools_2.15.2






 John Kane
 Kingston ON Canada


  -Original Message-
  From: natemille...@gmail.com
  Sent: Thu, 13 Dec 2012 09:58:34 -0800
  To: dwinsem...@comcast.net
  Subject: Re: [R] More efficient use of reshape?
 
  Sorry David,
 
  In my attempt to simplify example and just include the code I felt was
  necessary I left out the loading of ggplot2, which then imports reshape2,
  and which was actually used in the code I provided. Sorry to the mistake
  and my misunderstanding of where the reshape function was coming from.
  Should have checked that more carefully.
 
  Thanks,
  Nate
 
 
  On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius
  dwinsem...@comcast.netwrote:
 
 
  On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:
 
   Hi all,
 
  I have played a bit with the reshape package and function along with
  melt and cast, but I feel I still don't have a good handle on how
  to
  use them efficiently. Below I have included a application of reshape
  that
  is rather clunky and I'm hoping someone can offer advice on how to use
  reshape (or melt/cast) more efficiently.
 
 
  You do realize that the 'reshape' function is _not_ in the reshape
  package, right? And also that the reshape package has been superseded by
  the reshape2 package?
 
  --
  David.
 
 
  #For this example I am using climate change data available on-line
 
  file - (
 
 http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csv
 http://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv
  )
  clim.data - read.csv(file, header=TRUE)
 
  library(lubridate)
  library(reshape)
 
  #I've been playing with the lubridate package a bit to work with dates,
  but
  as the climate dataset only uses year and month I have
  #added a day to each entry in the yr_mn column and then used dym
  from
  lubridate to generate the POSIXlt formatted dates in
  #a new column clim.data$date
 
  clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=)
  clim.data$date-dym(clim.data$**yr_mn)
 
  #Now to the reshape. The dataframe is in a wide format. The columns
  GISS,
  HAD, NOAA, RSS, and UAH are all different sources
  #from which the global temperature anomaly has been calculated since
  1880
  (actually only 1978 for RSS and UAH). What I would like to
  #do is plot the temperature anomaly vs date and use ggplot to facet by
  the
  different data source (GISS, HAD, etc.). Thus I need the
  #data in long format with a date column, a temperature anomaly column,
  and
  a data source column. The code below works, but its
  #really very clunky and I'm sure I am not using these tools as
  efficiently
  as I can.
 
  #The varying=list(3:7) specifies the columns in the dataframe that
  corresponded to the sources (GISS, etc.), though then in the resulting
  #reshaped dataframe the sources are numbered 1-5, so I have to
  reassigned
  their names. In addition, the original dataframe has
  #additional data columns I do not want and so after reshaping I

Re: [R] More efficient use of reshape?

2012-12-13 Thread Nathan Miller
Sorry David,

In my attempt to simplify example and just include the code I felt was
necessary I left out the loading of ggplot2, which then imports reshape2,
and which was actually used in the code I provided. Sorry to the mistake
and my misunderstanding of where the reshape function was coming from.
Should have checked that more carefully.

Thanks,
Nate


On Thu, Dec 13, 2012 at 9:48 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Dec 13, 2012, at 9:16 AM, Nathan Miller wrote:

  Hi all,

 I have played a bit with the reshape package and function along with
 melt and cast, but I feel I still don't have a good handle on how to
 use them efficiently. Below I have included a application of reshape
 that
 is rather clunky and I'm hoping someone can offer advice on how to use
 reshape (or melt/cast) more efficiently.


 You do realize that the 'reshape' function is _not_ in the reshape
 package, right? And also that the reshape package has been superseded by
 the reshape2 package?

 --
 David.


 #For this example I am using climate change data available on-line

 file - (
 http://processtrends.com/**Files/RClimate_consol_temp_**anom_latest.csvhttp://processtrends.com/Files/RClimate_consol_temp_anom_latest.csv
 )
 clim.data - read.csv(file, header=TRUE)

 library(lubridate)
 library(reshape)

 #I've been playing with the lubridate package a bit to work with dates,
 but
 as the climate dataset only uses year and month I have
 #added a day to each entry in the yr_mn column and then used dym
 from
 lubridate to generate the POSIXlt formatted dates in
 #a new column clim.data$date

 clim.data$yr_mn-paste(01, clim.data$yr_mn, sep=)
 clim.data$date-dym(clim.data$**yr_mn)

 #Now to the reshape. The dataframe is in a wide format. The columns GISS,
 HAD, NOAA, RSS, and UAH are all different sources
 #from which the global temperature anomaly has been calculated since 1880
 (actually only 1978 for RSS and UAH). What I would like to
 #do is plot the temperature anomaly vs date and use ggplot to facet by the
 different data source (GISS, HAD, etc.). Thus I need the
 #data in long format with a date column, a temperature anomaly column, and
 a data source column. The code below works, but its
 #really very clunky and I'm sure I am not using these tools as efficiently
 as I can.

 #The varying=list(3:7) specifies the columns in the dataframe that
 corresponded to the sources (GISS, etc.), though then in the resulting
 #reshaped dataframe the sources are numbered 1-5, so I have to reassigned
 their names. In addition, the original dataframe has
 #additional data columns I do not want and so after reshaping I create
 another! dataframe with just the columns I need, and
 #then I have to rename them so that I can keep track of what everything
 is.
 Whew! Not the most elegant of code.

 d-reshape(clim.data, varying=list(3:7),idvar=date**,
 v.names=anomaly,direction=**long)

 d$time-ifelse(d$time==1,**GISS,d$time)
 d$time-ifelse(d$time==2,HAD**,d$time)
 d$time-ifelse(d$time==3,**NOAA,d$time)
 d$time-ifelse(d$time==4,RSS**,d$time)
 d$time-ifelse(d$time==5,UAH**,d$time)

 new.data-data.frame(d$date,d$**time,d$anomaly)
 names(new.data)-c(date,**source,anomaly)

 I realize this is a mess, though it works. I think with just some help on
 how better to work this example I'll probably get over the learning hump
 and actually figure out how to use these data manipulation functions more
 cleanly.

 Any advice or assistance would be appreciated.
 Thanks,
 Nate

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 Alameda, CA, USA



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Violin plot of categorical/binned data

2012-11-03 Thread Nathan Miller
Hi,

I'm trying to create a plot showing the density distribution of some
shipping data. I like the look of violin plots, but my data is not
continuous but rather binned and I want to make sure its binned nature (not
smooth) is apparent in the final plot. So for example, I have the number of
individuals per vessel, but rather than having the actual number of
individuals I have data in the format of: 7 values of zero, 11 values
between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To
plot this data I generated a new dataset with the first 7 values being 0,
representing the 7 values of 0, the next 11 values being 5.5, representing
the 11 values between 1-10, etc. Sample data below.

I can make a violin plot (code below) using a log y-axis, which looks
alright (though I do have to deal with the zeros still), but in its default
format it hides the fact that these are binned data, which seems a bit
misleading. Is it possible to make a violin plot that looks a bit more
angular (more corners, less smoothing) or in someway shows the
distribution, but also clearly shows the true nature of these data? I've
tried playing with the bandwidth adjustment and the kernel but haven't been
able to get a figure that seems to work.

Anyone have some thoughts on this?

Thanks,
Nate

library(ggplot2)
library(scales)

p=ggplot(data2,(aes(vessel,values)))
p+geom_violin()+
scale_y_log10(breaks = trans_breaks(log10, function(x) 10^x),labels =
trans_format(log10, math_format(10^.x)))

data2-read.table(textConnection(
   vessel  values
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 5.5e+00
 rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+04
rec 5.5e+04
rec 5.5e+04
rec 5.5e+05
rec 5.5e+05,header=T)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] drop1, 2-way Unbalanced ANOVA

2012-07-23 Thread Nathan Miller
Hi all,

I've spent quite a lot of time searching through the help lists and reading
about how best to run perform a 2-way ANOVA with unbalanced data. I realize
this has been covered a great deal so I was trying to avoid adding yet
another entry to the long list considering the use of different SS, etc.
Unfortunately, I have come to the point where I feel I have to wade in and
see if someone can help me out. Hopefully I'll phrase this properly given
and hopefully it will end up only requiring a simple response.

I have an experiment where I have measured a response variable (such as
water content) following exposure to two treatments (oxygen content and
medium). Oxygen content has three levels (5, 20, 35) and medium has two
levels (Air, Water). I am interested if water content is different under
the two treatments and whether the effect of oxygen content depends upon
the medium in which the experiment was conducted (Air or Water).

Unfortunately, the design is unbalanced as some experimental subjects had
to be removed from the experiment.

I realize that if I just use aov() to perform a two-way ANOVA the order in
which the terms (oxygen content and medium) are entered will give
different results because of the sequential SS.

What I have done in the past is utilize drop1() in conjunction with aov()

drop1(aov(WaterContent~Oxygen*Medium, data), test=F)

to see if the interaction term was significant (F, p-value) and if its
inclusion improved model fit (AIC). If from this I determine that the
interaction term can be removed and the model can be rerun without it, I am
able to test for main-effects and get F and p-values that I can report in a
manuscript.

However, if the interaction term is significant and its inclusion is
warranted, drop1() only provide me with SS, F, and p-value for the
interaction term. Now this is fine, because I do not wish to interpret the
main-effects with a significant interaction, but in a manuscript reviewers
will request an ANOVA table where l will be asked to report SS, F and
p-values for the other terms. I don't have those because I used drop1()
which only provides these for the highest order term in the model.

How best should I calculate the values that I know I will be asked to
provide in a manuscript?

I don't wish to come across as a scientist who is simply a slave to the F
and p-values with little regard for the data, the hypotheses, and the
actual statistical interpretation. I am interested in doing this right,
but I also know that practically in the current status of our field, while
I focus on doing statistics that address my hypotheses of interest and can
choose to not discuss the main effects in isolation when an interaction
exists, I will be asked to provide the ANOVA table with all the degrees
of freedom, SS, F-values, p-values etc...for the entire model, not just the
highest order term.

Can anyone provide advice here? Should I just use the car package and Type
III SS with an appropriate contrast and not use the drop1() function, even
though I'm really not interested in using the Type III SS and I kinda like
the drop1()? I am not opposed to Type II SS, but clearly if the interaction
is important then using Type II SS, which do not consider interactions, are
not appropriate.

Hopefully this is somewhat clear and doesn't simply sound like a rehashing
of the same old ANOVA and SS story. Maybe I should be doing something
completely different

I greatly appreciate constructive comments.

Thanks,

Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot stat_summary (mean_cl_boot)

2011-11-09 Thread Nathan Miller
Hello,

This is a pretty simple question, but after spending quite a bit of time
looking at Hmisc and using Google, I can't find the answer.

If I use stat_summary(fun.data=mean_cl_boot) in ggplot to generate 95%
confidence intervals, how many bootstrap iterations are preformed by
default? Can this be changed? I would at least like to be able to report
the number of boot strap interations used to generate the CIs.

I haven't been able to find mean_cl_boot as a function itself or
something ressembling it in the Hmisc documentation, but it seems as though
Hmisc is wrapped up in stat_summary() and is called to compute
mean_cl_boot.

Many thanks for clearing this up for me,

Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot stat_summary (mean_cl_boot)

2011-11-09 Thread Nathan Miller
Sorry, I didn't realize I was being so obscure.

Within ggplot it is possible to use stat_summary() to generate confidence
intervals about a mean. One method for generating these CI assumes
normality. The other uses bootstrapping to generate the CI. I am using the
second method which requires code like this

stat_summary(fun.data=mean_cl_boot, geom=errorbar,width=0.1,colour =
red)

I've added some extra flourishes to make them look like errorbars, alter
the width and specify color.

I would like some details regarding how this bootstrapped CI is calculated.
If I type ?mean_cl_boot at the R command line I get a minimal help file
for wrap_hmisc {ggplot2} which is described  wrap up a selection of
Hmisc to make it easy to use with stat_summary

I did not mean to suggest that ggplot2 calls Hmisc when I run
stat_summary(), but simply that it appears that stat_summary() seems to
have been based upon a selection of Hmisc, hence I went looking in Hmisc to
try to find details regarding stat_summary(). I was unsuccessful in this
attempt.

I don't believe a great deal of debugging is necessary. I am simply looking
for details regarding how mean_cl_boot works. If you don't have
information regarding how it works (such as the default number of
resamplings) there is no need to respond.

Thanks for any assistance,
Nate



On Wed, Nov 9, 2011 at 1:10 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Nov 9, 2011, at 2:59 PM, Nathan Miller wrote:

  Hello,

 This is a pretty simple question, but after spending quite a bit of time
 looking at Hmisc and using Google, I can't find the answer.

 If I use stat_summary(fun.data=mean_**cl_boot) in ggplot to generate
 95%
 confidence intervals, how many bootstrap iterations are preformed by
 default? Can this be changed? I would at least like to be able to report
 the number of boot strap interations used to generate the CIs.

 I haven't been able to find mean_cl_boot as a function itself or
 something ressembling it in the Hmisc documentation, but it seems as
 though
 Hmisc is wrapped up in stat_summary() and is called to compute
 mean_cl_boot.


 You seem really, really confused (and you offer very little in the way of
 context to support debugging efforts). You are referring to ggplot
 functions. As far as I know there is no connection between the Hmisc and
 ggplot (or ggplot2) packages. Al things change, I know, but Frank just
 completed switching over to Lattice a couple of years ago.


 --
 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot stat_summary (mean_cl_boot)

2011-11-09 Thread Nathan Miller
Ok, I got it.

smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)

Looks like its 1000.

Cool.
Thanks for the help,
Nate



On Wed, Nov 9, 2011 at 1:35 PM, Nathan Miller natemille...@gmail.comwrote:

 Sorry, I didn't realize I was being so obscure.

 Within ggplot it is possible to use stat_summary() to generate confidence
 intervals about a mean. One method for generating these CI assumes
 normality. The other uses bootstrapping to generate the CI. I am using the
 second method which requires code like this

 stat_summary(fun.data=mean_cl_boot, geom=errorbar,width=0.1,colour =
 red)

 I've added some extra flourishes to make them look like errorbars, alter
 the width and specify color.

 I would like some details regarding how this bootstrapped CI is
 calculated. If I type ?mean_cl_boot at the R command line I get a minimal
 help file for wrap_hmisc {ggplot2} which is described  wrap up a
 selection of Hmisc to make it easy to use with stat_summary

 I did not mean to suggest that ggplot2 calls Hmisc when I run
 stat_summary(), but simply that it appears that stat_summary() seems to
 have been based upon a selection of Hmisc, hence I went looking in Hmisc to
 try to find details regarding stat_summary(). I was unsuccessful in this
 attempt.

 I don't believe a great deal of debugging is necessary. I am simply
 looking for details regarding how mean_cl_boot works. If you don't have
 information regarding how it works (such as the default number of
 resamplings) there is no need to respond.

 Thanks for any assistance,
 Nate




 On Wed, Nov 9, 2011 at 1:10 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Nov 9, 2011, at 2:59 PM, Nathan Miller wrote:

  Hello,

 This is a pretty simple question, but after spending quite a bit of time
 looking at Hmisc and using Google, I can't find the answer.

 If I use stat_summary(fun.data=mean_**cl_boot) in ggplot to generate
 95%
 confidence intervals, how many bootstrap iterations are preformed by
 default? Can this be changed? I would at least like to be able to report
 the number of boot strap interations used to generate the CIs.

 I haven't been able to find mean_cl_boot as a function itself or
 something ressembling it in the Hmisc documentation, but it seems as
 though
 Hmisc is wrapped up in stat_summary() and is called to compute
 mean_cl_boot.


 You seem really, really confused (and you offer very little in the way of
 context to support debugging efforts). You are referring to ggplot
 functions. As far as I know there is no connection between the Hmisc and
 ggplot (or ggplot2) packages. Al things change, I know, but Frank just
 completed switching over to Lattice a couple of years ago.


 --
 David Winsemius, MD
 West Hartford, CT




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot stat_summary (mean_cl_boot)

2011-11-09 Thread Nathan Miller
For all those that are interested.

To adjust the number of reps in the stat_summary() mean_cl_boot function
simply specify B to the number of bootstrap resamples. I set B to 2000
resamplings below.

stat_summary(fun.data=mean_cl_boot, geom=errorbar,width=0.1,colour =
red, B=2000 )

If you run mean_cl_boot within stat_summary() and ggplot setting reps=T
does not appear to return a vector of the resampled means as an attribute
that I could locate anywhere.

However, you can run smean.cl.boot code outside of ggplot.

x-smean.cl.boot(OsmData$Mean, B=2000, reps=T)
attr(x,reps)

Thus, outside of ggplot you can use reps=T to check the resampling is
proceeding as you expect, before adding it to the ggplot code. I did some
checks setting B=1 and B=5  as well as large numbers both inside and
outside of the ggplot code to assure myself that my adjustments to B within
stat_summary() within ggplot were actually doing what I thought.

Finally, despite the fact that the Hmisc function is called
smean.cl.boot, as David points out, within ggplot and stat_summary you
must use mean_cl_boot without the s before mean.

Within ggplot mean_cl_boot is the correct notation and it does work.

I really like ggplot, but can agree that it isn't always clear how to get
from point A to point B. My hope in writing this out is that someone else
might start their own exploration of these issues a little further down the
road than I found myself when I started looking into this.

Thanks,
Nate

On Wed, Nov 9, 2011 at 1:46 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Nov 9, 2011, at 4:35 PM, Nathan Miller wrote:

  Sorry, I didn't realize I was being so obscure.

 Within ggplot it is possible to use stat_summary() to generate confidence
 intervals about a mean. One method for generating these CI assumes
 normality. The other uses bootstrapping to generate the CI. I am using the
 second method which requires code like this

 stat_summary(fun.data=mean_**cl_boot, geom=errorbar,width=0.1,**colour
 = red)

 I've added some extra flourishes to make them look like errorbars, alter
 the width and specify color.

 I would like some details regarding how this bootstrapped CI is
 calculated. If I type ?mean_cl_boot at the R command line I get a minimal
 help file for wrap_hmisc {ggplot2} which is described  wrap up a
 selection of Hmisc to make it easy to use with stat_summary

 I did not mean to suggest that ggplot2 calls Hmisc when I run
 stat_summary(),


 Actually it does.


  but simply that it appears that stat_summary() seems to have been based
 upon a selection of Hmisc, hence I went looking in Hmisc to try to find
 details regarding stat_summary(). I was unsuccessful in this attempt.

 I don't believe a great deal of debugging is necessary. I am simply
 looking for details regarding how mean_cl_boot works.


 It doesn't. That is not the right name.


  If you don't have information regarding how it works (such as the default
 number of resamplings) there is no need to respond.


 Hadley's help files in ggplot2 are terse (or the links to outside
 resources crash my R sessions)  to the point of being too frustrating for
 me to consider using that package, so I don't know if optional parameters
 can be passed to the Hmisc functions. If they are,  then you should set
 reps=TRUE and then see what happens to the number of reps from the returned
 object ... if the wrap_hmisc function does happen to catch it.

  x - rnorm(100)
  smean.cl.boot(x)
  Mean  Lower  Upper
 -0.0211511 -0.2013623  0.1469728

  smean.cl.boot(x, reps=TRUE)
   Mean   Lower   Upper
 -0.03465361 -0.21233213  0.15178655
 attr(,reps)
   [1]  0.0283330508 -0.1250784237  0.0744640779  0.1310826601 -0.1373094536
   [6]  0.0629291714  0.0145916070 -0.0860141221  0.0549134451  0.0732892908
 snipped pages of intervening output.
  [991]  0.1029922424  0.0613358597 -0.0645577851 -0.1664905503
 -0.1249615180
  [996] -0.0751783377 -0.0043747455 -0.1155948060 -0.0750075659
  0.1244430930

 I don't see where the number of reps is returned, but the B setting
 defaults to 1000.

 --
 david.


 Thanks for any assistance,
 Nate



 On Wed, Nov 9, 2011 at 1:10 PM, David Winsemius dwinsem...@comcast.net
 wrote:

 On Nov 9, 2011, at 2:59 PM, Nathan Miller wrote:

 Hello,

 This is a pretty simple question, but after spending quite a bit of time
 looking at Hmisc and using Google, I can't find the answer.

 If I use stat_summary(fun.data=mean_**cl_boot) in ggplot to generate
 95%
 confidence intervals, how many bootstrap iterations are preformed by
 default? Can this be changed? I would at least like to be able to report
 the number of boot strap interations used to generate the CIs.

 I haven't been able to find mean_cl_boot as a function itself or
 something ressembling it in the Hmisc documentation, but it seems as
 though
 Hmisc is wrapped up in stat_summary() and is called to compute
 mean_cl_boot.

 You seem really, really confused (and you offer very little

[R] Calculating difference between values in data frame based on separate column

2011-10-21 Thread Nathan Miller
Hi all,

Say I have a data frame something like the one below with different sample
vials, measured before(B) and after(A) some process, with a value recorded
at each measurement point

vialmeasurevalue
1   B26
1   A12
2   B 45
2   A 30
3   B 32
3   A 27
4   B 34
4   A 6

Is there an easy means by which I can subtract the after (A) measurements
from the before (B) measurement for each vial in this data frame so I am
left with a data frame that contains the vial number and difference between
A and B? I've played around with writing a function and applying it with
ddply, but haven't stumbled on a technique that works, though I feel there
should be something simple means of doing this that I'm just not seeing.

Thanks for you help,
Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying function with separate dataframe (calibration file) supplying some inputs

2011-10-20 Thread Nathan Miller
Thanks Josh,

I appreciate your comments. Merge is quite easy and I guess I should just go
that route. I don't have so many data points so the speed isn't really much
of an issue. I was just curious whether there was anything more elegant.

Yes, the function is not clear or ideal for troubleshooting. I adapted this
from an excel sheet in an effort to speed my data analysis and in the excel
file the final calculation was simply a long string of calculations. I
appreciate your attempt to make it clearer and to chunk it into fewer easier
to read pieces. I plan to go back and do that as well.


Thanks again.

Nate


On Wed, Oct 19, 2011 at 10:33 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 Hi Nathan,

 I honestly do not think that anything else will be much better than
 merging the two datasets.  If the datasets are not merged, you
 essentially have to apply your optode function to each vial, store the
 results, then combine them all together.  This is innefficient.
 Merging the datasets may be innefficient in a way, but once it is
 done, your function can be applied to the entire dataset in one step.
 If you have big data and the merge is slow, take a look at the
 data.table package.  I have recently (not that it was bad before, I
 just never really knew how much it could do) been quite impressed with
 it.  As a whole other note, your optode function was quite difficult
 to read, and I highly doubt you can confidently look at the code and
 ensure there is not a typo, missed operator, etc. somewhere in that
 block of formula code.  I attempted to clean it up some, though
 perhaps not with 100% success.

 ###
 optode2 - function(cal0, T0, cal100, T100, phase, temp) {
  dr - pi/180
  f1 - 0.801
  deltaPsiK - (-0.08)
  deltaKsvK - 0.000383
  m - 22.9
  tan0T100 - tan(((cal0 + deltaPsiK * (T100 - T0))) * dr)
  tan0Tm - tan((cal0 + (deltaPsiK * (temp - T0))) * dr)
  tan100T100 - tan(cal100 * dr)
  tanmTm - tan(phase * dr)
  A - tan100T100 / tan0T100 / m * 100^2

  B - tan100T100 / tan0T100 * 100 + tan100T100 / tan0T100 / m *100 -
 f1 / m * 100 - 100 + f1 * 100
  C - tan100T100 / tan0T100 - 1
  KsvT100 - (- B + (sqrt(B^2 - 4 * A * C))) / (2 * A)
  KsvTm - KsvT100 + (deltaKsvK * (temp - T100))
  a - tanmTm / tan0Tm / m * KsvTm^2
  b - tanmTm / tan0Tm * KsvTm + tanmTm / tan0Tm / m * KsvTm - f1 / m
 * KsvTm - KsvTm + f1 * KsvTm
  c - tanmTm / tan0Tm - 1
  tot - tanmTm / tan0T100
  big - tot * KsvTm + tanmTm / tan0T100 / m * KsvTm - f1 / m * KsvTm
 - KsvTm + f1 * KsvTm

  saturation - (-big + (sqrt((big)^2-4 * (tanmTm / tan0T100 / m *
 KsvTm^2) * (tot - 1 / (2 * (tot / m * KsvTm^2))
  return(saturation)
 }

 ## Read in your example data
 d1 - read.table(textConnection(
 vial cal0T0  cal100  T100
 1  61  1828   18
 2  60.81827.118
 3 60.21828.3 18
 4 59.818 27.2 18), header = TRUE, stringsAsFactors =
 FALSE)
 d2 - read.table(textConnection(
 vial   phasetemp   time
 1   3117.510
 1   31.5  17.420
 1   32.8  17.530
 2  29.0   17.5 10
 2  29.7   17.5 20
 2  30.9   17.5 30
 3  27.1   17.4 10
 3  27.6   17.4 20
 3  28.1   17.5 30
 4  31.0   17.6 10
 4  33.3   17.6 20
 4 35.617.6 30), header = TRUE, stringsAsFactors =
 FALSE)
 closeAllConnections()

 dat - merge(d1, d2, by = vial)
 ## optode wrapper
 f - function(d) optode2(d$cal0, d$T0, d$cal100, d$T100, d$phase, d$temp)

 dat$oxygen - f(dat)

 dat
 ###


 Cheers,

 Josh

 On Wed, Oct 19, 2011 at 8:38 PM, Nathan Miller natemille...@gmail.com
 wrote:
  Hello,
 
  I am not entirely sure the subject line captures what I am trying to do,
 but
  hopefully this description of the problem will help folks to see my
  challenge and hopefully offer constructive assistance.
 
  I have an experimental setup where I measure the decrease in oxygen in
 small
  vials as an organism, such as an oyster, consumes the oxygen. Each vial
 is
  calibrated before the experiment and these calibrations are used to
 convert
  the raw data after the experiment into oxygen values. I end up with two
  dataframes. One has the calibration data and for example could look like
  this
 
  vial cal0T0  cal100  T100
  1  61  1828   18
  2  60.81827.118
  3 60.21828.3 18
  4 59.818 27.2 18
 
  The second is a data file which could look like this
 
 
  vial   phasetemp   time
  1   3117.510
  1   31.5  17.420
  1   32.8  17.530
  2  29.0   17.5 10
  2  29.7   17.5 20
  2  30.9   17.5 30
  3  27.1   17.4 10
  3  27.6   17.4 20
  3  28.1   17.5 30
  4

[R] Applying function with separate dataframe (calibration file) supplying some inputs

2011-10-19 Thread Nathan Miller
Hello,

I am not entirely sure the subject line captures what I am trying to do, but
hopefully this description of the problem will help folks to see my
challenge and hopefully offer constructive assistance.

I have an experimental setup where I measure the decrease in oxygen in small
vials as an organism, such as an oyster, consumes the oxygen. Each vial is
calibrated before the experiment and these calibrations are used to convert
the raw data after the experiment into oxygen values. I end up with two
dataframes. One has the calibration data and for example could look like
this

vial cal0T0  cal100  T100
1  61  1828   18
2  60.81827.118
3 60.21828.3 18
4 59.818 27.2 18

The second is a data file which could look like this


vial   phasetemp   time
1   3117.510
1   31.5  17.420
1   32.8  17.530
2  29.0   17.5 10
2  29.7   17.5 20
2  30.9   17.5 30
3  27.1   17.4 10
3  27.6   17.4 20
3  28.1   17.5 30
4  31.0   17.6 10
4  33.3   17.6 20
4 35.617.6 30

I have a complicated function (included at the bottom) that uses the
calibration values and the raw data to calculate actual oxygen levels. It
works great, but as its currently written it requires that each calibration
be entered individually. I would rather apply the function based upon the
vial number (applying the calibration for vial 1 to all vial 1 data,
calibration for vial 2 to all vial 2 data).

I have managed to do this by combining the two dataframes into one that
looks like this

data
vial   phasetemp   time   cal0 T0cal100  T100
1   3117.510 61   18 2818
1   31.5  17.42061   18 2818
1   32.8  17.53061   18 2818
1   33.6  17.5 40   61   18 2818
2  29.0   17.5 10   60.81827.1  18
2  29.7   17.5 20   60.81827.1  18
2  30.9   17.5 30   60.81827.1  18
3  27.1   17.4 10   60.21828.3  18
3  27.6   17.4 20   60.21828.3  18
3  28.1   17.5 30   60.21828.3  18
4  31.0   17.6 10   59.818 27.2 18
4  33.3   17.6 20   59.818 27.2 18
4 35.617.6 30   59.818 27.2 18

I have then used ddply to apply my function grouped by vial

oxygen-ddply(data,.(vial), function(d) optode(d$cal0, d$T0, d$cal100,
d$T100, d$phase, d$temp))

This works, but I do not like having to put the calibrations into the same
dataframe as the data. Can anyone show me an example of how I could have a
function reference a dataframe (like the calibration data) based upon a
grouping variable (like vial) and use that data as partial inputs to the
function when it is applied to another dataframe (the actual data)? I don't
necessarily need an example using my sample data or the function I have
included here (as it is rather unwieldly). A simplified example of how to
achieve this type of cross referencing would suffice and I can apply the
principle to my current problem.

Thanks so much,
Nate


optode-function(cal0,T0,cal100,T100,phase,temp) {

f1=0.801
deltaPsiK=-0.08
deltaKsvK=0.000383
m=22.9
tan0T100=tan(((cal0+deltaPsiK*(T100-T0)))*pi/180)
tan0Tm=tan((cal0+(deltaPsiK*(temp-T0)))*pi/180)
tan100T100=tan(cal100*pi/180)
tanmTm=tan(phase*pi/180)
A=tan100T100/tan0T100*1/m*100^2

B=tan100T100/tan0T100*100+tan100T100/tan0T100*1/m*100-f1*1/m*100-100+f1*100
C=tan100T100/tan0T100-1
KsvT100=(-B+(sqrt(B^2-4*A*C)))/(2*A)
KsvTm=KsvT100+(deltaKsvK*(temp-T100))
a=tanmTm/tan0Tm*1/m*KsvTm^2

b=tanmTm/tan0Tm*KsvTm+tanmTm/tan0Tm*1/m*KsvTm-f1*1/m*KsvTm-KsvTm+f1*KsvTm
c=tanmTm/tan0Tm-1


Re: [R] ggplot2-Issue placing error bars behind data points

2011-09-08 Thread Nathan Miller
Jeff Newmiller and Dennis,

As always, very helpful. I appreciate the comments on how best to organize
the code as well. Simplifies things greatly.

Thanks,
Nate

On Wed, Sep 7, 2011 at 3:33 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 For your test data, try this:

 # Result of dput(NerveSurv)
 NerveSurv - structure(list(Time = c(0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 0,
 0.25, 0.5, 0.75, 1, 1.25, 1.5, 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5
 ), SAP = c(1, 1.04, 1.04, 1.06, 1.04, 1.22, 1.01, 1, 1.01, 1.01,
 1.06, 1.01, 0.977, 0.959, 1, 1.01, 1.06, 0.921, 0.951, 0.904,
 0.911), Temp = c(25L, 25L, 25L, 25L, 25L, 25L, 25L, 15L, 15L,
 15L, 15L, 15L, 15L, 15L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), SAPSE = c(0,
 0.0412, 0.0935, 0.0818, 0.131, 0.144, 0.0712, 0, 0.0156, 0.0337,
 0.0481, 0.0168, 0.00486, 0.0155, 0, 0.00462, 0.0491, 0.0329,
 0.0304, 0.0471, 0.0722)), .Names = c(Time, SAP, Temp, SAPSE
 ), class = data.frame, row.names = c(NA, -21L))

 limits-aes(ymax = NerveSurv$SAP + NerveSurv$SAPSE,
 ymin = NerveSurv$SAP - NerveSurv$SAPSE)

 # library('ggplot2')
 p - ggplot(data=NerveSurv,aes(x=Time,y=SAP))
 p + geom_errorbar(limits,width=0.2) +
geom_point(aes(fill=factor(Temp)), colour = colors()[173],
 shape = 21, size=4) +
xlab(Time (min)) +
xlim(c(0, 2)) +
 ylab(Normalized Spontaneous Action Potential Rate) +
ylim(c(0,1.6)) +
 scale_fill_manual(Acclimation\nTemperature, breaks=c(8,18,25),
  labels=c(25ºC, 18 ºC, 8 ºC),
   values=c(colours()[c(173,253,218)]))

 Notice the following in the above code:

 (1) As Jeff Newmiller pointed out, put geom_errorbar() before geom_point().
 (2) If you are going to be using the same value of a plot aesthetic, you
 *set* it outside aes() rather than *map* it inside aes().
 See the revised code for geom_point(). The idea is that if a
 variable is assigned
 to a plot aesthetic (e.g., color, fill, shape), then it needs to
 be mapped inside
 aes(); if an aesthetic is set to a specific value, it is set outside
 aes().
 (3) Your test data had an effective x-range of 0 - 1.5 rather than 0 - 50,
 so
 I shrunk xlim() for readability.
 (4) You can use \n inside of a character string as a carriage return. See
 the legend title for scale_fill_manual().
 (5) Style note: you want the layer addition operator + to be at the end of
 a
 line, not at the beginning. Copy and paste this code verbatim to
 see what I mean:
 p + geom_errorbar(limits,width=0.2)
 +  geom_point(aes(fill=factor(Temp)), colour = colors()[173],
 shape = 21, size=4)

 This happens because the first line is syntactically complete.

 HTH,
 Dennis

 On Wed, Sep 7, 2011 at 2:48 PM, Nathan Miller natemille...@gmail.com
 wrote:
  Hi all,
 
  This seems like a basic problem, but no amount of playing with the code
 has
  solved it. I have a time-series data set like that shown below (only
 longer)
  and am seeking to plot the data with filled, circular points and error
 bars.
  I would like the error bars to be behind the points otherwise they tend
 to
  obscure the points (especially when I have a lot of points in the actual
  data set). Despite providing a fill colour for the points and using
 shapes
  that utilize fill colours, the error bars are always placed on top of the
  points. Can anyone see the error I am making? I simply want to move the
  error bars so they are behind the data points.
 
  Thanks for your help. I assume its simple...or else its a bug.
 
  Nate
 
 
   Time  SAP TempSAPSE
   0.00 1.00   25 0.00
   0.25 1.04   25 0.041200
  0.50 1.04   25  0.093500
  0.75 1.06   25 0.081800
  1.00 1.04   25 0.131000
  1.25 1.22   25 0.144000
  1.50 1.01   25 0.071200
  0.00 1.00   15 0.00
  0.25 1.01   15 0.015600
   0.50 1.01   150.033700
  0.75 1.06   150.048100
  1.00 1.01   150.016800
  1.25 0.977000   150.004860
  1.50 0.959000   150.015500
  0.00 1.008 0.00
  0.25 1.018 0.004620
  0.50 1.068 0.049100
  0.75 0.9210008 0.032900
  1.00 0.9510008 0.030400
  1.25 0.904000 8 0.047100
  1.50 0.911000 8 0.072200
 
 
 limits-aes(ymax=NerveSurv$SAP+NerveSurv$SAPSE,ymin=NerveSurv$SAP-NerveSurv$SAPSE)
 
  p-ggplot(data=NerveSurv,aes(x=Time,y=SAP))
 
  p+geom_point(aes(colour=factor(Temp), shape=factor(Temp),
  fill=factor(Temp)), size=4)
  +geom_errorbar(limits,width=0.2)
  +xlab(Time (min))
  +xlim(c(0,50))
  +ylab(Normalized Spontaneous Action Potential Rate)
  +ylim(c(0,1.6))
  +scale_shape_manual(Acclimation Temperature,breaks=c(8,18,25),
  labels=c(25 ºC, 18 ºC, 8 ºC),values=c(21,21,21))
  +scale_fill_manual(Acclimation Temperature,breaks=c(8,18,25),
 labels=c(25
  ºC, 18 ºC, 8 ºC),values=c(colours()[c(173,253,218)]))
  +scale_colour_manual(Acclimation Temperature,breaks=c(8,18,25),
  labels=c(25 ºC, 18 ºC, 8 ºC

[R] ggplot2-Issue placing error bars behind data points

2011-09-07 Thread Nathan Miller
Hi all,

This seems like a basic problem, but no amount of playing with the code has
solved it. I have a time-series data set like that shown below (only longer)
and am seeking to plot the data with filled, circular points and error bars.
I would like the error bars to be behind the points otherwise they tend to
obscure the points (especially when I have a lot of points in the actual
data set). Despite providing a fill colour for the points and using shapes
that utilize fill colours, the error bars are always placed on top of the
points. Can anyone see the error I am making? I simply want to move the
error bars so they are behind the data points.

Thanks for your help. I assume its simple...or else its a bug.

Nate


 Time  SAP TempSAPSE
 0.00 1.00   25 0.00
 0.25 1.04   25 0.041200
0.50 1.04   25  0.093500
0.75 1.06   25 0.081800
1.00 1.04   25 0.131000
1.25 1.22   25 0.144000
1.50 1.01   25 0.071200
0.00 1.00   15 0.00
0.25 1.01   15 0.015600
 0.50 1.01   150.033700
0.75 1.06   150.048100
1.00 1.01   150.016800
1.25 0.977000   150.004860
1.50 0.959000   150.015500
0.00 1.008 0.00
0.25 1.018 0.004620
0.50 1.068 0.049100
0.75 0.9210008 0.032900
1.00 0.9510008 0.030400
1.25 0.9040008 0.047100
1.50 0.9110008 0.072200

limits-aes(ymax=NerveSurv$SAP+NerveSurv$SAPSE,ymin=NerveSurv$SAP-NerveSurv$SAPSE)

p-ggplot(data=NerveSurv,aes(x=Time,y=SAP))

p+geom_point(aes(colour=factor(Temp), shape=factor(Temp),
fill=factor(Temp)), size=4)
+geom_errorbar(limits,width=0.2)
+xlab(Time (min))
+xlim(c(0,50))
+ylab(Normalized Spontaneous Action Potential Rate)
+ylim(c(0,1.6))
+scale_shape_manual(Acclimation Temperature,breaks=c(8,18,25),
labels=c(25 ºC, 18 ºC, 8 ºC),values=c(21,21,21))
+scale_fill_manual(Acclimation Temperature,breaks=c(8,18,25), labels=c(25
ºC, 18 ºC, 8 ºC),values=c(colours()[c(173,253,218)]))
+scale_colour_manual(Acclimation Temperature,breaks=c(8,18,25),
labels=c(25 ºC, 18 ºC, 8 ºC), values=c(colours()[c(173,173,173)]))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Regression with 2 grouping variables

2011-08-23 Thread Nathan Miller
Thanks Dennis! Worked perfectly. I keep forgetting that plyr can split data
based on multiple subsetting variables.

Thanks so much,
Nate

On Mon, Aug 22, 2011 at 10:12 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 You're kind of on the right track, but there is no conditioning
 formula in lm(); it's not lattice :)  This is relatively easy to do
 with the plyr package, though:

 library('plyr')
 # Generate a list of models - the subsetting variables (Site, Vial) are
 # used to generate the data splits and the function is run on a generic
 # data split d, assumed to be a data frame.
 mlist - dlply(feed1, .(Site, Vial), function(d) lm(lnRFU ~ Time, data =
 d))

 # To get the set of model coefficients, take the list mlist as the input
 # data (argument 1) and then for each generic component 'm', extract its
 # coefficients:
 ldply(mlist, function(m) coef(m))

 For your test data, only Vial varies - the example code below reflects
 that:

  mlist - dlply(feed1, .(Vial), function(d) lm(lnRFU ~ Time, data = d))
  length(mlist)  # three component model objects
 [1] 3
  ldply(mlist, function(m) coef(m))
  Vial (Intercept) Time
 1110.75440 -0.001508621
 2210.83171 -0.005095100
 3310.81087 -0.004897600

 This idea can be generalized: if you want to pull out similar pieces
 of output from each model, run ldply() on the list of models and
 create a utility function that outputs, for a generic model object m,
 what you want to have returned. Common choices include R^2 values,
 tables of coefficients (as lists instead of data frames) or residuals
 and predicted values. The game is to write the function so that it
 takes a [list] model object (here, m) as input and a data frame as
 output. You can also extract output from summary(m) in a similar way,
 using m as the input object.

 HTH,
 Dennis

 On Mon, Aug 22, 2011 at 6:15 PM, Nathan Miller natemille...@gmail.com
 wrote:
  Hi all,
 
  I have a data set that looks a bit like this.
 
  feed1
   RFU Site Vial Time   lnRFU
  1   811   10  10.702075
  2   4752111   20  10.768927
  3   4290511   30  10.66674
  4   4686711   40  10.755069
  5   4299511   50  10.668839
  6   4307411   60  10.670675
  7   4119511   70  10.626072
  8   4709012   10  10.759816
  9   4810012   20  10.781037
  10  4321512   30  10.673943
  11  3965612   40  10.587998
  12  3879912   50  10.566150
  13  3842412   60 10.556438
  14 35240 1 2 70  10.469937
  15  4642713   10  10.745636
  16 46418 1 3 20  10.745443
  17  4209513   30  10.647684
  ..
  There are 5 columns of data, three levels of Site, 10 Vials per site,
  and measurements were taken at 10 min intervals from 10-70.. I am
 primarily
  interested in the relationship between Time and lnRFU to calculate
 the
  rate at which lnRFU declines over time. I have a nice plot using a
 ggplot2
  code that looks like this
 
  p-ggplot(data=feed1,aes(x=Time,y=lnRFU))
  p+geom_point(size=4)+facet_grid(Site~Vial)+geom_smooth(method=lm)
 
  The graph is useful to visualize the changes over time and grouped by
 both
  Site and Vial, but I also need the slopes of the linear regressions for
 each
  Vial, within a Site. This is where I run into a problem. I want to run a
  linear regression of lnRFU as a function of Time grouped by both Site and
  Vial. Its easy to visualize this comparison in ggplot using facet_grid(),
  but I'm not sure how to do a similar comparison/analysis within lm()
 
  I imagine something like
 
  fit-lm(lnRFU~Time | Vial * Site, data=feed1)
 
   in which I group by both Vial and Site, but obviously this code doesn't
  work. Does anyone have an idea for how to do a linear regression with two
  grouping variables? Do I have to go back and combine Vial and Site into a
  single grouping variable or can I leave the dataframe the way it is? I'm
  trying to imagine a means of accomplishing the same type of thing that
  facet_grid does when it allows you to plot the data as a function of two
  grouping variables.
 
  Thanks for you time. I greatly appreciate it.
 
  Nate Miller
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Linear Regression with 2 grouping variables

2011-08-22 Thread Nathan Miller
Hi all,

I have a data set that looks a bit like this.

feed1
  RFU Site Vial Time   lnRFU
1   811   10  10.702075
2   4752111   20  10.768927
3   4290511   30  10.66674
4   4686711   40  10.755069
5   4299511   50  10.668839
6   4307411   60  10.670675
7   4119511   70  10.626072
8   4709012   10  10.759816
9   4810012   20  10.781037
10  4321512   30  10.673943
11  3965612   40  10.587998
12  3879912   50  10.566150
13  3842412   60 10.556438
14  3524012   70  10.469937
15  4642713   10  10.745636
16  4641813   20  10.745443
17  4209513   30  10.647684
..
There are 5 columns of data, three levels of Site, 10 Vials per site,
and measurements were taken at 10 min intervals from 10-70.. I am primarily
interested in the relationship between Time and lnRFU to calculate the
rate at which lnRFU declines over time. I have a nice plot using a ggplot2
code that looks like this

p-ggplot(data=feed1,aes(x=Time,y=lnRFU))
p+geom_point(size=4)+facet_grid(Site~Vial)+geom_smooth(method=lm)

The graph is useful to visualize the changes over time and grouped by both
Site and Vial, but I also need the slopes of the linear regressions for each
Vial, within a Site. This is where I run into a problem. I want to run a
linear regression of lnRFU as a function of Time grouped by both Site and
Vial. Its easy to visualize this comparison in ggplot using facet_grid(),
but I'm not sure how to do a similar comparison/analysis within lm()

I imagine something like

fit-lm(lnRFU~Time | Vial * Site, data=feed1)

 in which I group by both Vial and Site, but obviously this code doesn't
work. Does anyone have an idea for how to do a linear regression with two
grouping variables? Do I have to go back and combine Vial and Site into a
single grouping variable or can I leave the dataframe the way it is? I'm
trying to imagine a means of accomplishing the same type of thing that
facet_grid does when it allows you to plot the data as a function of two
grouping variables.

Thanks for you time. I greatly appreciate it.

Nate Miller

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot boxplot

2011-01-20 Thread Nathan Miller
Hello,

I am trying to generate a set of boxplots using ggplot with the following
data with 4 columns (Day, Site, VO2, Cruise)

 AllCorbulaMR
Day Site  VO2 Cruise
1 11 148.43632670  1
2 11  61.73864969  1
3 11  92.64536096  1
4 11  73.35434957  1
5 11  69.85568810  1
6 11  98.71116866  1
7 11  67.57880107  1
8 11  80.57959160  1
9 11  53.38577137  1
1012  81.08429594  1
1112  79.73019687  1
1212  67.93991806  1
1312  50.69929558  1
1412  42.02457680  1
1512  64.10049924  1
1612  80.02264095  1
1712  67.14828804  1
1812  93.33363743  1
1913  53.86021985  1
2013  50.53366868  1
2113  52.12437086  1
2213  43.44618922  1
2313  64.64322840  1
2413  55.03761768  1
2513  67.79501374  1
2614  12.70806068  1
2714 114.56401960  1
2814  34.34450695  1
2914  76.70849935  1
3014  68.99752863  1
3114  71.23080332  1
3211   0.08222308  2
3311   NA  2
3411   0.03743258  2
3511   0.04496363  2
3611   0.07184903  2
3711   0.05637676  2
3811   0.05163886  2
3911   0.03022606  2
4011   0.04150667  2
4121   0.04982530  2
4221   0.05248479  2
4321   0.03839707  2
4421   0.04283591  2
4521   0.03285247  2
4621   0.03965853  2
4721   NA  2
4821   0.03637822  2
4921   0.03686663  2
5012   0.04086229  2
5112   NA  2
5212   0.01891389  2
5312   0.03365864  2
5412   0.04179611  2
5512   0.04675111  2
5612   0.04734616  2
5712   0.04046907  2
5812   0.03395499  2
5922   0.02104620  2
6022   NA  2
6122   NA  2
6222   NA  2
6322   0.01882796  2
6422   NA  2
6522   NA  2
6622   0.02328894  2
6722   0.02635327  2
6813   0.06030056  2
6913   0.0472  2
7013   0.04307900  2
7113   0.05144241  2
7213   0.03223973  2
7313   0.05145292  2
7413   0.02718536  2
7513   0.02830348  2
7613   0.05859836  2
7723   0.04521778  2
7823   0.03242385  2
7923   0.03412688  2
8023   0.04407171  2
8123   0.04517834  2
8223   NA  2
8323   0.02745407  2
8423   0.03118602  2
8523   0.04420074  2
8614   0.05352334  2
8714   0.05378120  2
8814   0.04394838  2
8914   0.02597939  2
9014   0.05476946  2
9114   0.04371743  2
9224   0.04022729  2
9324   0.04078509  2
9424   0.04911994  2
9524   0.04068468  2
9624   NA  2
9724   NA  2
9811   0.08892223  3
9911   0.08617873  3
100   11   0.06108950  3
101   11   0.19047922  3
102   11   0.09865930  3
103   11   0.12103549  3
104   11   0.06788404  3
105   11   0.12629497  3
106   11   0.10947173  3
107   11   0.11381467  3
108   21   0.07809781  3
109   21   0.04397586  3
110   21   0.06317635  3
111   21   0.02020365  3
112   21   0.09525985  3
113   21   0.04732347  3
114   21   0.03043341  3
115   21   0.04419395  3
116   21   NA  3
117   21   NA  3
118   12   0.03380003  3
119   12   0.02600926  3
120   12   0.03980552  3
121   12   0.03659985  3
122   12   0.04867881  3
123   12   0.03694679  3
124   12   0.03372825  3
125   12   0.03644750  3
126   12   0.01497611  3
127   12   0.02697976  3
128   22   0.03136923  3
129   22   0.03602215  3
130   22   0.04000660  3
131   22   0.03673098  3
132   22   0.03090854  3
133   22   0.04877643  3
134   22   0.02468537  3
135   22   NA  3
136   22   NA  3
137   22   NA  3
138   13   0.04809866  3
139   13   0.02380070  3
140   13   0.03672271  3
141   13   0.03232115  3
142   13   0.02950701  3
143   13   0.05068163  3
144   13   0.03004234  3
145   13   0.03090461  3
146   13   0.04539888  3
147   

[R] Error in combined for() and if() code

2010-12-28 Thread Nathan Miller
Hello,

I am trying to filter a data set like below so that the peaks in the Phase
value are more obvious and can be identified by a peak finding function
following the useful advise of Carl Witthoft. I have written the following

for(i in length(data$Phase)){
newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){
data$Phase[i+1]
}else{data$Phase[i]
}
}

I get the following error which I have not seen before when I paste the code
into R

Error in if (abs(data$Phase[i + 1] - data$Phase[i])  6) { :
  missing value where TRUE/FALSE needed

I don't have much experience with such loops as I have tried to avoid using
them in the past. Can anyone identify the error(s) in the code I have
written or a simpler means of writing such a filter?

Thank you,
Nate


data=
Time Phase
1  0.000 15.18
2  0.017 13.42
3  0.034 11.40
4  0.051 18.31
5  0.068 25.23
6  0.085 33.92
7  0.102 42.86
8  0.119 42.87
9  0.136 42.88
10 0.153 42.88
11 0.170 42.87
12 0.186 42.88
13 0.203 42.88
14 0.220 42.78
15 0.237 33.50
16 0.254 24.81
17 0.271 17.20
18 0.288 10.39
19 0.305 13.97
20 0.322 16.48
21 0.339 14.75
22 0.356 20.80
23 0.373 25.79
24 0.390 31.25
25 0.407 39.89
26 0.423 40.04
27 0.440 40.05
28 0.457 40.05
29 0.474 40.05
30 0.491 40.05
31 0.508 40.06
32 0.525 40.07
33 0.542 32.23
34 0.559 23.90
35 0.576 17.86
36 0.592 11.63
37 0.609 12.78
38 0.626 13.12
39 0.643 10.93
40 0.660 10.63
41 0.677 10.82
42 0.694 11.84
43 0.711 20.44
44 0.728 27.33
45 0.745 34.22
46 0.762 41.55
47 0.779 41.55
48 0.796 41.55
49 0.813 41.53
50 0.830 41.53
51 0.847 41.52
52 0.864 41.52
53 0.880 41.53
54 0.897 41.53
55 0.914 33.07
56 0.931 25.12
57 0.948 19.25
58 0.965 11.30
59 0.982 12.48
60 0.999 13.85
61 1.016 13.62
62 1.033 12.62
63 1.050 19.39
64 1.067 25.48
65 1.084 31.06
66 1.101 39.49
67 1.118 39.48
68 1.135 39.46
69 1.152 39.45
70 1.169 39.43
71 1.185 39.42
72 1.202 39.42
73 1.219 39.41
74 1.236 39.41
75 1.253 37.39
76 1.270 29.03
77 1.287 20.61
78 1.304 14.07
79 1.321  9.12

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in combined for() and if() code

2010-12-28 Thread Nathan Miller
Hi all,

I haven't solved the problem of filtering the data, but I have managed to
find all the peaks in the data despite their relatively flat nature using
peaks() in the IDPmisc package. It works really well for my data and the
ability to set a lower threshold for peaks to report is convenient as well.

Maybe I'll came back to the data filtering problem later.

Thanks for your help and comments,
Nate

On Tue, Dec 28, 2010 at 10:49 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Dec 28, 2010, at 1:08 PM, Nathan Miller wrote:

  Hello,

 I am trying to filter a data set like below so that the peaks in the Phase
 value are more obvious and can be identified by a peak finding function
 following the useful advise of Carl Witthoft. I have written the following

 for(i in length(data$Phase)){
 newphase=if(abs(data$Phase[i+1]-data$Phase[i])6){
 data$Phase[i+1]
 }else{data$Phase[i]
 }
 }

 I get the following error which I have not seen before when I paste the
 code
 into R

 Error in if (abs(data$Phase[i + 1] - data$Phase[i])  6) { :
  missing value where TRUE/FALSE needed

 I don't have much experience with such loops as I have tried to avoid
 using
 them in the past. Can anyone identify the error(s) in the code I have
 written or a simpler means of writing such a filter?


 Sometimes it's more informative to look at the data first. Here's a plot of
 the data with the first and second differences underneath

  plot(data, ylim=c(-5, max(data$Phase)) )
 lines(data$Time[-1], diff(data$Phase) )
 lines(data$Time[-(1:2)], diff(diff(data$Phase)), col=red)

 Your data had rather flat-topped maxima. These maxima are defined by the
  interval between the times when the first differences are zero (OR go from
 positive to negative)  AND the second differences are negative (OR zero).

 There is a package on CRAN:

 http://cran.r-project.org/web/packages/msProcess/index.html

   that purports to do peak finding. I would think the local maxima in
 you data might need some filtering and presumably the mass-spec people have
 need of that too.





 Thank you,
 Nate


 data=
   Time Phase
 1  0.000 15.18
 2  0.017 13.42
 3  0.034 11.40
 4  0.051 18.31
 5  0.068 25.23
 6  0.085 33.92
 7  0.102 42.86
 8  0.119 42.87
 9  0.136 42.88
 10 0.153 42.88
 11 0.170 42.87
 12 0.186 42.88
 13 0.203 42.88
 14 0.220 42.78
 15 0.237 33.50
 16 0.254 24.81
 17 0.271 17.20
 18 0.288 10.39
 19 0.305 13.97
 20 0.322 16.48
 21 0.339 14.75
 22 0.356 20.80
 23 0.373 25.79
 24 0.390 31.25
 25 0.407 39.89
 26 0.423 40.04
 27 0.440 40.05
 28 0.457 40.05
 29 0.474 40.05
 30 0.491 40.05
 31 0.508 40.06
 32 0.525 40.07
 33 0.542 32.23
 34 0.559 23.90
 35 0.576 17.86
 36 0.592 11.63
 37 0.609 12.78
 38 0.626 13.12
 39 0.643 10.93
 40 0.660 10.63
 41 0.677 10.82
 42 0.694 11.84
 43 0.711 20.44
 44 0.728 27.33
 45 0.745 34.22
 46 0.762 41.55
 47 0.779 41.55
 48 0.796 41.55
 49 0.813 41.53
 50 0.830 41.53
 51 0.847 41.52
 52 0.864 41.52
 53 0.880 41.53
 54 0.897 41.53
 55 0.914 33.07
 56 0.931 25.12
 57 0.948 19.25
 58 0.965 11.30
 59 0.982 12.48
 60 0.999 13.85
 61 1.016 13.62
 62 1.033 12.62
 63 1.050 19.39
 64 1.067 25.48
 65 1.084 31.06
 66 1.101 39.49
 67 1.118 39.48
 68 1.135 39.46
 69 1.152 39.45
 70 1.169 39.43
 71 1.185 39.42
 72 1.202 39.42
 73 1.219 39.41
 74 1.236 39.41
 75 1.253 37.39
 76 1.270 29.03
 77 1.287 20.61
 78 1.304 14.07
 79 1.321  9.12

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Finding flat-topped peaks in simple data set

2010-12-23 Thread Nathan Miller
Hello,
Thank you to all those great folks that have helped me in the past
(especially Dennis Murphy).

I have a new challenge. I often generate time-series data sets that look
like the one below, with a variable (Phase) which has a series of
flat-topped peaks (sample data below with 5 peaks). I would like to
calculate the phase value for each peak. It would be great to calculate the
mean value for the peak (possibly using rollmean(), but given the low
variability within the broad peak even just picking one value from the peak
would be sufficient.
I have tried using peaks() from the library(simecol),

peaks(data$Time,data$phase, model=max),

but because of the flat nature of the peaks and the fact that peaks() looks
for values with lower neighbors I am unable to find all the peaks
(especially the one between time 0.762-0.897). For all my data files the
maximum phase values will fall between 27 and 62 so adding a function which
removes the values below 27 is reasonable and a technique I have used to
eliminate some of the noise between the peaks.


data=
Time Phase
1  0.000 15.18
2  0.017 13.42
3  0.034 11.40
4  0.051 18.31
5  0.068 25.23
6  0.085 33.92
7  0.102 42.86
8  0.119 42.87
9  0.136 42.88
10 0.153 42.88
11 0.170 42.87
12 0.186 42.88
13 0.203 42.88
14 0.220 42.78
15 0.237 33.50
16 0.254 24.81
17 0.271 17.20
18 0.288 10.39
19 0.305 13.97
20 0.322 16.48
21 0.339 14.75
22 0.356 20.80
23 0.373 25.79
24 0.390 31.25
25 0.407 39.89
26 0.423 40.04
27 0.440 40.05
28 0.457 40.05
29 0.474 40.05
30 0.491 40.05
31 0.508 40.06
32 0.525 40.07
33 0.542 32.23
34 0.559 23.90
35 0.576 17.86
36 0.592 11.63
37 0.609 12.78
38 0.626 13.12
39 0.643 10.93
40 0.660 10.63
41 0.677 10.82
42 0.694 11.84
43 0.711 20.44
44 0.728 27.33
45 0.745 34.22
46 0.762 41.55
47 0.779 41.55
48 0.796 41.55
49 0.813 41.53
50 0.830 41.53
51 0.847 41.52
52 0.864 41.52
53 0.880 41.53
54 0.897 41.53
55 0.914 33.07
56 0.931 25.12
57 0.948 19.25
58 0.965 11.30
59 0.982 12.48
60 0.999 13.85
61 1.016 13.62
62 1.033 12.62
63 1.050 19.39
64 1.067 25.48
65 1.084 31.06
66 1.101 39.49
67 1.118 39.48
68 1.135 39.46
69 1.152 39.45
70 1.169 39.43
71 1.185 39.42
72 1.202 39.42
73 1.219 39.41
74 1.236 39.41
75 1.253 37.39
76 1.270 29.03
77 1.287 20.61
78 1.304 14.07
79 1.321  9.12


I have also tried using the peak finding code from Prof. Ripely, but it
doesn't seem able to find all the peaks either even if I widen the span.

peaks-function(series,span=3)
{
z - embed(series, span)
s - span%/%2
v- max.col(z) == 1 + s
result - c(rep(FALSE,s),v)
result - result[1:(length(result)-s)]
result
}

Can anyone offer some advice on how to find a series of peaks in a data file
when the peaks are flat-topped and actually less peaked?

Thanks so much!
Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot missing fill colours in boxplot legend

2010-12-17 Thread Nathan Miller
Hello,

I am trying to create a series of boxplots with the following data, three
columns, Day (1 or 2), Site (1-4), and VO2 (some values missing for
some Sites or Days)

 CorbulaMR3
   Day Site   VO2
111  88.92223
211  86.17873
311  61.08950
411 190.47922
511  98.65930
611 121.03549
711  67.88404
811 126.29497
911 109.47173
10   11 113.81467
11   21  78.09781
12   21  43.97586
13   21  63.17635
14   21  20.20365
15   21  95.25985
16   21  47.32347
17   21  30.43341
18   21  44.19395
19   21NA
20   21NA
21   12  33.80003
22   12  26.00926
23   12  39.80552
24   12  36.59985
25   12  48.67881
26   12  36.94679
27   12  33.72825
28   12  36.44749
29   12  14.97611
30   12  26.97976
31   22  31.36923
32   22  36.02215
33   22  40.00660
34   22  36.73098
35   22  30.90854
36   22  48.77643
37   22  24.68537
38   22NA
39   22NA
40   22NA
41   13  48.09866
42   13  23.80070
43   13  36.72271
44   13  32.32115
45   13  29.50701
46   13  50.68163
47   13  30.04234
48   13  30.90461
49   13  45.39888
50   13  32.61571
51   23  20.69708
52   23  21.17658
53   23  32.44907
54   23  34.04048
55   23  45.97381
56   23  40.34278
57   23  21.67128
58   23  22.45179
59   23NA
60   23NA
61   14  39.35840
62   14  29.32294
63   14  49.28409
64   14  50.75344
65   14  43.53663
66   14  33.76173
67   14  36.40901
68   14  36.16992
69   14  49.99144
70   14NA
71   24  28.64131
72   24  22.69317
73   24  42.31203
74   24  31.17968
75   24  59.36813
76   24  44.53866
77   24  45.96834
78   24  33.16604
79   24  35.57714
80   24  35.46922

I have the following code which works well except that in the legend the
fill colour for Day 1 and Day 2 is left blank. Instead the little boxes next
to the labels (Day 1, Day 2) are both unfilled (white). The boxplots are
filled properly, but without the fill colour in the legend its difficult
decipher the plot

p=ggplot(CorbulaMR3,aes(factor(Site),VO2))
p+geom_boxplot(aes(fill=factor(Day)))+scale_fill_manual('Day', breaks =
c('Day 1', 'Day 2'),
+ values = c('grey','black'),
+ labels = c('Day 1', 'Day 2'))+xlab('Sampling Site')+ylab('Metabolic
Rate')+opts(title=Sampling)

I've played around and if I don't use the scale_fill_manual() and instead
simply plot using

p+geom_boxplot(aes(fill=factor(Day)))

the legend boxes are filled, but obviously plot/legend is not labeled in the
manner I would like and I don't much care for the default colours.

Any thoughts on why when I use the scale_fill_manual() the fill colours are
not shown in the legend? I'm pretty sure its something simple.

I'm working on a Mac with R 2.12.0

Thank you,

Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Wait for user input with readline()

2010-11-22 Thread Nathan Miller
Hello,

I am trying write a script that includes a prompt for user input using
readlines(). I am running into the problem that when I run readlines() as a
single line the prompt works perfectly, but when I try to run a block of
code which includes the readline function, the script doesn't wait for the
user input. I have seen this question posted before when I did a search, but
I didn't find an suitable answer. Is there a means of ensuring that the
script does not proceed until a value has been entered to readline(). Can I
put readline in a function that will wait for input?

Are there other options for getting user input that allow require that the
script wait for user input?

Thanks for your help,

Nate

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Batch Processing Files

2010-11-16 Thread Nathan Miller
Dennis and all,


Thank you for the help as I try to get this method for importing and batch
processing files organized. I currently have this set-up to import data from
two files in my working directory. Var1 specifies data from file 1 and
file 2.



filenames=list.files()

library(plyr)

import.list=adply(filenames, 1, read.csv)

import.list



   Var1 Time O2_conc Chla_conc

1 10   273.5   300

2 1   10   268.1   240

3 1   20   262.8   210

4 1   30   257.4   175

5 1   40   252.0   155

6 1   50   246.6   138

7 1   60   241.3   129

8 1   70   235.9   121

9 1   80   230.5   117

1020   270.0   300

112   10   269.0   240

122   20   257.0   210

132   30   259.0   175

142   40   230.0   155

152   50   220.0   138

162   60   221.0   129

172   70   450.0   121

182   80   250.0   117



For my calculation I am only interested in column 3, so I used split() to
pull out this column based upon the value of Var1.





split(import.list[,3],import.list$Var1)

$`1`

[1] 273.5 268.1 262.8 257.4 252.0 246.6 241.3 235.9 230.5



$`2`

[1] 270 269 257 259 230 220 221 450 250



Finally I want to find the minimum value in column 3 (for each file) and the
three previous data points, and then calculate the mean of these four
values.



 a=which.min(data$1)

 b=(which.min(data$1)-3)

 c=c(a:b)

 d=mean(data$1[c])

 d

[1] 238.575



So far this works, but this last set of calculations (the min, mean) are not
very automated and are probably a bit cumbersome.

Rather than having to replace a=which.min(data$”1”) with  a=which.min(data$”2”)
to do this calculation on the second file I would like to write a loop or
use some R package with looping abilities to cycle through this (I have more
than 2 files, this is just a test set).



I’m imagining a psuedocode something like this



For (i in 1:20) {

a[i]=which.min(data$”[i]”)

b[i]=which.min(data$”[i]”)-3

c[i]=c(a[i]:b[i])

etc…

 ...though this particular code is most certainly too cumbersome.



Any thoughts on how to make a loop so that I can cycle through the files and
run these calculations?



Thank you,

Nate


On Mon, Nov 15, 2010 at 9:12 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 See inline.

 On Mon, Nov 15, 2010 at 4:26 PM, Nate Miller natemille...@gmail.comwrote:

 Hi All!

 I have some experience with R, but less experience writing scripts using
 R and have run into a challenge that I hope someone can help me with.

  I have multiple .csv files of data each with the same 3 columns of
 data, but potentially of varying lengths (some data files are from short
 measurements, others from longer ones). One file for example might look
 like this...

 Time, O2_conc, Chla_conc

 0,270,300

 10, 260, 280

 20, 245, 268

 30, 233, 238

 40, 222, 212

 50, 215, 201

 60, 208, 193

 70, 206, 191

 80, 207,189

 90, 206, 186

 100, 206, 183

 110, 207, 178

 120, 205, 174

 130, 240, 171

 140, 270, 155

 I am looking for an efficient means of batch (or sequentially)
 processing these files so that I can
 1. import each data file

 2. find the minimum value recorded in column 2 and the previous 5 data
 points


 Don't know what you mean by the 'previous 5 data points' ...are you
 referring to a rolling minimum?


 3. and average these 10 values to get a mean, minimum value.


 If the surmise above is correct, you should get 11 rolling mins for a
 vector of length 15. Here's an example using the rollapply() function from
 the zoo package:

 library(zoo)
  x - rpois(15, 10)
  x
  [1] 17 12 17  9  8 10  7 11 15 11 11 15  5  9 12
  rollapply(zoo(x), 5, FUN = min)
  3  4  5  6  7  8  9 10 11 12 13
  8  8  7  7  7  7  7 11  5  5  5
  mean(rollapply(zoo(x), 5, FUN = min))
 [1] 7


 Currently I have imported the data files using the following

 filenames=list.files()

 library(plyr)

 import.list=adply(filenames, 1, read.csv)


 This seems to be a reasonable approach. Does the result keep a column for
 the file names?


 and I know how to write a code to calculate the minimum value and the 5
 preceding values in a single column, in a single file. I think the
 problem I am running into is scaling this code up so that I can import
 multiple files and calculating mean, minimum value for the 2^nd column
 in each of them.


 As long as you have an indicator for each file, this should be pretty
 straightforward. Write a function that produces the summaries for one data
 frame and then do something like

 ddply(slurpedFiles, .(dsIndicator), myfunction)

 to map the function to all of them.

 The data.table package is an alternative, where you should be able to do
 similar things using the data set names as a key. data.table 'thinks' more
 like an SQL, but it can be very efficient.

 You should be able to do what you're asking for with either package.

 HTH,
 Dennis


 Can anyone offer some advice on how to batch processes