Note that ddply is a heavyweight solution, and as your data gets larger
you may find that using it for little things like this hits performance.
Also, "df" is a base function that you might actually want to use someday,
and you also introduce confusion in the mind of someone reading your code
if you redefine it this way.
existingdf <- read.csv( text=
"storm,Q_time,Q
s1,2008-08-07 21:15:00,0.000
s1,2008-08-07 21:16:00,3.020
s1,2008-08-07 21:17:00,6.041
s1,2008-08-07 21:18:00,9.061
s1,2008-08-07 21:19:00,12.082
s1,2008-08-07 21:20:00,15.102
s1,2008-08-07 21:21:00,18.123
s1,2008-08-07 21:22:00,11.143
s1,2008-08-07 21:23:00,0.000
s2,2010-10-05 21:00:00,0.000
s2,2010-10-05 21:01:00,1.812
s2,2010-10-05 21:02:00,3.625
s2,2010-10-05 21:03:00,5.437
s2,2010-10-05 21:04:00,7.249
s2,2010-10-05 21:05:00,9.061
s2,2010-10-05 21:06:00,0.874
s2,2010-10-05 21:07:00,0.000
", as.is=TRUE )
library(plyr)
# plyr solution
newdf <- ddply( existingdf
, "storm"
, function( DF ) {
transform( DF
, duration=seq.int( length.out=nrow( DF ) ) )
}
)
# base R solution
newdf2 <- transform( existingdf
, duration=ave( rep( 1, nrow(existingdf) )
, storm
, FUN=cumsum ) )
On Wed, 16 Apr 2014, Steve E. wrote:
Dear R Community,
I am having some trouble with a task that I hope you might be able to help
with. I have a dataset that includes the time and corresponding stream
discharge from numerous storms (example of structure with simplified data
below). I would like to produce a field that details the duration of each
storm, where each storm is a subset of the data and the duration runs from
zero to end for each unique storm. I have been trying to accomplish this
with ddply but to no avail as I am unable to provide ddply (e.g., below)
with the length of the storm (i.e., subset of data). Thank you in advance,
any help would be appreciated.
existing df:
storm,Q_time,Q
s1,2008-08-07 21:15:00,0.000
s1,2008-08-07 21:16:00,3.020
s1,2008-08-07 21:17:00,6.041
s1,2008-08-07 21:18:00,9.061
s1,2008-08-07 21:19:00,12.082
s1,2008-08-07 21:20:00,15.102
s1,2008-08-07 21:21:00,18.123
s1,2008-08-07 21:22:00,11.143
s1,2008-08-07 21:23:00,0.000
s2,2010-10-05 21:00:00,0.000
s2,2010-10-05 21:01:00,1.812
s2,2010-10-05 21:02:00,3.625
s2,2010-10-05 21:03:00,5.437
s2,2010-10-05 21:04:00,7.249
s2,2010-10-05 21:05:00,9.061
s2,2010-10-05 21:06:00,0.874
s2,2010-10-05 21:07:00,0.000
desired df:
storm,Q_time,Q, duration
s1,2008-08-07 21:15:00,0.000,1
s1,2008-08-07 21:16:00,3.020,2
s1,2008-08-07 21:17:00,6.041,3
s1,2008-08-07 21:18:00,9.061,4
s1,2008-08-07 21:19:00,12.082,5
s1,2008-08-07 21:20:00,15.102,6
s1,2008-08-07 21:21:00,18.123,7
s1,2008-08-07 21:22:00,11.143,8
s1,2008-08-07 21:23:00,0.000,9
s2,2010-10-05 21:00:00,0.000,1
s2,2010-10-05 21:01:00,1.812,2
s2,2010-10-05 21:02:00,3.625,3
s2,2010-10-05 21:03:00,5.437,4
s2,2010-10-05 21:04:00,7.249,5
s2,2010-10-05 21:05:00,9.061,6
s2,2010-10-05 21:06:00,0.874,7
s2,2010-10-05 21:07:00,0.000,8
I have been trying variations of the following statement, but I cannot seem
to get the length of the subset correct as I receive an error of the type
'Error: arguments imply differing number of rows: 2401, 0'.
newdf <- ddply(df, "storm", transform, FUN = function(x)
{duration=seq(from=1, by=1, length.out=nrow(x))})
I would really like to get a handle on ddply in this instance as it will be
quite helpful for many other similar calculations that I need to do with
this dataset.
Thanks again,
Stevan
--
View this message in context:
http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.