Regardless of whether you use the lower-level split function, or the 
higher-level aggregate function, or the tidyverse group_by function, the key is 
learning how to create the column that is the same for all records 
corresponding to the time interval of interest.

If you convert the sampdate to POSIXct, the tz IS important, because most of us 
use local timezones that respect daylight savings time, and a naive conversion 
of standard time will run into trouble if R is assuming daylight savings time 
applies. The lubridate package gets around this by always assuming UTC and 
giving you a function to "fix" the timezone after the conversion. I prefer to 
always be specific about timezones, at least by using so something like

    Sys.setenv( TZ = "Etc/GMT+8" )

which does not respect daylight savings.

Regarding using character data for identifying the month, in order to have 
clean plots of the data I prefer to use the trunc function but it returns a 
POSIXlt so I convert it to POSIXct:

    discharge$sampmonthbegin <- as.POSIXct( trunc( discharge$sampdate, units = 
"months" ) )

Then any of various ways can be used to aggregate the records by that column.

On September 2, 2021 12:10:15 PM PDT, Andrew Simmons <akwsi...@gmail.com> wrote:
>You could use 'split' to create a list of data frames, and then apply a
>function to each to get the means and sds.
>
>
>cols <- "cfs"  # add more as necessary
>S <- split(discharge[cols], format(discharge$sampdate, format = "%Y-%m"))
>means <- do.call("rbind", lapply(S, colMeans, na.rm = TRUE))
>sds   <- do.call("rbind", lapply(S, function(xx) sapply(xx, sd, na.rm =
>TRUE)))
>
>On Thu, Sep 2, 2021 at 3:01 PM Rich Shepard <rshep...@appl-ecosys.com>
>wrote:
>
>> On Thu, 2 Sep 2021, Rich Shepard wrote:
>>
>> > If I correctly understand the output of as.POSIXlt each date and time
>> > element is separate, so input such as 2016-03-03 12:00 would now be 2016
>> 03
>> > 03 12 00 (I've not read how the elements are separated). (The TZ is not
>> > important because all data are either PST or PDT.)
>>
>> Using this script:
>> discharge <- read.csv('../data/water/discharge.dat', header = TRUE, sep =
>> ',', stringsAsFactors = FALSE)
>> discharge$sampdate <- as.POSIXlt(discharge$sampdate, tz = "",
>>                                   format = '%Y-%m-%d %H:%M',
>>                                   optional = 'logical')
>> discharge$cfs <- as.numeric(discharge$cfs, length = 6)
>>
>> I get this result:
>> > head(discharge)
>>               sampdate    cfs
>> 1 2016-03-03 12:00:00 149000
>> 2 2016-03-03 12:10:00 150000
>> 3 2016-03-03 12:20:00 151000
>> 4 2016-03-03 12:30:00 156000
>> 5 2016-03-03 12:40:00 154000
>> 6 2016-03-03 12:50:00 150000
>>
>> I'm completely open to suggestions on using this output to calculate
>> monthly
>> means and sds.
>>
>> If dplyr:summarize() will do so please show me how to modify this command:
>> disc_monthly <- ( discharge
>>          %>% group_by(sampdate)
>>          %>% summarize(exp_value = mean(cfs, na.rm = TRUE))
>> because it produces daily means, not monthly means.
>>
>> TIA,
>>
>> Rich
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to