Sorry, i'll try and put more flesh on the bones. please note, i changed the data in the example, as fiddling has raised another question that's best illustrated with a slightly different data set.
first of all, when i do as you suggest, i obtain the following error: > PxMat <- aggregate(mm[,-1] ~ mm[,1], data=mm, sum) Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) : 'names' attribute [3] must be the same length as the vector [1] my data.frame is an xts, and it looks like this: px_ym1 vol_ym1 2012-06-01 09:30:00 97.90 9 2012-06-01 09:30:00 97.90 60 2012-06-01 09:30:00 97.90 71 2012-06-01 09:30:00 97.90 5 2012-06-01 09:30:00 97.90 3 2012-06-01 09:30:00 97.90 21 2012-06-01 09:31:00 97.90 5 2012-06-01 09:31:00 97.89 192 2012-06-01 09:31:00 97.89 65 2012-06-01 09:31:00 97.89 73 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.89 39 2012-06-01 09:31:00 97.90 15 2012-06-01 09:31:00 97.90 1 2012-06-01 09:31:00 97.89 1 2012-06-01 09:31:00 97.90 18 2012-06-01 09:31:00 97.89 1 2012-06-01 09:32:00 97.89 33 2012-06-01 09:34:00 97.89 1 2012-06-01 09:34:00 97.89 1 dput(mn) returns: > dput(mn) structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89, 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89, 97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1, 1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .indexCLASS = c("POSIXct", "POSIXt"), .indexTZ = "GMT", class = c("xts", "zoo"), index = structure(c(1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543120, 1338543240, 1338543240), tzone = "GMT", tclass = c("POSIXct", "POSIXt")), .Dim = c(21L, 2L), .Dimnames = list(NULL, c("px_ym1", "vol_ym1"))) as you can see, the xts data.frame xts data.frame that contains dates, prices and volumes. There is much more data over a long time period, and i'm interested in various sub-setting and then aggregate operations. I would like to split the data by time period and aggregate the data, such that i obtain a table which reports the volume traded at each price, for each of the time-period splits that i have chosen. I have employed the following approach: PxMat <- aggregate(.~px_ym1, data=mn, sum) which yields: px_ym1 vol_ym1 1 97.89 408 2 97.90 208 and for subsets, i use the following grouping: >PxMat30 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '30'], sum) Which yields: px_ym1 vol_ym1 1 97.9 169 and > PxMat31 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '31'], sum) which yields: px_ym1 vol_ym1 1 97.89 373 2 97.90 39 and so on and so forth for each minute. when i try and sub-set using general notation, as follows: PxMat <- aggregate(.~mn[,1], data=mn, sum) this yields a different form of output: px_ym1 px_ym1 vol_ym1 1 97.90 1076.79 408 2 97.89 979.00 208 the problem is that i now have the sum of the px_ym1 data (the sum of mn[,1]) hopefully things are now clearer - sorry to have wasted your time up until now. assuming that i have now made my situation clear, i am hope you can help with four specific questions. 1/ My data-sets are HUGE, so speed is an issue - is this the fastest way to sub-set and aggregate an xts? 2/ is there a way to do this for multiple splits? say a table for each minute, day, week, or month? the return would potentially be a list with a table for each day / minute etc showing volume traded at each price -- but it doesn't have to be a list ... i am writing a function with loops that would generate a table that reports volume traded at each price for each case of a specified time split (say for four tables, one for each minute in the example data, returned as a list). my solution is slow, it seems like something that someone would have done better already. is this the case? 3/ is there a way to do the sub-setting with templated variables? i would like to obtain the table i get with the named aggregate functions (reproduced above) with multiple data frames, as the column names will differ from time to time. i cannot figure out how to stop the command from summing the mn[,1] column when i stop using variable names. 4/ on a related note, is it possible to apply different functions to different columns of data? It would be nice, for example, if the table returned from an aggregate command could be made to be: px_ym1 count vol_ym1 1 97.90 11 408 2 97.89 10 208 where we have the price traded, the number of trades (a count of px_ym1 / mn[,1], and the sum of vol_ym1 (mn[,2]). thanks and best regards matt johnson On 13 June 2012 15:06, David Winsemius <dwinsem...@comcast.net> wrote: > > > On Jun 12, 2012, at 11:32 PM, Matthew Johnson wrote: > >> Dear R-help, >> >> I have an xts data set that i have subset by date. >> >> now it contains a date-time-stamp, and two columns (price and volume >> traded): my objective is to create tables of volume traded at a price - and >> i've been successfully using aggregate to do so in interactive use. >> >> say the data looks as follows: >> >> px_ym1 vol_ym1 >> 2012-06-01 09:37:00 97.91 437 >> 2012-06-01 09:37:00 97.91 64 >> 2012-06-01 09:37:00 97.91 1 >> 2012-06-01 09:37:00 97.91 5 >> 2012-06-01 09:37:00 97.91 5 >> 2012-06-01 09:37:00 97.92 174 >> 2012-06-01 09:37:00 97.92 64 >> 2012-06-01 09:37:00 97.92 125 >> 2012-06-01 09:37:00 97.92 124 >> 2012-06-01 09:37:00 97.92 64 >> 2012-06-01 09:37:00 97.92 109 >> 2012-06-01 09:37:00 97.92 64 >> 2012-06-01 09:37:00 97.92 19 >> 2012-06-01 09:37:00 97.92 45 >> 2012-06-01 09:37:00 97.92 75 >> 2012-06-01 09:37:00 97.92 3 >> 2012-06-01 09:37:00 97.92 47 >> 2012-06-01 09:37:00 97.91 26 >> 2012-06-01 09:37:00 97.92 4 >> 2012-06-01 09:37:00 97.92 1 >> >> the the following gives me what i'm looking for: >> >>> adf <- aggregate(.~px_ym1, data=mm, sum) >> >> >> which is this table: >> >> px_ym1 vol_ym1 >> 1 97.91 538 >> 2 97.92 918 >> >> however now i'm trying to code it to run automatically, and use of the >> templated version: >> >>> adf <- aggregate(.~mm[,1], data=mm, sum) > > > Did you try: > > adf <- aggregate(mm[,-1] ~ mm[,1], data=mm, sum) > adf > > I would have used names: > > adf <- aggregate(vol_ym1 ~ px_ym1, data=mm, sum) > adf > > > >> yields the following - which contains what i'd like, but is has also summed >> across the price column (not ideal). >> >> px_ym1 px_ym1 vol_ym1 >> 1 97.91 587.46 538 >> 2 97.92 1370.88 918 >> >> how do i code this so that i can enter an xts data-frame with arbitrary >> names and still obtain the table with only the information i desire? > > > That is too far to the vague side of the vague-specific continuum. > > >> >> on a related point, is there a way to combine the two steps? > > > Er, which two steps would that be? > > >> the function >> i've written splits by date and then returns a list containing data-frames >> that report the volume traded at each price on each date >> >> - am i re-creating the wheel here? is there canned function that does this? >> >> thanks + best regards >> >> matt johnson > > > > David Winsemius, MD > West Hartford, CT > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.