thank you for your patience. i assure you i will get better with the appropriate etiquette - and hopefully eventually contribute.
On 13 June 2012 16:18, David Winsemius <dwinsem...@comcast.net> wrote: > > On Jun 13, 2012, at 10:09 AM, Matthew Johnson wrote: > >> my sessioninfo was as follows: >> >>> sessionInfo() >> >> R version 2.14.1 (2011-12-22) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods >> [7] base >> >> other attached packages: >> [1] xts_0.8-2 zoo_1.7-6 >> >> loaded via a namespace (and not attached): >> [1] grid_2.14.1 lattice_0.20-0 >> >> >> i have now updated to R 2.15, and my session info is: >> >>> sessionInfo() >> >> R version 2.15.0 (2012-03-30) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods >> [7] base >> >> other attached packages: >> [1] xts_0.8-6 zoo_1.7-7 >> >> loaded via a namespace (and not attached): >> [1] grid_2.15.0 lattice_0.20-6 tools_2.15.0 >> >> For the XTS object mn your suggestion still fails with an error: >> >>> adf <- aggregate(mn[,-1]~mn[,1], data=mn, sum); adf >> >> Error in aggregate.formula(mn[, -1] ~ mn[, 1], data = mn, sum) : >> 'names' attribute [3] must be the same length as the vector [1] >> >> however when i convert to a zoo with >> >>> mnz <- as.zoo(mn) >> >> >> I get some errors, but it works > > > Those are warnings, ... not errors. > > >> >>> adf <- aggregate(mnz[,-1]~mnz[,1], data=mnz, sum); adf >> >> Warning messages: >> 1: In zoo(rval, index(x)[i]) : >> some methods for “zoo” objects do not work if the index entries in >> ‘order.by’ are not unique >> 2: In zoo(rval, index(x)[i]) : >> some methods for “zoo” objects do not work if the index entries in >> ‘order.by’ are not unique >> 3: In zoo(rval[i], index(x)[i]) : >> some methods for “zoo” objects do not work if the index entries in >> ‘order.by’ are not unique >> 4: In zoo(rval[i], index(x)[i]) : >> some methods for “zoo” objects do not work if the index entries in >> ‘order.by’ are not unique >> 5: In zoo(xc[ind], ix[ind]) : >> some methods for “zoo” objects do not work if the index entries in >> ‘order.by’ are not unique >> 6: In zoo(xc[ind], ix[ind]) : >> some methods for “zoo” objects do not work if the index entries in >> ‘order.by’ are not unique >> mnz[, 1] mnz[, -1] >> 1 97.90 408 >> 2 97.89 208 >> >> So is this a bug in XTS? > > > It does look that way to me. The correct way to report this is to contact > the package maintainer (copied on this message) , (although I did notice > that Joshua Ulrich already looked at this posting in SO and he is on the xts > development team). You should have put in this at the beginning of your code > : > > library(xts) > > -- > David. > >> >> thanks for your patience >> >> mj >> >> On 13 June 2012 15:53, David Winsemius <dwinsem...@comcast.net> wrote: >>> >>> >>> On Jun 13, 2012, at 9:38 AM, Matthew Johnson wrote: >>> >>>> Sorry, i'll try and put more flesh on the bones. >>>> >>>> please note, i changed the data in the example, as fiddling has raised >>>> another question that's best illustrated with a slightly different >>>> data set. >>>> >>>> first of all, when i do as you suggest, i obtain the following error: >>>> >>>>> PxMat <- aggregate(mm[,-1] ~ mm[,1], data=mm, sum) >>>> >>>> >>>> >>>> Error in aggregate.formula(mm[, -1] ~ mm[, 1], data = mm, sum) : >>>> 'names' attribute [3] must be the same length as the vector [1] >>> >>> >>> >>> Very strange. When I just did it with the structure you (cross-) posted >>> on >>> SO I got: >>> >>>> adf <- aggregate(mm[,-1]~mm[,1], data=mm, sum); adf >>> >>> snipped warning messages >>> mm[, 1] mm[, -1] >>> >>> 1 97.91 538 >>> 2 97.92 918 >>> >>> I had earlier tested it with a zoo object I had constructed and did it >>> again >>> with the structure below. >>> >>> mm[, 1] mm[, -1] >>> >>> 1 97.91 538 >>> 2 97.92 918 >>> >>> I'm using zoo_1.7-6 and R version 2.14.2 on a Mac. I do not remember you >>> posting the requested information about your versions. >>> >>> -- >>> David. >>> >>> >>>> >>>> my data.frame is an xts, and it looks like this: >>>> >>>> px_ym1 vol_ym1 >>>> 2012-06-01 09:30:00 97.90 9 >>>> 2012-06-01 09:30:00 97.90 60 >>>> 2012-06-01 09:30:00 97.90 71 >>>> 2012-06-01 09:30:00 97.90 5 >>>> 2012-06-01 09:30:00 97.90 3 >>>> 2012-06-01 09:30:00 97.90 21 >>>> 2012-06-01 09:31:00 97.90 5 >>>> 2012-06-01 09:31:00 97.89 192 >>>> 2012-06-01 09:31:00 97.89 65 >>>> 2012-06-01 09:31:00 97.89 73 >>>> 2012-06-01 09:31:00 97.89 1 >>>> 2012-06-01 09:31:00 97.89 1 >>>> 2012-06-01 09:31:00 97.89 39 >>>> 2012-06-01 09:31:00 97.90 15 >>>> 2012-06-01 09:31:00 97.90 1 >>>> 2012-06-01 09:31:00 97.89 1 >>>> 2012-06-01 09:31:00 97.90 18 >>>> 2012-06-01 09:31:00 97.89 1 >>>> 2012-06-01 09:32:00 97.89 33 >>>> 2012-06-01 09:34:00 97.89 1 >>>> 2012-06-01 09:34:00 97.89 1 >>>> >>>> dput(mn) returns: >>>> >>>>> dput(mn) >>>> >>>> >>>> structure(c(97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.9, 97.89, >>>> 97.89, 97.89, 97.89, 97.89, 97.89, 97.9, 97.9, 97.89, 97.9, 97.89, >>>> 97.89, 97.89, 97.89, 9, 60, 71, 5, 3, 21, 5, 192, 65, 73, 1, >>>> 1, 39, 15, 1, 1, 18, 1, 33, 1, 1), .indexCLASS = c("POSIXct", >>>> "POSIXt"), .indexTZ = "GMT", class = c("xts", "zoo"), index = >>>> structure(c(1338543000, >>>> 1338543000, 1338543000, 1338543000, 1338543000, 1338543000, 1338543060, >>>> 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, >>>> 1338543060, 1338543060, 1338543060, 1338543060, 1338543060, 1338543120, >>>> 1338543240, 1338543240), tzone = "GMT", tclass = c("POSIXct", >>>> "POSIXt")), .Dim = c(21L, 2L), .Dimnames = list(NULL, c("px_ym1", >>>> "vol_ym1"))) >>>> >>>> as you can see, the xts data.frame xts data.frame that contains dates, >>>> prices and volumes. There is much more data over a long time period, >>>> and i'm interested in various sub-setting and then aggregate >>>> operations. >>>> >>>> I would like to split the data by time period and aggregate the data, >>>> such that i obtain a table which reports the volume traded at each >>>> price, for each of the time-period splits that i have chosen. >>>> >>>> I have employed the following approach: >>>> >>>> PxMat <- aggregate(.~px_ym1, data=mn, sum) >>>> >>>> >>>> which yields: >>>> >>>> px_ym1 vol_ym1 >>>> 1 97.89 408 >>>> 2 97.90 208 >>>> >>>> and for subsets, i use the following grouping: >>>> >>>>> PxMat30 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '30'], sum) >>>> >>>> >>>> >>>> Which yields: >>>> >>>> px_ym1 vol_ym1 >>>> 1 97.9 169 >>>> >>>> and >>>> >>>>> PxMat31 <- aggregate(.~px_ym1, data=mn[.indexmin(mn) == '31'], sum) >>>> >>>> >>>> >>>> which yields: >>>> >>>> px_ym1 vol_ym1 >>>> 1 97.89 373 >>>> 2 97.90 39 >>>> >>>> and so on and so forth for each minute. >>>> >>>> when i try and sub-set using general notation, as follows: >>>> >>>> PxMat <- aggregate(.~mn[,1], data=mn, sum) >>>> >>>> this yields a different form of output: >>>> >>>> px_ym1 px_ym1 vol_ym1 >>>> 1 97.90 1076.79 408 >>>> 2 97.89 979.00 208 >>>> >>>> the problem is that i now have the sum of the px_ym1 data (the sum of >>>> mn[,1]) >>>> >>>> hopefully things are now clearer - sorry to have wasted your time up >>>> until now. >>>> >>>> assuming that i have now made my situation clear, i am hope you can >>>> help with four specific questions. >>>> >>>> 1/ My data-sets are HUGE, so speed is an issue - is this the fastest >>>> way to sub-set and aggregate an xts? >>>> >>>> 2/ is there a way to do this for multiple splits? say a table for each >>>> minute, day, week, or month? the return would potentially be a list >>>> with a table for each day / minute etc showing volume traded at each >>>> price -- but it doesn't have to be a list ... >>>> >>>> i am writing a function with loops that would generate a table that >>>> reports volume traded at each price for each case of a specified time >>>> split (say for four tables, one for each minute in the example data, >>>> returned as a list). my solution is slow, it seems like something that >>>> someone would have done better already. is this the case? >>>> >>>> 3/ is there a way to do the sub-setting with templated variables? i >>>> would like to obtain the table i get with the named aggregate >>>> functions (reproduced above) with multiple data frames, as the column >>>> names will differ from time to time. i cannot figure out how to stop >>>> the command from summing the mn[,1] column when i stop using variable >>>> names. >>>> >>>> 4/ on a related note, is it possible to apply different functions to >>>> different columns of data? It would be nice, for example, if the table >>>> returned from an aggregate command could be made to be: >>>> >>>> px_ym1 count vol_ym1 >>>> 1 97.90 11 408 >>>> 2 97.89 10 208 >>>> >>>> where we have the price traded, the number of trades (a count of >>>> px_ym1 / mn[,1], and the sum of vol_ym1 (mn[,2]). >>>> >>>> thanks and best regards >>>> >>>> matt johnson >>>> >>>> On 13 June 2012 15:06, David Winsemius <dwinsem...@comcast.net> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> On Jun 12, 2012, at 11:32 PM, Matthew Johnson wrote: >>>>> >>>>>> Dear R-help, >>>>>> >>>>>> I have an xts data set that i have subset by date. >>>>>> >>>>>> now it contains a date-time-stamp, and two columns (price and volume >>>>>> traded): my objective is to create tables of volume traded at a price >>>>>> - >>>>>> and >>>>>> i've been successfully using aggregate to do so in interactive use. >>>>>> >>>>>> say the data looks as follows: >>>>>> >>>>>> px_ym1 vol_ym1 >>>>>> 2012-06-01 09:37:00 97.91 437 >>>>>> 2012-06-01 09:37:00 97.91 64 >>>>>> 2012-06-01 09:37:00 97.91 1 >>>>>> 2012-06-01 09:37:00 97.91 5 >>>>>> 2012-06-01 09:37:00 97.91 5 >>>>>> 2012-06-01 09:37:00 97.92 174 >>>>>> 2012-06-01 09:37:00 97.92 64 >>>>>> 2012-06-01 09:37:00 97.92 125 >>>>>> 2012-06-01 09:37:00 97.92 124 >>>>>> 2012-06-01 09:37:00 97.92 64 >>>>>> 2012-06-01 09:37:00 97.92 109 >>>>>> 2012-06-01 09:37:00 97.92 64 >>>>>> 2012-06-01 09:37:00 97.92 19 >>>>>> 2012-06-01 09:37:00 97.92 45 >>>>>> 2012-06-01 09:37:00 97.92 75 >>>>>> 2012-06-01 09:37:00 97.92 3 >>>>>> 2012-06-01 09:37:00 97.92 47 >>>>>> 2012-06-01 09:37:00 97.91 26 >>>>>> 2012-06-01 09:37:00 97.92 4 >>>>>> 2012-06-01 09:37:00 97.92 1 >>>>>> >>>>>> the the following gives me what i'm looking for: >>>>>> >>>>>>> adf <- aggregate(.~px_ym1, data=mm, sum) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> which is this table: >>>>>> >>>>>> px_ym1 vol_ym1 >>>>>> 1 97.91 538 >>>>>> 2 97.92 918 >>>>>> >>>>>> however now i'm trying to code it to run automatically, and use of the >>>>>> templated version: >>>>>> >>>>>>> adf <- aggregate(.~mm[,1], data=mm, sum) >>>>> >>>>> >>>>> >>>>> >>>>> Did you try: >>>>> >>>>> adf <- aggregate(mm[,-1] ~ mm[,1], data=mm, sum) >>>>> adf >>>>> >>>>> I would have used names: >>>>> >>>>> adf <- aggregate(vol_ym1 ~ px_ym1, data=mm, sum) >>>>> adf >>>>> >>>>> >>>>> >>>>>> yields the following - which contains what i'd like, but is has also >>>>>> summed >>>>>> across the price column (not ideal). >>>>>> >>>>>> px_ym1 px_ym1 vol_ym1 >>>>>> 1 97.91 587.46 538 >>>>>> 2 97.92 1370.88 918 >>>>>> >>>>>> how do i code this so that i can enter an xts data-frame with >>>>>> arbitrary >>>>>> names and still obtain the table with only the information i desire? >>>>> >>>>> >>>>> >>>>> >>>>> That is too far to the vague side of the vague-specific continuum. >>>>> >>>>> >>>>>> >>>>>> on a related point, is there a way to combine the two steps? >>>>> >>>>> >>>>> >>>>> >>>>> Er, which two steps would that be? >>>>> >>>>> >>>>>> the function >>>>>> i've written splits by date and then returns a list containing >>>>>> data-frames >>>>>> that report the volume traded at each price on each date >>>>>> >>>>>> - am i re-creating the wheel here? is there canned function that does >>>>>> this? >>>>>> >>>>>> thanks + best regards >>>>>> >>>>>> matt johnson >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> David Winsemius, MD >>>>> West Hartford, CT >>>>> >>> >>> David Winsemius, MD >>> West Hartford, CT >>> > > David Winsemius, MD > West Hartford, CT > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.