Jeff, Clearly you (and others) have put a lot of work into xts -- and I'm the beneficiary. So I'll stop complaining.
Thanks for the class (both code and explanation). *-- Russ * On Sun, May 8, 2011 at 8:23 PM, Jeff Ryan <jeff.a.r...@gmail.com> wrote: > Hi Russ, > > We're of course getting into some incredibly fine-level detail on how all > of this works. I'll try and explain issues as I recall them over the > development of xts and cbind.xts > > xts started as an extension of zoo. zoo is an extension of 'ts' (greatly > simplified comparison of course, but stay with me) > > Achim and Gabor have put tremendous effort into the design of zoo - with a > primary focus on keeping it consistent with base R behavior. That is, try > not to introduce unnecessary changes to the interface an R user is > accustomed to. The logic being that this makes for a more consistent > interface as well as a easier learning curve and hence greater/faster > adoption rate. > > 'xts' extends this, though with a bit more flexibility in terms of > consistency. Why? Simply put - some things about R annoyed me coming from a > time-series background. Number one was the fact that lag() is backwards. > Backwards from expectation, nearly all literature, and all standard > definitions. So xts breaks with lag(, n=1) behavior. This is obviously > confusing to some - but was the gamble I was willing to take - consistency > (with R) be damned! ;-) > > So, now back to cbind. cbind and merge in zoo-land (and xts by extension) > are the same. This isn't the case for other classes that use these - but > that is 'allowable' and 'expected' under a class dispatch system. The docs > for ?cbind state: > > For cbind (rbind) the column (row) names are taken from the > colnames (rownames) of the arguments if these are matrix-like. > Otherwise from the names of the arguments or where those are not > supplied and deparse.level > 0, by deparsing the expressions > given, for deparse.level = 1 only if that gives a sensible name > (a symbol, see is.symbol). > > Based on that, I'd argue that xts does it "right". Of course I'll also > point out that this is incorrect thinking as well - since this is a > description for the generic - and not for xts. But again in a highly > configurable object/class system, where you start to make a distinction of > right and wrong is itself up for debate. > > At the other end of the argument spectrum is _why not_. That is, why can't > cbind.xts handle the names to replace the colnames of objects passed in. > Here is where I'll point out that I am really just going by memory. > > Three major items are involved in cbind. One is that dispatch is quite > unlike nearly every other dispatch in R. This is a fact - nothing to do > with xts. > > * cbind isn't a generic (it's an .Internal call) > * it uses ... > * cbind can be called in numerous ways (I'll list only the common ones - > but with R you can do even crazier things) > > do.call(cbind, > do.call(cbind.xts, > cbind, > cbind.xts, > merge, > merge.xts, > do.call(merge, > do.call(merge.xts > > The rules of dispatch on cbind are really at a level that R-help has no > business discussing. The second part is where things actually get tricky > though. They all behave differently with respect to how args are handled - > when eval'd, etc. > > I'm sure you have read how R strains itself on 'big data'. This is true > and false. Improper use (or just naive use) can cause object copies in > places you really don't want. Much of xts at this point is implemented in > custom C code. The gain here is that you can make it eas(ier) to avoid > copies until you need them by writing in C. Obvious, but needs to be said. > > To figure out what the columns have - and if names are attached to the > objects in the pairlist (the "..." in this context) - you have to be very > careful. Touch anything in the wrong place or wrong time and you lose a > figurative arm and leg to memory copies. So, in 99.9999% of cases - where > you aren't naming (which would be an extra feature above and beyond c(olumn) > binding [the reason for cbind] - you run a very real risk of getting nailed > for copies you don't want. On 10MM obs that is almost manageable. On 100's > of millions or billions - it is kill -9 time. > > To compound the issue - recall all of those different dispatch methods. > Yep - they all behave just a bit differently. How? Honestly - I don't > know or care. I simply know you can't easily make the behavior consistent > amongst those calls. I have tried. And tried. > > End of day, and a very long R-help email, xts is different than base R. It > is even different than it's 'parent' zoo behavior. But in exchange for this > difference (and bit of learning/adjustment) you get a class that is faster > than anything else. > > Period. > > > x <- .xts(1:1e7, 1:1e7) # our time series object > > m <- coredata(x) # a matrix > > > str(x) > An xts object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing: > Data: int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ... > Indexed by objects of class: [POSIXt,POSIXct] TZ: America/Chicago > xts Attributes: > NULL > > > str(m) > int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ... > > > system.time(x[,1]) # get the first column > user system elapsed > 0.017 0.000 0.017 > > system.time(m[,1]) # ditto > user system elapsed > 0.152 0.000 0.153 > > Yep, nearly 10x faster than a matrix op - AND you still have the time > index. To get there you need to sometimes make sacrifices. xts does, though > I like to think they are well thought out and consistent* > > *enough ;-) > > Best, > Jeff > > > On Sun, May 8, 2011 at 8:57 PM, Joshua Ulrich <josh.m.ulr...@gmail.com>wrote: > >> Russ, >> >> On May 8, 2011 6:29 PM, "Russ Abbott" <russ.abb...@gmail.com> wrote: >> > >> > Hi Jeff, >> > >> > The xts class has some very nice features, and you have done a valuable >> > service in developing it. >> > >> > My primary frustration is how difficult it seems to be to find out what >> went >> > wrong when my code doesn't work. I've been writing quite sophisticated >> code >> > for a fairly long time. It's not that I'm new to software development. >> > >> > The column name rule is a good example. I'm willing to live with the >> rule >> > that column names are not changed for efficiency sake. What's difficult >> for >> > me is that I never saw that rule anywhere before. Of course, I'm not an >> R >> > expect. I've been using it for only a couple of months. But still, I >> would >> > have expected to run into a rule like that. >> > >> > Worse, since the rule is in conflict with the explicit intent of >> cbind--one >> > can name columns when using cbind; in fact the examples illustrate how >> to do >> > it--it would really be nice of cbind would issue a warning when one >> attempts >> > to rename a column in violation of that rule. Instead, cbind is silent, >> > giving no hint about what went wrong. >> > >> Naming columns is not the explicit intent of cbind. The explicit >> intent is to combine objects by columns. Please don't overstate the >> case. >> >> While the examples for the generic show naming columns, neither >> ?cbind.zoo or ?cbind.xts have such examples. That's a hint. >> >> > It's those sorts of things that have caused me much frustration. And >> it's >> > these sorts of things that seem pervasive in R. One never knows what >> one is >> > dealing with. Did something not work because there is a special case >> rule >> > that I haven't heard of? Did it not work because a special convenience >> was >> > programmed into a function in a way that conflicted with normal use? >> Since >> > these sorts of things seem to come up so often, I find myself feeling >> that >> > there is no good way to track down problems, which leads to a sense of >> > helplessness and confusion. That's not what one wants in a programming >> > language. >> > >> If that's not what one wants, one can always write their own >> programming language. >> >> Seriously, it seems like you want to rant more than understand what's >> going on. You have the R and xts help pages and the source code. The >> "Note" section of help(cbind) tells you that the method dispatch is >> different. It even tells you what R source file to look at to see how >> dispatching is done. Compare the relevant source files from >> base::cbind and xts::cbind.xts, look at the "R Language Definition" >> manual to see how method dispatch is normally done. >> >> But you've been writing quite sophisticated code for a fairly long >> time, so I'm not telling you anything you don't know... you just don't >> think you should have to do the legwork. >> >> > -- Russ >> > >> > >> >> -- >> Joshua Ulrich | FOSS Trading: www.fosstrading.com >> >> >> >> > On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan <jeff.a.r...@gmail.com> >> wrote: >> > >> > > Hi Russ, >> > > >> > > Colnames don't get rewritten if they already exist. The reason is due >> to >> > > performance and how cbind is written at the R level. >> > > >> > > It isn't perfect per se, but the complexity and variety of dispatch >> that >> > > can take place for cbind in R, as it isn't a generic, is quite >> challenging >> > > to get to behave as one may hope. After years of trying I'd say it is >> > > nearly impossible to do what you want without causing horrible memory >> issues >> > > on non trivial objects they are use in production systems **using** >> xts on >> > > objects with billions of rows. Your simple case that has a simple >> > > workaround would cost everyone using in the other 99.999% of cases to >> pay a >> > > recurring cost that isn't tolerable. >> > > >> > > If this is frustrating to you you should stop using the class. >> > > >> > > Jeff >> > > >> > > Jeffrey Ryan | Founder | <jeffrey.r...@lemnica.com> >> > > jeffrey.r...@lemnica.com >> > > >> > > www.lemnica.com >> > > >> > > On May 8, 2011, at 2:07 PM, Russ Abbott <russ.abb...@gmail.com> >> wrote: >> > > >> > > I'm having troubles with the names of columns. >> > > >> > > quantmod deal with stock quotes. I've created an array of the first 5 >> > > closing prices from Jan 2007. (Is there a problem that the name is the >> same >> > > as the variable name? There shouldn't be.) >> > > >> > > > close >> > > >> > > close >> > > >> > > 2007-01-03 1416.60 >> > > >> > > 2007-01-04 1418.34 >> > > >> > > 2007-01-05 1409.71 >> > > >> > > 2007-01-08 1412.84 >> > > >> > > 2007-01-09 1412.11 >> > > >> > > >> > > When I try to create a more complex array by adding columns, the names >> get >> > > fouled up. Here's a simple example. >> > > >> > > > cbind(changed.close = close+1, zero = 0, close) >> > > >> > > close zero close.1 >> > > >> > > 2007-01-03 1417.60 0 1416.60 >> > > >> > > 2007-01-04 1419.34 0 1418.34 >> > > >> > > 2007-01-05 1410.71 0 1409.71 >> > > >> > > 2007-01-08 1413.84 0 1412.84 >> > > >> > > 2007-01-09 1413.11 0 1412.11 >> > > >> > > >> > > The first column should be called "changed.close", but it's called >> "close". >> > > The second column has the right name. The third column should be >> called >> > > "close" but it's called "close.1". Why is that? Am I missing >> something? >> > > >> > > If I change the order of the columns and let close have its original >> name, >> > > there is still a problem. >> > > >> > > > cbind(close, zero = 0, changed.close = close+1) >> > > >> > > close zero close.1 >> > > >> > > 2007-01-03 1416.60 0 1417.60 >> > > >> > > 2007-01-04 1418.34 0 1419.34 >> > > >> > > 2007-01-05 1409.71 0 1410.71 >> > > >> > > 2007-01-08 1412.84 0 1413.84 >> > > >> > > 2007-01-09 1412.11 0 1413.11 >> > > >> > > >> > > Now the names on the first two columns are ok, but the third column is >> > > still wrong. Again, why is that? Apparently it's not letting me >> assign a >> > > name to a column that comes from something that already has a name. >> Is that >> > > the way it should be? >> > > >> > > I don't get that same problem on a simpler example. >> > > >> > > >> > > > IX <- cbind(I=0, X=(1:3)) >> > > >> > > > IX >> > > >> > > I X >> > > >> > > [1,] 0 1 >> > > >> > > [2,] 0 2 >> > > >> > > [3,] 0 3 >> > > >> > > > cbind(Y = 1, Z = IX[, "I"], W = IX[, "X"]) >> > > >> > > Y Z W >> > > >> > > [1,] 1 0 1 >> > > >> > > [2,] 1 0 2 >> > > >> > > [3,] 1 0 3 >> > > >> > > >> > > Is this a peculiarity to xts objects? >> > > >> > > Thanks. >> > > >> > > *-- Russ * >> > > * >> > > * >> > > P.S. Once again I feel frustrated because it's taken me far more time >> than >> > > it deserves to track down and characterize this problem. I can fix it >> by >> > > using the names function. But I shouldn't have to do that. >> > > >> > > >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jeffrey Ryan > > jeffrey.r...@lemnica.com > > www.lemnica.com > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.