Jeff,

Clearly you (and others) have put a lot of work into xts -- and I'm the
beneficiary. So I'll stop complaining.

Thanks for the class (both code and explanation).

*-- Russ *



On Sun, May 8, 2011 at 8:23 PM, Jeff Ryan <jeff.a.r...@gmail.com> wrote:

> Hi Russ,
>
> We're of course getting into some incredibly fine-level detail on how all
> of this works.  I'll try and explain issues as I recall them over the
> development of xts and cbind.xts
>
> xts started as an extension of zoo.  zoo is an extension of 'ts' (greatly
> simplified comparison of course, but stay with me)
>
> Achim and Gabor have put tremendous effort into the design of zoo - with a
> primary focus on keeping it consistent with base R behavior.  That is, try
> not to introduce unnecessary changes to the interface an R user is
> accustomed to.  The logic being that this makes for a more consistent
> interface as well as a easier learning curve and hence greater/faster
> adoption rate.
>
> 'xts' extends this, though with a bit more flexibility in terms of
> consistency.  Why? Simply put - some things about R annoyed me coming from a
> time-series background.  Number one was the fact that lag() is backwards.
>  Backwards from expectation, nearly all literature, and all standard
> definitions.  So xts breaks with lag(, n=1) behavior.  This is obviously
> confusing to some - but was the gamble I was willing to take - consistency
> (with R) be damned! ;-)
>
> So, now back to cbind.  cbind and merge in zoo-land (and xts by extension)
> are the same. This isn't the case for other classes that use these - but
> that is 'allowable' and 'expected' under a class dispatch system.  The docs
> for ?cbind state:
>
> For ‘cbind’ (‘rbind’) the column (row) names are taken from the
>      ‘colnames’ (‘rownames’) of the arguments if these are matrix-like.
>      Otherwise from the names of the arguments or where those are not
>      supplied and ‘deparse.level > 0’, by deparsing the expressions
>      given, for ‘deparse.level = 1’ only if that gives a sensible name
>      (a ‘symbol’, see ‘is.symbol’).
>
> Based on that, I'd argue that xts does it "right". Of course I'll also
> point out that this is incorrect thinking as well - since this is a
> description for the generic - and not for xts.  But again in a highly
> configurable object/class system, where you start to make a distinction of
> right and wrong is itself up for debate.
>
> At the other end of the argument spectrum is _why not_.  That is, why can't
> cbind.xts handle the names to replace the colnames of objects passed in.
>  Here is where I'll point out that I am really just going by memory.
>
> Three major items are involved in cbind.  One is that dispatch is quite
> unlike nearly every other dispatch in R.  This is a fact - nothing to do
> with xts.
>
> *  cbind isn't a generic (it's an .Internal call)
> *  it uses ...
> *  cbind can be called in numerous ways (I'll list only the common ones -
> but with R you can do even crazier things)
>
>    do.call(cbind,
>    do.call(cbind.xts,
>    cbind,
>    cbind.xts,
>    merge,
>    merge.xts,
>    do.call(merge,
>    do.call(merge.xts
>
> The rules of dispatch on cbind are really at a level that R-help has no
> business discussing.  The second part is where things actually get tricky
> though.  They all behave differently with respect to how args are handled -
> when eval'd, etc.
>
> I'm sure you have read how R strains itself on 'big data'.  This is true
> and false.  Improper use (or just naive use) can cause object copies in
> places you really don't want.  Much of xts at this point is implemented in
> custom C code. The gain here is that you can make it eas(ier) to avoid
> copies until you need them by writing in C.  Obvious, but needs to be said.
>
> To figure out what the columns have - and if names are attached to the
> objects in the pairlist (the "..." in this context) - you have to be very
> careful.  Touch anything in the wrong place or wrong time and you lose a
> figurative arm and leg to memory copies.  So, in 99.9999% of cases - where
> you aren't naming (which would be an extra feature above and beyond c(olumn)
> binding [the reason for cbind] - you run a very real risk of getting nailed
> for copies you don't want.  On 10MM obs that is almost manageable. On 100's
> of millions or billions - it is kill -9 time.
>
> To compound the issue - recall all of those different dispatch methods.
>  Yep - they all behave just a bit differently.  How?  Honestly - I don't
> know or care.  I simply know you can't easily make the behavior consistent
> amongst those calls.  I have tried. And tried.
>
> End of day, and a very long R-help email, xts is different than base R.  It
> is even different than it's 'parent' zoo behavior.  But in exchange for this
> difference (and bit of learning/adjustment) you get a class that is faster
> than anything else.
>
> Period.
>
> > x <- .xts(1:1e7, 1:1e7)  # our time series object
> > m <- coredata(x)  # a matrix
>
> > str(x)
> An ‘xts’ object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing:
>   Data: int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ...
>   Indexed by objects of class: [POSIXt,POSIXct] TZ: America/Chicago
>   xts Attributes:
>  NULL
>
> > str(m)
>  int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ...
>
> > system.time(x[,1])  # get the first column
>    user  system elapsed
>   0.017   0.000   0.017
> > system.time(m[,1])  # ditto
>    user  system elapsed
>   0.152   0.000   0.153
>
> Yep, nearly 10x faster than a matrix op - AND you still have the time
> index. To get there you need to sometimes make sacrifices.  xts does, though
> I like to think they are well thought out and consistent*
>
> *enough ;-)
>
> Best,
> Jeff
>
>
> On Sun, May 8, 2011 at 8:57 PM, Joshua Ulrich <josh.m.ulr...@gmail.com>wrote:
>
>> Russ,
>>
>> On May 8, 2011 6:29 PM, "Russ Abbott" <russ.abb...@gmail.com> wrote:
>> >
>> > Hi Jeff,
>> >
>> > The xts class has some very nice features, and you have done a valuable
>> > service in developing it.
>> >
>> > My primary frustration is how difficult it seems to be to find out what
>> went
>> > wrong when my code doesn't work.  I've been writing quite sophisticated
>> code
>> > for a fairly long time. It's not that I'm new to software development.
>> >
>> > The column name rule is a good example.  I'm willing to live with the
>> rule
>> > that column names are not changed for efficiency sake.  What's difficult
>> for
>> > me is that I never saw that rule anywhere before.  Of course, I'm not an
>> R
>> > expect. I've been using it for only a couple of months. But still, I
>> would
>> > have expected to run into a rule like that.
>> >
>> > Worse, since the rule is in conflict with the explicit intent of
>> cbind--one
>> > can name columns when using cbind; in fact the examples illustrate how
>> to do
>> > it--it would really be nice of cbind would issue a warning when one
>> attempts
>> > to rename a column in violation of that rule.  Instead, cbind is silent,
>> > giving no hint about what went wrong.
>> >
>> Naming columns is not the explicit intent of cbind.  The explicit
>> intent is to combine objects by columns.  Please don't overstate the
>> case.
>>
>> While the examples for the generic show naming columns, neither
>> ?cbind.zoo or ?cbind.xts have such examples.  That's a hint.
>>
>> > It's those sorts of things that have caused me much frustration. And
>> it's
>> > these sorts of things that seem pervasive in R.  One never knows what
>> one is
>> > dealing with. Did something not work because there is a special case
>> rule
>> > that I haven't heard of? Did it not work because a special convenience
>> was
>> > programmed into a function in a way that conflicted with normal use?
>>  Since
>> > these sorts of things seem to come up so often, I find myself feeling
>> that
>> > there is no good way to track down problems, which leads to a sense of
>> > helplessness and confusion. That's not what one wants in a programming
>> > language.
>> >
>> If that's not what one wants, one can always write their own
>> programming language.
>>
>> Seriously, it seems like you want to rant more than understand what's
>> going on.  You have the R and xts help pages and the source code.  The
>> "Note" section of help(cbind) tells you that the method dispatch is
>> different.  It even tells you what R source file to look at to see how
>> dispatching is done.  Compare the relevant source files from
>> base::cbind and xts::cbind.xts, look at the "R Language Definition"
>> manual to see how method dispatch is normally done.
>>
>> But you've been writing quite sophisticated code for a fairly long
>> time, so I'm not telling you anything you don't know... you just don't
>> think you should have to do the legwork.
>>
>> > -- Russ
>> >
>> >
>>
>> --
>> Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
>>
>>
>>
>> > On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan <jeff.a.r...@gmail.com>
>> wrote:
>> >
>> > > Hi Russ,
>> > >
>> > > Colnames don't get rewritten if they already exist. The reason is due
>> to
>> > > performance and how cbind is written at the R level.
>> > >
>> > > It isn't perfect per se, but the complexity and variety of dispatch
>> that
>> > > can take place for cbind in R, as it isn't a generic, is quite
>> challenging
>> > > to get to behave as one may hope.  After years of trying I'd say it is
>> > > nearly impossible to do what you want without causing horrible memory
>> issues
>> > > on non trivial objects they are use in production systems **using**
>> xts on
>> > > objects with billions of rows.  Your simple case that has a simple
>> > > workaround would cost everyone using in the other 99.999% of cases to
>> pay a
>> > > recurring cost that isn't tolerable.
>> > >
>> > > If this is frustrating to you you should stop using the class.
>> > >
>> > > Jeff
>> > >
>> > > Jeffrey Ryan    |    Founder    |     <jeffrey.r...@lemnica.com>
>> > > jeffrey.r...@lemnica.com
>> > >
>> > > www.lemnica.com
>> > >
>> > > On May 8, 2011, at 2:07 PM, Russ Abbott <russ.abb...@gmail.com>
>> wrote:
>> > >
>> > > I'm having troubles with the names of columns.
>> > >
>> > > quantmod deal with stock quotes.  I've created an array of the first 5
>> > > closing prices from Jan 2007. (Is there a problem that the name is the
>> same
>> > > as the variable name? There shouldn't be.)
>> > >
>> > > > close
>> > >
>> > >              close
>> > >
>> > > 2007-01-03 1416.60
>> > >
>> > > 2007-01-04 1418.34
>> > >
>> > > 2007-01-05 1409.71
>> > >
>> > > 2007-01-08 1412.84
>> > >
>> > > 2007-01-09 1412.11
>> > >
>> > >
>> > > When I try to create a more complex array by adding columns, the names
>> get
>> > > fouled up.  Here's a simple example.
>> > >
>> > > > cbind(changed.close = close+1, zero = 0, close)
>> > >
>> > >              close zero close.1
>> > >
>> > > 2007-01-03 1417.60    0 1416.60
>> > >
>> > > 2007-01-04 1419.34    0 1418.34
>> > >
>> > > 2007-01-05 1410.71    0 1409.71
>> > >
>> > > 2007-01-08 1413.84    0 1412.84
>> > >
>> > > 2007-01-09 1413.11    0 1412.11
>> > >
>> > >
>> > > The first column should be called "changed.close", but it's called
>> "close".
>> > > The second column has the right name. The third column should be
>> called
>> > > "close" but it's called "close.1". Why is that? Am I missing
>> something?
>> > >
>> > > If I change the order of the columns and let close have its original
>> name,
>> > > there is still a problem.
>> > >
>> > > > cbind(close, zero = 0, changed.close = close+1)
>> > >
>> > >              close zero close.1
>> > >
>> > > 2007-01-03 1416.60    0 1417.60
>> > >
>> > > 2007-01-04 1418.34    0 1419.34
>> > >
>> > > 2007-01-05 1409.71    0 1410.71
>> > >
>> > > 2007-01-08 1412.84    0 1413.84
>> > >
>> > > 2007-01-09 1412.11    0 1413.11
>> > >
>> > >
>> > > Now the names on the first two columns are ok, but the third column is
>> > > still wrong. Again, why is that?  Apparently it's not letting me
>> assign a
>> > > name to a column that comes from something that already has a name.
>>  Is that
>> > > the way it should be?
>> > >
>> > > I don't get that same problem on a simpler example.
>> > >
>> > >
>> > > > IX <- cbind(I=0, X=(1:3))
>> > >
>> > >  > IX
>> > >
>> > >      I X
>> > >
>> > > [1,] 0 1
>> > >
>> > > [2,] 0 2
>> > >
>> > > [3,] 0 3
>> > >
>> > > > cbind(Y = 1, Z = IX[, "I"], W = IX[, "X"])
>> > >
>> > >      Y Z W
>> > >
>> > > [1,] 1 0 1
>> > >
>> > > [2,] 1 0 2
>> > >
>> > > [3,] 1 0 3
>> > >
>> > >
>> > > Is this a peculiarity to xts objects?
>> > >
>> > > Thanks.
>> > >
>> > > *-- Russ *
>> > > *
>> > > *
>> > > P.S. Once again I feel frustrated because it's taken me far more time
>> than
>> > > it deserves to track down and characterize this problem. I can fix it
>> by
>> > > using the names function. But I shouldn't have to do that.
>> > >
>> > >
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jeffrey Ryan
>
> jeffrey.r...@lemnica.com
>
> www.lemnica.com
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to