Re: [R] How numerical data is stored inside ts time series objects

2015-04-22 Thread Paul
William Dunlap  tibco.com> writes:
> I think we can call this a bug in stl().

I used what I learned from the responses to this thread, I looked at
the code for stl.  As they say in Microsoft, "this is expected
behaviour" according to the code.  And it doesn't look like an
inadvertent coding oversight.
---
Martin Maechler  lynne.stat.math.ethz.ch> writes:
>> Paul   gmail.com> Interesting that a 2D matrix
>> of size Nx1 is treated as a different animal from a length N
>> vector.  It's a departure from math convention, and from what I'm
>> accustomed to in Matlab.
>
> The vector space  |R^n  is not all the same space as the space
> |R^{n x 1} even though of course there's a trivial mapping between
> the objects (and the metrics) of the two.  A vector *is NOT* a
> matrix -- but in some matrix calculus notations there is a
> convention to *treat* n-vectors as  (n x 1) matrices.
>
> Good linear algebra teaching does distinguish vectors from
> one-column or one-row matrices -- I'm sure still the case in all
> good math departments around the globe -- but maybe not in math
> teaching to engineers and others who only need applied math.  Yes,
> linear algebra teaching will also make a point that in the usual
> matrix product notations, it is convenient and useful to treat
> vectors as if they were 1-column matrices.

The distinction in math is new me, with academic training in
engineering, even at the post grad level.  I haven't seen the
distinction in the math for Comp. Sci., either, and that's in the meat
grinder of Canada.  Admittedly, it's not quite as geeky as some meat
grinders in other countries.  And admittedly, I only took C.S. courses
that were geared to applications.  So I had always considered such a
distinction to a practicality in coding implementation of
vector/matrix classes, e.g., in C, a vector being a single pointer to
a number, while in a 2D array is a pointer to a vector and hence a
different type.

>> That R's vector seems more akin to a list, where the notion of
>> orientation doesn't apply.
>
> Sorry, but again:  not at all in the sense 'list's are used in R.

No need to apologize.  To clarify, being new to R, I was referring to
the general use of the term "list".  Specifically, I was referring to
an ordered collection without orientation, so it is consistent with
what you say above about distinguishing between length N vectors vs.
2D matrices of size Nx1 or 1xN.

> Fortunately, well thought out languages such as S, R, Julia, Python,
> all do make a good distinction between vectors and matrices i.e. 1D
> and 2D arrays.  If Matlab still does not do that, it's just another
> sign that Matlab users should flee and start using julia or R or
> python.

Matlab pretty well only deals with 2D arrays, some of which have size
Nx1 or 1xN.  I haven't seen an example of a 1-D data structure that
doesn't have an orientation, implied or otherwise.  Though of course,
if someone proves me wrong, then I stand corrected (and smarter
because of it).

>  {and well yes, we could start bitchering about S' and hence R's
>  distinction between a 1D array and a vector ... which I think has
>  been a clear design error... but that's not the topic here}

Big fan of python's readability, though I've only dabbled.  And
I won't start bitchering about R & S cuz I'm a newcomer and it's all
an eye popping wonderland.
---
David R Forrest  vims.edu> writes:
> The details of how str() represents your x and y variables is within
> the utils::stl.default() function.  You can hunt this down and see

I'm assuming that you meant utils.str.default() above.  I can follow
the rest of your post makes sense if I make that assumption.

I snipped the majority of your response because I'm not responding to
anything specific.  However, it was an extremely educational post.
Thank you for that.

> Also, Matlab sometimes needs a squeeze() to drop degenerate
> dimensions, and R's drop() is similar, and is less-black-magic
> looking than the [[1]] code:
>
> > str(drop(x))
>  Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318
>  1260 1120 963 ...
> > str(drop(y))
>  Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318
>  1260 1120 963 ...
>
> stl(drop(x),s.window='per')
> stl(drop(y),s.window='per')
>
> Maybe str.default() should do Time-Series interpretation of is.ts()
> objects for matrices as well as vectors.

I'm assuming that you mean stl(), since str() already works on both?
Maybe it's the version I have, however, but I find that the R code for
stl() doesn't have have a section for is.ts().  Instead, it seems to
run through a series of checks for pathological input, with the check
for matrix data consisting of is.matrix(na.action(as.ts(x))), where x
is the time series.  Somehow, the fact that the na.action(time series
argument) returns a matrix implies that the time series data is a
matrix rather than a vector.  In attempting t

Re: [R] How numerical data is stored inside ts time series objects

2015-04-22 Thread David R Forrest

> On Apr 21, 2015, at 9:39 PM, Paul  wrote:
...
> I rummaged around the help files for str, summary, dput, args.  This
> seems like a more complicated language than Matlab, VBA, or even C++'s
> STL of old (which was pretty thoroughly documented).  A function like
> str() returns an object description, and I'm guessing the conventions
> with which the object is described depends a lot on the person who
> wrote the handling code for the class.  The description for the
> variable y seems particularly elaborate.
> 
> Would I be right in assuming that the notation is ad-hoc and not
> documented?  For example, the two invocations str(x) and str(y) show a
> Time-Series and a ts.  And there are many lines of output for str(y)
> that is heavy in punctuation.
> 

The details of how str() represents your x and y variables is within the 
utils::stl.default() function.  You can hunt this down and see the code with:

  methods(class=class(x))  # Find the class-specific handlers -- no str()
  methods(str) # Find the methods for the generic
  getAnywhere(str.default)   # or getFromNamespace('str.default','utils')
  

Within the utils::str.default code, this 'Time-Series' specific code only 
triggers if the object doesn't match a long list of other items (for example: 
is.function(), is.list(), is.vector(object) || (is.array(object) && 
is.atomic(object)) ...)   

else if (stats::is.ts(object)) {
tsp.a <- stats::tsp(object)
str1 <- paste0(" Time-Series ", le.str, " from ", 
format(tsp.a[1L]), " to ", format(tsp.a[2L]), 
":")
std.attr <- c("tsp", "class")
}

This handling is not dependent on who wrote the ts class, but on who wrote the 
str.default function.  

A more explict way to look at the difference without the str() summarization is 
with dput(x) and dput(y):

> dput(x)
structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 
1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 
1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 
1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Tsp = c(1, 
3.916667, 12), class = "ts")
> dput(y)
structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 
1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 
1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 
1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Dim = c(36L, 
1L), .Dimnames = list(NULL, "V1"), .Tsp = c(1, 3.916667, 
12), class = "ts")


Also, Matlab sometimes needs a squeeze() to drop degenerate dimensions, and R's 
drop() is similar, and is less-black-magic looking than the [[1]] code:


> str(drop(x))
 Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 
963 ...
> str(drop(y))
 Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 
963 ...

stl(drop(x),s.window='per')
stl(drop(y),s.window='per') 

Maybe str.default() should do Time-Series interpretation of is.ts() objects for 
matrices as well as vectors.

Dave

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-22 Thread Martin Maechler
> Paul  
> on Wed, 22 Apr 2015 01:39:16 + writes:

> William Dunlap  tibco.com> writes:
>> Use the str() function to see the internal structure of most
>> objects.  In your case it would show something like:
>> 
>> > Data <- data.frame(theData=round(sin(1:38),1))
>> > x <- ts(Data[[1]], frequency=12) # or Data[,1]
>> > y <- ts(Data, frequency=12)
>> > str(x)
>> Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -
> 0.5
>> ...
>> > str(y)
>> ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...
>> - attr(*, "dimnames")=List of 2
>> ..$ : NULL
>> ..$ : chr "theData"
>> - attr(*, "tsp")= num [1:3] 1 4.08 12
>> 
>> 'x' contains a vector of data and 'y' contains a 1-column matrix of
>> data.  stl(x,"per") and stl(y, "per") give similar results as you
>> got.
>> 
>> Evidently, stl() does not know that 1-column matrices can be treated
>> much the same as vectors and gives an error message.  Thus you must
>> extract the one column into a vector: stl(y[,1], "per").

> Thanks, William.

> Interesting that a 2D matrix of size Nx1 is treated as a different
> animal from a length N vector.  It's a departure from math convention,
> and from what I'm accustomed to in Matlab.  

Ha -- Not at all!
The above is exactly the misconception I have been fighting --
mostly in vane -- for years.

Matlab's convention of treating a vector as an  N x 1 matrix is
a BIG confusion to much of math teaching :

The vector space  |R^n  is not all the same space as the space  |R^{n x 1}
even though of course there's a trivial mapping between the
objects (and the metrics) of the two.
A vector *is NOT* a matrix -- but in some matrix calculus
notations there is a convention to *treat* n-vectors as  (n x 1) matrices.

Good linear algebra teaching does distinguish vectors from
one-column or one-row matrices -- I'm sure still the case in all
good math departments around the globe -- but maybe not in math
teaching to engineers and others who only need applied math.
Yes, linear algebra teaching will also make a point that in
the usual matrix product notations, it is convenient and useful to treat
vectors as if they were 1-column matrices.

> That R's vector seems
> more akin to a list, where the notion of orientation doesn't apply.

Sorry, but again:  not at all in the sense 'list's are used in R.

Fortunately, well thought out languages such as S, R, Julia, Python,
all do make a good distinction between vectors and matrices
i.e. 1D and 2D arrays.  If Matlab still does not do that, it's
just another sign that Matlab users should flee and start using julia
or R or python.

  {and well yes, we could start bitchering about S' and hence R's distinction
   between a 1D array and a vector ... which I think has been a
   clear design error... but that's not the topic here}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-21 Thread William Dunlap
> Interesting that a 2D matrix of size Nx1 is treated as a different
> animal from a length N vector.

I think we can call this a bug in stl().

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Apr 21, 2015 at 6:39 PM, Paul  wrote:

> William Dunlap  tibco.com> writes:
> > Use the str() function to see the internal structure of most
> > objects.  In your case it would show something like:
> >
> > > Data <- data.frame(theData=round(sin(1:38),1))
> > > x <- ts(Data[[1]], frequency=12) # or Data[,1]
> > > y <- ts(Data, frequency=12)
> > > str(x)
> >  Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -
> 0.5
> > ...
> > > str(y)
> >  ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...
> >  - attr(*, "dimnames")=List of 2
> >   ..$ : NULL
> >   ..$ : chr "theData"
> >  - attr(*, "tsp")= num [1:3] 1 4.08 12
> >
> > 'x' contains a vector of data and 'y' contains a 1-column matrix of
> > data.  stl(x,"per") and stl(y, "per") give similar results as you
> > got.
> >
> > Evidently, stl() does not know that 1-column matrices can be treated
> > much the same as vectors and gives an error message.  Thus you must
> > extract the one column into a vector: stl(y[,1], "per").
>
> Thanks, William.
>
> Interesting that a 2D matrix of size Nx1 is treated as a different
> animal from a length N vector.  It's a departure from math convention,
> and from what I'm accustomed to in Matlab.  that R's vector seems
> more akin to a list, where the notion of orientation doesn't apply.
>
> I rummaged around the help files for str, summary, dput, args.  This
> seems like a more complicated language than Matlab, VBA, or even C++'s
> STL of old (which was pretty thoroughly documented).  A function like
> str() returns an object description, and I'm guessing the conventions
> with which the object is described depends a lot on the person who
> wrote the handling code for the class.  The description for the
> variable y seems particularly elaborate.
>
> Would I be right in assuming that the notation is ad-hoc and not
> documented?  For example, the two invocations str(x) and str(y) show a
> Time-Series and a ts.  And there are many lines of output for str(y)
> that is heavy in punctuation.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-21 Thread Paul
William Dunlap  tibco.com> writes:
> Use the str() function to see the internal structure of most
> objects.  In your case it would show something like:
>
> > Data <- data.frame(theData=round(sin(1:38),1))
> > x <- ts(Data[[1]], frequency=12) # or Data[,1]
> > y <- ts(Data, frequency=12)
> > str(x)
>  Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -
0.5
> ...
> > str(y)
>  ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...
>  - attr(*, "dimnames")=List of 2
>   ..$ : NULL
>   ..$ : chr "theData"
>  - attr(*, "tsp")= num [1:3] 1 4.08 12
>
> 'x' contains a vector of data and 'y' contains a 1-column matrix of
> data.  stl(x,"per") and stl(y, "per") give similar results as you
> got.
>
> Evidently, stl() does not know that 1-column matrices can be treated
> much the same as vectors and gives an error message.  Thus you must
> extract the one column into a vector: stl(y[,1], "per").

Thanks, William.

Interesting that a 2D matrix of size Nx1 is treated as a different
animal from a length N vector.  It's a departure from math convention,
and from what I'm accustomed to in Matlab.  that R's vector seems
more akin to a list, where the notion of orientation doesn't apply.

I rummaged around the help files for str, summary, dput, args.  This
seems like a more complicated language than Matlab, VBA, or even C++'s
STL of old (which was pretty thoroughly documented).  A function like
str() returns an object description, and I'm guessing the conventions
with which the object is described depends a lot on the person who
wrote the handling code for the class.  The description for the
variable y seems particularly elaborate.

Would I be right in assuming that the notation is ad-hoc and not
documented?  For example, the two invocations str(x) and str(y) show a
Time-Series and a ts.  And there are many lines of output for str(y)
that is heavy in punctuation.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-20 Thread William Dunlap
Use the str() function to see the internal structure of most objects.  In
your case it would show something like:

> Data <- data.frame(theData=round(sin(1:38),1))
> x <- ts(Data[[1]], frequency=12) # or Data[,1]
> y <- ts(Data, frequency=12)
> str(x)
 Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5
...
> str(y)
 ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr "theData"
 - attr(*, "tsp")= num [1:3] 1 4.08 12

'x' contains a vector of data and 'y' contains a 1-column matrix of data.
stl(x,"per") and stl(y, "per") give similar results as you got.

Evidently, stl() does not know that 1-column matrices can be treated much
the same as vectors and gives an error message.  Thus you must extract
the one column into a vector: stl(y[,1], "per").




Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Apr 20, 2015 at 4:04 PM, Paul  wrote:

> I'm getting familiar with the stl function in the stats packcage by
> trying it on an example from Brockwell & Davis's 2002 "Introduction to
> Times Series and Forcasting".  Specifically, I'm using a subset of his
> red wine sales data.  It's a detour from the stl material at
> http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm (at some point, I
> have to stop simply following and try to make it work with new data).
>
> I need a minimum of 36 wine sales data points in the series, since stl
> otherwise complains about the data being less than 2 cycles.  The data
> is in ~/tmp/wine.txt:
>
> 464
> 675
> 703
> 887
> 1139
> 1077
> 1318
> 1260
> 1120
> 963
> 996
> 960
> 530
> 883
> 894
> 1045
> 1199
> 1287
> 1565
> 1577
> 1076
> 918
> 1008
> 1063
> 544
> 635
> 804
> 980
> 1018
> 1064
> 1404
> 1286
> 1104
> 999
> 996
> 1015
>
> My sourced test code is buried in a repeat loop so that I can use a
> break command to circumvent the final error-causing statement that I'm
> trying to figure out:
>
> repeat{
>
> # Clear variables (from stackexchange)
> rm( list=setdiff( ls( all.names=TRUE ), lsf.str(all.names=TRUE ) )
> )
> ls()
>
> head( wine <- read.table("~/tmp/wine.txt") )
> ( x <- ts(wine[[1]],frequency=12) )
> ( y <- ts(wine,frequency=12) )
> ( a=stl(x,"per") )
> #break
> ( b=stl(y,"per") )
> }
>
> The final statement causes the error 'Error in stl(y, "per") : only
> univariate series are allowed'.  I found an explanation at
> http://stackoverflow.com/questions/10492155/time-series-and-stl-in-r-error-
> only-univariate-series-are-allowed.
> That's how I came up with the assignment to x using wine[[1]].  I
> found an explanation to the need for
> double square brackets at
> http://www.r-tutor.com/r-introduction/list/named-list-members.
>
> My problem is that it's not very clear what is happening inside the ts
> structures x and y.  If I simply print them, they look 100% identical:
>
> | > x
> |Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
> | 1  464  675  703  887 1139 1077 1318 1260 1120  963  996  960
> | 2  530  883  894 1045 1199 1287 1565 1577 1076  918 1008 1063
> | 3  544  635  804  980 1018 1064 1404 1286 1104  999  996 1015
> | > y
> |Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
> | 1  464  675  703  887 1139 1077 1318 1260 1120  963  996  960
> | 2  530  883  894 1045 1199 1287 1565 1577 1076  918 1008 1063
> | 3  544  635  804  980 1018 1064 1404 1286 1104  999  996 1015
>
> Whatever their differences, it's not causing R to misinterpret the
> data; that is, they each look like in single series of numerical data.
>
> Can anyone illuminate the difference in the data inside the ts data
> structures?  The potential incompatibility with stl is just one
> symptom.  Right now, the "solution" is black magic to me, and I would
> like to get a clearer picture so that I know when else (and how) to
> watch out for this.
>
> I've posted this to the R Help mailing list
> http://news.gmane.org/gmane.comp.lang.r.general and to stackoverflow
> at
> http://stackoverflow.com/questions/29759928/how-numerical-data-is-stored-
> inside-ts-time-series-objects.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.