Re: [R] How numerical data is stored inside ts time series objects

2015-04-22 Thread David R Forrest

 On Apr 21, 2015, at 9:39 PM, Paul paul.domas...@gmail.com wrote:
...
 I rummaged around the help files for str, summary, dput, args.  This
 seems like a more complicated language than Matlab, VBA, or even C++'s
 STL of old (which was pretty thoroughly documented).  A function like
 str() returns an object description, and I'm guessing the conventions
 with which the object is described depends a lot on the person who
 wrote the handling code for the class.  The description for the
 variable y seems particularly elaborate.
 
 Would I be right in assuming that the notation is ad-hoc and not
 documented?  For example, the two invocations str(x) and str(y) show a
 Time-Series and a ts.  And there are many lines of output for str(y)
 that is heavy in punctuation.
 

The details of how str() represents your x and y variables is within the 
utils::stl.default() function.  You can hunt this down and see the code with:

  methods(class=class(x))  # Find the class-specific handlers -- no str()
  methods(str) # Find the methods for the generic
  getAnywhere(str.default)   # or getFromNamespace('str.default','utils')
  

Within the utils::str.default code, this 'Time-Series' specific code only 
triggers if the object doesn't match a long list of other items (for example: 
is.function(), is.list(), is.vector(object) || (is.array(object)  
is.atomic(object)) ...)   

else if (stats::is.ts(object)) {
tsp.a - stats::tsp(object)
str1 - paste0( Time-Series , le.str,  from , 
format(tsp.a[1L]),  to , format(tsp.a[2L]), 
:)
std.attr - c(tsp, class)
}

This handling is not dependent on who wrote the ts class, but on who wrote the 
str.default function.  

A more explict way to look at the difference without the str() summarization is 
with dput(x) and dput(y):

 dput(x)
structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 
1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 
1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 
1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Tsp = c(1, 
3.916667, 12), class = ts)
 dput(y)
structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 
1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 
1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 
1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Dim = c(36L, 
1L), .Dimnames = list(NULL, V1), .Tsp = c(1, 3.916667, 
12), class = ts)


Also, Matlab sometimes needs a squeeze() to drop degenerate dimensions, and R's 
drop() is similar, and is less-black-magic looking than the [[1]] code:


 str(drop(x))
 Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 
963 ...
 str(drop(y))
 Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 
963 ...

stl(drop(x),s.window='per')
stl(drop(y),s.window='per') 

Maybe str.default() should do Time-Series interpretation of is.ts() objects for 
matrices as well as vectors.

Dave

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-22 Thread Martin Maechler
 Paul  paul.domas...@gmail.com
 on Wed, 22 Apr 2015 01:39:16 + writes:

 William Dunlap wdunlap at tibco.com writes:
 Use the str() function to see the internal structure of most
 objects.  In your case it would show something like:
 
  Data - data.frame(theData=round(sin(1:38),1))
  x - ts(Data[[1]], frequency=12) # or Data[,1]
  y - ts(Data, frequency=12)
  str(x)
 Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -
 0.5
 ...
  str(y)
 ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...
 - attr(*, dimnames)=List of 2
 ..$ : NULL
 ..$ : chr theData
 - attr(*, tsp)= num [1:3] 1 4.08 12
 
 'x' contains a vector of data and 'y' contains a 1-column matrix of
 data.  stl(x,per) and stl(y, per) give similar results as you
 got.
 
 Evidently, stl() does not know that 1-column matrices can be treated
 much the same as vectors and gives an error message.  Thus you must
 extract the one column into a vector: stl(y[,1], per).

 Thanks, William.

 Interesting that a 2D matrix of size Nx1 is treated as a different
 animal from a length N vector.  It's a departure from math convention,
 and from what I'm accustomed to in Matlab.  

Ha -- Not at all!
The above is exactly the misconception I have been fighting --
mostly in vane -- for years.

Matlab's convention of treating a vector as an  N x 1 matrix is
a BIG confusion to much of math teaching :

The vector space  |R^n  is not all the same space as the space  |R^{n x 1}
even though of course there's a trivial mapping between the
objects (and the metrics) of the two.
A vector *is NOT* a matrix -- but in some matrix calculus
notations there is a convention to *treat* n-vectors as  (n x 1) matrices.

Good linear algebra teaching does distinguish vectors from
one-column or one-row matrices -- I'm sure still the case in all
good math departments around the globe -- but maybe not in math
teaching to engineers and others who only need applied math.
Yes, linear algebra teaching will also make a point that in
the usual matrix product notations, it is convenient and useful to treat
vectors as if they were 1-column matrices.

 That R's vector seems
 more akin to a list, where the notion of orientation doesn't apply.

Sorry, but again:  not at all in the sense 'list's are used in R.

Fortunately, well thought out languages such as S, R, Julia, Python,
all do make a good distinction between vectors and matrices
i.e. 1D and 2D arrays.  If Matlab still does not do that, it's
just another sign that Matlab users should flee and start using julia
or R or python.

  {and well yes, we could start bitchering about S' and hence R's distinction
   between a 1D array and a vector ... which I think has been a
   clear design error... but that's not the topic here}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-22 Thread Paul
William Dunlap wdunlap at tibco.com writes:
 I think we can call this a bug in stl().

I used what I learned from the responses to this thread, I looked at
the code for stl.  As they say in Microsoft, this is expected
behaviour according to the code.  And it doesn't look like an
inadvertent coding oversight.
---
Martin Maechler maechler at lynne.stat.math.ethz.ch writes:
 Paul  Paul.Domaskis at gmail.com Interesting that a 2D matrix
 of size Nx1 is treated as a different animal from a length N
 vector.  It's a departure from math convention, and from what I'm
 accustomed to in Matlab.

 The vector space  |R^n  is not all the same space as the space
 |R^{n x 1} even though of course there's a trivial mapping between
 the objects (and the metrics) of the two.  A vector *is NOT* a
 matrix -- but in some matrix calculus notations there is a
 convention to *treat* n-vectors as  (n x 1) matrices.

 Good linear algebra teaching does distinguish vectors from
 one-column or one-row matrices -- I'm sure still the case in all
 good math departments around the globe -- but maybe not in math
 teaching to engineers and others who only need applied math.  Yes,
 linear algebra teaching will also make a point that in the usual
 matrix product notations, it is convenient and useful to treat
 vectors as if they were 1-column matrices.

The distinction in math is new me, with academic training in
engineering, even at the post grad level.  I haven't seen the
distinction in the math for Comp. Sci., either, and that's in the meat
grinder of Canada.  Admittedly, it's not quite as geeky as some meat
grinders in other countries.  And admittedly, I only took C.S. courses
that were geared to applications.  So I had always considered such a
distinction to a practicality in coding implementation of
vector/matrix classes, e.g., in C, a vector being a single pointer to
a number, while in a 2D array is a pointer to a vector and hence a
different type.

 That R's vector seems more akin to a list, where the notion of
 orientation doesn't apply.

 Sorry, but again:  not at all in the sense 'list's are used in R.

No need to apologize.  To clarify, being new to R, I was referring to
the general use of the term list.  Specifically, I was referring to
an ordered collection without orientation, so it is consistent with
what you say above about distinguishing between length N vectors vs.
2D matrices of size Nx1 or 1xN.

 Fortunately, well thought out languages such as S, R, Julia, Python,
 all do make a good distinction between vectors and matrices i.e. 1D
 and 2D arrays.  If Matlab still does not do that, it's just another
 sign that Matlab users should flee and start using julia or R or
 python.

Matlab pretty well only deals with 2D arrays, some of which have size
Nx1 or 1xN.  I haven't seen an example of a 1-D data structure that
doesn't have an orientation, implied or otherwise.  Though of course,
if someone proves me wrong, then I stand corrected (and smarter
because of it).

  {and well yes, we could start bitchering about S' and hence R's
  distinction between a 1D array and a vector ... which I think has
  been a clear design error... but that's not the topic here}

Big fan of python's readability, though I've only dabbled.  And
I won't start bitchering about R  S cuz I'm a newcomer and it's all
an eye popping wonderland.
---
David R Forrest drf at vims.edu writes:
 The details of how str() represents your x and y variables is within
 the utils::stl.default() function.  You can hunt this down and see

I'm assuming that you meant utils.str.default() above.  I can follow
the rest of your post makes sense if I make that assumption.

I snipped the majority of your response because I'm not responding to
anything specific.  However, it was an extremely educational post.
Thank you for that.

 Also, Matlab sometimes needs a squeeze() to drop degenerate
 dimensions, and R's drop() is similar, and is less-black-magic
 looking than the [[1]] code:

  str(drop(x))
  Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318
  1260 1120 963 ...
  str(drop(y))
  Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318
  1260 1120 963 ...

 stl(drop(x),s.window='per')
 stl(drop(y),s.window='per')

 Maybe str.default() should do Time-Series interpretation of is.ts()
 objects for matrices as well as vectors.

I'm assuming that you mean stl(), since str() already works on both?
Maybe it's the version I have, however, but I find that the R code for
stl() doesn't have have a section for is.ts().  Instead, it seems to
run through a series of checks for pathological input, with the check
for matrix data consisting of is.matrix(na.action(as.ts(x))), where x
is the time series.  Somehow, the fact that the na.action(time series
argument) returns a matrix implies that the time series data is a
matrix rather than a vector.  In attempting to get insight, I found

Re: [R] How numerical data is stored inside ts time series objects

2015-04-21 Thread William Dunlap
 Interesting that a 2D matrix of size Nx1 is treated as a different
 animal from a length N vector.

I think we can call this a bug in stl().

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Apr 21, 2015 at 6:39 PM, Paul paul.domas...@gmail.com wrote:

 William Dunlap wdunlap at tibco.com writes:
  Use the str() function to see the internal structure of most
  objects.  In your case it would show something like:
 
   Data - data.frame(theData=round(sin(1:38),1))
   x - ts(Data[[1]], frequency=12) # or Data[,1]
   y - ts(Data, frequency=12)
   str(x)
   Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -
 0.5
  ...
   str(y)
   ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...
   - attr(*, dimnames)=List of 2
..$ : NULL
..$ : chr theData
   - attr(*, tsp)= num [1:3] 1 4.08 12
 
  'x' contains a vector of data and 'y' contains a 1-column matrix of
  data.  stl(x,per) and stl(y, per) give similar results as you
  got.
 
  Evidently, stl() does not know that 1-column matrices can be treated
  much the same as vectors and gives an error message.  Thus you must
  extract the one column into a vector: stl(y[,1], per).

 Thanks, William.

 Interesting that a 2D matrix of size Nx1 is treated as a different
 animal from a length N vector.  It's a departure from math convention,
 and from what I'm accustomed to in Matlab.  that R's vector seems
 more akin to a list, where the notion of orientation doesn't apply.

 I rummaged around the help files for str, summary, dput, args.  This
 seems like a more complicated language than Matlab, VBA, or even C++'s
 STL of old (which was pretty thoroughly documented).  A function like
 str() returns an object description, and I'm guessing the conventions
 with which the object is described depends a lot on the person who
 wrote the handling code for the class.  The description for the
 variable y seems particularly elaborate.

 Would I be right in assuming that the notation is ad-hoc and not
 documented?  For example, the two invocations str(x) and str(y) show a
 Time-Series and a ts.  And there are many lines of output for str(y)
 that is heavy in punctuation.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-21 Thread Paul
William Dunlap wdunlap at tibco.com writes:
 Use the str() function to see the internal structure of most
 objects.  In your case it would show something like:

  Data - data.frame(theData=round(sin(1:38),1))
  x - ts(Data[[1]], frequency=12) # or Data[,1]
  y - ts(Data, frequency=12)
  str(x)
  Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -
0.5
 ...
  str(y)
  ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...
  - attr(*, dimnames)=List of 2
   ..$ : NULL
   ..$ : chr theData
  - attr(*, tsp)= num [1:3] 1 4.08 12

 'x' contains a vector of data and 'y' contains a 1-column matrix of
 data.  stl(x,per) and stl(y, per) give similar results as you
 got.

 Evidently, stl() does not know that 1-column matrices can be treated
 much the same as vectors and gives an error message.  Thus you must
 extract the one column into a vector: stl(y[,1], per).

Thanks, William.

Interesting that a 2D matrix of size Nx1 is treated as a different
animal from a length N vector.  It's a departure from math convention,
and from what I'm accustomed to in Matlab.  that R's vector seems
more akin to a list, where the notion of orientation doesn't apply.

I rummaged around the help files for str, summary, dput, args.  This
seems like a more complicated language than Matlab, VBA, or even C++'s
STL of old (which was pretty thoroughly documented).  A function like
str() returns an object description, and I'm guessing the conventions
with which the object is described depends a lot on the person who
wrote the handling code for the class.  The description for the
variable y seems particularly elaborate.

Would I be right in assuming that the notation is ad-hoc and not
documented?  For example, the two invocations str(x) and str(y) show a
Time-Series and a ts.  And there are many lines of output for str(y)
that is heavy in punctuation.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-20 Thread William Dunlap
Use the str() function to see the internal structure of most objects.  In
your case it would show something like:

 Data - data.frame(theData=round(sin(1:38),1))
 x - ts(Data[[1]], frequency=12) # or Data[,1]
 y - ts(Data, frequency=12)
 str(x)
 Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5
...
 str(y)
 ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr theData
 - attr(*, tsp)= num [1:3] 1 4.08 12

'x' contains a vector of data and 'y' contains a 1-column matrix of data.
stl(x,per) and stl(y, per) give similar results as you got.

Evidently, stl() does not know that 1-column matrices can be treated much
the same as vectors and gives an error message.  Thus you must extract
the one column into a vector: stl(y[,1], per).




Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Apr 20, 2015 at 4:04 PM, Paul paul.domas...@gmail.com wrote:

 I'm getting familiar with the stl function in the stats packcage by
 trying it on an example from Brockwell  Davis's 2002 Introduction to
 Times Series and Forcasting.  Specifically, I'm using a subset of his
 red wine sales data.  It's a detour from the stl material at
 http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm (at some point, I
 have to stop simply following and try to make it work with new data).

 I need a minimum of 36 wine sales data points in the series, since stl
 otherwise complains about the data being less than 2 cycles.  The data
 is in ~/tmp/wine.txt:

 464
 675
 703
 887
 1139
 1077
 1318
 1260
 1120
 963
 996
 960
 530
 883
 894
 1045
 1199
 1287
 1565
 1577
 1076
 918
 1008
 1063
 544
 635
 804
 980
 1018
 1064
 1404
 1286
 1104
 999
 996
 1015

 My sourced test code is buried in a repeat loop so that I can use a
 break command to circumvent the final error-causing statement that I'm
 trying to figure out:

 repeat{

 # Clear variables (from stackexchange)
 rm( list=setdiff( ls( all.names=TRUE ), lsf.str(all.names=TRUE ) )
 )
 ls()

 head( wine - read.table(~/tmp/wine.txt) )
 ( x - ts(wine[[1]],frequency=12) )
 ( y - ts(wine,frequency=12) )
 ( a=stl(x,per) )
 #break
 ( b=stl(y,per) )
 }

 The final statement causes the error 'Error in stl(y, per) : only
 univariate series are allowed'.  I found an explanation at
 http://stackoverflow.com/questions/10492155/time-series-and-stl-in-r-error-
 only-univariate-series-are-allowed.
 That's how I came up with the assignment to x using wine[[1]].  I
 found an explanation to the need for
 double square brackets at
 http://www.r-tutor.com/r-introduction/list/named-list-members.

 My problem is that it's not very clear what is happening inside the ts
 structures x and y.  If I simply print them, they look 100% identical:

 |  x
 |Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
 | 1  464  675  703  887 1139 1077 1318 1260 1120  963  996  960
 | 2  530  883  894 1045 1199 1287 1565 1577 1076  918 1008 1063
 | 3  544  635  804  980 1018 1064 1404 1286 1104  999  996 1015
 |  y
 |Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
 | 1  464  675  703  887 1139 1077 1318 1260 1120  963  996  960
 | 2  530  883  894 1045 1199 1287 1565 1577 1076  918 1008 1063
 | 3  544  635  804  980 1018 1064 1404 1286 1104  999  996 1015

 Whatever their differences, it's not causing R to misinterpret the
 data; that is, they each look like in single series of numerical data.

 Can anyone illuminate the difference in the data inside the ts data
 structures?  The potential incompatibility with stl is just one
 symptom.  Right now, the solution is black magic to me, and I would
 like to get a clearer picture so that I know when else (and how) to
 watch out for this.

 I've posted this to the R Help mailing list
 http://news.gmane.org/gmane.comp.lang.r.general and to stackoverflow
 at
 http://stackoverflow.com/questions/29759928/how-numerical-data-is-stored-
 inside-ts-time-series-objects.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.