Re: [R] How numerical data is stored inside ts time series objects
William Dunlap tibco.com> writes: > I think we can call this a bug in stl(). I used what I learned from the responses to this thread, I looked at the code for stl. As they say in Microsoft, "this is expected behaviour" according to the code. And it doesn't look like an inadvertent coding oversight. --- Martin Maechler lynne.stat.math.ethz.ch> writes: >> Paul gmail.com> Interesting that a 2D matrix >> of size Nx1 is treated as a different animal from a length N >> vector. It's a departure from math convention, and from what I'm >> accustomed to in Matlab. > > The vector space |R^n is not all the same space as the space > |R^{n x 1} even though of course there's a trivial mapping between > the objects (and the metrics) of the two. A vector *is NOT* a > matrix -- but in some matrix calculus notations there is a > convention to *treat* n-vectors as (n x 1) matrices. > > Good linear algebra teaching does distinguish vectors from > one-column or one-row matrices -- I'm sure still the case in all > good math departments around the globe -- but maybe not in math > teaching to engineers and others who only need applied math. Yes, > linear algebra teaching will also make a point that in the usual > matrix product notations, it is convenient and useful to treat > vectors as if they were 1-column matrices. The distinction in math is new me, with academic training in engineering, even at the post grad level. I haven't seen the distinction in the math for Comp. Sci., either, and that's in the meat grinder of Canada. Admittedly, it's not quite as geeky as some meat grinders in other countries. And admittedly, I only took C.S. courses that were geared to applications. So I had always considered such a distinction to a practicality in coding implementation of vector/matrix classes, e.g., in C, a vector being a single pointer to a number, while in a 2D array is a pointer to a vector and hence a different type. >> That R's vector seems more akin to a list, where the notion of >> orientation doesn't apply. > > Sorry, but again: not at all in the sense 'list's are used in R. No need to apologize. To clarify, being new to R, I was referring to the general use of the term "list". Specifically, I was referring to an ordered collection without orientation, so it is consistent with what you say above about distinguishing between length N vectors vs. 2D matrices of size Nx1 or 1xN. > Fortunately, well thought out languages such as S, R, Julia, Python, > all do make a good distinction between vectors and matrices i.e. 1D > and 2D arrays. If Matlab still does not do that, it's just another > sign that Matlab users should flee and start using julia or R or > python. Matlab pretty well only deals with 2D arrays, some of which have size Nx1 or 1xN. I haven't seen an example of a 1-D data structure that doesn't have an orientation, implied or otherwise. Though of course, if someone proves me wrong, then I stand corrected (and smarter because of it). > {and well yes, we could start bitchering about S' and hence R's > distinction between a 1D array and a vector ... which I think has > been a clear design error... but that's not the topic here} Big fan of python's readability, though I've only dabbled. And I won't start bitchering about R & S cuz I'm a newcomer and it's all an eye popping wonderland. --- David R Forrest vims.edu> writes: > The details of how str() represents your x and y variables is within > the utils::stl.default() function. You can hunt this down and see I'm assuming that you meant utils.str.default() above. I can follow the rest of your post makes sense if I make that assumption. I snipped the majority of your response because I'm not responding to anything specific. However, it was an extremely educational post. Thank you for that. > Also, Matlab sometimes needs a squeeze() to drop degenerate > dimensions, and R's drop() is similar, and is less-black-magic > looking than the [[1]] code: > > > str(drop(x)) > Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 > 1260 1120 963 ... > > str(drop(y)) > Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 > 1260 1120 963 ... > > stl(drop(x),s.window='per') > stl(drop(y),s.window='per') > > Maybe str.default() should do Time-Series interpretation of is.ts() > objects for matrices as well as vectors. I'm assuming that you mean stl(), since str() already works on both? Maybe it's the version I have, however, but I find that the R code for stl() doesn't have have a section for is.ts(). Instead, it seems to run through a series of checks for pathological input, with the check for matrix data consisting of is.matrix(na.action(as.ts(x))), where x is the time series. Somehow, the fact that the na.action(time series argument) returns a matrix implies that the time series data is a matrix rather than a vector. In attempting t
Re: [R] How numerical data is stored inside ts time series objects
> On Apr 21, 2015, at 9:39 PM, Paul wrote: ... > I rummaged around the help files for str, summary, dput, args. This > seems like a more complicated language than Matlab, VBA, or even C++'s > STL of old (which was pretty thoroughly documented). A function like > str() returns an object description, and I'm guessing the conventions > with which the object is described depends a lot on the person who > wrote the handling code for the class. The description for the > variable y seems particularly elaborate. > > Would I be right in assuming that the notation is ad-hoc and not > documented? For example, the two invocations str(x) and str(y) show a > Time-Series and a ts. And there are many lines of output for str(y) > that is heavy in punctuation. > The details of how str() represents your x and y variables is within the utils::stl.default() function. You can hunt this down and see the code with: methods(class=class(x)) # Find the class-specific handlers -- no str() methods(str) # Find the methods for the generic getAnywhere(str.default) # or getFromNamespace('str.default','utils') Within the utils::str.default code, this 'Time-Series' specific code only triggers if the object doesn't match a long list of other items (for example: is.function(), is.list(), is.vector(object) || (is.array(object) && is.atomic(object)) ...) else if (stats::is.ts(object)) { tsp.a <- stats::tsp(object) str1 <- paste0(" Time-Series ", le.str, " from ", format(tsp.a[1L]), " to ", format(tsp.a[2L]), ":") std.attr <- c("tsp", "class") } This handling is not dependent on who wrote the ts class, but on who wrote the str.default function. A more explict way to look at the difference without the str() summarization is with dput(x) and dput(y): > dput(x) structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Tsp = c(1, 3.916667, 12), class = "ts") > dput(y) structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Dim = c(36L, 1L), .Dimnames = list(NULL, "V1"), .Tsp = c(1, 3.916667, 12), class = "ts") Also, Matlab sometimes needs a squeeze() to drop degenerate dimensions, and R's drop() is similar, and is less-black-magic looking than the [[1]] code: > str(drop(x)) Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 963 ... > str(drop(y)) Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 963 ... stl(drop(x),s.window='per') stl(drop(y),s.window='per') Maybe str.default() should do Time-Series interpretation of is.ts() objects for matrices as well as vectors. Dave __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How numerical data is stored inside ts time series objects
> Paul > on Wed, 22 Apr 2015 01:39:16 + writes: > William Dunlap tibco.com> writes: >> Use the str() function to see the internal structure of most >> objects. In your case it would show something like: >> >> > Data <- data.frame(theData=round(sin(1:38),1)) >> > x <- ts(Data[[1]], frequency=12) # or Data[,1] >> > y <- ts(Data, frequency=12) >> > str(x) >> Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 - > 0.5 >> ... >> > str(y) >> ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ... >> - attr(*, "dimnames")=List of 2 >> ..$ : NULL >> ..$ : chr "theData" >> - attr(*, "tsp")= num [1:3] 1 4.08 12 >> >> 'x' contains a vector of data and 'y' contains a 1-column matrix of >> data. stl(x,"per") and stl(y, "per") give similar results as you >> got. >> >> Evidently, stl() does not know that 1-column matrices can be treated >> much the same as vectors and gives an error message. Thus you must >> extract the one column into a vector: stl(y[,1], "per"). > Thanks, William. > Interesting that a 2D matrix of size Nx1 is treated as a different > animal from a length N vector. It's a departure from math convention, > and from what I'm accustomed to in Matlab. Ha -- Not at all! The above is exactly the misconception I have been fighting -- mostly in vane -- for years. Matlab's convention of treating a vector as an N x 1 matrix is a BIG confusion to much of math teaching : The vector space |R^n is not all the same space as the space |R^{n x 1} even though of course there's a trivial mapping between the objects (and the metrics) of the two. A vector *is NOT* a matrix -- but in some matrix calculus notations there is a convention to *treat* n-vectors as (n x 1) matrices. Good linear algebra teaching does distinguish vectors from one-column or one-row matrices -- I'm sure still the case in all good math departments around the globe -- but maybe not in math teaching to engineers and others who only need applied math. Yes, linear algebra teaching will also make a point that in the usual matrix product notations, it is convenient and useful to treat vectors as if they were 1-column matrices. > That R's vector seems > more akin to a list, where the notion of orientation doesn't apply. Sorry, but again: not at all in the sense 'list's are used in R. Fortunately, well thought out languages such as S, R, Julia, Python, all do make a good distinction between vectors and matrices i.e. 1D and 2D arrays. If Matlab still does not do that, it's just another sign that Matlab users should flee and start using julia or R or python. {and well yes, we could start bitchering about S' and hence R's distinction between a 1D array and a vector ... which I think has been a clear design error... but that's not the topic here} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How numerical data is stored inside ts time series objects
> Interesting that a 2D matrix of size Nx1 is treated as a different > animal from a length N vector. I think we can call this a bug in stl(). Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Apr 21, 2015 at 6:39 PM, Paul wrote: > William Dunlap tibco.com> writes: > > Use the str() function to see the internal structure of most > > objects. In your case it would show something like: > > > > > Data <- data.frame(theData=round(sin(1:38),1)) > > > x <- ts(Data[[1]], frequency=12) # or Data[,1] > > > y <- ts(Data, frequency=12) > > > str(x) > > Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 - > 0.5 > > ... > > > str(y) > > ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ... > > - attr(*, "dimnames")=List of 2 > > ..$ : NULL > > ..$ : chr "theData" > > - attr(*, "tsp")= num [1:3] 1 4.08 12 > > > > 'x' contains a vector of data and 'y' contains a 1-column matrix of > > data. stl(x,"per") and stl(y, "per") give similar results as you > > got. > > > > Evidently, stl() does not know that 1-column matrices can be treated > > much the same as vectors and gives an error message. Thus you must > > extract the one column into a vector: stl(y[,1], "per"). > > Thanks, William. > > Interesting that a 2D matrix of size Nx1 is treated as a different > animal from a length N vector. It's a departure from math convention, > and from what I'm accustomed to in Matlab. that R's vector seems > more akin to a list, where the notion of orientation doesn't apply. > > I rummaged around the help files for str, summary, dput, args. This > seems like a more complicated language than Matlab, VBA, or even C++'s > STL of old (which was pretty thoroughly documented). A function like > str() returns an object description, and I'm guessing the conventions > with which the object is described depends a lot on the person who > wrote the handling code for the class. The description for the > variable y seems particularly elaborate. > > Would I be right in assuming that the notation is ad-hoc and not > documented? For example, the two invocations str(x) and str(y) show a > Time-Series and a ts. And there are many lines of output for str(y) > that is heavy in punctuation. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How numerical data is stored inside ts time series objects
William Dunlap tibco.com> writes: > Use the str() function to see the internal structure of most > objects. In your case it would show something like: > > > Data <- data.frame(theData=round(sin(1:38),1)) > > x <- ts(Data[[1]], frequency=12) # or Data[,1] > > y <- ts(Data, frequency=12) > > str(x) > Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 - 0.5 > ... > > str(y) > ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ... > - attr(*, "dimnames")=List of 2 > ..$ : NULL > ..$ : chr "theData" > - attr(*, "tsp")= num [1:3] 1 4.08 12 > > 'x' contains a vector of data and 'y' contains a 1-column matrix of > data. stl(x,"per") and stl(y, "per") give similar results as you > got. > > Evidently, stl() does not know that 1-column matrices can be treated > much the same as vectors and gives an error message. Thus you must > extract the one column into a vector: stl(y[,1], "per"). Thanks, William. Interesting that a 2D matrix of size Nx1 is treated as a different animal from a length N vector. It's a departure from math convention, and from what I'm accustomed to in Matlab. that R's vector seems more akin to a list, where the notion of orientation doesn't apply. I rummaged around the help files for str, summary, dput, args. This seems like a more complicated language than Matlab, VBA, or even C++'s STL of old (which was pretty thoroughly documented). A function like str() returns an object description, and I'm guessing the conventions with which the object is described depends a lot on the person who wrote the handling code for the class. The description for the variable y seems particularly elaborate. Would I be right in assuming that the notation is ad-hoc and not documented? For example, the two invocations str(x) and str(y) show a Time-Series and a ts. And there are many lines of output for str(y) that is heavy in punctuation. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How numerical data is stored inside ts time series objects
Use the str() function to see the internal structure of most objects. In your case it would show something like: > Data <- data.frame(theData=round(sin(1:38),1)) > x <- ts(Data[[1]], frequency=12) # or Data[,1] > y <- ts(Data, frequency=12) > str(x) Time-Series [1:38] from 1 to 4.08: 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ... > str(y) ts [1:38, 1] 0.8 0.9 0.1 -0.8 -1 -0.3 0.7 1 0.4 -0.5 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr "theData" - attr(*, "tsp")= num [1:3] 1 4.08 12 'x' contains a vector of data and 'y' contains a 1-column matrix of data. stl(x,"per") and stl(y, "per") give similar results as you got. Evidently, stl() does not know that 1-column matrices can be treated much the same as vectors and gives an error message. Thus you must extract the one column into a vector: stl(y[,1], "per"). Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Apr 20, 2015 at 4:04 PM, Paul wrote: > I'm getting familiar with the stl function in the stats packcage by > trying it on an example from Brockwell & Davis's 2002 "Introduction to > Times Series and Forcasting". Specifically, I'm using a subset of his > red wine sales data. It's a detour from the stl material at > http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm (at some point, I > have to stop simply following and try to make it work with new data). > > I need a minimum of 36 wine sales data points in the series, since stl > otherwise complains about the data being less than 2 cycles. The data > is in ~/tmp/wine.txt: > > 464 > 675 > 703 > 887 > 1139 > 1077 > 1318 > 1260 > 1120 > 963 > 996 > 960 > 530 > 883 > 894 > 1045 > 1199 > 1287 > 1565 > 1577 > 1076 > 918 > 1008 > 1063 > 544 > 635 > 804 > 980 > 1018 > 1064 > 1404 > 1286 > 1104 > 999 > 996 > 1015 > > My sourced test code is buried in a repeat loop so that I can use a > break command to circumvent the final error-causing statement that I'm > trying to figure out: > > repeat{ > > # Clear variables (from stackexchange) > rm( list=setdiff( ls( all.names=TRUE ), lsf.str(all.names=TRUE ) ) > ) > ls() > > head( wine <- read.table("~/tmp/wine.txt") ) > ( x <- ts(wine[[1]],frequency=12) ) > ( y <- ts(wine,frequency=12) ) > ( a=stl(x,"per") ) > #break > ( b=stl(y,"per") ) > } > > The final statement causes the error 'Error in stl(y, "per") : only > univariate series are allowed'. I found an explanation at > http://stackoverflow.com/questions/10492155/time-series-and-stl-in-r-error- > only-univariate-series-are-allowed. > That's how I came up with the assignment to x using wine[[1]]. I > found an explanation to the need for > double square brackets at > http://www.r-tutor.com/r-introduction/list/named-list-members. > > My problem is that it's not very clear what is happening inside the ts > structures x and y. If I simply print them, they look 100% identical: > > | > x > |Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec > | 1 464 675 703 887 1139 1077 1318 1260 1120 963 996 960 > | 2 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063 > | 3 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015 > | > y > |Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec > | 1 464 675 703 887 1139 1077 1318 1260 1120 963 996 960 > | 2 530 883 894 1045 1199 1287 1565 1577 1076 918 1008 1063 > | 3 544 635 804 980 1018 1064 1404 1286 1104 999 996 1015 > > Whatever their differences, it's not causing R to misinterpret the > data; that is, they each look like in single series of numerical data. > > Can anyone illuminate the difference in the data inside the ts data > structures? The potential incompatibility with stl is just one > symptom. Right now, the "solution" is black magic to me, and I would > like to get a clearer picture so that I know when else (and how) to > watch out for this. > > I've posted this to the R Help mailing list > http://news.gmane.org/gmane.comp.lang.r.general and to stackoverflow > at > http://stackoverflow.com/questions/29759928/how-numerical-data-is-stored- > inside-ts-time-series-objects. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.