Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
Jeffrey J. Hallman wrote: > Duncan Murdoch <[EMAIL PROTECTED]> writes: > > >> One reason I don't want to work on this is because the appropriate >> action depends on what "length(x)" is intended to mean. Currently for >> POSIXlt objects, it gives the physical length of the underlying basic >> type (the list). This is the same behaviour as we have for matrices, >> data frames and every other object without a specific length method, so >> it's not outrageous. >> >> The proposed change is to have it return the logical length of the >> object, which also seems quite reasonable. I don't think matrices and >> data frames have a "logical length", so there would be no contradiction >> in those examples. The thing that worries me is that there are probably >> objects in packages where both logical length and physical length make >> sense but are different. I don't have any expectation that length(x) on >> those currently is consistent in which type of value it returns. >> >> If we were to decide that "length(x)" *always* meant logical length, >> then we would have a problem: matrices and data frames don't have a >> logical length, so we shouldn't be getting an answer there. Changing >> length(x) for those is not acceptable. >> >> On the other hand, if we decide that "length(x)" *always* means physical >> length, we don't need to do anything to the POSIXlt or matrices or data >> frames, but there may well be other kinds of objects out there that >> violate this rule. >> >> We could leave the meaning of length(x) ambiguous. If you want to know >> what it does for a POSIXlt object, you need to read the documentation or >> look at the source code. As a policy, this isn't particularly >> appealing, but I could probably live with it if someone else did the >> research and showed that current usage is ambiguous. >> > > Physical length and logical length are, as you say, two different things. So > why not two functions? Keep length() for physical length, as it is now, and > maybe Length() for logical length. The latter could be defined as > > Length <- function(x, ...) UseMethod("Length") > > Length.default <- function(x, ...) length(x) > > and then add methods for classes that want something else. > A very reasonable suggestion, but I'd also put this in the "next time we design a language" category. The current system in R seems workable to me, if one knows that vector-like classes that have a S3 list-based implementation need to have methods defined for 'c', 'length', '[', etc, and that if these methods aren't defined, then you'll be operating on the underlying list structure. Where these methods are defined, one can get at the underlying structure by unclassing first, and that's OK. However, classes that have some of these methods defined but not others seem to me to be needlessly confusing -- it's not like there any great benefit that length() always returns the length of the underlying list for POSIXlt -- if there was a length() method one could get at the underlying length using length(unclass(x)). It just seems like a design oversight that makes using such classes unnecessarily difficult and error-prone. Hence my proposal (in a new thread) for coding & documentation guidelines that would that would: (1) suggest consistency is a good thing (2) suggent compliance or deviation should be documented (3) define what consistency was (and here it's not so important to get absolutely the right set of consistency definitions as it is to get a reasonable set that people agree on.) -- Tony Plate __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
Duncan Murdoch <[EMAIL PROTECTED]> writes: > One reason I don't want to work on this is because the appropriate > action depends on what "length(x)" is intended to mean. Currently for > POSIXlt objects, it gives the physical length of the underlying basic > type (the list). This is the same behaviour as we have for matrices, > data frames and every other object without a specific length method, so > it's not outrageous. > > The proposed change is to have it return the logical length of the > object, which also seems quite reasonable. I don't think matrices and > data frames have a "logical length", so there would be no contradiction > in those examples. The thing that worries me is that there are probably > objects in packages where both logical length and physical length make > sense but are different. I don't have any expectation that length(x) on > those currently is consistent in which type of value it returns. > > If we were to decide that "length(x)" *always* meant logical length, > then we would have a problem: matrices and data frames don't have a > logical length, so we shouldn't be getting an answer there. Changing > length(x) for those is not acceptable. > > On the other hand, if we decide that "length(x)" *always* means physical > length, we don't need to do anything to the POSIXlt or matrices or data > frames, but there may well be other kinds of objects out there that > violate this rule. > > We could leave the meaning of length(x) ambiguous. If you want to know > what it does for a POSIXlt object, you need to read the documentation or > look at the source code. As a policy, this isn't particularly > appealing, but I could probably live with it if someone else did the > research and showed that current usage is ambiguous. Physical length and logical length are, as you say, two different things. So why not two functions? Keep length() for physical length, as it is now, and maybe Length() for logical length. The latter could be defined as Length <- function(x, ...) UseMethod("Length") Length.default <- function(x, ...) length(x) and then add methods for classes that want something else. -- Jeff __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
Duncan Murdoch wrote: > On 15/12/2007 5:17 PM, Martin Maechler wrote: >>> "TP" == Tony Plate <[EMAIL PROTECTED]> >>> on Fri, 14 Dec 2007 13:58:30 -0700 writes: >> TP> Duncan Murdoch wrote: >> >> On 12/13/2007 1:59 PM, Tony Plate wrote: >> >>> Duncan Murdoch wrote: >> On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: >> > Full_Name: Petr Simecek >> > Version: 2.5.1, 2.6.1 >> > OS: Windows XP >> > Submission from: (NULL) (195.113.231.2) >> > >> > >> > Several times I have experienced that a length of a POSIXt vector >> > has not been >> > computed right. >> > >> > Example: >> > >> > tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 >> > ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L >> > ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), >> > mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), >> mon >> > = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, >> > 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = >> > c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, >> 163L, >> > 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, >> > 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", >> "min", >> > "hour", "mday", "mon", "year", "wday", "yday", "isdst" >> > ), class = c("POSIXt", "POSIXlt")) >> > >> > print(tv) >> > # print 11 time points (right) >> > >> > length(tv) >> > # returns 9 (wrong) >> >> tv is a list of length 9. The answer is right, your expectation is >> wrong. >> > I have tried that on several computers with/without switching to >> > English >> > locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a >> > help pages but I >> > cannot imagine how that could be OK. >> >> See this in ?POSIXt: >> >> Class '"POSIXlt"' is a named list of vectors... >> >> You could define your own length measurement as >> >> length.POSIXlt <- function(x) length(x$sec) >> >> and you'll get the answer you expect, but be aware that length.XXX >> methods are quite rare, and you may surprise some of your users. >> >> >>> >> >>> On the other hand, isn't the fact that length() currently always >> >>> returns 9 for POSIXlt objects likely to be a surprise to many users >> >>> of POSIXlt? >> >>> >> >>> The back of "The New S Language" says "Easy-to-use facilities allow >> >>> you to organize, store and retrieve all sorts of data. ... S >> >>> functions and data organization make applications easy to write." >> >>> >> >>> Now, POSIXlt has methods for c() and vector subsetting "[" (and many >> >>> other vector-manipulation methods - see methods(class="POSIXlt")). >> >>> Hence, from the point of view of intending to supply "easy-to-use >> >>> facilities ... [for] all sorts of data", isn't it a little >> >>> incongruous that length() is not also provided -- as 3 functions >> (any >> >>> others?) comprise a core set of vector-manipulation functions? >> >>> >> >>> Would it make sense to have an informal prescription (e.g., in >> >>> R-exts) that a class that implements a vector-like object and >> >>> provides at least of one of functions 'c', '[' and 'length' should >> >>> provide all three? It would also be easy to describe a test-suite >> >>> that should be included in the 'test' directory of a package >> >>> implementing such a class, that had some tests of the basic >> >>> vector-manipulation functionality, such as: >> >>> >> >>> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the >> >>> > # class being tested, of length 0, 1, 3, and 10, and they should >> >>> > # contain no duplicate elements >> >>> > length(x0) >> >>> [1] 1 >> >>> > length(c(x0, x1)) >> >>> [1] 2 >> >>> > length(c(x1,x10)) >> >>> [1] 11 >> >>> > all(x3 == x3[seq(len=length(x3))]) >> >>> [1] TRUE >> >>> > all(x3 == c(x3[1], x3[2], x3[3])) >> >>> [1] TRUE >> >>> > length(c(x3[2], x10[5:7])) >> >>> [1] 4 >> >>> > >> >>> >> >>> It would also be possible to describe a larger set of vector >> >>> manipulation functions that should be implemented together, >> including >> >>> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', >> >>> head, tail ... (many of which are provided for POSIXlt). >> >>> >> >>> Or is there some good reason that length() cannot be provided (while >> >>> 'c' and '[' can) for some vector-like classes such as
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
On 15/12/2007 5:17 PM, Martin Maechler wrote: >> "TP" == Tony Plate <[EMAIL PROTECTED]> >> on Fri, 14 Dec 2007 13:58:30 -0700 writes: > > TP> Duncan Murdoch wrote: > >> On 12/13/2007 1:59 PM, Tony Plate wrote: > >>> Duncan Murdoch wrote: > On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: > > Full_Name: Petr Simecek > > Version: 2.5.1, 2.6.1 > > OS: Windows XP > > Submission from: (NULL) (195.113.231.2) > > > > > > Several times I have experienced that a length of a POSIXt vector > > has not been > > computed right. > > > > Example: > > > > tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 > > ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L > > ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), > > mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon > > = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, > > 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = > > c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, > > 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, > > 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min", > > "hour", "mday", "mon", "year", "wday", "yday", "isdst" > > ), class = c("POSIXt", "POSIXlt")) > > > > print(tv) > > # print 11 time points (right) > > > > length(tv) > > # returns 9 (wrong) > > tv is a list of length 9. The answer is right, your expectation is > wrong. > > I have tried that on several computers with/without switching to > > English > > locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a > > help pages but I > > cannot imagine how that could be OK. > > See this in ?POSIXt: > > Class '"POSIXlt"' is a named list of vectors... > > You could define your own length measurement as > > length.POSIXlt <- function(x) length(x$sec) > > and you'll get the answer you expect, but be aware that length.XXX > methods are quite rare, and you may surprise some of your users. > > >>> > >>> On the other hand, isn't the fact that length() currently always > >>> returns 9 for POSIXlt objects likely to be a surprise to many users > >>> of POSIXlt? > >>> > >>> The back of "The New S Language" says "Easy-to-use facilities allow > >>> you to organize, store and retrieve all sorts of data. ... S > >>> functions and data organization make applications easy to write." > >>> > >>> Now, POSIXlt has methods for c() and vector subsetting "[" (and many > >>> other vector-manipulation methods - see methods(class="POSIXlt")). > >>> Hence, from the point of view of intending to supply "easy-to-use > >>> facilities ... [for] all sorts of data", isn't it a little > >>> incongruous that length() is not also provided -- as 3 functions (any > >>> others?) comprise a core set of vector-manipulation functions? > >>> > >>> Would it make sense to have an informal prescription (e.g., in > >>> R-exts) that a class that implements a vector-like object and > >>> provides at least of one of functions 'c', '[' and 'length' should > >>> provide all three? It would also be easy to describe a test-suite > >>> that should be included in the 'test' directory of a package > >>> implementing such a class, that had some tests of the basic > >>> vector-manipulation functionality, such as: > >>> > >>> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the > >>> > # class being tested, of length 0, 1, 3, and 10, and they should > >>> > # contain no duplicate elements > >>> > length(x0) > >>> [1] 1 > >>> > length(c(x0, x1)) > >>> [1] 2 > >>> > length(c(x1,x10)) > >>> [1] 11 > >>> > all(x3 == x3[seq(len=length(x3))]) > >>> [1] TRUE > >>> > all(x3 == c(x3[1], x3[2], x3[3])) > >>> [1] TRUE > >>> > length(c(x3[2], x10[5:7])) > >>> [1] 4 > >>> > > >>> > >>> It would also be possible to describe a larger set of vector > >>> manipulation functions that should be implemented together, including > >>> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', > >>> head, tail ... (many of which are provided for POSIXlt). > >>> > >>> Or is there some good reason that length() cannot be provided (while > >>> 'c' and '[' can) for some vector-like classes such as "POSIXlt"? > >> > >> What you say sounds good in general, but the devil is in the details. > >> Changing the meaning of length(x)
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
If it were simply deprecated and then changed then everyone using it would get a warning during the period of deprecation so it would not be so bad. Given that its current behavior is not very useful I suspect its not widely used anyways. | haven't followed the whole discussion so sorry if these points have already been made. On Dec 15, 2007 5:17 PM, Martin Maechler <[EMAIL PROTECTED]> wrote: > > "TP" == Tony Plate <[EMAIL PROTECTED]> > > on Fri, 14 Dec 2007 13:58:30 -0700 writes: > > >TP> Duncan Murdoch wrote: >>> On 12/13/2007 1:59 PM, Tony Plate wrote: >>>> Duncan Murdoch wrote: > On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: >> Full_Name: Petr Simecek >> Version: 2.5.1, 2.6.1 >> OS: Windows XP >> Submission from: (NULL) (195.113.231.2) >> >> >> Several times I have experienced that a length of a POSIXt vector >> has not been >> computed right. >> >> Example: >> >> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 >> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L >> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), >> mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon >> = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, >> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = >> c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, >> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, >> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min", >> "hour", "mday", "mon", "year", "wday", "yday", "isdst" >> ), class = c("POSIXt", "POSIXlt")) >> >> print(tv) >> # print 11 time points (right) >> >> length(tv) >> # returns 9 (wrong) > > tv is a list of length 9. The answer is right, your expectation is > wrong. >> I have tried that on several computers with/without switching to >> English >> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a >> help pages but I >> cannot imagine how that could be OK. > > See this in ?POSIXt: > > Class '"POSIXlt"' is a named list of vectors... > > You could define your own length measurement as > > length.POSIXlt <- function(x) length(x$sec) > > and you'll get the answer you expect, but be aware that length.XXX > methods are quite rare, and you may surprise some of your users. > >>>> >>>> On the other hand, isn't the fact that length() currently always >>>> returns 9 for POSIXlt objects likely to be a surprise to many users >>>> of POSIXlt? >>>> >>>> The back of "The New S Language" says "Easy-to-use facilities allow >>>> you to organize, store and retrieve all sorts of data. ... S >>>> functions and data organization make applications easy to write." >>>> >>>> Now, POSIXlt has methods for c() and vector subsetting "[" (and many >>>> other vector-manipulation methods - see methods(class="POSIXlt")). >>>> Hence, from the point of view of intending to supply "easy-to-use >>>> facilities ... [for] all sorts of data", isn't it a little >>>> incongruous that length() is not also provided -- as 3 functions (any >>>> others?) comprise a core set of vector-manipulation functions? >>>> >>>> Would it make sense to have an informal prescription (e.g., in >>>> R-exts) that a class that implements a vector-like object and >>>> provides at least of one of functions 'c', '[' and 'length' should >>>> provide all three? It would also be easy to describe a test-suite >>>> that should be included in the 'test' directory of a package >>>> implementing such a class, that had some tests of the basic >>>> vector-manipulation functionality, such as: >>>> >>>> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the >>>> > # class being tested, of length 0, 1, 3, and 10, and they should >>>> > # contain no duplicate elements >>>> > length(x0) >>>> [1] 1 >>>> > length(c(x0, x1)) >>>> [1] 2 >>>> > length(c(x1,x10)) >>>> [1] 11 >>>> > all(x3 == x3[seq(len=length(x3))]) >>>> [1] TRUE >>>> > all(x3 == c(x3[1], x3[2], x3[3])) >>>> [1] TRUE >>>> > length(c(x3[2], x10[5:7])) >>>> [1] 4 >>>> > >>>> >>>> It would also be possible to describe a larger set of vector >>>> manipulation functions that should be implemented together, including >>>> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', >>>> head, tail ... (many of which are provided for POSIXlt). >>>> >>>> Or is there some good reason that length() cannot be provided (while >
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
> "TP" == Tony Plate <[EMAIL PROTECTED]> > on Fri, 14 Dec 2007 13:58:30 -0700 writes: TP> Duncan Murdoch wrote: >> On 12/13/2007 1:59 PM, Tony Plate wrote: >>> Duncan Murdoch wrote: On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: > Full_Name: Petr Simecek > Version: 2.5.1, 2.6.1 > OS: Windows XP > Submission from: (NULL) (195.113.231.2) > > > Several times I have experienced that a length of a POSIXt vector > has not been > computed right. > > Example: > > tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 > ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L > ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), > mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon > = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, > 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = > c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, > 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min", > "hour", "mday", "mon", "year", "wday", "yday", "isdst" > ), class = c("POSIXt", "POSIXlt")) > > print(tv) > # print 11 time points (right) > > length(tv) > # returns 9 (wrong) tv is a list of length 9. The answer is right, your expectation is wrong. > I have tried that on several computers with/without switching to > English > locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a > help pages but I > cannot imagine how that could be OK. See this in ?POSIXt: Class '"POSIXlt"' is a named list of vectors... You could define your own length measurement as length.POSIXlt <- function(x) length(x$sec) and you'll get the answer you expect, but be aware that length.XXX methods are quite rare, and you may surprise some of your users. >>> >>> On the other hand, isn't the fact that length() currently always >>> returns 9 for POSIXlt objects likely to be a surprise to many users >>> of POSIXlt? >>> >>> The back of "The New S Language" says "Easy-to-use facilities allow >>> you to organize, store and retrieve all sorts of data. ... S >>> functions and data organization make applications easy to write." >>> >>> Now, POSIXlt has methods for c() and vector subsetting "[" (and many >>> other vector-manipulation methods - see methods(class="POSIXlt")). >>> Hence, from the point of view of intending to supply "easy-to-use >>> facilities ... [for] all sorts of data", isn't it a little >>> incongruous that length() is not also provided -- as 3 functions (any >>> others?) comprise a core set of vector-manipulation functions? >>> >>> Would it make sense to have an informal prescription (e.g., in >>> R-exts) that a class that implements a vector-like object and >>> provides at least of one of functions 'c', '[' and 'length' should >>> provide all three? It would also be easy to describe a test-suite >>> that should be included in the 'test' directory of a package >>> implementing such a class, that had some tests of the basic >>> vector-manipulation functionality, such as: >>> >>> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the >>> > # class being tested, of length 0, 1, 3, and 10, and they should >>> > # contain no duplicate elements >>> > length(x0) >>> [1] 1 >>> > length(c(x0, x1)) >>> [1] 2 >>> > length(c(x1,x10)) >>> [1] 11 >>> > all(x3 == x3[seq(len=length(x3))]) >>> [1] TRUE >>> > all(x3 == c(x3[1], x3[2], x3[3])) >>> [1] TRUE >>> > length(c(x3[2], x10[5:7])) >>> [1] 4 >>> > >>> >>> It would also be possible to describe a larger set of vector >>> manipulation functions that should be implemented together, including >>> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', >>> head, tail ... (many of which are provided for POSIXlt). >>> >>> Or is there some good reason that length() cannot be provided (while >>> 'c' and '[' can) for some vector-like classes such as "POSIXlt"? >> >> What you say sounds good in general, but the devil is in the details. >> Changing the meaning of length(x) for some objects has fairly >> widespread effects. Are they all positive? I don't know. >> >> Adding a prescription like the one you suggest would be good if it's >> easy to implement, but bad if it's already widely violated. How many
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
Duncan Murdoch wrote: > On 12/13/2007 1:59 PM, Tony Plate wrote: >> Duncan Murdoch wrote: >>> On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: Full_Name: Petr Simecek Version: 2.5.1, 2.6.1 OS: Windows XP Submission from: (NULL) (195.113.231.2) Several times I have experienced that a length of a POSIXt vector has not been computed right. Example: tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" ), class = c("POSIXt", "POSIXlt")) print(tv) # print 11 time points (right) length(tv) # returns 9 (wrong) >>> >>> tv is a list of length 9. The answer is right, your expectation is >>> wrong. I have tried that on several computers with/without switching to English locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages but I cannot imagine how that could be OK. >>> >>> See this in ?POSIXt: >>> >>> Class '"POSIXlt"' is a named list of vectors... >>> >>> You could define your own length measurement as >>> >>> length.POSIXlt <- function(x) length(x$sec) >>> >>> and you'll get the answer you expect, but be aware that length.XXX >>> methods are quite rare, and you may surprise some of your users. >>> >> >> On the other hand, isn't the fact that length() currently always >> returns 9 for POSIXlt objects likely to be a surprise to many users >> of POSIXlt? >> >> The back of "The New S Language" says "Easy-to-use facilities allow >> you to organize, store and retrieve all sorts of data. ... S >> functions and data organization make applications easy to write." >> >> Now, POSIXlt has methods for c() and vector subsetting "[" (and many >> other vector-manipulation methods - see methods(class="POSIXlt")). >> Hence, from the point of view of intending to supply "easy-to-use >> facilities ... [for] all sorts of data", isn't it a little >> incongruous that length() is not also provided -- as 3 functions (any >> others?) comprise a core set of vector-manipulation functions? >> >> Would it make sense to have an informal prescription (e.g., in >> R-exts) that a class that implements a vector-like object and >> provides at least of one of functions 'c', '[' and 'length' should >> provide all three? It would also be easy to describe a test-suite >> that should be included in the 'test' directory of a package >> implementing such a class, that had some tests of the basic >> vector-manipulation functionality, such as: >> >> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the >> > # class being tested, of length 0, 1, 3, and 10, and they should >> > # contain no duplicate elements >> > length(x0) >> [1] 1 >> > length(c(x0, x1)) >> [1] 2 >> > length(c(x1,x10)) >> [1] 11 >> > all(x3 == x3[seq(len=length(x3))]) >> [1] TRUE >> > all(x3 == c(x3[1], x3[2], x3[3])) >> [1] TRUE >> > length(c(x3[2], x10[5:7])) >> [1] 4 >> > >> >> It would also be possible to describe a larger set of vector >> manipulation functions that should be implemented together, including >> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', >> head, tail ... (many of which are provided for POSIXlt). >> >> Or is there some good reason that length() cannot be provided (while >> 'c' and '[' can) for some vector-like classes such as "POSIXlt"? > > What you say sounds good in general, but the devil is in the details. > Changing the meaning of length(x) for some objects has fairly > widespread effects. Are they all positive? I don't know. > > Adding a prescription like the one you suggest would be good if it's > easy to implement, but bad if it's already widely violated. How many > base or CRAN or Bioconductor packages violate it currently? Do the > ones that provide all 3 methods do so in a consistent way, i.e. does > "length(x)" mean the same thing in all of them? I'm not sure doing something like this would be so bad even if it is already widely violated. R has evolved significantly over time, and many rough edges have been cleaned up, sometimes in ways that were not backward compatible. This is a great thing & my thanks go to the people working on R. If some base or CRAN or Bioconductor packages currently don't implement vector operations consistently, wouldn't it be good to know that? Wouldn't it be us
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
On 12/13/2007 1:59 PM, Tony Plate wrote: > Duncan Murdoch wrote: >> On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: >>> Full_Name: Petr Simecek >>> Version: 2.5.1, 2.6.1 >>> OS: Windows XP >>> Submission from: (NULL) (195.113.231.2) >>> >>> >>> Several times I have experienced that a length of a POSIXt vector has not >>> been >>> computed right. >>> >>> Example: >>> >>> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 >>> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L >>> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, >>> 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, >>> 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, >>> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, >>> 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, >>> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, >>> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", >>> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" >>> ), class = c("POSIXt", "POSIXlt")) >>> >>> print(tv) >>> # print 11 time points (right) >>> >>> length(tv) >>> # returns 9 (wrong) >> >> tv is a list of length 9. The answer is right, your expectation is wrong. >>> I have tried that on several computers with/without switching to English >>> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages >>> but I >>> cannot imagine how that could be OK. >> >> See this in ?POSIXt: >> >> Class '"POSIXlt"' is a named list of vectors... >> >> You could define your own length measurement as >> >> length.POSIXlt <- function(x) length(x$sec) >> >> and you'll get the answer you expect, but be aware that length.XXX >> methods are quite rare, and you may surprise some of your users. >> > > On the other hand, isn't the fact that length() currently always returns 9 > for POSIXlt objects likely to be a surprise to many users of POSIXlt? > > The back of "The New S Language" says "Easy-to-use facilities allow you to > organize, store and retrieve all sorts of data. ... S functions and data > organization make applications easy to write." > > Now, POSIXlt has methods for c() and vector subsetting "[" (and many other > vector-manipulation methods - see methods(class="POSIXlt")). Hence, from > the point of view of intending to supply "easy-to-use facilities ... [for] > all sorts of data", isn't it a little incongruous that length() is not also > provided -- as 3 functions (any others?) comprise a core set of > vector-manipulation functions? > > Would it make sense to have an informal prescription (e.g., in R-exts) that > a class that implements a vector-like object and provides at least of one > of functions 'c', '[' and 'length' should provide all three? It would also > be easy to describe a test-suite that should be included in the 'test' > directory of a package implementing such a class, that had some tests of > the basic vector-manipulation functionality, such as: > > > # at this point, x0, x1, x3, & x10 should exist, as vectors of the > > # class being tested, of length 0, 1, 3, and 10, and they should > > # contain no duplicate elements > > length(x0) > [1] 1 > > length(c(x0, x1)) > [1] 2 > > length(c(x1,x10)) > [1] 11 > > all(x3 == x3[seq(len=length(x3))]) > [1] TRUE > > all(x3 == c(x3[1], x3[2], x3[3])) > [1] TRUE > > length(c(x3[2], x10[5:7])) > [1] 4 > > > > It would also be possible to describe a larger set of vector manipulation > functions that should be implemented together, including e.g., 'rep', > 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', head, tail ... (many > of which are provided for POSIXlt). > > Or is there some good reason that length() cannot be provided (while 'c' > and '[' can) for some vector-like classes such as "POSIXlt"? What you say sounds good in general, but the devil is in the details. Changing the meaning of length(x) for some objects has fairly widespread effects. Are they all positive? I don't know. Adding a prescription like the one you suggest would be good if it's easy to implement, but bad if it's already widely violated. How many base or CRAN or Bioconductor packages violate it currently? Do the ones that provide all 3 methods do so in a consistent way, i.e. does "length(x)" mean the same thing in all of them? I agree that the current state is less than perfect, but making it better would really be a lot of work. I suspect there are better ways to spend my time, so I'm not going to volunteer to do it. I'm not even going to invite someone else to do it, or offer to review your work if you volunteer. I think this falls into the class of "next time we write a language, let's handle this better" problems. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
Duncan Murdoch wrote: > On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: >> Full_Name: Petr Simecek >> Version: 2.5.1, 2.6.1 >> OS: Windows XP >> Submission from: (NULL) (195.113.231.2) >> >> >> Several times I have experienced that a length of a POSIXt vector has not >> been >> computed right. >> >> Example: >> >> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 >> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L >> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, >> 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, >> 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, >> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, >> 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, >> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, >> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", >> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" >> ), class = c("POSIXt", "POSIXlt")) >> >> print(tv) >> # print 11 time points (right) >> >> length(tv) >> # returns 9 (wrong) > > tv is a list of length 9. The answer is right, your expectation is wrong. >> I have tried that on several computers with/without switching to English >> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages >> but I >> cannot imagine how that could be OK. > > See this in ?POSIXt: > > Class '"POSIXlt"' is a named list of vectors... > > You could define your own length measurement as > > length.POSIXlt <- function(x) length(x$sec) > > and you'll get the answer you expect, but be aware that length.XXX > methods are quite rare, and you may surprise some of your users. > On the other hand, isn't the fact that length() currently always returns 9 for POSIXlt objects likely to be a surprise to many users of POSIXlt? The back of "The New S Language" says "Easy-to-use facilities allow you to organize, store and retrieve all sorts of data. ... S functions and data organization make applications easy to write." Now, POSIXlt has methods for c() and vector subsetting "[" (and many other vector-manipulation methods - see methods(class="POSIXlt")). Hence, from the point of view of intending to supply "easy-to-use facilities ... [for] all sorts of data", isn't it a little incongruous that length() is not also provided -- as 3 functions (any others?) comprise a core set of vector-manipulation functions? Would it make sense to have an informal prescription (e.g., in R-exts) that a class that implements a vector-like object and provides at least of one of functions 'c', '[' and 'length' should provide all three? It would also be easy to describe a test-suite that should be included in the 'test' directory of a package implementing such a class, that had some tests of the basic vector-manipulation functionality, such as: > # at this point, x0, x1, x3, & x10 should exist, as vectors of the > # class being tested, of length 0, 1, 3, and 10, and they should > # contain no duplicate elements > length(x0) [1] 1 > length(c(x0, x1)) [1] 2 > length(c(x1,x10)) [1] 11 > all(x3 == x3[seq(len=length(x3))]) [1] TRUE > all(x3 == c(x3[1], x3[2], x3[3])) [1] TRUE > length(c(x3[2], x10[5:7])) [1] 4 > It would also be possible to describe a larger set of vector manipulation functions that should be implemented together, including e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', head, tail ... (many of which are provided for POSIXlt). Or is there some good reason that length() cannot be provided (while 'c' and '[' can) for some vector-like classes such as "POSIXlt"? -- Tony Plate > Duncan Murdoch > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
It is right: it is a list of length 9. You even constructed it as such a list! On Tue, 11 Dec 2007, [EMAIL PROTECTED] wrote: > Full_Name: Petr Simecek > Version: 2.5.1, 2.6.1 > OS: Windows XP > Submission from: (NULL) (195.113.231.2) > > > Several times I have experienced that a length of a POSIXt vector has not been > computed right. > > Example: > > tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 > ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L > ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, > 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, > 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, > 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, > 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", > "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" > ), class = c("POSIXt", "POSIXlt")) > > print(tv) > # print 11 time points (right) > > length(tv) > # returns 9 (wrong) > > I have tried that on several computers with/without switching to English > locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help > pages but I cannot imagine how that could be OK. > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: > Full_Name: Petr Simecek > Version: 2.5.1, 2.6.1 > OS: Windows XP > Submission from: (NULL) (195.113.231.2) > > > Several times I have experienced that a length of a POSIXt vector has not been > computed right. > > Example: > > tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 > ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L > ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, > 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, > 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, > 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, > 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", > "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" > ), class = c("POSIXt", "POSIXlt")) > > print(tv) > # print 11 time points (right) > > length(tv) > # returns 9 (wrong) tv is a list of length 9. The answer is right, your expectation is wrong. > > I have tried that on several computers with/without switching to English > locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages > but I > cannot imagine how that could be OK. See this in ?POSIXt: Class '"POSIXlt"' is a named list of vectors... You could define your own length measurement as length.POSIXlt <- function(x) length(x$sec) and you'll get the answer you expect, but be aware that length.XXX methods are quite rare, and you may surprise some of your users. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
[EMAIL PROTECTED] wrote: > Full_Name: Petr Simecek > Version: 2.5.1, 2.6.1 > OS: Windows XP > Submission from: (NULL) (195.113.231.2) > > > Several times I have experienced that a length of a POSIXt vector has not been > computed right. > > Example: > > tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 > ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L > ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, > 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, > 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, > 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, > 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, > 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", > "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" > ), class = c("POSIXt", "POSIXlt")) > > print(tv) > # print 11 time points (right) > > length(tv) > # returns 9 (wrong) > > I have tried that on several computers with/without switching to English > locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages > but I > cannot imagine how that could be OK. > > Given the way you define it, you should be able to imagine it! It's a list of length 9: sec, min, hour,..., isdst. -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Wrong length of POSIXt vectors (PR#10507)
Full_Name: Petr Simecek Version: 2.5.1, 2.6.1 OS: Windows XP Submission from: (NULL) (195.113.231.2) Several times I have experienced that a length of a POSIXt vector has not been computed right. Example: tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" ), class = c("POSIXt", "POSIXlt")) print(tv) # print 11 time points (right) length(tv) # returns 9 (wrong) I have tried that on several computers with/without switching to English locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages but I cannot imagine how that could be OK. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel