date:20071215

Re: [Rd] Improvement of SignRank functions

2007-12-15 Thread Martin Maechler

Hi Ivo,

 IU == Ivo Ugrina [EMAIL PROTECTED]
 on Fri, 14 Dec 2007 23:03:37 +0100 writes:

IU I took some time and liberty and tried to improve
IU existing implementation of SignRank functions
IU in R. (dsignrank, ...)

IU As I have seen they've been based on csignrank.
IU So I modified csignrank and, I believe,
IU improved calculation time and memory efficiency.

do you have evidence for your belief?
i.e. a set of  system.time(.) calls where you see the
difference?

IU The idea is basically the same. I use the same recursion
IU as original author 

 (Kurt Hornik)

IU used with one slight modification.
IU I am generating Wilcoxon SignRank density from the
IU beginning (for n=2,3,...) with the help of recursion formula.

IU What is changed?
IU There is no need for SINGRANK_MAX in src/nmath/nmath.h anymore.

IU Only functions:
IU static void w_free()
IU void signrank_free()
IU static void w_init_maybe(int n)
IU static double csignrank(int k, int n)
IU in src/nmath/signrank.c have been altered.
IU There was no change to dsignrank, psignrank, ...

IU I've tried to make as little changes as possible.

IU So, to compile with new functions only src/nmath/signrank.c
IU needs to be changed with the one given in attachment.

IU I hope it really is an improvement.

Me too :-) {Waiting for your test results}

The code does look slightly simpler, I agree.

BTW: If you had a smart idea to *not* use a static 'w' and still
 be memory efficient,
 that could lead to make that code thread-safe, but I am
 not at all sure this is possible without using
 thread-library C code.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Improvement of SignRank functions

2007-12-15 Thread Ivo Ugrina

Martin Maechler wrote:
 do you have evidence for your belief?
 i.e. a set of  system.time(.) calls where you see the
 difference?


system.time(dsignrank(17511, 400))
user  system elapsed
   1.010   0.120   1.145
system.time(dsignrank((0:17511), 400))
user  system elapsed
1.250.131.40
system.time(dsignrank((0:17511), 500))
user  system elapsed
   2.040   0.220   2.296
system.time(psignrank((0:17511), 600))
user  system elapsed
  20.670   0.580  21.403
system.time(qsignrank(0.56, 300))
user  system elapsed
   0.700   0.050   0.753
==
system.time(dsignrank(17511, 400))
user  system elapsed
   0.070   0.000   0.078
system.time(dsignrank((0:17511), 400))
user  system elapsed
   0.100   0.000   0.104
system.time(dsignrank((0:17511), 500))
user  system elapsed
   0.160   0.000   0.164
system.time(psignrank((0:17511), 600))
user  system elapsed
  16.330   0.370  16.729
system.time(qsignrank(0.56, 300))
user  system elapsed
   0.020   0.010   0.029



system.time(dsignrank((0:2), 600))
user  system elapsed
   3.470   0.280   3.745
RAM: ~130MB
==
system.time(dsignrank((0:2), 600))
user  system elapsed
   0.250   0.010   0.26
RAM: ~1MB



 BTW: If you had a smart idea to *not* use a static 'w' and still
  be memory efficient,
  that could lead to make that code thread-safe, but I am
  not at all sure this is possible without using
  thread-library C code.

I'll look into it.


With respect,
-- 
Ivo Ugrina
ICQ: 47508335 | www.iugrina.com
---
baza matematickih pojmova
http://baza.iugrina.com
---
anime, manga, Japan fanzin
http://yoshi.iugrina.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Improvement of SignRank functions

2007-12-15 Thread Martin Maechler

Hi Ivo,

 IU == Ivo Ugrina [EMAIL PROTECTED]
 on Sat, 15 Dec 2007 14:13:10 +0100 writes:

IU Martin Maechler wrote:
 do you have evidence for your belief?
 i.e. a set of  system.time(.) calls where you see the
 difference?

IU system.time(dsignrank(17511, 400))
IU user  system elapsed
IU 1.010   0.120   1.145
IU system.time(dsignrank((0:17511), 400))
IU user  system elapsed
IU 1.250.131.40
IU system.time(dsignrank((0:17511), 500))
IU user  system elapsed
IU 2.040   0.220   2.296
IU system.time(psignrank((0:17511), 600))
IU user  system elapsed
IU 20.670   0.580  21.403
IU system.time(qsignrank(0.56, 300))
IU user  system elapsed
IU 0.700   0.050   0.753
IU ==
IU system.time(dsignrank(17511, 400))
IU user  system elapsed
IU 0.070   0.000   0.078
IU system.time(dsignrank((0:17511), 400))
IU user  system elapsed
IU 0.100   0.000   0.104
IU system.time(dsignrank((0:17511), 500))
IU user  system elapsed
IU 0.160   0.000   0.164
IU system.time(psignrank((0:17511), 600))
IU user  system elapsed
IU 16.330   0.370  16.729
IU system.time(qsignrank(0.56, 300))
IU user  system elapsed
IU 0.020   0.010   0.029



IU system.time(dsignrank((0:2), 600))
IU user  system elapsed
IU 3.470   0.280   3.745
IU RAM: ~130MB
IU ==
IU system.time(dsignrank((0:2), 600))
IU user  system elapsed
IU 0.250   0.010   0.26
IU RAM: ~1MB

that's quite convincing; thank you!
and I can verify part of it on my computer.

I think I'd just commit your signrank.c
(with a few cosmetic changes) to the sources, right?

*Not* using a static with all the previously computed counts
is probably not possible without a (CPU time) efficiency loss;
and to make this thread-safe one could use a thread-global
array, but how to do that would really depend on the threading
system used, and that's not at all given.

Thank you for your contribution!
Martin



 BTW: If you had a smart idea to *not* use a static 'w' and still
 be memory efficient,
 that could lead to make that code thread-safe, but I am
 not at all sure this is possible without using
 thread-library C code.

IU I'll look into it.


IU With respect,
IU -- 
IU Ivo Ugrina
IU ICQ: 47508335 | www.iugrina.com
IU ---
IU baza matematickih pojmova
IU http://baza.iugrina.com
IU ---
IU anime, manga, Japan fanzin
IU http://yoshi.iugrina.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S4 class extending data.frame?

2007-12-15 Thread Ben Bolker



  Thanks, Martin.  In the short term (a) seems best.  In the long
run we may try (c), because there are other things that data.frame
doesn't do that we want it to do (i.e., allow arbitrary objects with
[ methods, print methods, and the same length to be bound together,
rather than being restricted to atomic vectors + Date/factor).

  cheers
Ben




Martin Morgan wrote:
 
 Ben, Oleg --
 
 Some solutions, which you've probably already thought of, are (a) move
 the data.frame into its own slot, instead of extending it, (b) manage
 the data.frame attributes yourself, or (c) reinvent the data.frame
 from scratch as a proper S4 class (e.g., extending 'list' with
 validity constraints on element length and homogeneity of element
 content).
 
 (b) places a lot of dependence on understanding the data.frame
 implementation, and is probably too tricky (for me) to get right,(c)
 is probably also tricky, and probably caries significant performance
 overhead (e.g., object duplication during validity checking).
 
 (a) means that you don't get automatic method inheritance. On the plus
 side, you still get the structure. It is trivial to implement methods
 like [, [[, etc to dispatch on your object and act on the appropriate
 slot. And in some sense you now know what methods i.e., those you've
 implemented, are supported on your object.
 
 Oleg, here's my cautionary tale for extending list, where manually
 subsetting the .Data slot mixes up the names (callNextMethod would
 have done the right thing, but was not appropriate). This was quite a
 subtle bug for me, because I hadn't been expecting named lists in my
 object; the problem surfaced when sapply used the (incorrectly subset)
 names attribute of the list. My solution in this case was to make sure
 'names' were removed from lists used to construct objects. As a
 consequence I lose a nice little bit of sapply magic.
 
 setClass('A', 'list')
 [1] A
 setMethod('[', 'A', function(x, i, j, ..., drop=TRUE) {
 + [EMAIL PROTECTED] - [EMAIL PROTECTED]
 + x
 + })
 [1] [
 names(new('A', list(x=1, y=2))[2])
 [1] x
 
 Martin
 
 Oleg Sklyar [EMAIL PROTECTED] writes:
 
 I had the same problem. Generally data.frame's behave like lists, but
 while you can extend list, there are problems extending a data.frame
 class. This comes down to the internal representation of the object I
 guess. Vectors, including list, contain their information in a (hidden)
 slot .Data (see the example below). data.frame's do not seem to follow
 this convention.

 Any idea how to go around?

 The following example is exactly the same as Ben's for a data.frame, but
 using a list. It works fine and one can see that the list structure is
 stored in .Data

 * ~: R
 R version 2.6.1 (2007-11-26) 
 setClass(c3,representation(comment=character),contains=list)
 [1] c3
 l = list(1:3,2:4)
 z3 = new(c3,l,comment=hello)
 z3
 An object of class “c3”
 [[1]]
 [1] 1 2 3

 [[2]]
 [1] 2 3 4

 Slot comment:
 [1] hello

 [EMAIL PROTECTED]
 [[1]]
 [1] 1 2 3

 [[2]]
 [1] 2 3 4

 Regards,
 Oleg

 On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 I would like to build an S4 class that extends
 a data frame, but includes several more slots.
 
 Here's an example using integer as the base
 class instead:
 
 setClass(c1,representation(comment=character),contains=integer)
 z1 = new(c1,55,comment=hello)
 z1
 z1+10
 z1[1]
 [EMAIL PROTECTED]
 
  -- in other words, it behaves exactly as an integer
 for access and operations but happens to have another slot.
 
  If I do this with a data frame instead, it doesn't seem to work
 at all.
 
 setClass(c2,representation(comment=character),contains=data.frame)
 d = data.frame(1:3,2:4)
 z2 = new(c2,d,comment=goodbye)
 z2  ## data all gone!!
 z2[,1]  ## Error ... object is not subsettable
 [EMAIL PROTECTED]  ## still there
 
   I can achieve approximately the same effect by
 adding attributes, but I was hoping for the structure
 of S4 classes ...
 
   Programming with Data and the R Language Definition
 contain 2 references each to data frames, and neither of
 them has allowed me to figure out this behavior.
 
  (While I'm at it: it would be wonderful to have
 a rich data frame that could include as a column
 any object that had an appropriate length and
 [ method ... has anyone done anything in this direction?
 ?data.frame says the allowable types are
  (numeric, logical, factor and character and so on),
  but I'm having trouble sorting out what the limitations
 are ...)
 
   hoping for enlightenment (it would be lovely to be
 shown how to make this work, but a definitive statement
 that it is impossible would be useful too).
 
   cheers
 Ben Bolker
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFHYL1pc5UpGjwzenMRAqErAJ9jj1KgVVSGIf+DtK7Km/+JBaDu2QCaAkl/
 eMi+WCEWK6FPpVMpUbo+RBQ=
 =huvz
 -END PGP SIGNATURE-

Re: [Rd] Improvement of SignRank functions

2007-12-15 Thread Ivo Ugrina

Hi Martin,

Martin Maechler wrote:
 that's quite convincing; thank you!
 and I can verify part of it on my computer.
:D

 I think I'd just commit your signrank.c
 (with a few cosmetic changes) to the sources, right?
Right!

There is no need for SIGNRANK_MAX in src/nmath/nmath.h anymore.

 Thank you for your contribution!
You're welcome.

With respect,
-- 
Ivo Ugrina
ICQ: 47508335 | www.iugrina.com
---
baza matematickih pojmova
http://baza.iugrina.com
---
anime, manga, Japan fanzin
http://yoshi.iugrina.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] List comprehensions for R

2007-12-15 Thread David C. Norris

Gabor,

Thank you for drawing this previous work to my attention.  I've attached 
below code that extends the list comprehension to include logical 
'guard' expressions, as in

  leap.years - .[ x ~ x - 1900:2100 | (x %% 400 == 0 || x %% 100 != 0 
 x %% 4 == 0) ]
  leap.years
 [1] 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 
1956 1960
[16] 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 
2016 2020
[31] 2024 2028 2032 2036 2040 2044 2048 2052 2056 2060 2064 2068 2072 
2076 2080
[46] 2084 2088 2092 2096
 

I wonder, would many (most?) R users be mathematically-trained 
statisticians first, and programmers second, and therefore find a 
mathematical notation like the list comprehension more natural than less 
declarative programming constructs?  I would be genuinely interested in 
your (and others') thoughts on that question, based on your knowledge of 
the R user community.

Regards,
David

Gabor Grothendieck wrote:
 That seems quite nice.

 Note that there has been some related code posted.  See:
 http://tolstoy.newcastle.edu.au/R/help/03b/6406.html
 which discusses some R idioms for list comprehensions.

 Also the gsubfn package has some functionality in this direction.  We
 preface any function with fn$ to allow functions in its arguments
 to be specified as formulas.  Its more R-ish than your code and
 applies to more than just list comprehensions while your code is
 more faithful to list comprehensions.



## Updated to include logical guards in list comprehensions

##
## Define syntax for list/vector/array comprehensions
##

. - structure(NA, class=comprehension)

comprehend - function(expr, vars, seqs, guard, comprehension=list()){
  if(length(vars)==0){  # base case of recursion
if(eval(guard)) comprehension[[length(comprehension)+1]] - eval(expr)
  } else {
for(elt in eval(seqs[[1]])){
  assign(vars[1], elt, inherits=TRUE)
  comprehension - comprehend(expr, vars[-1], seqs[-1], guard, 
comprehension)
}
  }
  comprehension
}

## List comprehensions specified by close approximation to set-builder 
notation:
##
##   { x+y | 0x9, 0yx, x*y30 } --- .[ x+y ~ {x-0:9; y-0:x} | 
x*y30 ]
##
[.comprehension - function(x, f){
  f - substitute(f)
  ## First, we pluck out the optional guard, if it is present:
  if(is.call(f)  is.call(f[[3]])  f[[3]][[1]]=='|'){
guard - f[[3]][[3]]
f[[3]] - f[[3]][[2]]
  } else {
guard - TRUE
  }
  ## To allow omission of braces around a lone comprehension generator,
  ## as in 'expr ~ var - seq' we make allowances for two shapes of f:
  ##
  ## (1)(`-` (`~` expr
  ##   var)
  ##  seq)
  ## and
  ##
  ## (2)(`~` expr
  ## (`{` (`-` var1 seq1)
  ##  (`-` var2 seq2)
  ##  ...
  ##  (`-` varN - seqN)))
  ##
  ## In the former case, we set gens - list(var - seq), unifying the
  ## treatment of both shapes under the latter, more general one.
  syntax.error - Comprehension expects 'expr ~ {x1 - seq1; ... ; xN 
- seqN}'.
  if(!is.call(f) || (f[[1]]!='-'  f[[1]]!='~'))
stop(syntax.error)
  if(is(f,'-')){ # (1)
lhs - f[[2]]
if(!is.call(lhs) || lhs[[1]] != '~')
  stop(syntax.error)
expr - lhs[[2]]
var - as.character(lhs[[3]])
seq - f[[3]]
gens - list(call('-', var, seq))
  } else { # (2)
expr - f[[2]]
gens - as.list(f[[3]])[-1]
if(any(lapply(gens, class) != '-'))
  stop(syntax.error)
  }
  ## Fill list comprehension .LC
  vars - as.character(lapply(gens, function(g) g[[2]]))
  seqs - lapply(gens, function(g) g[[3]])
  .LC - comprehend(expr, vars, seqs, guard)
  ## Provided the result is rectangular, convert it to a vector or array
  ## TODO: Extend to handle .LC structures more than 2-deep.
  ## TODO: Avoid rectangularizing nested comprehensions along guarded 
dimensions?
  if(!length(.LC))
return(.LC)
  dim1 - dim(.LC[[1]])
  if(is.null(dim1)){
lengths - sapply(.LC, length)
if(all(lengths == lengths[1])){ # rectangular
  .LC - unlist(.LC)
  if(lengths[1]  1) # matrix
dim(.LC) - c(lengths[1], length(lengths))
} else { # ragged
  # leave .LC as a list
}
  } else { # elements of .LC have dimension
dim - c(dim1, length(.LC))
.LC - unlist(.LC)
dim(.LC) - dim
  }
  .LC
}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-15 Thread Martin Maechler

 TP == Tony Plate [EMAIL PROTECTED]
 on Fri, 14 Dec 2007 13:58:30 -0700 writes:

TP Duncan Murdoch wrote:
 On 12/13/2007 1:59 PM, Tony Plate wrote:
 Duncan Murdoch wrote:
 On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)
 
 
 Several times I have experienced that a length of a POSIXt vector 
 has not been
 computed right.
 
 Example:
 
 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), 
 mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon 
 = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = 
 c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, min, 
 hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))
 
 print(tv)
 # print 11 time points (right)
 
 length(tv)
 # returns 9 (wrong)
 
 tv is a list of length 9.  The answer is right, your expectation is 
 wrong.
 I have tried that on several computers with/without switching to 
 English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a 
 help pages but I
 cannot imagine how that could be OK.
 
 See this in ?POSIXt:
 
 Class 'POSIXlt' is a named list of vectors...
 
 You could define your own length measurement as
 
 length.POSIXlt - function(x) length(x$sec)
 
 and you'll get the answer you expect, but be aware that length.XXX 
 methods are quite rare, and you may surprise some of your users.
 
 
 On the other hand, isn't the fact that length() currently always 
 returns 9 for POSIXlt objects likely to be a surprise to many users 
 of POSIXlt?
 
 The back of The New S Language says Easy-to-use facilities allow 
 you to organize, store and retrieve all sorts of data. ... S 
 functions and data organization make applications easy to write.
 
 Now, POSIXlt has methods for c() and vector subsetting [ (and many 
 other vector-manipulation methods - see methods(class=POSIXlt)).  
 Hence, from the point of view of intending to supply easy-to-use 
 facilities ... [for] all sorts of data, isn't it a little 
 incongruous that length() is not also provided -- as 3 functions (any 
 others?) comprise a core set of vector-manipulation functions?
 
 Would it make sense to have an informal prescription (e.g., in 
 R-exts) that a class that implements a vector-like object and 
 provides at least of one of functions 'c', '[' and 'length' should 
 provide all three?  It would also be easy to describe a test-suite 
 that should be included in the 'test' directory of a package 
 implementing such a class, that had some tests of the basic 
 vector-manipulation functionality, such as:
 
  # at this point, x0, x1, x3,  x10 should exist, as vectors of the
  # class being tested, of length 0, 1, 3, and 10, and they should
  # contain no duplicate elements
  length(x0)
 [1] 1
  length(c(x0, x1))
 [1] 2
  length(c(x1,x10))
 [1] 11
  all(x3 == x3[seq(len=length(x3))])
 [1] TRUE
  all(x3 == c(x3[1], x3[2], x3[3]))
 [1] TRUE
  length(c(x3[2], x10[5:7]))
 [1] 4
 
 
 It would also be possible to describe a larger set of vector 
 manipulation functions that should be implemented together, including 
 e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[-', 'is.na', 
 head, tail ... (many of which are provided for POSIXlt).
 
 Or is there some good reason that length() cannot be provided (while 
 'c' and '[' can) for some vector-like classes such as POSIXlt?
 
 What you say sounds good in general, but the devil is in the details. 
 Changing the meaning of length(x) for some objects has fairly 
 widespread effects.  Are they all positive?  I don't know.
 
 Adding a prescription like the one you suggest would be good if it's 
 easy to implement, but bad if it's already widely violated.  How many 
 base or CRAN or Bioconductor packages violate it currently?   Do the 
 ones that provide all 3 methods do so in a consistent way, i.e. does 
 length(x) mean the same thing in all of them?
TP I'm not sure doing something like this would be so bad even if it is 
TP already widely violated.  R has evolved significantly over time, and 
TP many rough edges have been cleaned up, sometimes in ways that were not 
TP backward compatible.

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-15 Thread Gabor Grothendieck

If it were simply deprecated and then changed then
everyone using it would get a warning during the period
of deprecation so it would
not be so bad.  Given that its current behavior is
not very useful I suspect its not widely used anyways.
| haven't followed the whole discussion so sorry if these
points have already been made.

On Dec 15, 2007 5:17 PM, Martin Maechler [EMAIL PROTECTED] wrote:
  TP == Tony Plate [EMAIL PROTECTED]
  on Fri, 14 Dec 2007 13:58:30 -0700 writes:


TP Duncan Murdoch wrote:
 On 12/13/2007 1:59 PM, Tony Plate wrote:
 Duncan Murdoch wrote:
 On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)


 Several times I have experienced that a length of a POSIXt vector
 has not been
 computed right.

 Example:

 tv-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L),
 mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon
 = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L,
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday =
 c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L,
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L,
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c(sec, min,
 hour, mday, mon, year, wday, yday, isdst
 ), class = c(POSIXt, POSIXlt))

 print(tv)
 # print 11 time points (right)

 length(tv)
 # returns 9 (wrong)

 tv is a list of length 9.  The answer is right, your expectation is
 wrong.
 I have tried that on several computers with/without switching to
 English
 locales, i.e. Sys.setlocale(LC_TIME, en). I have searched a
 help pages but I
 cannot imagine how that could be OK.

 See this in ?POSIXt:

 Class 'POSIXlt' is a named list of vectors...

 You could define your own length measurement as

 length.POSIXlt - function(x) length(x$sec)

 and you'll get the answer you expect, but be aware that length.XXX
 methods are quite rare, and you may surprise some of your users.


 On the other hand, isn't the fact that length() currently always
 returns 9 for POSIXlt objects likely to be a surprise to many users
 of POSIXlt?

 The back of The New S Language says Easy-to-use facilities allow
 you to organize, store and retrieve all sorts of data. ... S
 functions and data organization make applications easy to write.

 Now, POSIXlt has methods for c() and vector subsetting [ (and many
 other vector-manipulation methods - see methods(class=POSIXlt)).
 Hence, from the point of view of intending to supply easy-to-use
 facilities ... [for] all sorts of data, isn't it a little
 incongruous that length() is not also provided -- as 3 functions (any
 others?) comprise a core set of vector-manipulation functions?

 Would it make sense to have an informal prescription (e.g., in
 R-exts) that a class that implements a vector-like object and
 provides at least of one of functions 'c', '[' and 'length' should
 provide all three?  It would also be easy to describe a test-suite
 that should be included in the 'test' directory of a package
 implementing such a class, that had some tests of the basic
 vector-manipulation functionality, such as:

  # at this point, x0, x1, x3,  x10 should exist, as vectors of the
  # class being tested, of length 0, 1, 3, and 10, and they should
  # contain no duplicate elements
  length(x0)
 [1] 1
  length(c(x0, x1))
 [1] 2
  length(c(x1,x10))
 [1] 11
  all(x3 == x3[seq(len=length(x3))])
 [1] TRUE
  all(x3 == c(x3[1], x3[2], x3[3]))
 [1] TRUE
  length(c(x3[2], x10[5:7]))
 [1] 4
 

 It would also be possible to describe a larger set of vector
 manipulation functions that should be implemented together, including
 e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[-', 'is.na',
 head, tail ... (many of which are provided for POSIXlt).

 Or is there some good reason that length() cannot be provided (while
 'c' and '[' can) for some vector-like classes such as POSIXlt?

 What you say sounds good in general, but the devil is in the details.
 Changing the meaning of length(x) for some objects has fairly
 widespread effects.  Are they all positive?  I don't know.

 Adding a prescription like the one you suggest would be good if it's
 easy to implement, but bad if it's already widely violated.  How many
 base or CRAN or Bioconductor packages violate it currently?   Do the
 ones that provide all 3 methods do so in a

Re: [Rd] Improvement of SignRank functions

Re: [Rd] Improvement of SignRank functions

Re: [Rd] Improvement of SignRank functions

Re: [Rd] S4 class extending data.frame?

Re: [Rd] Improvement of SignRank functions

Re: [Rd] List comprehensions for R

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

8 matches

Site Navigation

Mail list logo

Footer information