Re: [Rd] S4 class extending data.frame?

2007-12-13 Thread Oleg Sklyar
I had the same problem. Generally data.frame's behave like lists, but
while you can extend list, there are problems extending a data.frame
class. This comes down to the internal representation of the object I
guess. Vectors, including list, contain their information in a (hidden)
slot .Data (see the example below). data.frame's do not seem to follow
this convention.

Any idea how to go around?

The following example is exactly the same as Ben's for a data.frame, but
using a list. It works fine and one can see that the list structure is
stored in .Data

* ~: R
R version 2.6.1 (2007-11-26) 
> setClass("c3",representation(comment="character"),contains="list")
[1] "c3"
> l = list(1:3,2:4)
> z3 = new("c3",l,comment="hello")
> z3
An object of class “c3”
[[1]]
[1] 1 2 3

[[2]]
[1] 2 3 4

Slot "comment":
[1] "hello"

> [EMAIL PROTECTED]
[[1]]
[1] 1 2 3

[[2]]
[1] 2 3 4

Regards,
Oleg

On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> I would like to build an S4 class that extends
> a data frame, but includes several more slots.
> 
> Here's an example using integer as the base
> class instead:
> 
> setClass("c1",representation(comment="character"),contains="integer")
> z1 = new("c1",55,comment="hello")
> z1
> z1+10
> z1[1]
> [EMAIL PROTECTED]
> 
>  -- in other words, it behaves exactly as an integer
> for access and operations but happens to have another slot.
> 
>  If I do this with a data frame instead, it doesn't seem to work
> at all.
> 
> setClass("c2",representation(comment="character"),contains="data.frame")
> d = data.frame(1:3,2:4)
> z2 = new("c2",d,comment="goodbye")
> z2  ## data all gone!!
> z2[,1]  ## Error ... object is not subsettable
> [EMAIL PROTECTED]  ## still there
> 
>   I can achieve approximately the same effect by
> adding attributes, but I was hoping for the structure
> of S4 classes ...
> 
>   Programming with Data and the R Language Definition
> contain 2 references each to data frames, and neither of
> them has allowed me to figure out this behavior.
> 
>  (While I'm at it: it would be wonderful to have
> a "rich data frame" that could include as a column
> any object that had an appropriate length and
> [ method ... has anyone done anything in this direction?
> ?data.frame says the allowable types are
>  "(numeric, logical, factor and character and so on)",
>  but I'm having trouble sorting out what the limitations
> are ...)
> 
>   hoping for enlightenment (it would be lovely to be
> shown how to make this work, but a definitive statement
> that it is impossible would be useful too).
> 
>   cheers
> Ben Bolker
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFHYL1pc5UpGjwzenMRAqErAJ9jj1KgVVSGIf+DtK7Km/+JBaDu2QCaAkl/
> eMi+WCEWK6FPpVMpUbo+RBQ=
> =huvz
> -END PGP SIGNATURE-
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Dr Oleg Sklyar * EBI-EMBL, Cambridge CB10 1SD, UK * +44-1223-494466

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Adding a survival object to a data frame (PR#10510)

2007-12-13 Thread Prof Brian Ripley
Apparently this was Surv from package Design.
So the bug is in contributed package Design, and nothing to do with 
R-bugs.

On Wed, 12 Dec 2007, Peter Dalgaard wrote:

> [EMAIL PROTECTED] wrote:
>> Your example is not reproducible without 'library(survival)'.
>> When I include that, I get
>>
>>
>>> head(D,20)
>>>
>> stime status  surv
>> 1176   TRUE  176
>> ...
>>
>> Objects of class "Surv" are from the contributed package survival, and you
>> need that attached to deal with them properly.
>>
>>
>>
> ... but even if you detach it, you do not get the symptoms shown:
>
>> head(D,5)
>   stime status surv.time surv.status
> 1176   TRUE   176   1
> 2 67   TRUE67   1
> 3432   TRUE   432   1
> 4 77   TRUE77   1
> 5275   TRUE   275   1
>
>
>> On Wed, 12 Dec 2007, [EMAIL PROTECTED] wrote:
>>
>>
>>> Full_Name: Edward McNeil
>>> Version: 2.6.1
>>> OS: Windows
>>> Submission from: (NULL) (203.170.234.5)
>>>
>>>
>>> I want to show students how the survival object looks like in R.
>>> Reproducible example:
>>>
>>> library(MASS)
>>> data(Aids2)
>>> attach(Aids2)
>>> status <- status=="D"
>>> stime <- death-diag
>>> surv <- Surv(stime, status)
>>> D <- data.frame(stime, status, surv)
>>> head(D,20)
>>>   stime status x..i..
>>> 1176   TRUE   176
>>> 2 67   TRUE67
>>> 3432   TRUE   432
>>> 4 77   TRUE77
>>> 5275   TRUE   275
>>> 6373   TRUE   373
>>> 7389   TRUE   389
>>> 8   1027   TRUE  1027
>>> 9492   TRUE   492
>>> 10   434   TRUE   434
>>> 1116   TRUE16
>>> 12   308   TRUE   308
>>> 1392   TRUE92
>>> 14   265   TRUE   265
>>> 15  1052  FALSE  1052+
>>> 16   132   TRUE   132
>>> 17   527   TRUE   527
>>> 18   581  FALSE   581+
>>> 19   511  FALSE   511+
>>> 20   151   TRUE   151
>>>
>>> detach(Aids2)
>>>
>>> The 'surv' column is strangely labelled 'x..i..'.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S4 class extending data.frame?

2007-12-13 Thread Oleg Sklyar
Thanks for your comments. I cannot recall now when I had the situation
that I wanted to inherit from a data.frame, but the fact was that I
could not set the data. So now it just popped up and I thought it was
indeed unfortunate that data.frame structure did not follow the same
principles as other "standard" classes do.

Regarding named lists, modifying .Data directly may play a bad joke
until one clearly thinks about all aspects of the object. I had a
similar situation as well and after that am very careful about such
things (well, I had it in C when creating an object with names
attribute). The thing is: names is and independent attribute, so there
is a potential possibility to set .Data at different length from names
etc when working directly. Thanks for pointing this out anyway.

Regards,
Oleg


On Thu, 2007-12-13 at 07:01 -0800, Martin Morgan wrote:
> Ben, Oleg --
> 
> Some solutions, which you've probably already thought of, are (a) move
> the data.frame into its own slot, instead of extending it, (b) manage
> the data.frame attributes yourself, or (c) reinvent the data.frame
> from scratch as a proper S4 class (e.g., extending 'list' with
> validity constraints on element length and homogeneity of element
> content).
> 
> (b) places a lot of dependence on understanding the data.frame
> implementation, and is probably too tricky (for me) to get right,(c)
> is probably also tricky, and probably caries significant performance
> overhead (e.g., object duplication during validity checking).
> 
> (a) means that you don't get automatic method inheritance. On the plus
> side, you still get the structure. It is trivial to implement methods
> like [, [[, etc to dispatch on your object and act on the appropriate
> slot. And in some sense you now know what methods i.e., those you've
> implemented, are supported on your object.
> 
> Oleg, here's my cautionary tale for extending list, where manually
> subsetting the .Data slot mixes up the names (callNextMethod would
> have done the right thing, but was not appropriate). This was quite a
> subtle bug for me, because I hadn't been expecting named lists in my
> object; the problem surfaced when sapply used the (incorrectly subset)
> names attribute of the list. My solution in this case was to make sure
> 'names' were removed from lists used to construct objects. As a
> consequence I lose a nice little bit of sapply magic.
> 
> > setClass('A', 'list')
> [1] "A"
> > setMethod('[', 'A', function(x, i, j, ..., drop=TRUE) {
> + [EMAIL PROTECTED] <- [EMAIL PROTECTED]
> + x
> + })
> [1] "["
> > names(new('A', list(x=1, y=2))[2])
> [1] "x"
> 
> Martin
> 
> Oleg Sklyar <[EMAIL PROTECTED]> writes:
> 
> > I had the same problem. Generally data.frame's behave like lists, but
> > while you can extend list, there are problems extending a data.frame
> > class. This comes down to the internal representation of the object I
> > guess. Vectors, including list, contain their information in a (hidden)
> > slot .Data (see the example below). data.frame's do not seem to follow
> > this convention.
> >
> > Any idea how to go around?
> >
> > The following example is exactly the same as Ben's for a data.frame, but
> > using a list. It works fine and one can see that the list structure is
> > stored in .Data
> >
> > * ~: R
> > R version 2.6.1 (2007-11-26) 
> >> setClass("c3",representation(comment="character"),contains="list")
> > [1] "c3"
> >> l = list(1:3,2:4)
> >> z3 = new("c3",l,comment="hello")
> >> z3
> > An object of class “c3”
> > [[1]]
> > [1] 1 2 3
> >
> > [[2]]
> > [1] 2 3 4
> >
> > Slot "comment":
> > [1] "hello"
> >
> >> [EMAIL PROTECTED]
> > [[1]]
> > [1] 1 2 3
> >
> > [[2]]
> > [1] 2 3 4
> >
> > Regards,
> > Oleg
> >
> > On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
> >> -BEGIN PGP SIGNED MESSAGE-
> >> Hash: SHA1
> >> 
> >> I would like to build an S4 class that extends
> >> a data frame, but includes several more slots.
> >> 
> >> Here's an example using integer as the base
> >> class instead:
> >> 
> >> setClass("c1",representation(comment="character"),contains="integer")
> >> z1 = new("c1",55,comment="hello")
> >> z1
> >> z1+10
> >> z1[1]
> >> [EMAIL PROTECTED]
> >> 
> >>  -- in other words, it behaves exactly as an integer
> >> for access and operations but happens to have another slot.
> >> 
> >>  If I do this with a data frame instead, it doesn't seem to work
> >> at all.
> >> 
> >> setClass("c2",representation(comment="character"),contains="data.frame")
> >> d = data.frame(1:3,2:4)
> >> z2 = new("c2",d,comment="goodbye")
> >> z2  ## data all gone!!
> >> z2[,1]  ## Error ... object is not subsettable
> >> [EMAIL PROTECTED]  ## still there
> >> 
> >>   I can achieve approximately the same effect by
> >> adding attributes, but I was hoping for the structure
> >> of S4 classes ...
> >> 
> >>   Programming with Data and the R Language Definition
> >> contain 2 references each to data frames, and neither of
> >> them has allowed me to figure out this be

Re: [Rd] S4 class extending data.frame?

2007-12-13 Thread Martin Morgan
Ben, Oleg --

Some solutions, which you've probably already thought of, are (a) move
the data.frame into its own slot, instead of extending it, (b) manage
the data.frame attributes yourself, or (c) reinvent the data.frame
from scratch as a proper S4 class (e.g., extending 'list' with
validity constraints on element length and homogeneity of element
content).

(b) places a lot of dependence on understanding the data.frame
implementation, and is probably too tricky (for me) to get right,(c)
is probably also tricky, and probably caries significant performance
overhead (e.g., object duplication during validity checking).

(a) means that you don't get automatic method inheritance. On the plus
side, you still get the structure. It is trivial to implement methods
like [, [[, etc to dispatch on your object and act on the appropriate
slot. And in some sense you now know what methods i.e., those you've
implemented, are supported on your object.

Oleg, here's my cautionary tale for extending list, where manually
subsetting the .Data slot mixes up the names (callNextMethod would
have done the right thing, but was not appropriate). This was quite a
subtle bug for me, because I hadn't been expecting named lists in my
object; the problem surfaced when sapply used the (incorrectly subset)
names attribute of the list. My solution in this case was to make sure
'names' were removed from lists used to construct objects. As a
consequence I lose a nice little bit of sapply magic.

> setClass('A', 'list')
[1] "A"
> setMethod('[', 'A', function(x, i, j, ..., drop=TRUE) {
+ [EMAIL PROTECTED] <- [EMAIL PROTECTED]
+ x
+ })
[1] "["
> names(new('A', list(x=1, y=2))[2])
[1] "x"

Martin

Oleg Sklyar <[EMAIL PROTECTED]> writes:

> I had the same problem. Generally data.frame's behave like lists, but
> while you can extend list, there are problems extending a data.frame
> class. This comes down to the internal representation of the object I
> guess. Vectors, including list, contain their information in a (hidden)
> slot .Data (see the example below). data.frame's do not seem to follow
> this convention.
>
> Any idea how to go around?
>
> The following example is exactly the same as Ben's for a data.frame, but
> using a list. It works fine and one can see that the list structure is
> stored in .Data
>
> * ~: R
> R version 2.6.1 (2007-11-26) 
>> setClass("c3",representation(comment="character"),contains="list")
> [1] "c3"
>> l = list(1:3,2:4)
>> z3 = new("c3",l,comment="hello")
>> z3
> An object of class “c3”
> [[1]]
> [1] 1 2 3
>
> [[2]]
> [1] 2 3 4
>
> Slot "comment":
> [1] "hello"
>
>> [EMAIL PROTECTED]
> [[1]]
> [1] 1 2 3
>
> [[2]]
> [1] 2 3 4
>
> Regards,
> Oleg
>
> On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>> 
>> I would like to build an S4 class that extends
>> a data frame, but includes several more slots.
>> 
>> Here's an example using integer as the base
>> class instead:
>> 
>> setClass("c1",representation(comment="character"),contains="integer")
>> z1 = new("c1",55,comment="hello")
>> z1
>> z1+10
>> z1[1]
>> [EMAIL PROTECTED]
>> 
>>  -- in other words, it behaves exactly as an integer
>> for access and operations but happens to have another slot.
>> 
>>  If I do this with a data frame instead, it doesn't seem to work
>> at all.
>> 
>> setClass("c2",representation(comment="character"),contains="data.frame")
>> d = data.frame(1:3,2:4)
>> z2 = new("c2",d,comment="goodbye")
>> z2  ## data all gone!!
>> z2[,1]  ## Error ... object is not subsettable
>> [EMAIL PROTECTED]  ## still there
>> 
>>   I can achieve approximately the same effect by
>> adding attributes, but I was hoping for the structure
>> of S4 classes ...
>> 
>>   Programming with Data and the R Language Definition
>> contain 2 references each to data frames, and neither of
>> them has allowed me to figure out this behavior.
>> 
>>  (While I'm at it: it would be wonderful to have
>> a "rich data frame" that could include as a column
>> any object that had an appropriate length and
>> [ method ... has anyone done anything in this direction?
>> ?data.frame says the allowable types are
>>  "(numeric, logical, factor and character and so on)",
>>  but I'm having trouble sorting out what the limitations
>> are ...)
>> 
>>   hoping for enlightenment (it would be lovely to be
>> shown how to make this work, but a definitive statement
>> that it is impossible would be useful too).
>> 
>>   cheers
>> Ben Bolker
>> 
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.6 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>> 
>> iD8DBQFHYL1pc5UpGjwzenMRAqErAJ9jj1KgVVSGIf+DtK7Km/+JBaDu2QCaAkl/
>> eMi+WCEWK6FPpVMpUbo+RBQ=
>> =huvz
>> -END PGP SIGNATURE-
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> -- 
> Dr Oleg Sklyar * EBI-EMBL, Cambridge CB10 1SD, UK * +44-1223-494466
>
> 

[Rd] creating lagged variables

2007-12-13 Thread Antonio, Fabio Di Narzo
Hi all.
I'm looking for robust ways of building lagged variables in a dataset
with multiple individuals.

Consider a dataset with variables like the following:
##
set.seed(123)
d <- data.frame(id = rep(1:2, each=3), time=rep(1:3, 2), value=rnorm(6))
##
>d
  id time   value
1  11 -0.56047565
2  12 -0.23017749
3  13  1.55870831
4  21  0.07050839
5  22  0.12928774
6  23  1.71506499

I want to compute the lagged variable 'value(t-1)', taking subject id
into account.
My current effort produced the following:
##
my_lag <- function(dt, varname, timevarname='time', lag=1) {
vname <- paste(varname, if(lag>0) '.' else '', lag, sep='')
timevar <- dt[[timevarname]]
dt[[vname]] <- dt[[varname]][match(timevar, timevar + lag)]
dt
}
lag_by <- function(dt, idvarname='id', ...)
  do.call(rbind, by(dt, dt[[idvarname]], my_lag, ...))
##
With the previous data I get:

> lag_by(d, varname='value')
id time   value value.1
1.1  11 -0.56047565  NA
1.2  12 -0.23017749 -0.56047565
1.3  13  1.55870831 -0.23017749
2.4  21  0.07050839  NA
2.5  22  0.12928774  0.07050839
2.6  23  1.71506499  0.12928774

So that seems working. However, I was thinking if there is a
smarter/cleaner/more robust way to do the job. For instance, with the
above function I get dataframe rows re-ordering as a side-effect
(anyway this is of no concern in my current analysis)...
Any suggestion?

All the bests,
Fabio.
-- 
Antonio, Fabio Di Narzo
Ph.D. student at
Department of Statistical Sciences
University of Bologna, Italy

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-13 Thread Tony Plate
Duncan Murdoch wrote:
> On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
>> Full_Name: Petr Simecek
>> Version: 2.5.1, 2.6.1
>> OS: Windows XP
>> Submission from: (NULL) (195.113.231.2)
>>
>>
>> Several times I have experienced that a length of a POSIXt vector has not 
>> been
>> computed right.
>>
>> Example:
>>
>> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
>> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
>> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 
>> 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 
>> 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
>> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 
>> 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
>> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
>> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", 
>> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>> ), class = c("POSIXt", "POSIXlt"))
>>
>> print(tv)
>> # print 11 time points (right)
>>
>> length(tv)
>> # returns 9 (wrong)
> 
> tv is a list of length 9.  The answer is right, your expectation is wrong.
>> I have tried that on several computers with/without switching to English
>> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages 
>> but I
>> cannot imagine how that could be OK.
> 
> See this in ?POSIXt:
> 
> Class '"POSIXlt"' is a named list of vectors...
> 
> You could define your own length measurement as
> 
> length.POSIXlt <- function(x) length(x$sec)
> 
> and you'll get the answer you expect, but be aware that length.XXX 
> methods are quite rare, and you may surprise some of your users.
> 

On the other hand, isn't the fact that length() currently always returns 9 
for POSIXlt objects likely to be a surprise to many users of POSIXlt?

The back of "The New S Language" says "Easy-to-use facilities allow you to 
organize, store and retrieve all sorts of data. ... S functions and data 
organization make applications easy to write."

Now, POSIXlt has methods for c() and vector subsetting "[" (and many other 
vector-manipulation methods - see methods(class="POSIXlt")).  Hence, from 
the point of view of intending to supply "easy-to-use facilities ... [for] 
all sorts of data", isn't it a little incongruous that length() is not also 
provided -- as 3 functions (any others?) comprise a core set of 
vector-manipulation functions?

Would it make sense to have an informal prescription (e.g., in R-exts) that 
a class that implements a vector-like object and provides at least of one 
of functions 'c', '[' and 'length' should provide all three?  It would also 
be easy to describe a test-suite that should be included in the 'test' 
directory of a package implementing such a class, that had some tests of 
the basic vector-manipulation functionality, such as:

 > # at this point, x0, x1, x3, & x10 should exist, as vectors of the
 > # class being tested, of length 0, 1, 3, and 10, and they should
 > # contain no duplicate elements
 > length(x0)
[1] 1
 > length(c(x0, x1))
[1] 2
 > length(c(x1,x10))
[1] 11
 > all(x3 == x3[seq(len=length(x3))])
[1] TRUE
 > all(x3 == c(x3[1], x3[2], x3[3]))
[1] TRUE
 > length(c(x3[2], x10[5:7]))
[1] 4
 >

It would also be possible to describe a larger set of vector manipulation 
functions that should be implemented together, including e.g., 'rep', 
'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', head, tail ... (many 
of which are provided for POSIXlt).

Or is there some good reason that length() cannot be provided (while 'c' 
and '[' can) for some vector-like classes such as "POSIXlt"?

-- Tony Plate

> Duncan Murdoch
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-13 Thread Duncan Murdoch
On 12/13/2007 1:59 PM, Tony Plate wrote:
> Duncan Murdoch wrote:
>> On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
>>> Full_Name: Petr Simecek
>>> Version: 2.5.1, 2.6.1
>>> OS: Windows XP
>>> Submission from: (NULL) (195.113.231.2)
>>>
>>>
>>> Several times I have experienced that a length of a POSIXt vector has not 
>>> been
>>> computed right.
>>>
>>> Example:
>>>
>>> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
>>> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
>>> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 
>>> 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 
>>> 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
>>> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 
>>> 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
>>> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
>>> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", 
>>> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>>> ), class = c("POSIXt", "POSIXlt"))
>>>
>>> print(tv)
>>> # print 11 time points (right)
>>>
>>> length(tv)
>>> # returns 9 (wrong)
>> 
>> tv is a list of length 9.  The answer is right, your expectation is wrong.
>>> I have tried that on several computers with/without switching to English
>>> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages 
>>> but I
>>> cannot imagine how that could be OK.
>> 
>> See this in ?POSIXt:
>> 
>> Class '"POSIXlt"' is a named list of vectors...
>> 
>> You could define your own length measurement as
>> 
>> length.POSIXlt <- function(x) length(x$sec)
>> 
>> and you'll get the answer you expect, but be aware that length.XXX 
>> methods are quite rare, and you may surprise some of your users.
>> 
> 
> On the other hand, isn't the fact that length() currently always returns 9 
> for POSIXlt objects likely to be a surprise to many users of POSIXlt?
> 
> The back of "The New S Language" says "Easy-to-use facilities allow you to 
> organize, store and retrieve all sorts of data. ... S functions and data 
> organization make applications easy to write."
> 
> Now, POSIXlt has methods for c() and vector subsetting "[" (and many other 
> vector-manipulation methods - see methods(class="POSIXlt")).  Hence, from 
> the point of view of intending to supply "easy-to-use facilities ... [for] 
> all sorts of data", isn't it a little incongruous that length() is not also 
> provided -- as 3 functions (any others?) comprise a core set of 
> vector-manipulation functions?
> 
> Would it make sense to have an informal prescription (e.g., in R-exts) that 
> a class that implements a vector-like object and provides at least of one 
> of functions 'c', '[' and 'length' should provide all three?  It would also 
> be easy to describe a test-suite that should be included in the 'test' 
> directory of a package implementing such a class, that had some tests of 
> the basic vector-manipulation functionality, such as:
> 
>  > # at this point, x0, x1, x3, & x10 should exist, as vectors of the
>  > # class being tested, of length 0, 1, 3, and 10, and they should
>  > # contain no duplicate elements
>  > length(x0)
> [1] 1
>  > length(c(x0, x1))
> [1] 2
>  > length(c(x1,x10))
> [1] 11
>  > all(x3 == x3[seq(len=length(x3))])
> [1] TRUE
>  > all(x3 == c(x3[1], x3[2], x3[3]))
> [1] TRUE
>  > length(c(x3[2], x10[5:7]))
> [1] 4
>  >
> 
> It would also be possible to describe a larger set of vector manipulation 
> functions that should be implemented together, including e.g., 'rep', 
> 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', head, tail ... (many 
> of which are provided for POSIXlt).
> 
> Or is there some good reason that length() cannot be provided (while 'c' 
> and '[' can) for some vector-like classes such as "POSIXlt"?

What you say sounds good in general, but the devil is in the details. 
Changing the meaning of length(x) for some objects has fairly widespread 
effects.  Are they all positive?  I don't know.

Adding a prescription like the one you suggest would be good if it's 
easy to implement, but bad if it's already widely violated.  How many 
base or CRAN or Bioconductor packages violate it currently?   Do the 
ones that provide all 3 methods do so in a consistent way, i.e. does 
"length(x)" mean the same thing in all of them?

I agree that the current state is less than perfect, but making it 
better would really be a lot of work.  I suspect there are better ways 
to spend my time, so I'm not going to volunteer to do it.  I'm not even 
going to invite someone else to do it, or offer to review your work if 
you volunteer.  I think this falls into the class of "next time we write 
a language, let's handle this better" problems.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] creating lagged variables

2007-12-13 Thread Gabor Grothendieck
The problem is the representation.

If we transform it into a zoo time series, z, with one
series per column and one time point per row then we
can just merge the series with its lag.

> DF <- data.frame(id = c(1, 1, 1, 2, 2, 2), time = c(1, 2,
+ 3, 1, 2, 3), value = c(-0.56047565, -0.23017749, 1.55870831,
+ 0.07050839, 0.12928774, 1.71506499))
>
> library(zoo)
> z <- do.call(merge, by(DF, DF$id, function(x) zoo(x$value, x$time)))
> merge(z, lag(z, -1))
 1.z2.z 1.lag(z, -1) 2.lag(z, -1)
1 -0.5604756 0.07050839   NA   NA
2 -0.2301775 0.12928774   -0.5604756   0.07050839
3  1.5587083 1.71506499   -0.2301775   0.12928774


On Dec 13, 2007 1:21 PM, Antonio, Fabio Di Narzo
<[EMAIL PROTECTED]> wrote:
> Hi all.
> I'm looking for robust ways of building lagged variables in a dataset
> with multiple individuals.
>
> Consider a dataset with variables like the following:
> ##
> set.seed(123)
> d <- data.frame(id = rep(1:2, each=3), time=rep(1:3, 2), value=rnorm(6))
> ##
> >d
>  id time   value
> 1  11 -0.56047565
> 2  12 -0.23017749
> 3  13  1.55870831
> 4  21  0.07050839
> 5  22  0.12928774
> 6  23  1.71506499
>
> I want to compute the lagged variable 'value(t-1)', taking subject id
> into account.
> My current effort produced the following:
> ##
> my_lag <- function(dt, varname, timevarname='time', lag=1) {
>vname <- paste(varname, if(lag>0) '.' else '', lag, sep='')
>timevar <- dt[[timevarname]]
>dt[[vname]] <- dt[[varname]][match(timevar, timevar + lag)]
>dt
> }
> lag_by <- function(dt, idvarname='id', ...)
>  do.call(rbind, by(dt, dt[[idvarname]], my_lag, ...))
> ##
> With the previous data I get:
>
> > lag_by(d, varname='value')
>id time   value value.1
> 1.1  11 -0.56047565  NA
> 1.2  12 -0.23017749 -0.56047565
> 1.3  13  1.55870831 -0.23017749
> 2.4  21  0.07050839  NA
> 2.5  22  0.12928774  0.07050839
> 2.6  23  1.71506499  0.12928774
>
> So that seems working. However, I was thinking if there is a
> smarter/cleaner/more robust way to do the job. For instance, with the
> above function I get dataframe rows re-ordering as a side-effect
> (anyway this is of no concern in my current analysis)...
> Any suggestion?
>
> All the bests,
> Fabio.
> --
> Antonio, Fabio Di Narzo
> Ph.D. student at
> Department of Statistical Sciences
> University of Bologna, Italy
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] End of whiskers of boxplots are repeated on PDF device (PR#10499)

2007-12-13 Thread Michael Toews
I've identified the problem for this issue, which is simple to fix. 
Please see and apply the attached patch. Thanks.

+mt

Index: src/library/graphics/R/boxplot.R
===
--- src/library/graphics/R/boxplot.R(revision 43677)
+++ src/library/graphics/R/boxplot.R(working copy)
@@ -141,8 +141,8 @@
xysegments(rep.int(x, 2), stats[c(1,5)],
   rep.int(x, 2), stats[c(2,4)],
   lty = whisklty[i], lwd = whisklwd[i], col = whiskcol[i])
-   xysegments(rep.int(xP(x, -wid * staplewex), 2), stats[c(1,5)],
-  rep.int(xP(x, +wid * staplewex), 2), stats[c(1,5)],
+   xysegments(rep.int(xP(x, -wid * staplewex[i]), 2), stats[c(1,5)],
+  rep.int(xP(x, +wid * staplewex[i]), 2), stats[c(1,5)],
   lty= staplelty[i], lwd= staplelwd[i], col= staplecol[i])
## finally the box borders
xypolygon(xx, yy, lty= boxlty[i], lwd= boxlwd[i], border= boxcol[i])
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel