Re: [Rd] S4 class extending data.frame?

2007-12-15 Thread Ben Bolker


  Thanks, Martin.  In the short term (a) seems best.  In the long
run we may try (c), because there are other things that data.frame
doesn't do that we want it to do (i.e., allow arbitrary objects with
[ methods, print methods, and the same length to be bound together,
rather than being restricted to atomic vectors + Date/factor).

  cheers
Ben




Martin Morgan wrote:
 
 Ben, Oleg --
 
 Some solutions, which you've probably already thought of, are (a) move
 the data.frame into its own slot, instead of extending it, (b) manage
 the data.frame attributes yourself, or (c) reinvent the data.frame
 from scratch as a proper S4 class (e.g., extending 'list' with
 validity constraints on element length and homogeneity of element
 content).
 
 (b) places a lot of dependence on understanding the data.frame
 implementation, and is probably too tricky (for me) to get right,(c)
 is probably also tricky, and probably caries significant performance
 overhead (e.g., object duplication during validity checking).
 
 (a) means that you don't get automatic method inheritance. On the plus
 side, you still get the structure. It is trivial to implement methods
 like [, [[, etc to dispatch on your object and act on the appropriate
 slot. And in some sense you now know what methods i.e., those you've
 implemented, are supported on your object.
 
 Oleg, here's my cautionary tale for extending list, where manually
 subsetting the .Data slot mixes up the names (callNextMethod would
 have done the right thing, but was not appropriate). This was quite a
 subtle bug for me, because I hadn't been expecting named lists in my
 object; the problem surfaced when sapply used the (incorrectly subset)
 names attribute of the list. My solution in this case was to make sure
 'names' were removed from lists used to construct objects. As a
 consequence I lose a nice little bit of sapply magic.
 
 setClass('A', 'list')
 [1] A
 setMethod('[', 'A', function(x, i, j, ..., drop=TRUE) {
 + [EMAIL PROTECTED] - [EMAIL PROTECTED]
 + x
 + })
 [1] [
 names(new('A', list(x=1, y=2))[2])
 [1] x
 
 Martin
 
 Oleg Sklyar [EMAIL PROTECTED] writes:
 
 I had the same problem. Generally data.frame's behave like lists, but
 while you can extend list, there are problems extending a data.frame
 class. This comes down to the internal representation of the object I
 guess. Vectors, including list, contain their information in a (hidden)
 slot .Data (see the example below). data.frame's do not seem to follow
 this convention.

 Any idea how to go around?

 The following example is exactly the same as Ben's for a data.frame, but
 using a list. It works fine and one can see that the list structure is
 stored in .Data

 * ~: R
 R version 2.6.1 (2007-11-26) 
 setClass(c3,representation(comment=character),contains=list)
 [1] c3
 l = list(1:3,2:4)
 z3 = new(c3,l,comment=hello)
 z3
 An object of class “c3”
 [[1]]
 [1] 1 2 3

 [[2]]
 [1] 2 3 4

 Slot comment:
 [1] hello

 [EMAIL PROTECTED]
 [[1]]
 [1] 1 2 3

 [[2]]
 [1] 2 3 4

 Regards,
 Oleg

 On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 I would like to build an S4 class that extends
 a data frame, but includes several more slots.
 
 Here's an example using integer as the base
 class instead:
 
 setClass(c1,representation(comment=character),contains=integer)
 z1 = new(c1,55,comment=hello)
 z1
 z1+10
 z1[1]
 [EMAIL PROTECTED]
 
  -- in other words, it behaves exactly as an integer
 for access and operations but happens to have another slot.
 
  If I do this with a data frame instead, it doesn't seem to work
 at all.
 
 setClass(c2,representation(comment=character),contains=data.frame)
 d = data.frame(1:3,2:4)
 z2 = new(c2,d,comment=goodbye)
 z2  ## data all gone!!
 z2[,1]  ## Error ... object is not subsettable
 [EMAIL PROTECTED]  ## still there
 
   I can achieve approximately the same effect by
 adding attributes, but I was hoping for the structure
 of S4 classes ...
 
   Programming with Data and the R Language Definition
 contain 2 references each to data frames, and neither of
 them has allowed me to figure out this behavior.
 
  (While I'm at it: it would be wonderful to have
 a rich data frame that could include as a column
 any object that had an appropriate length and
 [ method ... has anyone done anything in this direction?
 ?data.frame says the allowable types are
  (numeric, logical, factor and character and so on),
  but I'm having trouble sorting out what the limitations
 are ...)
 
   hoping for enlightenment (it would be lovely to be
 shown how to make this work, but a definitive statement
 that it is impossible would be useful too).
 
   cheers
 Ben Bolker
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFHYL1pc5UpGjwzenMRAqErAJ9jj1KgVVSGIf+DtK7Km/+JBaDu2QCaAkl/
 eMi+WCEWK6FPpVMpUbo+RBQ=
 =huvz
 -END PGP SIGNATURE-
 
 

Re: [Rd] S4 class extending data.frame?

2007-12-13 Thread Oleg Sklyar
Thanks for your comments. I cannot recall now when I had the situation
that I wanted to inherit from a data.frame, but the fact was that I
could not set the data. So now it just popped up and I thought it was
indeed unfortunate that data.frame structure did not follow the same
principles as other standard classes do.

Regarding named lists, modifying .Data directly may play a bad joke
until one clearly thinks about all aspects of the object. I had a
similar situation as well and after that am very careful about such
things (well, I had it in C when creating an object with names
attribute). The thing is: names is and independent attribute, so there
is a potential possibility to set .Data at different length from names
etc when working directly. Thanks for pointing this out anyway.

Regards,
Oleg


On Thu, 2007-12-13 at 07:01 -0800, Martin Morgan wrote:
 Ben, Oleg --
 
 Some solutions, which you've probably already thought of, are (a) move
 the data.frame into its own slot, instead of extending it, (b) manage
 the data.frame attributes yourself, or (c) reinvent the data.frame
 from scratch as a proper S4 class (e.g., extending 'list' with
 validity constraints on element length and homogeneity of element
 content).
 
 (b) places a lot of dependence on understanding the data.frame
 implementation, and is probably too tricky (for me) to get right,(c)
 is probably also tricky, and probably caries significant performance
 overhead (e.g., object duplication during validity checking).
 
 (a) means that you don't get automatic method inheritance. On the plus
 side, you still get the structure. It is trivial to implement methods
 like [, [[, etc to dispatch on your object and act on the appropriate
 slot. And in some sense you now know what methods i.e., those you've
 implemented, are supported on your object.
 
 Oleg, here's my cautionary tale for extending list, where manually
 subsetting the .Data slot mixes up the names (callNextMethod would
 have done the right thing, but was not appropriate). This was quite a
 subtle bug for me, because I hadn't been expecting named lists in my
 object; the problem surfaced when sapply used the (incorrectly subset)
 names attribute of the list. My solution in this case was to make sure
 'names' were removed from lists used to construct objects. As a
 consequence I lose a nice little bit of sapply magic.
 
  setClass('A', 'list')
 [1] A
  setMethod('[', 'A', function(x, i, j, ..., drop=TRUE) {
 + [EMAIL PROTECTED] - [EMAIL PROTECTED]
 + x
 + })
 [1] [
  names(new('A', list(x=1, y=2))[2])
 [1] x
 
 Martin
 
 Oleg Sklyar [EMAIL PROTECTED] writes:
 
  I had the same problem. Generally data.frame's behave like lists, but
  while you can extend list, there are problems extending a data.frame
  class. This comes down to the internal representation of the object I
  guess. Vectors, including list, contain their information in a (hidden)
  slot .Data (see the example below). data.frame's do not seem to follow
  this convention.
 
  Any idea how to go around?
 
  The following example is exactly the same as Ben's for a data.frame, but
  using a list. It works fine and one can see that the list structure is
  stored in .Data
 
  * ~: R
  R version 2.6.1 (2007-11-26) 
  setClass(c3,representation(comment=character),contains=list)
  [1] c3
  l = list(1:3,2:4)
  z3 = new(c3,l,comment=hello)
  z3
  An object of class “c3”
  [[1]]
  [1] 1 2 3
 
  [[2]]
  [1] 2 3 4
 
  Slot comment:
  [1] hello
 
  [EMAIL PROTECTED]
  [[1]]
  [1] 1 2 3
 
  [[2]]
  [1] 2 3 4
 
  Regards,
  Oleg
 
  On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
  
  I would like to build an S4 class that extends
  a data frame, but includes several more slots.
  
  Here's an example using integer as the base
  class instead:
  
  setClass(c1,representation(comment=character),contains=integer)
  z1 = new(c1,55,comment=hello)
  z1
  z1+10
  z1[1]
  [EMAIL PROTECTED]
  
   -- in other words, it behaves exactly as an integer
  for access and operations but happens to have another slot.
  
   If I do this with a data frame instead, it doesn't seem to work
  at all.
  
  setClass(c2,representation(comment=character),contains=data.frame)
  d = data.frame(1:3,2:4)
  z2 = new(c2,d,comment=goodbye)
  z2  ## data all gone!!
  z2[,1]  ## Error ... object is not subsettable
  [EMAIL PROTECTED]  ## still there
  
I can achieve approximately the same effect by
  adding attributes, but I was hoping for the structure
  of S4 classes ...
  
Programming with Data and the R Language Definition
  contain 2 references each to data frames, and neither of
  them has allowed me to figure out this behavior.
  
   (While I'm at it: it would be wonderful to have
  a rich data frame that could include as a column
  any object that had an appropriate length and
  [ method ... has anyone done anything in this direction?
  ?data.frame says the allowable types are
   (numeric, logical, 

Re: [Rd] S4 class extending data.frame?

2007-12-13 Thread Martin Morgan
Ben, Oleg --

Some solutions, which you've probably already thought of, are (a) move
the data.frame into its own slot, instead of extending it, (b) manage
the data.frame attributes yourself, or (c) reinvent the data.frame
from scratch as a proper S4 class (e.g., extending 'list' with
validity constraints on element length and homogeneity of element
content).

(b) places a lot of dependence on understanding the data.frame
implementation, and is probably too tricky (for me) to get right,(c)
is probably also tricky, and probably caries significant performance
overhead (e.g., object duplication during validity checking).

(a) means that you don't get automatic method inheritance. On the plus
side, you still get the structure. It is trivial to implement methods
like [, [[, etc to dispatch on your object and act on the appropriate
slot. And in some sense you now know what methods i.e., those you've
implemented, are supported on your object.

Oleg, here's my cautionary tale for extending list, where manually
subsetting the .Data slot mixes up the names (callNextMethod would
have done the right thing, but was not appropriate). This was quite a
subtle bug for me, because I hadn't been expecting named lists in my
object; the problem surfaced when sapply used the (incorrectly subset)
names attribute of the list. My solution in this case was to make sure
'names' were removed from lists used to construct objects. As a
consequence I lose a nice little bit of sapply magic.

 setClass('A', 'list')
[1] A
 setMethod('[', 'A', function(x, i, j, ..., drop=TRUE) {
+ [EMAIL PROTECTED] - [EMAIL PROTECTED]
+ x
+ })
[1] [
 names(new('A', list(x=1, y=2))[2])
[1] x

Martin

Oleg Sklyar [EMAIL PROTECTED] writes:

 I had the same problem. Generally data.frame's behave like lists, but
 while you can extend list, there are problems extending a data.frame
 class. This comes down to the internal representation of the object I
 guess. Vectors, including list, contain their information in a (hidden)
 slot .Data (see the example below). data.frame's do not seem to follow
 this convention.

 Any idea how to go around?

 The following example is exactly the same as Ben's for a data.frame, but
 using a list. It works fine and one can see that the list structure is
 stored in .Data

 * ~: R
 R version 2.6.1 (2007-11-26) 
 setClass(c3,representation(comment=character),contains=list)
 [1] c3
 l = list(1:3,2:4)
 z3 = new(c3,l,comment=hello)
 z3
 An object of class “c3”
 [[1]]
 [1] 1 2 3

 [[2]]
 [1] 2 3 4

 Slot comment:
 [1] hello

 [EMAIL PROTECTED]
 [[1]]
 [1] 1 2 3

 [[2]]
 [1] 2 3 4

 Regards,
 Oleg

 On Thu, 2007-12-13 at 00:04 -0500, Ben Bolker wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 I would like to build an S4 class that extends
 a data frame, but includes several more slots.
 
 Here's an example using integer as the base
 class instead:
 
 setClass(c1,representation(comment=character),contains=integer)
 z1 = new(c1,55,comment=hello)
 z1
 z1+10
 z1[1]
 [EMAIL PROTECTED]
 
  -- in other words, it behaves exactly as an integer
 for access and operations but happens to have another slot.
 
  If I do this with a data frame instead, it doesn't seem to work
 at all.
 
 setClass(c2,representation(comment=character),contains=data.frame)
 d = data.frame(1:3,2:4)
 z2 = new(c2,d,comment=goodbye)
 z2  ## data all gone!!
 z2[,1]  ## Error ... object is not subsettable
 [EMAIL PROTECTED]  ## still there
 
   I can achieve approximately the same effect by
 adding attributes, but I was hoping for the structure
 of S4 classes ...
 
   Programming with Data and the R Language Definition
 contain 2 references each to data frames, and neither of
 them has allowed me to figure out this behavior.
 
  (While I'm at it: it would be wonderful to have
 a rich data frame that could include as a column
 any object that had an appropriate length and
 [ method ... has anyone done anything in this direction?
 ?data.frame says the allowable types are
  (numeric, logical, factor and character and so on),
  but I'm having trouble sorting out what the limitations
 are ...)
 
   hoping for enlightenment (it would be lovely to be
 shown how to make this work, but a definitive statement
 that it is impossible would be useful too).
 
   cheers
 Ben Bolker
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFHYL1pc5UpGjwzenMRAqErAJ9jj1KgVVSGIf+DtK7Km/+JBaDu2QCaAkl/
 eMi+WCEWK6FPpVMpUbo+RBQ=
 =huvz
 -END PGP SIGNATURE-
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 -- 
 Dr Oleg Sklyar * EBI-EMBL, Cambridge CB10 1SD, UK * +44-1223-494466

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.

[Rd] S4 class extending data.frame?

2007-12-12 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I would like to build an S4 class that extends
a data frame, but includes several more slots.

Here's an example using integer as the base
class instead:

setClass(c1,representation(comment=character),contains=integer)
z1 = new(c1,55,comment=hello)
z1
z1+10
z1[1]
[EMAIL PROTECTED]

 -- in other words, it behaves exactly as an integer
for access and operations but happens to have another slot.

 If I do this with a data frame instead, it doesn't seem to work
at all.

setClass(c2,representation(comment=character),contains=data.frame)
d = data.frame(1:3,2:4)
z2 = new(c2,d,comment=goodbye)
z2  ## data all gone!!
z2[,1]  ## Error ... object is not subsettable
[EMAIL PROTECTED]  ## still there

  I can achieve approximately the same effect by
adding attributes, but I was hoping for the structure
of S4 classes ...

  Programming with Data and the R Language Definition
contain 2 references each to data frames, and neither of
them has allowed me to figure out this behavior.

 (While I'm at it: it would be wonderful to have
a rich data frame that could include as a column
any object that had an appropriate length and
[ method ... has anyone done anything in this direction?
?data.frame says the allowable types are
 (numeric, logical, factor and character and so on),
 but I'm having trouble sorting out what the limitations
are ...)

  hoping for enlightenment (it would be lovely to be
shown how to make this work, but a definitive statement
that it is impossible would be useful too).

  cheers
Ben Bolker

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHYL1pc5UpGjwzenMRAqErAJ9jj1KgVVSGIf+DtK7Km/+JBaDu2QCaAkl/
eMi+WCEWK6FPpVMpUbo+RBQ=
=huvz
-END PGP SIGNATURE-

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel