Thanks so much for your help.  I am realizing that I may be
over-complicating things for myself.  I have learned a ton about creating
methods, but I feel like I am trying to reinvent the data.frame class.
Basically, I am trying to create a data.frame type object where I can
enforce the header names and column data types.  I am trying to force the
user to setup the following fields:

   - event_number (character)
   - agency (factor)
   - unit_num (factor)
   - alarm (POSIXct)
   - priority (factor)

A user might use the following code:

event_number = c(1:5)
agency = c("CFD", rep("ACFR", 3), "CFD")
unit_num = c("E1", "T10", "E3", "E2", "BC1")
temp =  c("00:52:35", "06:58:18", "13:42:18", "20:59:45", "21:19:00")
alarm = as.POSIXct(strptime(temp, format="%H:%M:%S"))
priority = c("A", "E", "A", "C", "C")
data = data.frame(event_number=event_number, agency=agency,
unit_number=unit_num, alarm=alarm, priority=priority)

I have all sorts of functions that I am trying to incorporate into a package
for analyzing fire department data, but keep having problems with small
deviations in data format causing errors.  In this example, the following
might cause issues in my functions:

   - "event_number" should be of type character
   - "agency", "unit_number", and "priority", should be of type factor
   - "unit_number" should actually have name "unit_num"

Ideally, I would be able to extend either the actual data.frame class or
something similar, so that users are forced to create correctly formatted
data.frames ... something that would create error messages until the user
uses the following code:

data = data.frame(event_number=as.character(event_number),
agency=as.factor(agency), unit_num=as.factor(unit_num), alarm=alarm,
priority=as.factor(priority))

After a user has created a correctly formatted object, the user may need to
manipulate the data prior to applying the analysis functions.  For instance,
a user might just want to analyze data for Engine #1 (unit_num == "E1).
Because of the need to manipulate data, I am trying to maintain all the same
functionality as a data frame ... subset(), head(), [i,j], et cetera.

Just wondering if you think creating a new S4 class is the way to go.  So
far I got the head(), tail(), and subset() methods working for my new S4
class, but the "[" seems like a pretty big undertaking.  Is there something
easier you might recommend?  Would it be possible to extend the data.frame
class to include some data verifications?  If so, do you have some basic
pointers for setting something like that up?

Really appreciate all your help thus far.  Hopefully, one last advice email
will do the trick.  Thanks.
--Markus



On Mon, Feb 8, 2010 at 6:43 PM, Martin Morgan <mtmor...@fhcrc.org> wrote:

> On 02/08/2010 02:54 PM, Markus Weisner wrote:
> > Thanks.  Used getGeneric("[") to figure out the general format for the
> > setMethod, but am having some problem with how to set up the actual
> > function:
> >
> >> getGeneric("[")
> > standardGeneric for "[" defined from package "base"
> >
> > function (x, i, j, ..., drop = TRUE)
> > standardGeneric("[", .Primitive("["))
> > <environment: 0x116513c30>
> > Methods may be defined for arguments: x, i, j, drop
> > Use  showMethods("[")  for currently available ones.
> >
> > Based on this, I set up the following code:
> >
> > setClass("A", representation(a="numeric", b="numeric"))
> > data = new("A", a=1:10, b=1:10)
> > setMethod("[", "A",
> >         function(x, i, j, ..., drop) {
> >             slotnames <- slotNames(x)[j]
> >             new_ = new("A")
> >             for(slot in slotnames) new_d...@slot = x...@slot[i]
> >             new_data
> >         })
> > data[5,c("a")]
>
> probably there are several issues and covering them in an email response
> won't do them justice.
>
> instead of new_d...@slot, use slot(new_data, slot)
>
> "[" dispatches on four arguments, and likely the cases need to be
> handled differently (e.g., data[,"a"] vs. data[,TRUE] vs data[1,]). So
> you'll end up with methods
>
> setMethod("[", c("A", "missing", "character", "ANY"), ...
> setMethod("[", c("A", "missing", "logical", "ANY"), ...
> setMethod("[", c("A", "numeric", "missing", "ANY"), ...
>
> plus others, or you'll write something like
>
> setMethod("[", c("A", "missing", "ANY", "ANY"),
>           function(x, i, j, ..., drop=TRUE)
> {
>   if (is.character(j))
>       j <- match(j, slotNames(x))
>   j <- slotNames(x)[j]
>   ...
> })
>
> ("ANY" is implicit, it's unlikely you'll ever dispatch on 'drop', so a
> signature for "[" often omits teh fourth signature element). You'll aim
> for re-use, so likely the methods are all wrappers around some simple
> function .subset_A(x, i, j, drop) where i, j are the types that'll work.
>
> x...@slot <- value and slot(x, slot) <- value make (at least) one copy of x
> each time they're invoked, so your code above is making multiple copies
> of the data. One strategy is not to define an  'initialize' method and
> gain the benefit of the default method as a kind of copy constructor,
> along the lines of
>
>   initialize(x, a=slot(x, "a")[j], b=slot(x, "b")[j])
>
> if the subset were to be of slots a and b.
>
> You said your objective was to write a kind of enhanced data.frame, so
> maybe a 'cheap' way to get the functionality you're after would be
>
>  setClass("A", representation=representation(data="data.frame"))
>  setMethod("[", c("A", "ANY", "ANY"),
>             function(x, i, j, ..., drop=TRUE)
>   {
>       initialize(x, data=slot(x, "data")[i, j, drop=drop])
>  })
>
> Likely you'd have to write methods for when one or both of i, j were
> 'missing', and probably enforce drop=FALSE (since when drop=TRUE you
> won't get a data.frame, and invalidate your class). You could use
> setValidity to insist that the data.frame met your constraints.
>
> > data = new("A", data=data.frame(a=1:10, b=10:1))
> > data[1:5, 2, drop=FALSE]
> An object of class "A"
> Slot "data":
>   b
> 1 10
> 2  9
> 3  8
> 4  7
> 5  6
>
> There are a number of S4 packages that might provide good (or
> otherwise!) examples, including Matrix, IRanges (which has a DataFrame
> class, capable of holding anything with length() and [ defined) -- maybe
> that's what you'd like to extend?) and Biobase (IRanges and Biobase are
> in Bioconductor).
>
> Martin
>
> >
> > The problem is that I cannot access S4 object slots using @ and a
> character
> > variable.  I also cannot access a slot using the typical brackets since
> that
> > is what I am trying to define here.  Kind of stuck.  Thanks for any
> advice
> > you might have.
> > Best,
> > Markus
> >
> > On Mon, Feb 8, 2010 at 4:54 PM, Martin Morgan <mtmor...@fhcrc.org>
> wrote:
> >
> >> On 02/08/2010 01:22 PM, Markus Weisner wrote:
> >>> Worked like a charm!!  Thank you so much.  I just plugged the following
> >> into
> >>> my code ...
> >>>
> >>> setMethod("$", "CADresponses", function(x, name) slot(x, name))
> >>>
> >>> ... and it worked perfect.  If you don't mind, I have a quick follow up
> >>> question, using your example
> >>>
> >>> setClass("A", representation(a="numeric", b="numeric"))
> >>> setMethod("$", "A", function(x, name) slot(x, name))
> >>> data = new("A", a=1:10, b=1:10)
> >>> data$a[5] #now works thanks to your code
> >>> data$a[5] <- 200 #assignments do not work -- any ideas?
> >>
> >> same idea, but for "$<-"
> >>
> >>> setClass("A", representation(a="numeric"))
> >> [1] "A"
> >>> getGeneric("$<-")
> >> standardGeneric for "$<-" defined from package "base"
> >>
> >> function (x, name, value)
> >> standardGeneric("$<-", .Primitive("$<-"))
> >> <environment: 0x14c33a8>
> >> Methods may be defined for arguments: x, value
> >> Use  showMethods("$<-")  for currently available ones.
> >>> setReplaceMethod("$", "A", function(x, name, value) {
> >> +     slot(x, name) <- value
> >> +     x
> >> + })
> >> [1] "$<-"
> >>> a <- new("A", a=1:10)
> >>> a$a <- 10:1
> >>> a
> >> An object of class "A"
> >> Slot "a":
> >>  [1] 10  9  8  7  6  5  4  3  2  1
> >>
> >>> data[5,c("a")] = 200 #would also like this to work -- any ideas?
> >>>
> >>> Do you have any suggestions for getting assignments and brackets to
> work
> >> as
> >>> they would for data frames?  Thanks so much for your help.
> >>
> >> same approach, but using getGeneric("[") and getGeneric("[<-") to guide
> >> you.
> >>
> >> Martin
> >>
> >>> Best,
> >>> Markus
> >>>
> >>>
> >>>
> >>> On Mon, Feb 8, 2010 at 2:44 PM, Martin Morgan <mtmor...@fhcrc.org>
> >> wrote:
> >>>
> >>>> On 02/07/2010 08:31 PM, Markus Weisner wrote:
> >>>>> I created some S4 objects that are essentially data frame objects.
>  The
> >>>> S4
> >>>>> object definitions were necessary to verify data integrity and force
> a
> >>>>> standardized data format.  I am, however, finding myself redefining
> all
> >>>> the
> >>>>> typical generic functions so that I can still manipulate my S4
> objects
> >> as
> >>>> if
> >>>>> they were data frames ... I have used setMethod to set methods for
> >>>> "subset",
> >>>>> "head", and "tail".  I would like to use setMethod or setGeneric to
> >>>> enable
> >>>>> me to use object$slotname to access obj...@slotname for my S4
> objects.
> >>>>  Any
> >>>>> advice is appreciated.  Thanks.
> >>>>
> >>>> Hi Markus --
> >>>>
> >>>>> setClass("A", representation(a="numeric"))
> >>>> [1] "A"
> >>>>> new("A")$a
> >>>> Error in new("A")$a : $ operator not defined for this S4 class
> >>>>> getGeneric("$")
> >>>> standardGeneric for "$" defined from package "base"
> >>>>
> >>>> function (x, name)
> >>>> standardGeneric("$", .Primitive("$"))
> >>>> <environment: 0xa62028>
> >>>> Methods may be defined for arguments: x
> >>>> Use  showMethods("$")  for currently available ones.
> >>>>> setMethod("$", "A", function(x, name) slot(x, name))
> >>>> [1] "$"
> >>>>> new("A", a=1:10)$a
> >>>>  [1]  1  2  3  4  5  6  7  8  9 10
> >>>>> new("A", a=1:10)$b
> >>>> Error in slot(x, name) : no slot of name "b" for this object of class
> >> "A"
> >>>>
> >>>> does that help?
> >>>>
> >>>> Martin
> >>>>
> >>>>> --Markus
> >>>>>
> >>>>>       [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help@r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>>
> >>>> --
> >>>> Martin Morgan
> >>>> Computational Biology / Fred Hutchinson Cancer Research Center
> >>>> 1100 Fairview Ave. N.
> >>>> PO Box 19024 Seattle, WA 98109
> >>>>
> >>>> Location: Arnold Building M1 B861
> >>>> Phone: (206) 667-2793
> >>>>
> >>>
> >>
> >>
> >> --
> >> Martin Morgan
> >> Computational Biology / Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N.
> >> PO Box 19024 Seattle, WA 98109
> >>
> >> Location: Arnold Building M1 B861
> >> Phone: (206) 667-2793
> >>
> >
>
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to