Re: [R] Subseting a data.frame

William Dunlap Thu, 17 Oct 2013 14:42:20 -0700

seq_along(x), integer(length(x)), is.na(x), or anything that produces an integer
(or numeric or logical) vector the length of x would work.  I use integer() or 
numeric()
to indicate I'm not using its value: it is just a vector in which to place the
return values of FUN().


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: arun [mailto:smartpink...@yahoo.com]
> Sent: Thursday, October 17, 2013 2:33 PM
> To: R help
> Cc: William Dunlap; Bert Gunter
> Subject: Re: [R] Subseting a data.frame
> 
> Hi Bill,
> 
> #seq_along() worked in the cases you showed.
> 
>  ave(seq_along(fac),fac,FUN=length)
> #[1] 3 1 3 3
>   ave(seq_along(num), num, FUN=length)
> #[1] 3 1 3 3
>   ave(seq_along(char), char, FUN=length)
> #[1] 3 1 3 3
> 
> 
> 
> I thought, there might be some advantages in speed, but they were similar in 
> speed.
> set.seed(195)
>  num1 <- sample(1e3,1e7,replace=TRUE)
>  system.time(res1 <- ave(integer(length(num1)),num1,FUN=length))
>   # user  system elapsed
>   #4.148   0.228   4.382
> system.time(res2 <- ave(seq_along(num1),num1,FUN=length))
> #   user  system elapsed
>  # 3.944   0.228   4.181
> system.time(res3 <- ave(num1,num1,FUN=length))
> #   user  system elapsed
>  # 3.740   0.264   4.012
> identical(res1,res2)
> #[1] TRUE
>  identical(res2,res3)
> #[1] TRUE
> 
> 
> A.K.
> 
> 
> 
> 
> On Thursday, October 17, 2013 4:34 PM, William Dunlap <wdun...@tibco.com> 
> wrote:
>   May I ask why:
>     count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), 
> basel_asset_class,
> FUN=length))
>   should not be more simply done as:
>     count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, 
> FUN=length))
> 
> The way I did it would work if basel_asset_class were non-numeric.
> In ave(x, group, FUN=FUN), FUN's return value should be the same type as x (or
> you can get some odd type conversions).  E.g.,
> 
>    > num <- c(2,3,2,2) ;  char <- c("Two","Three","Two","Two")
>    > ave(num, num, FUN=length) # good
>    [1] 3 1 3 3
>    > ave(char, char, FUN=length) # bad
>    [1] "3" "1" "3" "3"
>    > fac <- factor(char, levels=c("One","Two","Three"))
>    > ave(fac, fac, FUN=length)
>    [1] <NA> <NA> <NA> <NA>
>    Levels: One Two Three
>    Warning messages:
>    1: In `[<-.factor`(`*tmp*`, i, value = 0L) :
>      invalid factor level, NA generated
>    2: In `[<-.factor`(`*tmp*`, i, value = 3L) :
>      invalid factor level, NA generated
>    3: In `[<-.factor`(`*tmp*`, i, value = 1L) :
>      invalid factor level, NA generated
> but x=integer(length(group)) works in all cases:
>    > ave(integer(length(fac)), fac, FUN=length)
>    [1] 3 1 3 3
>    > ave(integer(length(char)), char, FUN=length)
>       [1] 3 1 3 3
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> 
> From: Bert Gunter [mailto:gunter.ber...@gene.com]
> Sent: Thursday, October 17, 2013 1:06 PM
> To: William Dunlap
> Cc: Katherine Gobin; r-help@r-project.org
> Subject: Re: [R] Subseting a data.frame
> 
> May I ask why:
> 
> count_by_class <- with(dat, ave(numeric(length(basel_
> asset_class)), basel_asset_class, FUN=length))
> should not be more simply done as:
> 
> count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, 
> FUN=length))
> 
> ?
> -- Bert
> 
> On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap
> <wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote:
> > What I need is to select only those records for which there are more than 
> > two default
> > frequencies (defa_frequency),
> 
> Here is one way.  There are many others:
>    > dat <- data.frame( # slightly less trivial example
>         basel_asset_class=c(4,8,8,8,74,3,74),
>         defa_frequency=(1:7)/8)
>    > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)),
> basel_asset_class, FUN=length))
>    > cbind(dat, count_by_class) # see what we just computed
>      basel_asset_class defa_frequency count_by_class
>    1                 4          0.125              1
>    2                 8          0.250              3
>    3                 8          0.375              3
>    4                 8          0.500              3
>    5                74          0.625              2
>    6                 3          0.750              1
>    7                74          0.875              2
>    > mydat[count_by_class>1, ] # I think this is what you are asking for
>      basel_asset_class defa_frequency
>    2                 8          0.250
>    3                 8          0.375
>    4                 8          0.500
>    5                74          0.625
>    7                74          0.875
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com<http://tibco.com>
> 
> 
> > -----Original Message-----
> > From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> 
> > [mailto:r-
> help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On Behalf
> > Of Katherine Gobin
> > Sent: Thursday, October 17, 2013 11:05 AM
> > To: Bert Gunter
> > Cc: r-help@r-project.org<mailto:r-help@r-project.org>
> > Subject: Re: [R] Subseting a data.frame
> >
> > Correction. (2nd para first three lines)
> >
> > Pl read following line
> >
> > What I need is to select only those records for which there are more than 
> > two default
> > frequencies (defa_frequency), Thus, there is only one default frequency = 
> > 0.150 w.r.t
> > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel 
> > aseet class 4,
> >
> >
> > as
> >
> > What I need is to select only those records for which there are more than 
> > two default
> > frequencies (defa_frequency), Thus, there is only one default frequency = 
> > 0.150 w.r.t
> > basel_asset_class = 4 whereas there are THREE default frequencies w.r.t. 
> > basel aseet
> > class 8,
> >
> >
> >
> > I alpologize for the incovenience.
> >
> > Regards
> >
> > KAtherine
> >
> >
> >
> >
> >
> >
> >
> >
> > On , Katherine Gobin
> <katherine_go...@yahoo.com<mailto:katherine_go...@yahoo.com>> wrote:
> >
> >  I am sorry perhaps  was not able to put the question properly. I am not 
> >looking for the
> > subset of the data.frame where the basel_asset_class is > 2. I do agree 
> > that would have
> > been a basic requirement. Let me try to put the question again.
> >
> > I have a data frame as
> >
> > mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency = 
> > c(0.15, 0.07,
> 0.03,
> > 0.001))
> >
> > # Please note I have changed the basel_asset_class to 4 from 2, to avoid 
> > confusion.
> >
> > > mydat
> >   basel_asset_class defa_frequency
> > 1                 4          0.150
> > 2                 8          0.070
> > 3                 8          0.030
> > 4                 8          0.001
> >
> >
> >
> > This is just an representative example. In reality, I may have no of basel 
> > asset classes. 4,
> 8
> > etc are the IDs can be anything thus I cant hard code it as subset(mydat,
> > mydat$basel_asset_class > 2).
> >
> >
> > What I need is to select only those records for which there are more than 
> > two default
> > frequencies (defa_frequency), Thus, there is only one default frequency = 
> > 0.150 w.r.t
> > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel 
> > aseet class 4,
> > similarly there could be another basel asset class having say 5 default 
> > frequncies. Thus,
> I
> > need to take subset of the data.frame s.t. the no of corresponding 
> > defa_frequencies is
> > greater than 2.
> >
> > The idea is we try to fit exponential curve Y = A exp( BX ) for each of the 
> > basel asset
> > classes and to estimate values of A and B, mathematically one needs to have 
> > at least
> two
> > values of X.
> >
> > I hope I may be able to express my requirement. Its not that I need the 
> > subset of mydat
> > s.t. basel asset class is > 2 (now 4 in revised example), but sbuset s.t. 
> > no of default
> > frequencies is greater than or equal to 2. This 2 is not same as basel 
> > asset class 2.
> >
> > Kindly guide
> >
> > With warm regards
> >
> > Katherine Gobin
> >
> >
> >
> >
> > On Thursday, 17 October 2013 9:33 PM, Bert Gunter
> <gunter.ber...@gene.com<mailto:gunter.ber...@gene.com>> wrote:
> >
> > "Kindly guide" ...
> >
> > This is a very basic question, so the kindest guide I can give is to read 
> > an Introduction to
> R
> > (ships with R) or a R web tutorial of your choice so that you can learn how 
> > R works
> > instead of posting to this list.
> >
> > Cheers,
> > Bert
> >
> >
> >
> >
> > On Wed, Oct 16, 2013 at 11:55 PM, Katherine Gobin
> <katherine_go...@yahoo.com<mailto:katherine_go...@yahoo.com>>
> > wrote:
> >
> > Dear Forum,
> > >
> > >I have a data frame as
> > >
> > >mydat = data.frame(basel_asset_class = c(2, 8, 8 ,8), defa_frequency = 
> > >c(0.15, 0.07,
> > 0.03, 0.001))
> > >
> > >> mydat
> > >  basel_asset_class defa_frequency
> > >1                 2          0.150
> > >2                 8          0.070
> > >3                 8          0.030
> > >4                 8          0.001
> > >
> > >
> > >I need to get the subset of this data.frame where no of records for the 
> > >given
> > basel_asset_class is > 2, i.e. I need to obtain subset of above data.frame 
> > as (since there
> > is only 1 record, against basel_asset_class = 2, I want to filter it)
> > >
> > >> mydat_a
> > >  basel_asset_class defa_frequency
> > >1                 8          0.070
> > >2                 8          0.030
> > >3                 8          0.001
> > >
> > >Kindly guide
> > >
> > >Katherine
> > >        [[alternative HTML version deleted]]
> > >
> > >
> > >______________________________________________
> > >R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide 
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > (650) 467-7374<tel:%28650%29%20467-7374>
> >       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org<mailto:R-help@r-project.org> mailing list
> 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> --
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> (650) 467-7374
> 
> 
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subseting a data.frame

Reply via email to