seq_along(x), integer(length(x)), is.na(x), or anything that produces an integer (or numeric or logical) vector the length of x would work. I use integer() or numeric() to indicate I'm not using its value: it is just a vector in which to place the return values of FUN().
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: arun [mailto:smartpink...@yahoo.com] > Sent: Thursday, October 17, 2013 2:33 PM > To: R help > Cc: William Dunlap; Bert Gunter > Subject: Re: [R] Subseting a data.frame > > Hi Bill, > > #seq_along() worked in the cases you showed. > > ave(seq_along(fac),fac,FUN=length) > #[1] 3 1 3 3 > ave(seq_along(num), num, FUN=length) > #[1] 3 1 3 3 > ave(seq_along(char), char, FUN=length) > #[1] 3 1 3 3 > > > > I thought, there might be some advantages in speed, but they were similar in > speed. > set.seed(195) > num1 <- sample(1e3,1e7,replace=TRUE) > system.time(res1 <- ave(integer(length(num1)),num1,FUN=length)) > # user system elapsed > #4.148 0.228 4.382 > system.time(res2 <- ave(seq_along(num1),num1,FUN=length)) > # user system elapsed > # 3.944 0.228 4.181 > system.time(res3 <- ave(num1,num1,FUN=length)) > # user system elapsed > # 3.740 0.264 4.012 > identical(res1,res2) > #[1] TRUE > identical(res2,res3) > #[1] TRUE > > > A.K. > > > > > On Thursday, October 17, 2013 4:34 PM, William Dunlap <wdun...@tibco.com> > wrote: > May I ask why: > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, > FUN=length)) > should not be more simply done as: > count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, > FUN=length)) > > The way I did it would work if basel_asset_class were non-numeric. > In ave(x, group, FUN=FUN), FUN's return value should be the same type as x (or > you can get some odd type conversions). E.g., > > > num <- c(2,3,2,2) ; char <- c("Two","Three","Two","Two") > > ave(num, num, FUN=length) # good > [1] 3 1 3 3 > > ave(char, char, FUN=length) # bad > [1] "3" "1" "3" "3" > > fac <- factor(char, levels=c("One","Two","Three")) > > ave(fac, fac, FUN=length) > [1] <NA> <NA> <NA> <NA> > Levels: One Two Three > Warning messages: > 1: In `[<-.factor`(`*tmp*`, i, value = 0L) : > invalid factor level, NA generated > 2: In `[<-.factor`(`*tmp*`, i, value = 3L) : > invalid factor level, NA generated > 3: In `[<-.factor`(`*tmp*`, i, value = 1L) : > invalid factor level, NA generated > but x=integer(length(group)) works in all cases: > > ave(integer(length(fac)), fac, FUN=length) > [1] 3 1 3 3 > > ave(integer(length(char)), char, FUN=length) > [1] 3 1 3 3 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Thursday, October 17, 2013 1:06 PM > To: William Dunlap > Cc: Katherine Gobin; r-help@r-project.org > Subject: Re: [R] Subseting a data.frame > > May I ask why: > > count_by_class <- with(dat, ave(numeric(length(basel_ > asset_class)), basel_asset_class, FUN=length)) > should not be more simply done as: > > count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, > FUN=length)) > > ? > -- Bert > > On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap > <wdun...@tibco.com<mailto:wdun...@tibco.com>> wrote: > > What I need is to select only those records for which there are more than > > two default > > frequencies (defa_frequency), > > Here is one way. There are many others: > > dat <- data.frame( # slightly less trivial example > basel_asset_class=c(4,8,8,8,74,3,74), > defa_frequency=(1:7)/8) > > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, FUN=length)) > > cbind(dat, count_by_class) # see what we just computed > basel_asset_class defa_frequency count_by_class > 1 4 0.125 1 > 2 8 0.250 3 > 3 8 0.375 3 > 4 8 0.500 3 > 5 74 0.625 2 > 6 3 0.750 1 > 7 74 0.875 2 > > mydat[count_by_class>1, ] # I think this is what you are asking for > basel_asset_class defa_frequency > 2 8 0.250 > 3 8 0.375 > 4 8 0.500 > 5 74 0.625 > 7 74 0.875 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com<http://tibco.com> > > > > -----Original Message----- > > From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> > > [mailto:r- > help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On Behalf > > Of Katherine Gobin > > Sent: Thursday, October 17, 2013 11:05 AM > > To: Bert Gunter > > Cc: r-help@r-project.org<mailto:r-help@r-project.org> > > Subject: Re: [R] Subseting a data.frame > > > > Correction. (2nd para first three lines) > > > > Pl read following line > > > > What I need is to select only those records for which there are more than > > two default > > frequencies (defa_frequency), Thus, there is only one default frequency = > > 0.150 w.r.t > > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel > > aseet class 4, > > > > > > as > > > > What I need is to select only those records for which there are more than > > two default > > frequencies (defa_frequency), Thus, there is only one default frequency = > > 0.150 w.r.t > > basel_asset_class = 4 whereas there are THREE default frequencies w.r.t. > > basel aseet > > class 8, > > > > > > > > I alpologize for the incovenience. > > > > Regards > > > > KAtherine > > > > > > > > > > > > > > > > > > On , Katherine Gobin > <katherine_go...@yahoo.com<mailto:katherine_go...@yahoo.com>> wrote: > > > > I am sorry perhaps was not able to put the question properly. I am not > >looking for the > > subset of the data.frame where the basel_asset_class is > 2. I do agree > > that would have > > been a basic requirement. Let me try to put the question again. > > > > I have a data frame as > > > > mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency = > > c(0.15, 0.07, > 0.03, > > 0.001)) > > > > # Please note I have changed the basel_asset_class to 4 from 2, to avoid > > confusion. > > > > > mydat > > basel_asset_class defa_frequency > > 1 4 0.150 > > 2 8 0.070 > > 3 8 0.030 > > 4 8 0.001 > > > > > > > > This is just an representative example. In reality, I may have no of basel > > asset classes. 4, > 8 > > etc are the IDs can be anything thus I cant hard code it as subset(mydat, > > mydat$basel_asset_class > 2). > > > > > > What I need is to select only those records for which there are more than > > two default > > frequencies (defa_frequency), Thus, there is only one default frequency = > > 0.150 w.r.t > > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel > > aseet class 4, > > similarly there could be another basel asset class having say 5 default > > frequncies. Thus, > I > > need to take subset of the data.frame s.t. the no of corresponding > > defa_frequencies is > > greater than 2. > > > > The idea is we try to fit exponential curve Y = A exp( BX ) for each of the > > basel asset > > classes and to estimate values of A and B, mathematically one needs to have > > at least > two > > values of X. > > > > I hope I may be able to express my requirement. Its not that I need the > > subset of mydat > > s.t. basel asset class is > 2 (now 4 in revised example), but sbuset s.t. > > no of default > > frequencies is greater than or equal to 2. This 2 is not same as basel > > asset class 2. > > > > Kindly guide > > > > With warm regards > > > > Katherine Gobin > > > > > > > > > > On Thursday, 17 October 2013 9:33 PM, Bert Gunter > <gunter.ber...@gene.com<mailto:gunter.ber...@gene.com>> wrote: > > > > "Kindly guide" ... > > > > This is a very basic question, so the kindest guide I can give is to read > > an Introduction to > R > > (ships with R) or a R web tutorial of your choice so that you can learn how > > R works > > instead of posting to this list. > > > > Cheers, > > Bert > > > > > > > > > > On Wed, Oct 16, 2013 at 11:55 PM, Katherine Gobin > <katherine_go...@yahoo.com<mailto:katherine_go...@yahoo.com>> > > wrote: > > > > Dear Forum, > > > > > >I have a data frame as > > > > > >mydat = data.frame(basel_asset_class = c(2, 8, 8 ,8), defa_frequency = > > >c(0.15, 0.07, > > 0.03, 0.001)) > > > > > >> mydat > > > basel_asset_class defa_frequency > > >1 2 0.150 > > >2 8 0.070 > > >3 8 0.030 > > >4 8 0.001 > > > > > > > > >I need to get the subset of this data.frame where no of records for the > > >given > > basel_asset_class is > 2, i.e. I need to obtain subset of above data.frame > > as (since there > > is only 1 record, against basel_asset_class = 2, I want to filter it) > > > > > >> mydat_a > > > basel_asset_class defa_frequency > > >1 8 0.070 > > >2 8 0.030 > > >3 8 0.001 > > > > > >Kindly guide > > > > > >Katherine > > > [[alternative HTML version deleted]] > > > > > > > > >______________________________________________ > > >R-help@r-project.org<mailto:R-help@r-project.org> mailing list > > >https://stat.ethz.ch/mailman/listinfo/r-help > > >PLEASE do read the posting guide > > >http://www.R-project.org/posting-guide.html > > >and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > -- > > > > Bert Gunter > > Genentech Nonclinical Biostatistics > > > > (650) 467-7374<tel:%28650%29%20467-7374> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org<mailto:R-help@r-project.org> mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > (650) 467-7374 > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.