Re: [R] Subseting a data.frame
Dear sir, Thanks a lot for your guidance. I have been benefited immensely by this discussion. Thanks again. Regards Katherine On Friday, 18 October 2013 2:50 AM, Bert Gunter wrote: Thanks, Bill. But ?ave specifically says: ave(x, ..., FUN = mean) Arguments: x A numeric. So that it should not be expected to work properly if the argument is not (coercible to) numeric. Nevertheless, defensive programming is always wise. Cheers, Bert On Thu, Oct 17, 2013 at 1:34 PM, William Dunlap wrote: > May I ask why: > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, FUN=length)) > > should not be more simply done as: > count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, > FUN=length)) > > The way I did it would work if basel_asset_class were non-numeric. > > In ave(x, group, FUN=FUN), FUN's return value should be the same type as x > (or > > you can get some odd type conversions). E.g., > > > > > num <- c(2,3,2,2) ; char <- c("Two","Three","Two","Two") > > > ave(num, num, FUN=length) # good > > [1] 3 1 3 3 > > > ave(char, char, FUN=length) # bad > > [1] "3" "1" "3" "3" > > > fac <- factor(char, levels=c("One","Two","Three")) > > > ave(fac, fac, FUN=length) > > [1] > > Levels: One Two Three > > Warning messages: > > 1: In `[<-.factor`(`*tmp*`, i, value = 0L) : > > invalid factor level, NA generated > > 2: In `[<-.factor`(`*tmp*`, i, value = 3L) : > > invalid factor level, NA generated > > 3: In `[<-.factor`(`*tmp*`, i, value = 1L) : > > invalid factor level, NA generated > > but x=integer(length(group)) works in all cases: > > > ave(integer(length(fac)), fac, FUN=length) > > [1] 3 1 3 3 > > > ave(integer(length(char)), char, FUN=length) > > [1] 3 1 3 3 > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Thursday, October 17, 2013 1:06 PM > To: William Dunlap > Cc: Katherine Gobin; r-help@r-project.org > Subject: Re: [R] Subseting a data.frame > > > > May I ask why: > > count_by_class <- with(dat, ave(numeric(length(basel_ > > asset_class)), basel_asset_class, FUN=length)) > > should not be more simply done as: > > count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, > FUN=length)) > > ? > > -- Bert > > > > On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap wrote: > >> What I need is to select only those records for which there are more than >> two default >> frequencies (defa_frequency), > > Here is one way. There are many others: > > dat <- data.frame( # slightly less trivial example > basel_asset_class=c(4,8,8,8,74,3,74), > defa_frequency=(1:7)/8) > > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, FUN=length)) > > cbind(dat, count_by_class) # see what we just computed > basel_asset_class defa_frequency count_by_class > 1 4 0.125 1 > 2 8 0.250 3 > 3 8 0.375 3 > 4 8 0.500 3 > 5 74 0.625 2 > 6 3 0.750 1 > 7 74 0.875 2 > > mydat[count_by_class>1, ] # I think this is what you are asking for > basel_asset_class defa_frequency > 2 8 0.250 > 3 8 0.375 > 4 8 0.500 > 5 74 0.625 > 7 74 0.875 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> On Behalf >> Of Katherine Gobin >> Sent: Thursday, October 17, 2013 11:05 AM >> To: Bert Gunter >> Cc: r-help@r-project.org >> Subject: Re: [R] Subseting a data.frame >> >> Correction. (2nd para first three lines) >> >> Pl read following line >> >> What I need is to select only those records for which there are more than >> two default >> frequencies (defa_frequency), Thus, there is only one default frequency = >> 0.150 w.r.t >> basel_asse
Re: [R] Subseting a data.frame
> -Original Message- > > mydat > basel_asset_class defa_frequency > 1 2 0.150 > 2 8 0.070 > 3 8 0.030 > 4 8 0.001 > > > I need to get the subset of this data.frame where no of records for the > given basel_asset_class is > 2, Maybe something like subset(mydat, ave(1:nrow(mydat), base_asset_class, FUN=length)>2) ? S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subseting a data.frame
> -Original Message- > ... the kindest guide I can give is to > read an Introduction to R (ships with R) or a R web tutorial of your choice No quibble with the advice, but it prompted me to look again at the R Intro. Interestingly, the Intro doesn't mention subset() at all; the subsetting operations referred to there are all based on indexing (mostly because that section is intended to be about indexing, of course). Subsetting using subset() is perhaps the most natural way of subsetting data frames; perhaps a line or two and an example could usefully be included in the 'Working with data frames' section of the R Intro? S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subseting a data.frame
seq_along(x), integer(length(x)), is.na(x), or anything that produces an integer (or numeric or logical) vector the length of x would work. I use integer() or numeric() to indicate I'm not using its value: it is just a vector in which to place the return values of FUN(). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: arun [mailto:smartpink...@yahoo.com] > Sent: Thursday, October 17, 2013 2:33 PM > To: R help > Cc: William Dunlap; Bert Gunter > Subject: Re: [R] Subseting a data.frame > > Hi Bill, > > #seq_along() worked in the cases you showed. > > ave(seq_along(fac),fac,FUN=length) > #[1] 3 1 3 3 > ave(seq_along(num), num, FUN=length) > #[1] 3 1 3 3 > ave(seq_along(char), char, FUN=length) > #[1] 3 1 3 3 > > > > I thought, there might be some advantages in speed, but they were similar in > speed. > set.seed(195) > num1 <- sample(1e3,1e7,replace=TRUE) > system.time(res1 <- ave(integer(length(num1)),num1,FUN=length)) > # user system elapsed > #4.148 0.228 4.382 > system.time(res2 <- ave(seq_along(num1),num1,FUN=length)) > # user system elapsed > # 3.944 0.228 4.181 > system.time(res3 <- ave(num1,num1,FUN=length)) > # user system elapsed > # 3.740 0.264 4.012 > identical(res1,res2) > #[1] TRUE > identical(res2,res3) > #[1] TRUE > > > A.K. > > > > > On Thursday, October 17, 2013 4:34 PM, William Dunlap > wrote: > May I ask why: > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, > FUN=length)) > should not be more simply done as: > count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, > FUN=length)) > > The way I did it would work if basel_asset_class were non-numeric. > In ave(x, group, FUN=FUN), FUN's return value should be the same type as x (or > you can get some odd type conversions). E.g., > > > num <- c(2,3,2,2) ; char <- c("Two","Three","Two","Two") > > ave(num, num, FUN=length) # good > [1] 3 1 3 3 > > ave(char, char, FUN=length) # bad > [1] "3" "1" "3" "3" > > fac <- factor(char, levels=c("One","Two","Three")) > > ave(fac, fac, FUN=length) > [1] > Levels: One Two Three > Warning messages: > 1: In `[<-.factor`(`*tmp*`, i, value = 0L) : > invalid factor level, NA generated > 2: In `[<-.factor`(`*tmp*`, i, value = 3L) : > invalid factor level, NA generated > 3: In `[<-.factor`(`*tmp*`, i, value = 1L) : > invalid factor level, NA generated > but x=integer(length(group)) works in all cases: > > ave(integer(length(fac)), fac, FUN=length) > [1] 3 1 3 3 > > ave(integer(length(char)), char, FUN=length) > [1] 3 1 3 3 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Thursday, October 17, 2013 1:06 PM > To: William Dunlap > Cc: Katherine Gobin; r-help@r-project.org > Subject: Re: [R] Subseting a data.frame > > May I ask why: > > count_by_class <- with(dat, ave(numeric(length(basel_ > asset_class)), basel_asset_class, FUN=length)) > should not be more simply done as: > > count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, > FUN=length)) > > ? > -- Bert > > On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap > mailto:wdun...@tibco.com>> wrote: > > What I need is to select only those records for which there are more than > > two default > > frequencies (defa_frequency), > > Here is one way. There are many others: > > dat <- data.frame( # slightly less trivial example > basel_asset_class=c(4,8,8,8,74,3,74), > defa_frequency=(1:7)/8) > > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, FUN=length)) > > cbind(dat, count_by_class) # see what we just computed > basel_asset_class defa_frequency count_by_class > 1 4 0.125 1 > 2 8 0.250 3 > 3 8 0.375 3 > 4 8 0.500 3 > 5 74 0.625 2 > 6 3 0.750 1 > 7 74 0.875 2 > > mydat[count_by_class>1, ] # I think this is what you are asking for > basel_asset_class defa_frequency > 2 8
Re: [R] Subseting a data.frame
Hi Bill, #seq_along() worked in the cases you showed. ave(seq_along(fac),fac,FUN=length) #[1] 3 1 3 3 ave(seq_along(num), num, FUN=length) #[1] 3 1 3 3 ave(seq_along(char), char, FUN=length) #[1] 3 1 3 3 I thought, there might be some advantages in speed, but they were similar in speed. set.seed(195) num1 <- sample(1e3,1e7,replace=TRUE) system.time(res1 <- ave(integer(length(num1)),num1,FUN=length)) # user system elapsed #4.148 0.228 4.382 system.time(res2 <- ave(seq_along(num1),num1,FUN=length)) # user system elapsed # 3.944 0.228 4.181 system.time(res3 <- ave(num1,num1,FUN=length)) # user system elapsed # 3.740 0.264 4.012 identical(res1,res2) #[1] TRUE identical(res2,res3) #[1] TRUE A.K. On Thursday, October 17, 2013 4:34 PM, William Dunlap wrote: May I ask why: count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), basel_asset_class, FUN=length)) should not be more simply done as: count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, FUN=length)) The way I did it would work if basel_asset_class were non-numeric. In ave(x, group, FUN=FUN), FUN's return value should be the same type as x (or you can get some odd type conversions). E.g., > num <- c(2,3,2,2) ; char <- c("Two","Three","Two","Two") > ave(num, num, FUN=length) # good [1] 3 1 3 3 > ave(char, char, FUN=length) # bad [1] "3" "1" "3" "3" > fac <- factor(char, levels=c("One","Two","Three")) > ave(fac, fac, FUN=length) [1] Levels: One Two Three Warning messages: 1: In `[<-.factor`(`*tmp*`, i, value = 0L) : invalid factor level, NA generated 2: In `[<-.factor`(`*tmp*`, i, value = 3L) : invalid factor level, NA generated 3: In `[<-.factor`(`*tmp*`, i, value = 1L) : invalid factor level, NA generated but x=integer(length(group)) works in all cases: > ave(integer(length(fac)), fac, FUN=length) [1] 3 1 3 3 > ave(integer(length(char)), char, FUN=length) [1] 3 1 3 3 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Thursday, October 17, 2013 1:06 PM To: William Dunlap Cc: Katherine Gobin; r-help@r-project.org Subject: Re: [R] Subseting a data.frame May I ask why: count_by_class <- with(dat, ave(numeric(length(basel_ asset_class)), basel_asset_class, FUN=length)) should not be more simply done as: count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, FUN=length)) ? -- Bert On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap mailto:wdun...@tibco.com>> wrote: > What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Here is one way. There are many others: > dat <- data.frame( # slightly less trivial example basel_asset_class=c(4,8,8,8,74,3,74), defa_frequency=(1:7)/8) > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), basel_asset_class, FUN=length)) > cbind(dat, count_by_class) # see what we just computed basel_asset_class defa_frequency count_by_class 1 4 0.125 1 2 8 0.250 3 3 8 0.375 3 4 8 0.500 3 5 74 0.625 2 6 3 0.750 1 7 74 0.875 2 > mydat[count_by_class>1, ] # I think this is what you are asking for basel_asset_class defa_frequency 2 8 0.250 3 8 0.375 4 8 0.500 5 74 0.625 7 74 0.875 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com<http://tibco.com> > -Original Message- > From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> > [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On > Behalf > Of Katherine Gobin > Sent: Thursday, October 17, 2013 11:05 AM > To: Bert Gunter > Cc: r-help@r-project.org<mailto:r-help@r-project.org> > Subject: Re: [R] Subseting a data.frame > > Correction. (2nd para first three lines) > > Pl read following line > > What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Thus, there is only one default frequency = > 0.150 w.r.t > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel > aseet class 4, > > > as > > What I need is to select only those records for which there are more than t
Re: [R] Subseting a data.frame
Thanks, Bill. But ?ave specifically says: ave(x, ..., FUN = mean) Arguments: x A numeric. So that it should not be expected to work properly if the argument is not (coercible to) numeric. Nevertheless, defensive programming is always wise. Cheers, Bert On Thu, Oct 17, 2013 at 1:34 PM, William Dunlap wrote: > May I ask why: > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, FUN=length)) > > should not be more simply done as: > count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, > FUN=length)) > > The way I did it would work if basel_asset_class were non-numeric. > > In ave(x, group, FUN=FUN), FUN's return value should be the same type as x > (or > > you can get some odd type conversions). E.g., > > > >> num <- c(2,3,2,2) ; char <- c("Two","Three","Two","Two") > >> ave(num, num, FUN=length) # good > >[1] 3 1 3 3 > >> ave(char, char, FUN=length) # bad > >[1] "3" "1" "3" "3" > >> fac <- factor(char, levels=c("One","Two","Three")) > >> ave(fac, fac, FUN=length) > >[1] > >Levels: One Two Three > >Warning messages: > >1: In `[<-.factor`(`*tmp*`, i, value = 0L) : > > invalid factor level, NA generated > >2: In `[<-.factor`(`*tmp*`, i, value = 3L) : > > invalid factor level, NA generated > >3: In `[<-.factor`(`*tmp*`, i, value = 1L) : > > invalid factor level, NA generated > > but x=integer(length(group)) works in all cases: > >> ave(integer(length(fac)), fac, FUN=length) > >[1] 3 1 3 3 > > > ave(integer(length(char)), char, FUN=length) > > [1] 3 1 3 3 > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > From: Bert Gunter [mailto:gunter.ber...@gene.com] > Sent: Thursday, October 17, 2013 1:06 PM > To: William Dunlap > Cc: Katherine Gobin; r-help@r-project.org > Subject: Re: [R] Subseting a data.frame > > > > May I ask why: > > count_by_class <- with(dat, ave(numeric(length(basel_ > > asset_class)), basel_asset_class, FUN=length)) > > should not be more simply done as: > > count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, > FUN=length)) > > ? > > -- Bert > > > > On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap wrote: > >> What I need is to select only those records for which there are more than >> two default >> frequencies (defa_frequency), > > Here is one way. There are many others: >> dat <- data.frame( # slightly less trivial example > basel_asset_class=c(4,8,8,8,74,3,74), > defa_frequency=(1:7)/8) >> count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, FUN=length)) >> cbind(dat, count_by_class) # see what we just computed > basel_asset_class defa_frequency count_by_class >1 4 0.125 1 >2 8 0.250 3 >3 8 0.375 3 >4 8 0.500 3 >574 0.625 2 >6 3 0.750 1 >774 0.875 2 >> mydat[count_by_class>1, ] # I think this is what you are asking for > basel_asset_class defa_frequency >2 8 0.250 >3 8 0.375 >4 8 0.500 >574 0.625 >774 0.875 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> On Behalf >> Of Katherine Gobin >> Sent: Thursday, October 17, 2013 11:05 AM >> To: Bert Gunter >> Cc: r-help@r-project.org >> Subject: Re: [R] Subseting a data.frame >> >> Correction. (2nd para first three lines) >> >> Pl read following line >> >> What I need is to select only those records for which there are more than >> two default >> frequencies (defa_frequency), Thus, there is only one default frequency = >> 0.150 w.r.t >> basel_asset_class = 4 whereas there are default frequencies w.r.t. basel >> aseet class 4, >> >> >> as >> >> What I need is to select only those records for which t
Re: [R] Subseting a data.frame
May I ask why: count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), basel_asset_class, FUN=length)) should not be more simply done as: count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, FUN=length)) The way I did it would work if basel_asset_class were non-numeric. In ave(x, group, FUN=FUN), FUN's return value should be the same type as x (or you can get some odd type conversions). E.g., > num <- c(2,3,2,2) ; char <- c("Two","Three","Two","Two") > ave(num, num, FUN=length) # good [1] 3 1 3 3 > ave(char, char, FUN=length) # bad [1] "3" "1" "3" "3" > fac <- factor(char, levels=c("One","Two","Three")) > ave(fac, fac, FUN=length) [1] Levels: One Two Three Warning messages: 1: In `[<-.factor`(`*tmp*`, i, value = 0L) : invalid factor level, NA generated 2: In `[<-.factor`(`*tmp*`, i, value = 3L) : invalid factor level, NA generated 3: In `[<-.factor`(`*tmp*`, i, value = 1L) : invalid factor level, NA generated but x=integer(length(group)) works in all cases: > ave(integer(length(fac)), fac, FUN=length) [1] 3 1 3 3 > ave(integer(length(char)), char, FUN=length) [1] 3 1 3 3 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Thursday, October 17, 2013 1:06 PM To: William Dunlap Cc: Katherine Gobin; r-help@r-project.org Subject: Re: [R] Subseting a data.frame May I ask why: count_by_class <- with(dat, ave(numeric(length(basel_ asset_class)), basel_asset_class, FUN=length)) should not be more simply done as: count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, FUN=length)) ? -- Bert On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap mailto:wdun...@tibco.com>> wrote: > What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Here is one way. There are many others: > dat <- data.frame( # slightly less trivial example basel_asset_class=c(4,8,8,8,74,3,74), defa_frequency=(1:7)/8) > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), basel_asset_class, FUN=length)) > cbind(dat, count_by_class) # see what we just computed basel_asset_class defa_frequency count_by_class 1 4 0.125 1 2 8 0.250 3 3 8 0.375 3 4 8 0.500 3 574 0.625 2 6 3 0.750 1 774 0.875 2 > mydat[count_by_class>1, ] # I think this is what you are asking for basel_asset_class defa_frequency 2 8 0.250 3 8 0.375 4 8 0.500 574 0.625 774 0.875 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com<http://tibco.com> > -Original Message- > From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> > [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] On > Behalf > Of Katherine Gobin > Sent: Thursday, October 17, 2013 11:05 AM > To: Bert Gunter > Cc: r-help@r-project.org<mailto:r-help@r-project.org> > Subject: Re: [R] Subseting a data.frame > > Correction. (2nd para first three lines) > > Pl read following line > > What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Thus, there is only one default frequency = > 0.150 w.r.t > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel > aseet class 4, > > > as > > What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Thus, there is only one default frequency = > 0.150 w.r.t > basel_asset_class = 4 whereas there are THREE default frequencies w.r.t. > basel aseet > class 8, > > > > I alpologize for the incovenience. > > Regards > > KAtherine > > > > > > > > > On , Katherine Gobin > mailto:katherine_go...@yahoo.com>> wrote: > > I am sorry perhaps was not able to put the question properly. I am not > looking for the > subset of the data.frame where the basel_asset_class is > 2. I do agree that > would have > been a basic requirement. Let me try to put the question again. > > I have a data frame as > > mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency =
Re: [R] Subseting a data.frame
May I ask why: count_by_class <- with(dat, ave(numeric(length(basel_ asset_class)), basel_asset_class, FUN=length)) should not be more simply done as: count_by_class <- with(dat, ave(basel_asset_class, basel_asset_class, FUN=length)) ? -- Bert On Thu, Oct 17, 2013 at 12:36 PM, William Dunlap wrote: > > What I need is to select only those records for which there are more > than two default > > frequencies (defa_frequency), > > Here is one way. There are many others: >> dat <- data.frame( # slightly less trivial example > basel_asset_class=c(4,8,8,8,74,3,74), > defa_frequency=(1:7)/8) >> count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), > basel_asset_class, FUN=length)) >> cbind(dat, count_by_class) # see what we just computed > basel_asset_class defa_frequency count_by_class >1 4 0.125 1 >2 8 0.250 3 >3 8 0.375 3 >4 8 0.500 3 >574 0.625 2 >6 3 0.750 1 >774 0.875 2 >> mydat[count_by_class>1, ] # I think this is what you are asking for > basel_asset_class defa_frequency >2 8 0.250 >3 8 0.375 >4 8 0.500 >574 0.625 >774 0.875 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf > > Of Katherine Gobin > > Sent: Thursday, October 17, 2013 11:05 AM > > To: Bert Gunter > > Cc: r-help@r-project.org > > Subject: Re: [R] Subseting a data.frame > > > > Correction. (2nd para first three lines) > > > > Pl read following line > > > > What I need is to select only those records for which there are more > than two default > > frequencies (defa_frequency), Thus, there is only one default frequency > = 0.150 w.r.t > > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel > aseet class 4, > > > > > > as > > > > What I need is to select only those records for which there are more > than two default > > frequencies (defa_frequency), Thus, there is only one default frequency > = 0.150 w.r.t > > basel_asset_class = 4 whereas there are THREE default frequencies w.r.t. > basel aseet > > class 8, > > > > > > > > I alpologize for the incovenience. > > > > Regards > > > > KAtherine > > > > > > > > > > > > > > > > > > On , Katherine Gobin wrote: > > > > I am sorry perhaps was not able to put the question properly. I am not > looking for the > > subset of the data.frame where the basel_asset_class is > 2. I do agree > that would have > > been a basic requirement. Let me try to put the question again. > > > > I have a data frame as > > > > mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency = > c(0.15, 0.07, 0.03, > > 0.001)) > > > > # Please note I have changed the basel_asset_class to 4 from 2, to avoid > confusion. > > > > > mydat > > basel_asset_class defa_frequency > > 1 4 0.150 > > 2 8 0.070 > > 3 8 0.030 > > 4 8 0.001 > > > > > > > > This is just an representative example. In reality, I may have no of > basel asset classes. 4, 8 > > etc are the IDs can be anything thus I cant hard code it as subset(mydat, > > mydat$basel_asset_class > 2). > > > > > > What I need is to select only those records for which there are more > than two default > > frequencies (defa_frequency), Thus, there is only one default frequency > = 0.150 w.r.t > > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel > aseet class 4, > > similarly there could be another basel asset class having say 5 default > frequncies. Thus, I > > need to take subset of the data.frame s.t. the no of corresponding > defa_frequencies is > > greater than 2. > > > > The idea is we try to fit exponential curve Y = A exp( BX ) for each of > the basel asset > > classes and to estimate values of A and B, mathematically one needs to > have at least two > >
Re: [R] Subseting a data.frame
> What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Here is one way. There are many others: > dat <- data.frame( # slightly less trivial example basel_asset_class=c(4,8,8,8,74,3,74), defa_frequency=(1:7)/8) > count_by_class <- with(dat, ave(numeric(length(basel_asset_class)), basel_asset_class, FUN=length)) > cbind(dat, count_by_class) # see what we just computed basel_asset_class defa_frequency count_by_class 1 4 0.125 1 2 8 0.250 3 3 8 0.375 3 4 8 0.500 3 574 0.625 2 6 3 0.750 1 774 0.875 2 > mydat[count_by_class>1, ] # I think this is what you are asking for basel_asset_class defa_frequency 2 8 0.250 3 8 0.375 4 8 0.500 574 0.625 774 0.875 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Katherine Gobin > Sent: Thursday, October 17, 2013 11:05 AM > To: Bert Gunter > Cc: r-help@r-project.org > Subject: Re: [R] Subseting a data.frame > > Correction. (2nd para first three lines) > > Pl read following line > > What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Thus, there is only one default frequency = > 0.150 w.r.t > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel > aseet class 4, > > > as > > What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Thus, there is only one default frequency = > 0.150 w.r.t > basel_asset_class = 4 whereas there are THREE default frequencies w.r.t. > basel aseet > class 8, > > > > I alpologize for the incovenience. > > Regards > > KAtherine > > > > > > > > > On , Katherine Gobin wrote: > > I am sorry perhaps was not able to put the question properly. I am not > looking for the > subset of the data.frame where the basel_asset_class is > 2. I do agree that > would have > been a basic requirement. Let me try to put the question again. > > I have a data frame as > > mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency = > c(0.15, 0.07, 0.03, > 0.001)) > > # Please note I have changed the basel_asset_class to 4 from 2, to avoid > confusion. > > > mydat > basel_asset_class defa_frequency > 1 4 0.150 > 2 8 0.070 > 3 8 0.030 > 4 8 0.001 > > > > This is just an representative example. In reality, I may have no of basel > asset classes. 4, 8 > etc are the IDs can be anything thus I cant hard code it as subset(mydat, > mydat$basel_asset_class > 2). > > > What I need is to select only those records for which there are more than two > default > frequencies (defa_frequency), Thus, there is only one default frequency = > 0.150 w.r.t > basel_asset_class = 4 whereas there are default frequencies w.r.t. basel > aseet class 4, > similarly there could be another basel asset class having say 5 default > frequncies. Thus, I > need to take subset of the data.frame s.t. the no of corresponding > defa_frequencies is > greater than 2. > > The idea is we try to fit exponential curve Y = A exp( BX ) for each of the > basel asset > classes and to estimate values of A and B, mathematically one needs to have > at least two > values of X. > > I hope I may be able to express my requirement. Its not that I need the > subset of mydat > s.t. basel asset class is > 2 (now 4 in revised example), but sbuset s.t. no > of default > frequencies is greater than or equal to 2. This 2 is not same as basel asset > class 2. > > Kindly guide > > With warm regards > > Katherine Gobin > > > > > On Thursday, 17 October 2013 9:33 PM, Bert Gunter > wrote: > > "Kindly guide" ... > > This is a very basic question, so the kindest guide I can give is to read an > Introduction to R > (ships with R) or a R web tutorial of your choice so that you can learn how R > works > instead of posting to this list. > > Cheers, >
Re: [R] Subseting a data.frame
You may try: mydat[with(mydat,ave(seq_along(basel_asset_class),basel_asset_class,FUN=length)>2),] # basel_asset_class defa_frequency #2 8 0.070 #3 8 0.030 #4 8 0.001 #or library(plyr) mydat[ddply(mydat,.(basel_asset_class),mutate,L=length(defa_frequency))[,3] >2,] #assuming it is sorted. A.K. On Thursday, October 17, 2013 1:59 PM, Katherine Gobin wrote: I am sorry perhaps was not able to put the question properly. I am not looking for the subset of the data.frame where the basel_asset_class is > 2. I do agree that would have been a basic requirement. Let me try to put the question again. I have a data frame as mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency = c(0.15, 0.07, 0.03, 0.001)) # Please note I have changed the basel_asset_class to 4 from 2, to avoid confusion. > mydat basel_asset_class defa_frequency 1 4 0.150 2 8 0.070 3 8 0.030 4 8 0.001 This is just an representative example. In reality, I may have no of basel asset classes. 4, 8 etc are the IDs can be anything thus I cant hard code it as subset(mydat, mydat$basel_asset_class > 2). What I need is to select only those records for which there are more than two default frequencies (defa_frequency), Thus, there is only one default frequency = 0.150 w.r.t basel_asset_class = 4 whereas there are default frequencies w.r.t. basel aseet class 4, similarly there could be another basel asset class having say 5 default frequncies. Thus, I need to take subset of the data.frame s.t. the no of corresponding defa_frequencies is greater than 2. The idea is we try to fit exponential curve Y = A exp( BX ) for each of the basel asset classes and to estimate values of A and B, mathematically one needs to have at least two values of X. I hope I may be able to express my requirement. Its not that I need the subset of mydat s.t. basel asset class is > 2 (now 4 in revised example), but sbuset s.t. no of default frequencies is greater than or equal to 2. This 2 is not same as basel asset class 2. Kindly guide With warm regards Katherine Gobin On Thursday, 17 October 2013 9:33 PM, Bert Gunter wrote: "Kindly guide" ... This is a very basic question, so the kindest guide I can give is to read an Introduction to R (ships with R) or a R web tutorial of your choice so that you can learn how R works instead of posting to this list. Cheers, Bert On Wed, Oct 16, 2013 at 11:55 PM, Katherine Gobin wrote: Dear Forum, > >I have a data frame as > >mydat = data.frame(basel_asset_class = c(2, 8, 8 ,8), defa_frequency = c(0.15, >0.07, 0.03, 0.001)) > >> mydat > basel_asset_class defa_frequency >1 2 0.150 >2 8 0.070 >3 8 0.030 >4 8 0.001 > > >I need to get the subset of this data.frame where no of records for the given >basel_asset_class is > 2, i.e. I need to obtain subset of above data.frame as >(since there is only 1 record, against basel_asset_class = 2, I want to filter >it) > >> mydat_a > basel_asset_class defa_frequency >1 8 0.070 >2 8 0.030 >3 8 0.001 > >Kindly guide > >Katherine > [[alternative HTML version deleted]] > > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subseting a data.frame
Correction. (2nd para first three lines) Pl read following line What I need is to select only those records for which there are more than two default frequencies (defa_frequency), Thus, there is only one default frequency = 0.150 w.r.t basel_asset_class = 4 whereas there are default frequencies w.r.t. basel aseet class 4, as What I need is to select only those records for which there are more than two default frequencies (defa_frequency), Thus, there is only one default frequency = 0.150 w.r.t basel_asset_class = 4 whereas there are THREE default frequencies w.r.t. basel aseet class 8, I alpologize for the incovenience. Regards KAtherine On , Katherine Gobin wrote: I am sorry perhaps was not able to put the question properly. I am not looking for the subset of the data.frame where the basel_asset_class is > 2. I do agree that would have been a basic requirement. Let me try to put the question again. I have a data frame as mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency = c(0.15, 0.07, 0.03, 0.001)) # Please note I have changed the basel_asset_class to 4 from 2, to avoid confusion. > mydat basel_asset_class defa_frequency 1 4 0.150 2 8 0.070 3 8 0.030 4 8 0.001 This is just an representative example. In reality, I may have no of basel asset classes. 4, 8 etc are the IDs can be anything thus I cant hard code it as subset(mydat, mydat$basel_asset_class > 2). What I need is to select only those records for which there are more than two default frequencies (defa_frequency), Thus, there is only one default frequency = 0.150 w.r.t basel_asset_class = 4 whereas there are default frequencies w.r.t. basel aseet class 4, similarly there could be another basel asset class having say 5 default frequncies. Thus, I need to take subset of the data.frame s.t. the no of corresponding defa_frequencies is greater than 2. The idea is we try to fit exponential curve Y = A exp( BX ) for each of the basel asset classes and to estimate values of A and B, mathematically one needs to have at least two values of X. I hope I may be able to express my requirement. Its not that I need the subset of mydat s.t. basel asset class is > 2 (now 4 in revised example), but sbuset s.t. no of default frequencies is greater than or equal to 2. This 2 is not same as basel asset class 2. Kindly guide With warm regards Katherine Gobin On Thursday, 17 October 2013 9:33 PM, Bert Gunter wrote: "Kindly guide" ... This is a very basic question, so the kindest guide I can give is to read an Introduction to R (ships with R) or a R web tutorial of your choice so that you can learn how R works instead of posting to this list. Cheers, Bert On Wed, Oct 16, 2013 at 11:55 PM, Katherine Gobin wrote: Dear Forum, > >I have a data frame as > >mydat = data.frame(basel_asset_class = c(2, 8, 8 ,8), defa_frequency = c(0.15, >0.07, 0.03, 0.001)) > >> mydat > basel_asset_class defa_frequency >1 2 0.150 >2 8 0.070 >3 8 0.030 >4 8 0.001 > > >I need to get the subset of this data.frame where no of records for the given >basel_asset_class is > 2, i.e. I need to obtain subset of above data.frame as >(since there is only 1 record, against basel_asset_class = 2, I want to filter >it) > >> mydat_a > basel_asset_class defa_frequency >1 8 0.070 >2 8 0.030 >3 8 0.001 > >Kindly guide > >Katherine > [[alternative HTML version deleted]] > > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subseting a data.frame
I am sorry perhaps was not able to put the question properly. I am not looking for the subset of the data.frame where the basel_asset_class is > 2. I do agree that would have been a basic requirement. Let me try to put the question again. I have a data frame as mydat = data.frame(basel_asset_class = c(4, 8, 8 ,8), defa_frequency = c(0.15, 0.07, 0.03, 0.001)) # Please note I have changed the basel_asset_class to 4 from 2, to avoid confusion. > mydat basel_asset_class defa_frequency 1 4 0.150 2 8 0.070 3 8 0.030 4 8 0.001 This is just an representative example. In reality, I may have no of basel asset classes. 4, 8 etc are the IDs can be anything thus I cant hard code it as subset(mydat, mydat$basel_asset_class > 2). What I need is to select only those records for which there are more than two default frequencies (defa_frequency), Thus, there is only one default frequency = 0.150 w.r.t basel_asset_class = 4 whereas there are default frequencies w.r.t. basel aseet class 4, similarly there could be another basel asset class having say 5 default frequncies. Thus, I need to take subset of the data.frame s.t. the no of corresponding defa_frequencies is greater than 2. The idea is we try to fit exponential curve Y = A exp( BX ) for each of the basel asset classes and to estimate values of A and B, mathematically one needs to have at least two values of X. I hope I may be able to express my requirement. Its not that I need the subset of mydat s.t. basel asset class is > 2 (now 4 in revised example), but sbuset s.t. no of default frequencies is greater than or equal to 2. This 2 is not same as basel asset class 2. Kindly guide With warm regards Katherine Gobin On Thursday, 17 October 2013 9:33 PM, Bert Gunter wrote: "Kindly guide" ... This is a very basic question, so the kindest guide I can give is to read an Introduction to R (ships with R) or a R web tutorial of your choice so that you can learn how R works instead of posting to this list. Cheers, Bert On Wed, Oct 16, 2013 at 11:55 PM, Katherine Gobin wrote: Dear Forum, > >I have a data frame as > >mydat = data.frame(basel_asset_class = c(2, 8, 8 ,8), defa_frequency = c(0.15, >0.07, 0.03, 0.001)) > >> mydat > basel_asset_class defa_frequency >1 2 0.150 >2 8 0.070 >3 8 0.030 >4 8 0.001 > > >I need to get the subset of this data.frame where no of records for the given >basel_asset_class is > 2, i.e. I need to obtain subset of above data.frame as >(since there is only 1 record, against basel_asset_class = 2, I want to filter >it) > >> mydat_a > basel_asset_class defa_frequency >1 8 0.070 >2 8 0.030 >3 8 0.001 > >Kindly guide > >Katherine > [[alternative HTML version deleted]] > > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subseting a data.frame
"Kindly guide" ... This is a very basic question, so the kindest guide I can give is to read an Introduction to R (ships with R) or a R web tutorial of your choice so that you can learn how R works instead of posting to this list. Cheers, Bert On Wed, Oct 16, 2013 at 11:55 PM, Katherine Gobin wrote: > Dear Forum, > > I have a data frame as > > mydat = data.frame(basel_asset_class = c(2, 8, 8 ,8), defa_frequency = > c(0.15, 0.07, 0.03, 0.001)) > > > mydat > basel_asset_class defa_frequency > 1 2 0.150 > 2 8 0.070 > 3 8 0.030 > 4 8 0.001 > > > I need to get the subset of this data.frame where no of records for the > given basel_asset_class is > 2, i.e. I need to obtain subset of above > data.frame as (since there is only 1 record, against basel_asset_class = 2, > I want to filter it) > > > mydat_a > basel_asset_class defa_frequency > 1 8 0.070 > 2 8 0.030 > 3 8 0.001 > > Kindly guide > > Katherine > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subseting a data.frame
Katherine, There are multiple ways to do this and I highly recommend you look into a basic R manual or search the forums. One quick example would be: mysub <- subset(mydat, basel_asset_class > 2) Cheers, Charles On Thu, Oct 17, 2013 at 1:55 AM, Katherine Gobin wrote: > Dear Forum, > > I have a data frame as > > mydat = data.frame(basel_asset_class = c(2, 8, 8 ,8), defa_frequency = > c(0.15, 0.07, 0.03, 0.001)) > > > mydat > basel_asset_class defa_frequency > 1 2 0.150 > 2 8 0.070 > 3 8 0.030 > 4 8 0.001 > > > I need to get the subset of this data.frame where no of records for the > given basel_asset_class is > 2, i.e. I need to obtain subset of above > data.frame as (since there is only 1 record, against basel_asset_class = 2, > I want to filter it) > > > mydat_a > basel_asset_class defa_frequency > 1 8 0.070 > 2 8 0.030 > 3 8 0.001 > > Kindly guide > > Katherine > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Charles Determan Integrated Biosciences PhD Candidate University of Minnesota [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.