Re: [R] How to count rows with a condition
HI, I tried the code with unsorted ac_names column and found to be working. So, couldn't identify exactly the problem. If you can provide a subset of your dataset using ?dput(), then it would be much helpful. set.seed(1) dat1-data.frame(ac_name=sample(c(HouseA,HouseB,HouseC,HouseD,HouseE,HouseF,HouseG,HouseI,HouseJ),50,replace=TRUE),val=rnorm(50,15)) dat2-within(dat1,{ac_name-as.character(ac_name)}) dat2-dat2[order(dat2[,1]),] dat3-dat2[dat2[,1]%in%count(dat2[,1])$x[count(dat2[,1])[2]5],] #data excluded dat4-dat2[!dat2[,1]%in%count(dat2[,1])$x[count(dat2[,1])[2]5],] #data included A.K. - Original Message - From: fxen3k f.seha...@gmail.com To: r-help@r-project.org Cc: Sent: Wednesday, October 17, 2012 9:57 AM Subject: Re: [R] How to count rows with a condition Thanks for the first reply. Unfortunately, my list of different ac_names ist pretty long (about 1,000 different names). Is there a way, to sort them, count the quantity of each name and exclude these rows, who exceed a particular limit? -- View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454p4646465.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to count rows with a condition
Hi, I have a dataset called data. There is one row called ac_name. Some names in this column appear very often, some less. What I want is to filter this dataset with the following condition: Exclude the names, which appear more than five times. (example: House A appears 8 times == exclude it; House B appears 5 times == include it etc.) In the end, I want to have the old data dataset excluding the rows with the above mentioned condition and another list with all the names which have been excluded. I think for one of the professionals amongst you this is pretty easy to solve. ;-) Thanks dudes! Cheerio, Felix -- View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to count rows with a condition
Thanks for the first reply. Unfortunately, my list of different ac_names ist pretty long (about 1,000 different names). Is there a way, to sort them, count the quantity of each name and exclude these rows, who exceed a particular limit? -- View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454p4646465.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to count rows with a condition
One way is: ac_name_count - ave(integer(nrow(data)), data[[ac_name]], FUN=length) data[ac_name_count = 5, ,drop=FALSE] # rows whose ac_name entry is rare data[ac_name_count 5, ,drop=FALSE] # rows whose ac_name entry is common Use ac_name_seqno - ave(integer(nrow(data)), data[[ac_name]], FUN=seq_along) to assign a within-group sequence number so you can pick out the first or last n items in a group for the big groups. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of fxen3k Sent: Wednesday, October 17, 2012 5:45 AM To: r-help@r-project.org Subject: [R] How to count rows with a condition Hi, I have a dataset called data. There is one row called ac_name. Some names in this column appear very often, some less. What I want is to filter this dataset with the following condition: Exclude the names, which appear more than five times. (example: House A appears 8 times == exclude it; House B appears 5 times == include it etc.) In the end, I want to have the old data dataset excluding the rows with the above mentioned condition and another list with all the names which have been excluded. I think for one of the professionals amongst you this is pretty easy to solve. ;-) Thanks dudes! Cheerio, Felix -- View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with- a-condition-tp4646454.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to count rows with a condition
On Oct 17, 2012, at 5:44 AM, fxen3k wrote: Hi, I have a dataset called data. There is one row called ac_name. Some names in this column appear very often, some less. What I want is to filter this dataset with the following condition: Exclude the names, which appear more than five times. (example: House A appears 8 times == exclude it; House B appears 5 times == include it etc.) In the end, I want to have the old data dataset excluding the rows with the above mentioned condition and another list with all the names which have been excluded. data[ ave(data$ac_name, data$ac_name, length) = 5, ] # all with 5 or fewer entries -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to count rows with a condition
HI David, I tried ur function: set.seed(1) dat1-data.frame(ac_name=rep(c(HouseA,HouseB,HouseC,HouseD,HouseE),times=c(8,5,4,6,3)),val=rnorm(26,15)) dat2-within(dat1,{ac_name-as.character(ac_name)}) dat2-dat2[order(dat2[,1]),] dat2[ave(dat2$ac_name,dat2$ac_name,length)=5,] #Error in unique.default(x) : unique() applies only to vectors #With FUN added head(dat2[ave(dat2$ac_name,dat2$ac_name,FUN=length)=5,]) # ac_name val #9 HouseB 15.57578 #10 HouseB 14.69461 #11 HouseB 16.51178 #12 HouseB 15.38984 #13 HouseB 14.37876 #14 HouseC 12.78530 A.K. - Original Message - From: David Winsemius dwinsem...@comcast.net To: fxen3k f.seha...@gmail.com Cc: r-help@r-project.org Sent: Wednesday, October 17, 2012 4:25 PM Subject: Re: [R] How to count rows with a condition On Oct 17, 2012, at 5:44 AM, fxen3k wrote: Hi, I have a dataset called data. There is one row called ac_name. Some names in this column appear very often, some less. What I want is to filter this dataset with the following condition: Exclude the names, which appear more than five times. (example: House A appears 8 times == exclude it; House B appears 5 times == include it etc.) In the end, I want to have the old data dataset excluding the rows with the above mentioned condition and another list with all the names which have been excluded. data[ ave(data$ac_name, data$ac_name, length) = 5, ] # all with 5 or fewer entries -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to count rows with a condition
data[ ave(data$ac_name, data$ac_name, length) = 5, ] fails for two reasons: a) you need to label the FUN argument, FUN=length, since there is a ... in the middle of ave's argument list to catch all the grouping arguments b) the type of the first argument to needs to be compatible with the type of the return value of FUN(). If ac_name is a factor you get NA's and warnings, if it is character the 5 starts using character order instead of numerical order, leading to incorrect results because 115: data - data.frame(ac_name=rep(c(Amos,Boris,Charlotte),c(3,8,11)), n=101:122, stringsAsFactors=FALSE) data[ ave(data$ac_name, data$ac_name, FUN=length) = 5, ] ac_name n 1 Amos 101 2 Amos 102 3 Amos 103 12 Charlotte 112 13 Charlotte 113 ... [ rows elided ] ... 22 Charlotte 122 data - data.frame(ac_name=rep(c(Amos,Boris,Charlotte),c(3,8,11)), n=101:122, stringsAsFactors=TRUE) data[ ave(data$ac_name, data$ac_name, FUN=length) = 5, ] ac_name n NA NA NA NA.1 NA NA NA.2 NA NA ... [rows elided] ... NA.21NA NA Warning messages: 1: In `[-.factor`(`*tmp*`, i, value = 3L) : invalid factor level, NAs generated 2: In `[-.factor`(`*tmp*`, i, value = 8L) : invalid factor level, NAs generated 3: In `[-.factor`(`*tmp*`, i, value = 11L) : invalid factor level, NAs generated 4: In Ops.factor(ave(data$ac_name, data$ac_name, FUN = length), 5) : = not meaningful for factors That is why I made the first argument integer: data[ ave(integer(nrow(data)), data$ac_name, FUN=length) = 5, ] ac_name n 1Amos 101 2Amos 102 3Amos 103 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Winsemius Sent: Wednesday, October 17, 2012 1:25 PM To: fxen3k Cc: r-help@r-project.org Subject: Re: [R] How to count rows with a condition On Oct 17, 2012, at 5:44 AM, fxen3k wrote: Hi, I have a dataset called data. There is one row called ac_name. Some names in this column appear very often, some less. What I want is to filter this dataset with the following condition: Exclude the names, which appear more than five times. (example: House A appears 8 times == exclude it; House B appears 5 times == include it etc.) In the end, I want to have the old data dataset excluding the rows with the above mentioned condition and another list with all the names which have been excluded. data[ ave(data$ac_name, data$ac_name, length) = 5, ] # all with 5 or fewer entries -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.