Re: [R] counting duplicate items that occur in multiple groups
Thanks, everyone! Quoting Jim Lemon : Oops, I sent this to Tom earlier today and forgot to copy to the list: VendorID=rep(paste0("V",1:10),each=5) AcctID=paste0("A",sample(1:5,50,TRUE)) Data<-data.frame(VendorID,AcctID) table(Data) # get multiple vendors for each account dupAcctID<-colSums(table(Data)>0) Data$dupAcct<-NA # fill in the new column for(i in 1:length(dupAcctID)) Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i] Jim On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman wrote: Hi everyone. I have a dataframe that is a collection of Vendor IDs plus a bank account number for each vendor. I'm trying to find a way to count the number of duplicate bank accounts that occur in more than one unique Vendor_ID, and then assign the count value for each row in the dataframe in a new variable. I can do a count of bank accounts that occur within the same vendor using dplyr and group_by and count, but I can't figure out a way to count duplicates among multiple Vendor_IDs. Dataframe example code: #Create a sample data frame: set.seed(1) Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = sample(1:1)) Thanks in advance for any help. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
Oops, I sent this to Tom earlier today and forgot to copy to the list: VendorID=rep(paste0("V",1:10),each=5) AcctID=paste0("A",sample(1:5,50,TRUE)) Data<-data.frame(VendorID,AcctID) table(Data) # get multiple vendors for each account dupAcctID<-colSums(table(Data)>0) Data$dupAcct<-NA # fill in the new column for(i in 1:length(dupAcctID)) Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i] Jim On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman wrote: > Hi everyone. I have a dataframe that is a collection of Vendor IDs > plus a bank account number for each vendor. I'm trying to find a way > to count the number of duplicate bank accounts that occur in more than > one unique Vendor_ID, and then assign the count value for each row in > the dataframe in a new variable. > > I can do a count of bank accounts that occur within the same vendor > using dplyr and group_by and count, but I can't figure out a way to > count duplicates among multiple Vendor_IDs. > > > Dataframe example code: > > > #Create a sample data frame: > > set.seed(1) > > Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = > sample(1:1)) > > > > > Thanks in advance for any help. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
On Wed, Nov 18, 2020 at 5:40 AM Bert Gunter wrote: > > z <- with(Data2, tapply(Vendor,Account, I)) > n <- vapply(z,length,1) > data.frame (Vendor = unlist(z), >Account = rep(names(z),n), >NumVen = rep(n,n) > ) > > ## which gives: > >Vendor Account NumVen > A1 V1 A1 1 > A21 V2 A2 3 > A22 V3 A2 3 > A23 V1 A2 3 > A3 V4 A3 1 > A4 V2 A4 1 > > Of course this also works for Data1 > > Bill may be able to come up with a slicker version, however. Perhaps transform(Data2, nshare = as.vector(table(Account)[Account])) (or dplyr::mutate() instead of transform(), if you prefer.) -Deepayan > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman > wrote: > > > Yes, good catch. Thanks > > > > > > Quoting Bert Gunter : > > > > > Why 0's in the data frame? Shouldn't that be 1 (vendor with that > > account)? > > > > > > Bert > > > Bert Gunter > > > > > > "The trouble with having an open mind is that people keep coming along > > and > > > sticking things into it." > > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > > > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman > > > wrote: > > > > > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying > > >> (too hard) not to share any actual values so I just created randomized > > >> values for my example; of course I should have specified that the > > >> random values would not provide the expected problem pattern. I should > > >> have just used simple dummy codes as Bill Dunlap did. > > >> > > >> So per Bill's example data for Data1, the expected (hoped for) output > > >> should be: > > >> > > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > > >> 1 V1 A1 0 > > >> 2 V2 A2 3 > > >> 3 V3 A2 3 > > >> 4 V4 A2 3 > > >> > > >> > > >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. > > >> The value is 3 for V2, V3 and V4 because they each share bank account > > >> A2. > > >> > > >> > > >> Likewise, in the Data2 frame, the same logic applies: > > >> > > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > > >> 1 V1 A1 0 > > >> 2 V2 A2 3 > > >> 3 V3 A2 3 > > >> 4 V1 A2 3 > > >> 5 V4 A3 0 > > >> 6 V2 A4 0 > > >> > > >> > > >> > > >> > > >> > > >> > > >> Thanks! > > >> > > >> > > >> Quoting Bill Dunlap : > > >> > > >> > What should the result be for > > >> > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > > >> > Account=c("A1","A2","A2","A2")) > > >> > ? > > >> > > > >> > Must each vendor have only one account? If not, what should the > > result > > >> be > > >> > for > > >> >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > > >> > Account=c("A1","A2","A2","A2","A3","A4")) > > >> > ? > > >> > > > >> > -Bill > > >> > > > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman > > > > >> > wrote: > > >> > > > >> >> Hi everyone. I have a dataframe that is a collection of Vendor IDs > > >> >> plus a bank account number for each vendor. I'm trying to find a way > > >> >> to count the number of duplicate bank accounts that occur in more > > than > > >> >> one unique Vendor_ID, and then assign the count value for each row in > > >> >> the dataframe in a new variable. > > >> >> > > >> >> I can do a count of bank accounts that occur within the same vendor > > >> >> using dplyr and group_by and count, but I can't figure out a way to > > >> >> count duplicates among multiple Vendor_IDs. > > >> >> > > >> >> > > >> >> Dataframe example code: > > >> >> > > >> >> > > >> >> #Create a sample data frame: > > >> >> > > >> >> set.seed(1) > > >> >> > > >> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = > > >> >> sample(1:1)) > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> Thanks in advance for any help. > > >> >> > > >> >> __ > > >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> >> PLEASE do read the posting guide > > >> >> http://www.R-project.org/posting-guide.html > > >> >> and provide commented, minimal, self-contained, reproducible code. > > >> >> > > >> > > >> __ > > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > >> > > > > > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To
Re: [R] counting duplicate items that occur in multiple groups
z <- with(Data2, tapply(Vendor,Account, I)) n <- vapply(z,length,1) data.frame (Vendor = unlist(z), Account = rep(names(z),n), NumVen = rep(n,n) ) ## which gives: Vendor Account NumVen A1 V1 A1 1 A21 V2 A2 3 A22 V3 A2 3 A23 V1 A2 3 A3 V4 A3 1 A4 V2 A4 1 Of course this also works for Data1 Bill may be able to come up with a slicker version, however. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman wrote: > Yes, good catch. Thanks > > > Quoting Bert Gunter : > > > Why 0's in the data frame? Shouldn't that be 1 (vendor with that > account)? > > > > Bert > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > and > > sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman > > wrote: > > > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying > >> (too hard) not to share any actual values so I just created randomized > >> values for my example; of course I should have specified that the > >> random values would not provide the expected problem pattern. I should > >> have just used simple dummy codes as Bill Dunlap did. > >> > >> So per Bill's example data for Data1, the expected (hoped for) output > >> should be: > >> > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > >> 1 V1 A1 0 > >> 2 V2 A2 3 > >> 3 V3 A2 3 > >> 4 V4 A2 3 > >> > >> > >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. > >> The value is 3 for V2, V3 and V4 because they each share bank account > >> A2. > >> > >> > >> Likewise, in the Data2 frame, the same logic applies: > >> > >> Vendor Account Num_Vendors_Sharing_Bank_Acct > >> 1 V1 A1 0 > >> 2 V2 A2 3 > >> 3 V3 A2 3 > >> 4 V1 A2 3 > >> 5 V4 A3 0 > >> 6 V2 A4 0 > >> > >> > >> > >> > >> > >> > >> Thanks! > >> > >> > >> Quoting Bill Dunlap : > >> > >> > What should the result be for > >> > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > >> > Account=c("A1","A2","A2","A2")) > >> > ? > >> > > >> > Must each vendor have only one account? If not, what should the > result > >> be > >> > for > >> >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > >> > Account=c("A1","A2","A2","A2","A3","A4")) > >> > ? > >> > > >> > -Bill > >> > > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman > > >> > wrote: > >> > > >> >> Hi everyone. I have a dataframe that is a collection of Vendor IDs > >> >> plus a bank account number for each vendor. I'm trying to find a way > >> >> to count the number of duplicate bank accounts that occur in more > than > >> >> one unique Vendor_ID, and then assign the count value for each row in > >> >> the dataframe in a new variable. > >> >> > >> >> I can do a count of bank accounts that occur within the same vendor > >> >> using dplyr and group_by and count, but I can't figure out a way to > >> >> count duplicates among multiple Vendor_IDs. > >> >> > >> >> > >> >> Dataframe example code: > >> >> > >> >> > >> >> #Create a sample data frame: > >> >> > >> >> set.seed(1) > >> >> > >> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = > >> >> sample(1:1)) > >> >> > >> >> > >> >> > >> >> > >> >> Thanks in advance for any help. > >> >> > >> >> __ > >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> >> > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
Many problems can often be solved with some thought by using the right tools, such as the ones from the tidyverse. Without giving a specific answer, you might want to think about using the group_by() functionality in a pipeline that would lump together all rows matching say having the same value in several columns. Then in something like a mutate() or summarize() you can use special functions like n() that return how many rows exist within each grouping. There are many more such verbs and features that let you build up something, often by removing the grouping along the way and perhaps adding some other form of grouping including the new rowwise() that then lets you do things across columns on a row at a time and so on. I think the point is to think of steps that lead to a result that can be used in the next step and so on. And, for some problems, you can think outside the pipelines and create multiple intermediate data.frames with parts of what you will need and then combine them with joins or whatever it takes to efficiently get a result, or by brute force. Sometimes (as when making graphs) you might want to convert data between forms that are often called long versus wide. Yes, plenty can be done in base R or using other packages. But a good set of tools might be part of what you need to investigate. Of course, others can chime in suggesting that there are negatives to dplyr and other aspects of the tidyverse and they would be right too. -Original Message- From: R-help On Behalf Of Tom Woolman Sent: Tuesday, November 17, 2020 6:30 PM To: Bill Dunlap Cc: r-help@r-project.org Subject: Re: [R] counting duplicate items that occur in multiple groups Hi Bill. Sorry to be so obtuse with the example data, I was trying (too hard) not to share any actual values so I just created randomized values for my example; of course I should have specified that the random values would not provide the expected problem pattern. I should have just used simple dummy codes as Bill Dunlap did. So per Bill's example data for Data1, the expected (hoped for) output should be: Vendor Account Num_Vendors_Sharing_Bank_Acct 1 V1 A1 0 2 V2 A2 3 3 V3 A2 3 4 V4 A2 3 Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. The value is 3 for V2, V3 and V4 because they each share bank account A2. Likewise, in the Data2 frame, the same logic applies: Vendor Account Num_Vendors_Sharing_Bank_Acct 1 V1 A1 0 2 V2 A2 3 3 V3 A2 3 4 V1 A2 3 5 V4 A3 0 6 V2 A4 0 Thanks! Quoting Bill Dunlap : > What should the result be for > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > Account=c("A1","A2","A2","A2")) > ? > > Must each vendor have only one account? If not, what should the > result be for >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > Account=c("A1","A2","A2","A2","A3","A4")) > ? > > -Bill > > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman > wrote: > >> Hi everyone. I have a dataframe that is a collection of Vendor IDs >> plus a bank account number for each vendor. I'm trying to find a way >> to count the number of duplicate bank accounts that occur in more >> than one unique Vendor_ID, and then assign the count value for each >> row in the dataframe in a new variable. >> >> I can do a count of bank accounts that occur within the same vendor >> using dplyr and group_by and count, but I can't figure out a way to >> count duplicates among multiple Vendor_IDs. >> >> >> Dataframe example code: >> >> >> #Create a sample data frame: >> >> set.seed(1) >> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = >> sample(1:1)) >> >> >> >> >> Thanks in advance for any help. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)? Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman wrote: > Hi Bill. Sorry to be so obtuse with the example data, I was trying > (too hard) not to share any actual values so I just created randomized > values for my example; of course I should have specified that the > random values would not provide the expected problem pattern. I should > have just used simple dummy codes as Bill Dunlap did. > > So per Bill's example data for Data1, the expected (hoped for) output > should be: > > Vendor Account Num_Vendors_Sharing_Bank_Acct > 1 V1 A1 0 > 2 V2 A2 3 > 3 V3 A2 3 > 4 V4 A2 3 > > > Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. > The value is 3 for V2, V3 and V4 because they each share bank account > A2. > > > Likewise, in the Data2 frame, the same logic applies: > > Vendor Account Num_Vendors_Sharing_Bank_Acct > 1 V1 A1 0 > 2 V2 A2 3 > 3 V3 A2 3 > 4 V1 A2 3 > 5 V4 A3 0 > 6 V2 A4 0 > > > > > > > Thanks! > > > Quoting Bill Dunlap : > > > What should the result be for > > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > > Account=c("A1","A2","A2","A2")) > > ? > > > > Must each vendor have only one account? If not, what should the result > be > > for > >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > > Account=c("A1","A2","A2","A2","A3","A4")) > > ? > > > > -Bill > > > > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman > > wrote: > > > >> Hi everyone. I have a dataframe that is a collection of Vendor IDs > >> plus a bank account number for each vendor. I'm trying to find a way > >> to count the number of duplicate bank accounts that occur in more than > >> one unique Vendor_ID, and then assign the count value for each row in > >> the dataframe in a new variable. > >> > >> I can do a count of bank accounts that occur within the same vendor > >> using dplyr and group_by and count, but I can't figure out a way to > >> count duplicates among multiple Vendor_IDs. > >> > >> > >> Dataframe example code: > >> > >> > >> #Create a sample data frame: > >> > >> set.seed(1) > >> > >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = > >> sample(1:1)) > >> > >> > >> > >> > >> Thanks in advance for any help. > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
Yes, good catch. Thanks Quoting Bert Gunter : Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)? Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman wrote: Hi Bill. Sorry to be so obtuse with the example data, I was trying (too hard) not to share any actual values so I just created randomized values for my example; of course I should have specified that the random values would not provide the expected problem pattern. I should have just used simple dummy codes as Bill Dunlap did. So per Bill's example data for Data1, the expected (hoped for) output should be: Vendor Account Num_Vendors_Sharing_Bank_Acct 1 V1 A1 0 2 V2 A2 3 3 V3 A2 3 4 V4 A2 3 Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. The value is 3 for V2, V3 and V4 because they each share bank account A2. Likewise, in the Data2 frame, the same logic applies: Vendor Account Num_Vendors_Sharing_Bank_Acct 1 V1 A1 0 2 V2 A2 3 3 V3 A2 3 4 V1 A2 3 5 V4 A3 0 6 V2 A4 0 Thanks! Quoting Bill Dunlap : > What should the result be for > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), > Account=c("A1","A2","A2","A2")) > ? > > Must each vendor have only one account? If not, what should the result be > for >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), > Account=c("A1","A2","A2","A2","A3","A4")) > ? > > -Bill > > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman > wrote: > >> Hi everyone. I have a dataframe that is a collection of Vendor IDs >> plus a bank account number for each vendor. I'm trying to find a way >> to count the number of duplicate bank accounts that occur in more than >> one unique Vendor_ID, and then assign the count value for each row in >> the dataframe in a new variable. >> >> I can do a count of bank accounts that occur within the same vendor >> using dplyr and group_by and count, but I can't figure out a way to >> count duplicates among multiple Vendor_IDs. >> >> >> Dataframe example code: >> >> >> #Create a sample data frame: >> >> set.seed(1) >> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = >> sample(1:1)) >> >> >> >> >> Thanks in advance for any help. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
Hi Bill. Sorry to be so obtuse with the example data, I was trying (too hard) not to share any actual values so I just created randomized values for my example; of course I should have specified that the random values would not provide the expected problem pattern. I should have just used simple dummy codes as Bill Dunlap did. So per Bill's example data for Data1, the expected (hoped for) output should be: Vendor Account Num_Vendors_Sharing_Bank_Acct 1 V1 A1 0 2 V2 A2 3 3 V3 A2 3 4 V4 A2 3 Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct. The value is 3 for V2, V3 and V4 because they each share bank account A2. Likewise, in the Data2 frame, the same logic applies: Vendor Account Num_Vendors_Sharing_Bank_Acct 1 V1 A1 0 2 V2 A2 3 3 V3 A2 3 4 V1 A2 3 5 V4 A3 0 6 V2 A4 0 Thanks! Quoting Bill Dunlap : What should the result be for Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), Account=c("A1","A2","A2","A2")) ? Must each vendor have only one account? If not, what should the result be for Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), Account=c("A1","A2","A2","A2","A3","A4")) ? -Bill On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman wrote: Hi everyone. I have a dataframe that is a collection of Vendor IDs plus a bank account number for each vendor. I'm trying to find a way to count the number of duplicate bank accounts that occur in more than one unique Vendor_ID, and then assign the count value for each row in the dataframe in a new variable. I can do a count of bank accounts that occur within the same vendor using dplyr and group_by and count, but I can't figure out a way to count duplicates among multiple Vendor_IDs. Dataframe example code: #Create a sample data frame: set.seed(1) Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = sample(1:1)) Thanks in advance for any help. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
What should the result be for Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"), Account=c("A1","A2","A2","A2")) ? Must each vendor have only one account? If not, what should the result be for Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"), Account=c("A1","A2","A2","A2","A3","A4")) ? -Bill On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman wrote: > Hi everyone. I have a dataframe that is a collection of Vendor IDs > plus a bank account number for each vendor. I'm trying to find a way > to count the number of duplicate bank accounts that occur in more than > one unique Vendor_ID, and then assign the count value for each row in > the dataframe in a new variable. > > I can do a count of bank accounts that occur within the same vendor > using dplyr and group_by and count, but I can't figure out a way to > count duplicates among multiple Vendor_IDs. > > > Dataframe example code: > > > #Create a sample data frame: > > set.seed(1) > > Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = > sample(1:1)) > > > > > Thanks in advance for any help. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting duplicate items that occur in multiple groups
Inline. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman wrote: > Hi everyone. I have a dataframe that is a collection of Vendor IDs > plus a bank account number for each vendor. I interpret this as: "all vendors are unique and each vendor has a single bank account." Is that correct? > I'm trying to find a way > to count the number of duplicate bank accounts that occur in more than > one unique Vendor_ID, The following makes no sense to me, as each row is a unique vendor and has only one bank account. > and then assign the count value for each row in > the dataframe in a new variable. > > I can do a count of bank accounts that occur within the same vendor > using dplyr and group_by and count, but I can't figure out a way to > count duplicates among multiple Vendor_IDs. > I interpret this to mean that you want to count vendor ID's by account . With only one account per vendor this is trivial; e.g. set.seed(22) d1 <- data.frame(id = sample(1:30), account = sample(1:20,30, replace = TRUE)) table(d1$account) ## gives 1 2 3 6 7 8 9 10 11 13 15 16 17 18 19 20 3 1 2 1 1 1 1 1 4 3 1 2 1 3 2 3 Note that AFAICS your example is useless, as it gives the same number of different account numbers as ID's, so no duplication can occur. As my interpretations are likely incorrect and this is not what you mean nor want, either clarify your meaning and provide a useful **minimal** example; or wait for a reply from someone with a better understanding than I. Cheers, Bert > > Dataframe example code: > > > #Create a sample data frame: > > set.seed(1) > > Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = > sample(1:1)) > > > > > Thanks in advance for any help. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting duplicate items that occur in multiple groups
Hi everyone. I have a dataframe that is a collection of Vendor IDs plus a bank account number for each vendor. I'm trying to find a way to count the number of duplicate bank accounts that occur in more than one unique Vendor_ID, and then assign the count value for each row in the dataframe in a new variable. I can do a count of bank accounts that occur within the same vendor using dplyr and group_by and count, but I can't figure out a way to count duplicates among multiple Vendor_IDs. Dataframe example code: #Create a sample data frame: set.seed(1) Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID = sample(1:1)) Thanks in advance for any help. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting unique values (summary stats)
You have several problems. As David W pointed out, there is no replace= argument in the unique() function. The first step in debugging your code should be to read the manual page for any function returning an error. Also you did not include a comma at the end of the line containing replace=TRUE. Finally the code for counting the missing values is more complicated than it needs to be. This code will only work if myData is a data frame that contains only columns with numeric data. options(digits=4) myData <- USArrests summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), sd=sapply(myData, sd, na.rm=TRUE), min=sapply(myData, min, na.rm=TRUE), max=sapply(myData, max, na.rm=TRUE), median=sapply(myData, median, na.rm=TRUE), length=sapply(myData, length), unique=sapply(myData, function (x) length(unique(x))), miss.val=sapply(myData, function(y) sum(is.na(y summary.stats mean sd min max median length unique miss.val # Murder 7.788 4.356 0.8 17.4 7.25 50 430 # Assault 170.760 83.338 45.0 337.0 159.00 50 450 # UrbanPop 65.540 14.475 32.0 91.0 66.00 50 360 # Rape 21.232 9.366 7.3 46.0 20.10 50 480 David L Carlson Department of Anthropology Texas A&M University College Station, TX 77843-4352 -Original Message- From: R-help On Behalf Of David Winsemius Sent: Thursday, March 21, 2019 5:55 PM To: reichm...@sbcglobal.net; 'r-help mailing list' Subject: Re: [R] counting unique values (summary stats) On 3/21/19 3:31 PM, reichm...@sbcglobal.net wrote: > r-help > > I have the following little scrip to create a df of summary stats. I'm > having problems obtaining the # of unique values > > unique=sapply(myData, function (x) > length(unique(x), replace = TRUE)) I just looked up the usage on `length` and do not see any possibility of using a "replace" parameter. It's also unclear what sort of data object `myData` might be. (And you might consider using column names other than the names of R functions.) -- David. > > Can I do that, or am I using the wrong R function? > > summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), > sd=sapply(myData, sd, na.rm=TRUE), > min=sapply(myData, min, na.rm=TRUE), > max=sapply(myData, max, na.rm=TRUE), > median=sapply(myData, median, na.rm=TRUE), > length=sapply(myData, length), > unique=sapply(myData, function (x) > length(unique(x), replace = TRUE)) > miss.val=sapply(myData, function(y) > sum(length(which(is.na(y)) > > Jeff Reichman > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting unique values (summary stats)
On 3/21/19 3:31 PM, reichm...@sbcglobal.net wrote: r-help I have the following little scrip to create a df of summary stats. I'm having problems obtaining the # of unique values unique=sapply(myData, function (x) length(unique(x), replace = TRUE)) I just looked up the usage on `length` and do not see any possibility of using a "replace" parameter. It's also unclear what sort of data object `myData` might be. (And you might consider using column names other than the names of R functions.) -- David. Can I do that, or am I using the wrong R function? summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), sd=sapply(myData, sd, na.rm=TRUE), min=sapply(myData, min, na.rm=TRUE), max=sapply(myData, max, na.rm=TRUE), median=sapply(myData, median, na.rm=TRUE), length=sapply(myData, length), unique=sapply(myData, function (x) length(unique(x), replace = TRUE)) miss.val=sapply(myData, function(y) sum(length(which(is.na(y)) Jeff Reichman __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting unique values (summary stats)
r-help I have the following little scrip to create a df of summary stats. I'm having problems obtaining the # of unique values unique=sapply(myData, function (x) length(unique(x), replace = TRUE)) Can I do that, or am I using the wrong R function? summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), sd=sapply(myData, sd, na.rm=TRUE), min=sapply(myData, min, na.rm=TRUE), max=sapply(myData, max, na.rm=TRUE), median=sapply(myData, median, na.rm=TRUE), length=sapply(myData, length), unique=sapply(myData, function (x) length(unique(x), replace = TRUE)) miss.val=sapply(myData, function(y) sum(length(which(is.na(y)) Jeff Reichman __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting nuber of sentences by qdap package
Hi all, I have a data frame with a variable Description containing text of speeches and I would like to count number of sentences in each speech, > str(data) 'data.frame': 255 obs. of 3 variables: $ Group : Factor w/ 255 levels "AlzheimerGroup1","AlzheimerGroup10",..: 1 112 179 190 201 212 223 234 245 2 ... $ Gender : int 1 1 0 0 0 0 0 1 0 0 ... $ Description: Factor w/ 255 levels "A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady "| __truncated__,..: 63 69 38 134 111 242 196 85 84 233 ... I want to use qdap package. Does anyone know how should I do this? Thanks for any help! Elahe __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting with multiple criteria using data table
To be fair, the OP did provide brief snippets of data.table usage below the data dump indicating some level of effort, but posted it all in HTML (what you see we do not see), did not make the example reproducible (dput is great, and library calls really clear things up [1][2][3]), and this looks suspiciously like homework (not on topic here, see the Posting Guide). [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example [2] http://adv-r.had.co.nz/Reproducibility.html [3] https://cran.r-project.org/web/packages/reprex/index.html -- Sent from my phone. Please excuse my brevity. On June 21, 2017 4:16:36 PM PDT, Bert Gunter wrote: >Have you gone through any R tutorials? If not, why not? If so, maybe >you need to spend some more time with them. > >It looks like you want us to do your work for you. We don't do this. >See (and follow) the posting guide below for what we might do (we're >volunteers, so no guarantees). > >Cheers, >Bert > > >Bert Gunter > >"The trouble with having an open mind is that people keep coming along >and sticking things into it." >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > >On Wed, Jun 21, 2017 at 2:50 PM, Ek Esawi wrote: >> I have a data.table which is shown below. I want to count >combinations of >> columns on i and count on j with by. A few examples are given below >the >> table. >> >> >> >> I want to: >> >> all months to show on the output including those that they have zero >value >> >> I want the three statements combined in on if possible so the output >will >> be one data table; that is the outputs are next to each other as >manually >> illustrated on the last part (desired output). >> >> >> >> >> >> Thanks--EK >> >> >> >> >> >>> Test >> >> Color Grade Value Month Day >> >> 1: yellow A20May 1 >> >> 2: green B25 June 2 >> >> 3: green A10 April 3 >> >> 4: black A17 August 3 >> >> 5:red C 5May 5 >> >> 6: orange D 0 June 13 >> >> 7: orange E12 April 5 >> >> 8: orange F11 August 8 >> >> 9: orange F99 April 23 >> >> 10: orange F70May 7 >> >> 11: black A77 June 11 >> >> 12: green B87 August 33 >> >> 13: black A79 April 9 >> >> 14: green A68May 14 >> >> 15: black C90 June 31 >> >> 16: green D79 August 11 >> >> 17: black E 101 April 17 >> >> 18:red F90 June 21 >> >> 19:red F 112 August 13 >> >> 20:red F 101 April 20 >> >>> Test[Color=="green"&Grade=="A", .N, by=Month] >> >>Month N >> >> 1: April 1 >> >> 2: May 1 >> >>> Test[Color=="orange"&Grade=="F", .N, by=Month] >> >> Month N >> >> 1: August 1 >> >> 2: April 1 >> >> 3:May 1 >> >> >> >>> Test[Color=="orange"&Grade=="F", .N, by=Month] >> >> Month N >> >> 1: August 1 >> >> 2: April 1 >> >> 3:May 1 >> >>> Test[Color=="red"&Grade=="F", .N, by=Month] >> >> Month N >> >> 1: June 1 >> >> 2: August 1 >> >> 3: April 1 >> >> >> >> Desired output >> >> N1 N2 N3 >> >> April 1 1 1 >> >> May 1 1 1 >> >> June0 0 0 >> >> August 01 1 >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting with multiple criteria using data table
> On Jun 21, 2017, at 2:50 PM, Ek Esawi wrote: > > I have a data.table which is shown below. I want to count combinations of > columns on i and count on j with by. A few examples are given below the > table. > > > > I want to: > > all months to show on the output including those that they have zero value > > I want the three statements combined in on if possible so the output will > be one data table; that is the outputs are next to each other as manually > illustrated on the last part (desired output). > > > > > > Thanks--EK > > > > > >> Test > > Color Grade Value Month Day > > 1: yellow A20May 1 > > 2: green B25 June 2 > > 3: green A10 April 3 > > 4: black A17 August 3 > > 5:red C 5May 5 > > 6: orange D 0 June 13 > > 7: orange E12 April 5 > > 8: orange F11 August 8 > > 9: orange F99 April 23 > > 10: orange F70May 7 > > 11: black A77 June 11 > > 12: green B87 August 33 > > 13: black A79 April 9 > > 14: green A68May 14 > > 15: black C90 June 31 > > 16: green D79 August 11 > > 17: black E 101 April 17 > > 18:red F90 June 21 > > 19:red F 112 August 13 > > 20:red F 101 April 20 You should have offered the output of: dput(Test) > >> Test[Color=="green"&Grade=="A", .N, by=Month] > > Month N > > 1: April 1 > > 2: May 1 > >> Test[Color=="orange"&Grade=="F", .N, by=Month] > >Month N > > 1: August 1 > > 2: April 1 > > 3:May 1 > > > >> Test[Color=="orange"&Grade=="F", .N, by=Month] > >Month N > > 1: August 1 > > 2: April 1 > > 3:May 1 > >> Test[Color=="red"&Grade=="F", .N, by=Month] > >Month N > > 1: June 1 > > 2: August 1 > > 3: April 1 > > > > Desired output > >N1 N2 N3 > > April 1 1 1 > > May 1 1 1 > > June0 0 0 > > August 01 1 I count 4 data.tables and a total of 11 items so why only 3 columns and 9 items? Were you tabulating colors by month? > Test[ (Color=="green"&Grade=="A") | (Color=="red"&Grade=="F") | (Color=="orange"&Grade=="F")| (Color=="orange"&Grade=="F")| (Color=="red"&Grade=="F") ,table(Month, Color)] Color Monthgreen orange red April 1 1 1 August 0 1 1 June 0 0 1 May1 1 0 > > > [[alternative HTML version deleted]] Rhelp is plain-text. Do read the Posting Guide: Another possibility: Test[(Color=="green"&Grade=="A") |These are the conditions separated by logical OR's in the first argument to `[data.table` (Color=="red"&Grade=="F") | (Color=="orange"&Grade=="F")| (Color=="orange"&Grade=="F")| (Color=="red"&Grade=="F") , table(Month, Grade)] Grade MonthA F April 1 2 August 0 2 June 0 1 May1 1 > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting with multiple criteria using data table
Have you gone through any R tutorials? If not, why not? If so, maybe you need to spend some more time with them. It looks like you want us to do your work for you. We don't do this. See (and follow) the posting guide below for what we might do (we're volunteers, so no guarantees). Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Jun 21, 2017 at 2:50 PM, Ek Esawi wrote: > I have a data.table which is shown below. I want to count combinations of > columns on i and count on j with by. A few examples are given below the > table. > > > > I want to: > > all months to show on the output including those that they have zero value > > I want the three statements combined in on if possible so the output will > be one data table; that is the outputs are next to each other as manually > illustrated on the last part (desired output). > > > > > > Thanks--EK > > > > > >> Test > > Color Grade Value Month Day > > 1: yellow A20May 1 > > 2: green B25 June 2 > > 3: green A10 April 3 > > 4: black A17 August 3 > > 5:red C 5May 5 > > 6: orange D 0 June 13 > > 7: orange E12 April 5 > > 8: orange F11 August 8 > > 9: orange F99 April 23 > > 10: orange F70May 7 > > 11: black A77 June 11 > > 12: green B87 August 33 > > 13: black A79 April 9 > > 14: green A68May 14 > > 15: black C90 June 31 > > 16: green D79 August 11 > > 17: black E 101 April 17 > > 18:red F90 June 21 > > 19:red F 112 August 13 > > 20:red F 101 April 20 > >> Test[Color=="green"&Grade=="A", .N, by=Month] > >Month N > > 1: April 1 > > 2: May 1 > >> Test[Color=="orange"&Grade=="F", .N, by=Month] > > Month N > > 1: August 1 > > 2: April 1 > > 3:May 1 > > > >> Test[Color=="orange"&Grade=="F", .N, by=Month] > > Month N > > 1: August 1 > > 2: April 1 > > 3:May 1 > >> Test[Color=="red"&Grade=="F", .N, by=Month] > > Month N > > 1: June 1 > > 2: August 1 > > 3: April 1 > > > > Desired output > > N1 N2 N3 > > April 1 1 1 > > May 1 1 1 > > June0 0 0 > > August 01 1 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting with multiple criteria using data table
I have a data.table which is shown below. I want to count combinations of columns on i and count on j with by. A few examples are given below the table. I want to: all months to show on the output including those that they have zero value I want the three statements combined in on if possible so the output will be one data table; that is the outputs are next to each other as manually illustrated on the last part (desired output). Thanks--EK > Test Color Grade Value Month Day 1: yellow A20May 1 2: green B25 June 2 3: green A10 April 3 4: black A17 August 3 5:red C 5May 5 6: orange D 0 June 13 7: orange E12 April 5 8: orange F11 August 8 9: orange F99 April 23 10: orange F70May 7 11: black A77 June 11 12: green B87 August 33 13: black A79 April 9 14: green A68May 14 15: black C90 June 31 16: green D79 August 11 17: black E 101 April 17 18:red F90 June 21 19:red F 112 August 13 20:red F 101 April 20 > Test[Color=="green"&Grade=="A", .N, by=Month] Month N 1: April 1 2: May 1 > Test[Color=="orange"&Grade=="F", .N, by=Month] Month N 1: August 1 2: April 1 3:May 1 > Test[Color=="orange"&Grade=="F", .N, by=Month] Month N 1: August 1 2: April 1 3:May 1 > Test[Color=="red"&Grade=="F", .N, by=Month] Month N 1: June 1 2: August 1 3: April 1 Desired output N1 N2 N3 April 1 1 1 May 1 1 1 June0 0 0 August 01 1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting enumerated items in each element of a character vector
Let's be a bit careful. You'll probably need a regular expression. But maybe a regex can't work in principle, so one can't just gloss over the details. You said: "blah blah blah" can contain ANY text. If this is true, "blah blah blah" could contain the delimiters. If that is the case, a regex is not powerful enough in principle and you need a context-sensitive parser. So let's have a list of valid demarcations. From what you write I can guess that ... text2 <- c( "blah 1) blah blah blah 1", "blah 10. blah blah blah 1", "blah 1) 1) blah blah blah 1", "blah 1. 10) blah blah blah 1", "blah 1) 1. blah blah blah 1", "blah 10. 10. blah blah blah 1" ) ... captures the variation. But that's just my guess from staring at your examples. I can't be sure - that's your task to contribute. On text2, the regular expression ... "(\d+(\)|\.)\s*){1,2}" ... gives the expected result of # [1] 1 1 1 1 1 1 ... and ... # [1] 5 5 5 5 ... on your text1. In code: library(stringr) str_count(text1, "(\\d+(\\)|\\.)\\s*){1,2}") > On Apr 26, 2017, at 10:13 AM, Dan Abner wrote: > > Hi all, > > I am looking for a streamlined way of counting the number of enumerated items > are each element of a character vector. For example: > > > text1<-c("blah blah blah. > blah blah blah > 1) blah blah blah 1 > 2) blah blah blah > 10) blah 10 blah blah > blah blah blah > 1) blah blah blah > 2) blah blah blah 2 > blah blah blah.","blah blah blah. > blah blah blah > 1. blah blah blah 1 > 2. blah blah blah > 10.blah 10 blah blah > blah blah blah > 1. blah blah blah 1 > 2. blah blah blah > blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) > blah blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) > blah blah blah. blah blah blah." > ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah. 10. > blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah blah. > blah blah blah.") > > text1 > > === > > I would like the result to be c(5,5,5,5). Notice that sometimes there are > leading hard returns, other times not. Sometimes are there separate lists and > the same numbers are used in the enumerated items multiple times within each > character string. Sometimes the leading numbers for the enumerated items > exceed single digits. Notice that the delimiter may be ) or a period (.). If > the delimiter is a period and there are hard returns (example 2), then I > expect that will be easy enough to differentiate sentences ending with a > number from enumerated items. However, I imagine it would be much more > difficult to differentiate the two for example 4. > > Any suggestions are appreciated. > > Best, > > Dan > > On Wed, Apr 26, 2017 at 8:35 AM, Boris Steipe > wrote: > What's the expected output for this sample? > > How do _you_ define what should be counted? > > > > > > > On Apr 26, 2017, at 8:33 AM, Dan Abner wrote: > > > > Hi all, > > > > I was not clearly enough in my example code. Please see below where "blah > > blah blah" can be ANY text or numbers: No predictable pattern at all to > > what may or may not be written in place of "blah blah blah". > > > > text1<-c("blah blah blah. > > blah blah blah > > 1) blah blah blah 1 > > 2) blah blah blah > > 10) blah 10 blah blah > > blah blah blah > > 1) blah blah blah > > 2) blah blah blah 2 > > blah blah blah.","blah blah blah. > > blah blah blah > > 1. blah blah blah 1 > > 2. blah blah blah > > 10.blah 10 blah blah > > blah blah blah > > 1. blah blah blah 1 > > 2. blah blah blah > > blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) > > blah > > blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah > > blah blah. blah blah blah." > > ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah. > > 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah > > blah. blah blah blah.") > > > > text1 > > > > Thank you in advance for your suggestions and/or guidance. > > > > Best, > > > > Dan > > > > > > On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon >> wrote: > > > >> Thanks, Ista. I thought there might be a "tidy" way to do this, but I > >> hadn't use stringr. > >> > >> -- Mike > >> > >> > >> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn wrote: > >>> stringr::str_count (and stringi::stri_count that it wraps) interpret > >>> the pattern argument as a regular expression by default. > >>> > >>> Best, > >>> Ista > >>> > >>> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon > >>> wrote: > I like Boris's "Hadley" solution. For the record, I've appended a > version that uses regular expressions, the only benefit of which is > that it could be generalized to find more-complicated patterns. > > -- Mike > > counts <- sapply(text1, function(next_string) { > loc_example <- length(gregexpr("Example", next_string)[[1]]) > loc_example > }, USE.NAM
Re: [R] Counting enumerated items in each element of a character vector
What's the expected output for this sample? How do _you_ define what should be counted? > On Apr 26, 2017, at 8:33 AM, Dan Abner wrote: > > Hi all, > > I was not clearly enough in my example code. Please see below where "blah > blah blah" can be ANY text or numbers: No predictable pattern at all to > what may or may not be written in place of "blah blah blah". > > text1<-c("blah blah blah. > blah blah blah > 1) blah blah blah 1 > 2) blah blah blah > 10) blah 10 blah blah > blah blah blah > 1) blah blah blah > 2) blah blah blah 2 > blah blah blah.","blah blah blah. > blah blah blah > 1. blah blah blah 1 > 2. blah blah blah > 10.blah 10 blah blah > blah blah blah > 1. blah blah blah 1 > 2. blah blah blah > blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) blah > blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah > blah blah. blah blah blah." > ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah. > 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah > blah. blah blah blah.") > > text1 > > Thank you in advance for your suggestions and/or guidance. > > Best, > > Dan > > > On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon > wrote: > >> Thanks, Ista. I thought there might be a "tidy" way to do this, but I >> hadn't use stringr. >> >> -- Mike >> >> >> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn wrote: >>> stringr::str_count (and stringi::stri_count that it wraps) interpret >>> the pattern argument as a regular expression by default. >>> >>> Best, >>> Ista >>> >>> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon >>> wrote: I like Boris's "Hadley" solution. For the record, I've appended a version that uses regular expressions, the only benefit of which is that it could be generalized to find more-complicated patterns. -- Mike counts <- sapply(text1, function(next_string) { loc_example <- length(gregexpr("Example", next_string)[[1]]) loc_example }, USE.NAMES=FALSE) > counts [1] 5 5 5 5 > On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe >> wrote: > I should add: there's a str_count() function in the stringr package. > > library(stringr) > str_count(text1, "Example") > # [1] 5 5 5 5 > > I guess that would be the neater solution. > > B. > > > >> On Apr 25, 2017, at 8:23 PM, Boris Steipe >> wrote: >> >> How about: >> >> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 >> } )) >> >> >> Splitting your string on the five "Examples" in each gives six >> elements. length(x) - 1 is the number of >> matches. You can use any regex instead of "example" if you need to >> tweak what you are looking for. >> >> >> B. >> >> >> >> >>> On Apr 25, 2017, at 8:14 PM, Dan Abner >> wrote: >>> >>> Hi all, >>> >>> I am looking for a streamlined way of counting the number of >> enumerated >>> items are each element of a character vector. For example: >>> >>> >>> text1<-c("This is an example. >>> List 1 >>> 1) Example 1 >>> 2) Example 2 >>> 10) Example 10 >>> List 2 >>> 1) Example 1 >>> 2) Example 2 >>> These have been examples.","This is another example. >>> List 1 >>> 1. Example 1 >>> 2. Example 2 >>> 10. Example 10 >>> List 2 >>> 1. Example 1 >>> 2. Example 2 >>> These have been examples.","This is a third example. List 1 1) >> Example 1. >>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. >> These have >>> been examples." >>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. >> Example >>> 10. List 2 Example 1. 2. Example 2. These have been examples.") >>> >>> text1 >>> >>> === >>> >>> I would like the result to be c(5,5,5,5). Notice that sometimes >> there are >>> leading hard returns, other times not. Sometimes are there separate >> lists >>> and the same numbers are used in the enumerated items multiple times >> within >>> each character string. Sometimes the leading numbers for the >> enumerated >>> items exceed single digits. Notice that the delimiter may be ) or a >> period >>> (.). If the delimiter is a period and there are hard returns >> (example 2), >>> then I expect that will be easy enough to differentiate sentences >> ending >>> with a number from enumerated items. However, I imagine it would be >> much >>> more difficult to differentiate the two for example 4. >>> >>> Any suggestions are appreciated. >>> >>> Best, >>> >>> Dan >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/
Re: [R] Counting enumerated items in each element of a character vector
Hi all, I was not clearly enough in my example code. Please see below where "blah blah blah" can be ANY text or numbers: No predictable pattern at all to what may or may not be written in place of "blah blah blah". text1<-c("blah blah blah. blah blah blah 1) blah blah blah 1 2) blah blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 2) blah blah blah 2 blah blah blah.","blah blah blah. blah blah blah 1. blah blah blah 1 2. blah blah blah 10.blah 10 blah blah blah blah blah 1. blah blah blah 1 2. blah blah blah blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) blah blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah blah blah. blah blah blah." ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah. 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah blah. blah blah blah.") text1 Thank you in advance for your suggestions and/or guidance. Best, Dan On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon wrote: > Thanks, Ista. I thought there might be a "tidy" way to do this, but I > hadn't use stringr. > > -- Mike > > > On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn wrote: > > stringr::str_count (and stringi::stri_count that it wraps) interpret > > the pattern argument as a regular expression by default. > > > > Best, > > Ista > > > > On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon > > wrote: > >> I like Boris's "Hadley" solution. For the record, I've appended a > >> version that uses regular expressions, the only benefit of which is > >> that it could be generalized to find more-complicated patterns. > >> > >> -- Mike > >> > >> counts <- sapply(text1, function(next_string) { > >> loc_example <- length(gregexpr("Example", next_string)[[1]]) > >> loc_example > >> }, USE.NAMES=FALSE) > >> > >>> counts > >> [1] 5 5 5 5 > >>> > >> > >> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe > wrote: > >>> I should add: there's a str_count() function in the stringr package. > >>> > >>> library(stringr) > >>> str_count(text1, "Example") > >>> # [1] 5 5 5 5 > >>> > >>> I guess that would be the neater solution. > >>> > >>> B. > >>> > >>> > >>> > On Apr 25, 2017, at 8:23 PM, Boris Steipe > wrote: > > How about: > > unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 > } )) > > > Splitting your string on the five "Examples" in each gives six > elements. length(x) - 1 is the number of > matches. You can use any regex instead of "example" if you need to > tweak what you are looking for. > > > B. > > > > > > On Apr 25, 2017, at 8:14 PM, Dan Abner > wrote: > > > > Hi all, > > > > I am looking for a streamlined way of counting the number of > enumerated > > items are each element of a character vector. For example: > > > > > > text1<-c("This is an example. > > List 1 > > 1) Example 1 > > 2) Example 2 > > 10) Example 10 > > List 2 > > 1) Example 1 > > 2) Example 2 > > These have been examples.","This is another example. > > List 1 > > 1. Example 1 > > 2. Example 2 > > 10. Example 10 > > List 2 > > 1. Example 1 > > 2. Example 2 > > These have been examples.","This is a third example. List 1 1) > Example 1. > > 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. > These have > > been examples." > > ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. > Example > > 10. List 2 Example 1. 2. Example 2. These have been examples.") > > > > text1 > > > > === > > > > I would like the result to be c(5,5,5,5). Notice that sometimes > there are > > leading hard returns, other times not. Sometimes are there separate > lists > > and the same numbers are used in the enumerated items multiple times > within > > each character string. Sometimes the leading numbers for the > enumerated > > items exceed single digits. Notice that the delimiter may be ) or a > period > > (.). If the delimiter is a period and there are hard returns > (example 2), > > then I expect that will be easy enough to differentiate sentences > ending > > with a number from enumerated items. However, I imagine it would be > much > > more difficult to differentiate the two for example 4. > > > > Any suggestions are appreciated. > > > > Best, > > > > Dan > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing li
Re: [R] Counting enumerated items in each element of a character vector
Thanks, Ista. I thought there might be a "tidy" way to do this, but I hadn't use stringr. -- Mike On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn wrote: > stringr::str_count (and stringi::stri_count that it wraps) interpret > the pattern argument as a regular expression by default. > > Best, > Ista > > On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon > wrote: >> I like Boris's "Hadley" solution. For the record, I've appended a >> version that uses regular expressions, the only benefit of which is >> that it could be generalized to find more-complicated patterns. >> >> -- Mike >> >> counts <- sapply(text1, function(next_string) { >> loc_example <- length(gregexpr("Example", next_string)[[1]]) >> loc_example >> }, USE.NAMES=FALSE) >> >>> counts >> [1] 5 5 5 5 >>> >> >> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe >> wrote: >>> I should add: there's a str_count() function in the stringr package. >>> >>> library(stringr) >>> str_count(text1, "Example") >>> # [1] 5 5 5 5 >>> >>> I guess that would be the neater solution. >>> >>> B. >>> >>> >>> On Apr 25, 2017, at 8:23 PM, Boris Steipe wrote: How about: unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of matches. You can use any regex instead of "example" if you need to tweak what you are looking for. B. > On Apr 25, 2017, at 8:14 PM, Dan Abner wrote: > > Hi all, > > I am looking for a streamlined way of counting the number of enumerated > items are each element of a character vector. For example: > > > text1<-c("This is an example. > List 1 > 1) Example 1 > 2) Example 2 > 10) Example 10 > List 2 > 1) Example 1 > 2) Example 2 > These have been examples.","This is another example. > List 1 > 1. Example 1 > 2. Example 2 > 10. Example 10 > List 2 > 1. Example 1 > 2. Example 2 > These have been examples.","This is a third example. List 1 1) Example 1. > 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These > have > been examples." > ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example > 10. List 2 Example 1. 2. Example 2. These have been examples.") > > text1 > > === > > I would like the result to be c(5,5,5,5). Notice that sometimes there are > leading hard returns, other times not. Sometimes are there separate lists > and the same numbers are used in the enumerated items multiple times > within > each character string. Sometimes the leading numbers for the enumerated > items exceed single digits. Notice that the delimiter may be ) or a period > (.). If the delimiter is a period and there are hard returns (example 2), > then I expect that will be easy enough to differentiate sentences ending > with a number from enumerated items. However, I imagine it would be much > more difficult to differentiate the two for example 4. > > Any suggestions are appreciated. > > Best, > > Dan > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting enumerated items in each element of a character vector
stringr::str_count (and stringi::stri_count that it wraps) interpret the pattern argument as a regular expression by default. Best, Ista On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon wrote: > I like Boris's "Hadley" solution. For the record, I've appended a > version that uses regular expressions, the only benefit of which is > that it could be generalized to find more-complicated patterns. > > -- Mike > > counts <- sapply(text1, function(next_string) { > loc_example <- length(gregexpr("Example", next_string)[[1]]) > loc_example > }, USE.NAMES=FALSE) > >> counts > [1] 5 5 5 5 >> > > On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe > wrote: >> I should add: there's a str_count() function in the stringr package. >> >> library(stringr) >> str_count(text1, "Example") >> # [1] 5 5 5 5 >> >> I guess that would be the neater solution. >> >> B. >> >> >> >>> On Apr 25, 2017, at 8:23 PM, Boris Steipe wrote: >>> >>> How about: >>> >>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) >>> >>> >>> Splitting your string on the five "Examples" in each gives six elements. >>> length(x) - 1 is the number of >>> matches. You can use any regex instead of "example" if you need to tweak >>> what you are looking for. >>> >>> >>> B. >>> >>> >>> >>> On Apr 25, 2017, at 8:14 PM, Dan Abner wrote: Hi all, I am looking for a streamlined way of counting the number of enumerated items are each element of a character vector. For example: text1<-c("This is an example. List 1 1) Example 1 2) Example 2 10) Example 10 List 2 1) Example 1 2) Example 2 These have been examples.","This is another example. List 1 1. Example 1 2. Example 2 10. Example 10 List 2 1. Example 1 2. Example 2 These have been examples.","This is a third example. List 1 1) Example 1. 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have been examples." ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example 10. List 2 Example 1. 2. Example 2. These have been examples.") text1 === I would like the result to be c(5,5,5,5). Notice that sometimes there are leading hard returns, other times not. Sometimes are there separate lists and the same numbers are used in the enumerated items multiple times within each character string. Sometimes the leading numbers for the enumerated items exceed single digits. Notice that the delimiter may be ) or a period (.). If the delimiter is a period and there are hard returns (example 2), then I expect that will be easy enough to differentiate sentences ending with a number from enumerated items. However, I imagine it would be much more difficult to differentiate the two for example 4. Any suggestions are appreciated. Best, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting enumerated items in each element of a character vector
I like Boris's "Hadley" solution. For the record, I've appended a version that uses regular expressions, the only benefit of which is that it could be generalized to find more-complicated patterns. -- Mike counts <- sapply(text1, function(next_string) { loc_example <- length(gregexpr("Example", next_string)[[1]]) loc_example }, USE.NAMES=FALSE) > counts [1] 5 5 5 5 > On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe wrote: > I should add: there's a str_count() function in the stringr package. > > library(stringr) > str_count(text1, "Example") > # [1] 5 5 5 5 > > I guess that would be the neater solution. > > B. > > > >> On Apr 25, 2017, at 8:23 PM, Boris Steipe wrote: >> >> How about: >> >> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) >> >> >> Splitting your string on the five "Examples" in each gives six elements. >> length(x) - 1 is the number of >> matches. You can use any regex instead of "example" if you need to tweak >> what you are looking for. >> >> >> B. >> >> >> >> >>> On Apr 25, 2017, at 8:14 PM, Dan Abner wrote: >>> >>> Hi all, >>> >>> I am looking for a streamlined way of counting the number of enumerated >>> items are each element of a character vector. For example: >>> >>> >>> text1<-c("This is an example. >>> List 1 >>> 1) Example 1 >>> 2) Example 2 >>> 10) Example 10 >>> List 2 >>> 1) Example 1 >>> 2) Example 2 >>> These have been examples.","This is another example. >>> List 1 >>> 1. Example 1 >>> 2. Example 2 >>> 10. Example 10 >>> List 2 >>> 1. Example 1 >>> 2. Example 2 >>> These have been examples.","This is a third example. List 1 1) Example 1. >>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have >>> been examples." >>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example >>> 10. List 2 Example 1. 2. Example 2. These have been examples.") >>> >>> text1 >>> >>> === >>> >>> I would like the result to be c(5,5,5,5). Notice that sometimes there are >>> leading hard returns, other times not. Sometimes are there separate lists >>> and the same numbers are used in the enumerated items multiple times within >>> each character string. Sometimes the leading numbers for the enumerated >>> items exceed single digits. Notice that the delimiter may be ) or a period >>> (.). If the delimiter is a period and there are hard returns (example 2), >>> then I expect that will be easy enough to differentiate sentences ending >>> with a number from enumerated items. However, I imagine it would be much >>> more difficult to differentiate the two for example 4. >>> >>> Any suggestions are appreciated. >>> >>> Best, >>> >>> Dan >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting enumerated items in each element of a character vector
I should add: there's a str_count() function in the stringr package. library(stringr) str_count(text1, "Example") # [1] 5 5 5 5 I guess that would be the neater solution. B. > On Apr 25, 2017, at 8:23 PM, Boris Steipe wrote: > > How about: > > unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) > > > Splitting your string on the five "Examples" in each gives six elements. > length(x) - 1 is the number of > matches. You can use any regex instead of "example" if you need to tweak what > you are looking for. > > > B. > > > > >> On Apr 25, 2017, at 8:14 PM, Dan Abner wrote: >> >> Hi all, >> >> I am looking for a streamlined way of counting the number of enumerated >> items are each element of a character vector. For example: >> >> >> text1<-c("This is an example. >> List 1 >> 1) Example 1 >> 2) Example 2 >> 10) Example 10 >> List 2 >> 1) Example 1 >> 2) Example 2 >> These have been examples.","This is another example. >> List 1 >> 1. Example 1 >> 2. Example 2 >> 10. Example 10 >> List 2 >> 1. Example 1 >> 2. Example 2 >> These have been examples.","This is a third example. List 1 1) Example 1. >> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have >> been examples." >> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example >> 10. List 2 Example 1. 2. Example 2. These have been examples.") >> >> text1 >> >> === >> >> I would like the result to be c(5,5,5,5). Notice that sometimes there are >> leading hard returns, other times not. Sometimes are there separate lists >> and the same numbers are used in the enumerated items multiple times within >> each character string. Sometimes the leading numbers for the enumerated >> items exceed single digits. Notice that the delimiter may be ) or a period >> (.). If the delimiter is a period and there are hard returns (example 2), >> then I expect that will be easy enough to differentiate sentences ending >> with a number from enumerated items. However, I imagine it would be much >> more difficult to differentiate the two for example 4. >> >> Any suggestions are appreciated. >> >> Best, >> >> Dan >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting enumerated items in each element of a character vector
How about: unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } )) Splitting your string on the five "Examples" in each gives six elements. length(x) - 1 is the number of matches. You can use any regex instead of "example" if you need to tweak what you are looking for. B. > On Apr 25, 2017, at 8:14 PM, Dan Abner wrote: > > Hi all, > > I am looking for a streamlined way of counting the number of enumerated > items are each element of a character vector. For example: > > > text1<-c("This is an example. > List 1 > 1) Example 1 > 2) Example 2 > 10) Example 10 > List 2 > 1) Example 1 > 2) Example 2 > These have been examples.","This is another example. > List 1 > 1. Example 1 > 2. Example 2 > 10. Example 10 > List 2 > 1. Example 1 > 2. Example 2 > These have been examples.","This is a third example. List 1 1) Example 1. > 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have > been examples." > ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example > 10. List 2 Example 1. 2. Example 2. These have been examples.") > > text1 > > === > > I would like the result to be c(5,5,5,5). Notice that sometimes there are > leading hard returns, other times not. Sometimes are there separate lists > and the same numbers are used in the enumerated items multiple times within > each character string. Sometimes the leading numbers for the enumerated > items exceed single digits. Notice that the delimiter may be ) or a period > (.). If the delimiter is a period and there are hard returns (example 2), > then I expect that will be easy enough to differentiate sentences ending > with a number from enumerated items. However, I imagine it would be much > more difficult to differentiate the two for example 4. > > Any suggestions are appreciated. > > Best, > > Dan > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting enumerated items in each element of a character vector
Hi all, I am looking for a streamlined way of counting the number of enumerated items are each element of a character vector. For example: text1<-c("This is an example. List 1 1) Example 1 2) Example 2 10) Example 10 List 2 1) Example 1 2) Example 2 These have been examples.","This is another example. List 1 1. Example 1 2. Example 2 10. Example 10 List 2 1. Example 1 2. Example 2 These have been examples.","This is a third example. List 1 1) Example 1. 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have been examples." ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example 10. List 2 Example 1. 2. Example 2. These have been examples.") text1 === I would like the result to be c(5,5,5,5). Notice that sometimes there are leading hard returns, other times not. Sometimes are there separate lists and the same numbers are used in the enumerated items multiple times within each character string. Sometimes the leading numbers for the enumerated items exceed single digits. Notice that the delimiter may be ) or a period (.). If the delimiter is a period and there are hard returns (example 2), then I expect that will be easy enough to differentiate sentences ending with a number from enumerated items. However, I imagine it would be much more difficult to differentiate the two for example 4. Any suggestions are appreciated. Best, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
On 03/10/15 04:42, David Winsemius wrote: On Oct 2, 2015, at 2:33 AM, Duncan Murdoch wrote: The zoo package replaces as.Date.numeric() with a function that assumes an origin of "1970-01-01". There may be other packages that also make a replacement like this. David appears to have one of them attached, and you don't. Quite right, Duncan. I failed to include the even though it was staring me in the face. My wife says I have an extreme case of "refrigerator blindness" which now seems to be spreading to other areas of my cognitive activities. Sorry, Rolf. Quite alright. The syndrome is *very* familiar to me! :-) cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
On Oct 2, 2015, at 2:33 AM, Duncan Murdoch wrote: > On 01/10/2015 11:29 PM, Rolf Turner wrote: >> On 02/10/15 15:47, David Winsemius wrote: >> >> >> >>> On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote: P.S. I have been unable to find a corresponding vector of the names of the days of the week, although I have a very vague recollection of the existence of such a vector. Does it exist, and if so what is it called? >>> >>> It's could called up by strptime because it is mapped to a character >>> vector by the internationalization database: >>> format( as.Date(1:7)+2, format="%A") >>> [1] "Sunday""Monday""Tuesday" "Wednesday" "Thursday" >>> "Friday" [7] "Saturday" >> >> >> >> When I try that (copying and pasting your code so that there's no chance >> of fumble-fingering) I get: >> >>> Error in as.Date.numeric(1:7) : 'origin' must be supplied >> >> Why do these things always happen to *me*??? > > The zoo package replaces as.Date.numeric() with a function that assumes > an origin of "1970-01-01". There may be other packages that also make a > replacement like this. David appears to have one of them attached, and > you don't. Quite right, Duncan. I failed to include the even though it was staring me in the face. My wife says I have an extreme case of "refrigerator blindness" which now seems to be spreading to other areas of my cognitive activities. Sorry, Rolf. -- David. > > Duncan Murdoch > David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
On 01/10/2015 11:29 PM, Rolf Turner wrote: > On 02/10/15 15:47, David Winsemius wrote: > > > >> On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote: >>> >>> P.S. I have been unable to find a corresponding vector of the names >>> of the days of the week, although I have a very vague recollection >>> of the existence of such a vector. Does it exist, and if so what >>> is it called? >> >> It's could called up by strptime because it is mapped to a character >> vector by the internationalization database: >> >>> format( as.Date(1:7)+2, format="%A") >> [1] "Sunday""Monday""Tuesday" "Wednesday" "Thursday" >> "Friday" [7] "Saturday" > > > > When I try that (copying and pasting your code so that there's no chance > of fumble-fingering) I get: > >> Error in as.Date.numeric(1:7) : 'origin' must be supplied > > Why do these things always happen to *me*??? The zoo package replaces as.Date.numeric() with a function that assumes an origin of "1970-01-01". There may be other packages that also make a replacement like this. David appears to have one of them attached, and you don't. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
On Oct 1, 2015, at 8:29 PM, Rolf Turner wrote: > On 02/10/15 15:47, David Winsemius wrote: > > > >> On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote: >>> >>> P.S. I have been unable to find a corresponding vector of the names >>> of the days of the week, although I have a very vague recollection >>> of the existence of such a vector. Does it exist, and if so what >>> is it called? >> >> It's could called up by strptime because it is mapped to a character >> vector by the internationalization database: >> >>> format( as.Date(1:7)+2, format="%A") >> [1] "Sunday""Monday""Tuesday" "Wednesday" "Thursday" >> "Friday" [7] "Saturday" > > > > When I try that (copying and pasting your code so that there's no chance of > fumble-fingering) I get: > >> Error in as.Date.numeric(1:7) : 'origin' must be supplied > > Why do these things always happen to *me*??? Or why am I so lucky as to avoid the need for an origin when the help page says the call is: ## S3 method for class 'numeric' as.Date(x, origin, ...)# noting no default in the formals The code says that origin should be supplied if it is missing: > as.Date.numeric function (x, origin, ...) { if (missing(origin)) origin <- "1970-01-01" if (identical(origin, "-00-00")) origin <- as.Date("-01-01", ...) - 1 as.Date(origin, ...) + x } -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
On 02/10/15 15:47, David Winsemius wrote: On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote: P.S. I have been unable to find a corresponding vector of the names of the days of the week, although I have a very vague recollection of the existence of such a vector. Does it exist, and if so what is it called? It's could called up by strptime because it is mapped to a character vector by the internationalization database: format( as.Date(1:7)+2, format="%A") [1] "Sunday""Monday""Tuesday" "Wednesday" "Thursday" "Friday" [7] "Saturday" When I try that (copying and pasting your code so that there's no chance of fumble-fingering) I get: Error in as.Date.numeric(1:7) : 'origin' must be supplied Why do these things always happen to *me*??? cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote: > On 02/10/15 10:54, peter dalgaard wrote: > >>> On 01 Oct 2015, at 23:04 , Rolf Turner >>> wrote: >>> >>> On 02/10/15 03:45, David L Carlson wrote: >>> >>> >>> If you want the month names: > mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", + "July", "Aug", "Sep", "Oct", "Nov", "Dec") > dimnames(tbl)$Month <- mnt >>> >>> >>> >>> Unnecessary typing; there is a built-in data set "month.abb" (in >>> the "base" package) that is identical to your "mnt". >>> >>> Difficult (nearly impossible!) to find, but, if you can't quite >>> remember the name! I *knew* I'd seen it, so I persisted and >>> eventually tracked it down. >>> >>> Strangely ??month or help.search("month") yield no trace of it. >>> Pages and pages of (useless!) output but no sign of "month.abb" >>> (nor of "month.name" which gives the unabbreviated month names). >>> >>> Can anyone explain to me why "??" and help.search() are of no help >>> here? >> >> Umm, >> >> --- Help files with alias or concept or title matching ‘month’ >> using fuzzy matching: >> >> >> base::Constants Built-in Constants Aliases: month.abb, >> month.name --- > > Hmm. When I did ??month I got a completely different display. It > contained *absolutely no* mention of month.abb. That *seems* to be > because I have help_type set to "html". When I re-set help_type to > "text", I get a display like unto the one that you obtained (and it does > indeed lead one to month.abb). > > It seems to me ver' strange that one gets a different collection of > information under help_type="text" than one does under help_type="html". > If I were me, I would classify this as a bug. > >> Also, entering "month" gives the completions >> >>> month >> month.abb monthplot months.Date month.name months >> months.POSIXt > > Yes, I eventually managed to come up with this trick as well. But that is > not really relevant to the phenomenon that "??" or help.search() don't work > effectively, or at least not consistently (the effectiveness appearing to > depend --- for some bizarre reason --- on the value of help_type). > > cheers, > > Rolf > > P.S. I have been unable to find a corresponding vector of the names of the > days of the week, although I have a very vague recollection of the existence > of such a vector. Does it exist, and if so what is it called? It's could called up by strptime because it is mapped to a character vector by the internationalization database: > format( as.Date(1:7)+2, format="%A") [1] "Sunday""Monday""Tuesday" "Wednesday" "Thursday" "Friday" [7] "Saturday" > Or is my recollection an illusion brought on by advancing senility? > > R. > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
On 02/10/15 10:54, peter dalgaard wrote: On 01 Oct 2015, at 23:04 , Rolf Turner wrote: On 02/10/15 03:45, David L Carlson wrote: If you want the month names: mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", + "July", "Aug", "Sep", "Oct", "Nov", "Dec") dimnames(tbl)$Month <- mnt Unnecessary typing; there is a built-in data set "month.abb" (in the "base" package) that is identical to your "mnt". Difficult (nearly impossible!) to find, but, if you can't quite remember the name! I *knew* I'd seen it, so I persisted and eventually tracked it down. Strangely ??month or help.search("month") yield no trace of it. Pages and pages of (useless!) output but no sign of "month.abb" (nor of "month.name" which gives the unabbreviated month names). Can anyone explain to me why "??" and help.search() are of no help here? Umm, --- Help files with alias or concept or title matching ‘month’ using fuzzy matching: base::Constants Built-in Constants Aliases: month.abb, month.name --- Hmm. When I did ??month I got a completely different display. It contained *absolutely no* mention of month.abb. That *seems* to be because I have help_type set to "html". When I re-set help_type to "text", I get a display like unto the one that you obtained (and it does indeed lead one to month.abb). It seems to me ver' strange that one gets a different collection of information under help_type="text" than one does under help_type="html". If I were me, I would classify this as a bug. Also, entering "month" gives the completions month month.abb monthplot months.Date month.name months months.POSIXt Yes, I eventually managed to come up with this trick as well. But that is not really relevant to the phenomenon that "??" or help.search() don't work effectively, or at least not consistently (the effectiveness appearing to depend --- for some bizarre reason --- on the value of help_type). cheers, Rolf P.S. I have been unable to find a corresponding vector of the names of the days of the week, although I have a very vague recollection of the existence of such a vector. Does it exist, and if so what is it called? Or is my recollection an illusion brought on by advancing senility? R. -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
> On 01 Oct 2015, at 23:04 , Rolf Turner wrote: > > On 02/10/15 03:45, David L Carlson wrote: > > > >> If you want the month names: >> >>> mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", >> + "July", "Aug", "Sep", "Oct", "Nov", "Dec") >>> dimnames(tbl)$Month <- mnt > > > > Unnecessary typing; there is a built-in data set "month.abb" (in the > "base" package) that is identical to your "mnt". > > Difficult (nearly impossible!) to find, but, if you can't quite remember the > name! I *knew* I'd seen it, so I persisted and eventually tracked it down. > > Strangely ??month or help.search("month") yield no trace of it. Pages and > pages of (useless!) output but no sign of "month.abb" (nor of "month.name" > which gives the unabbreviated month names). > > Can anyone explain to me why "??" and help.search() are of no help here? Umm, --- Help files with alias or concept or title matching ‘month’ using fuzzy matching: base::Constants Built-in Constants Aliases: month.abb, month.name --- Also, entering "month" gives the completions > month month.abb monthplot months.Date month.name months months.POSIXt -pd > > cheers, > > Rolf Turner > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
On 02/10/15 03:45, David L Carlson wrote: If you want the month names: mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", + "July", "Aug", "Sep", "Oct", "Nov", "Dec") dimnames(tbl)$Month <- mnt Unnecessary typing; there is a built-in data set "month.abb" (in the "base" package) that is identical to your "mnt". Difficult (nearly impossible!) to find, but, if you can't quite remember the name! I *knew* I'd seen it, so I persisted and eventually tracked it down. Strangely ??month or help.search("month") yield no trace of it. Pages and pages of (useless!) output but no sign of "month.abb" (nor of "month.name" which gives the unabbreviated month names). Can anyone explain to me why "??" and help.search() are of no help here? cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
You should always reply to the list since other posters may have other suggestions. Assuming your data frame is called rain: > str(rain) 'data.frame': 2192 obs. of 4 variables: $ Year : int 1960 1960 1960 1960 1960 1960 1960 1960 1960 1960 ... $ Month : int 1 1 1 1 1 1 1 1 1 1 ... $ Day : int 1 2 3 4 5 6 7 8 9 10 ... $ Amount: num 0.3 0 0 0 0 2.7 7.1 14 12.6 11.1 ... > tbl <- xtabs(~Year+Month, rain, subset=Amount > 0.01) > tbl Month Year1 2 3 4 5 6 7 8 9 10 11 12 1960 24 15 2 12 19 22 18 24 22 20 30 29 1961 26 9 10 18 18 11 18 14 24 28 30 31 1962 22 14 19 2 18 19 27 26 26 29 15 28 1963 27 17 15 4 9 23 16 24 19 28 30 22 1964 15 25 9 13 19 14 23 20 24 30 25 27 1965 13 21 12 10 21 24 22 21 28 23 28 31 If you want the month names: > mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", + "July", "Aug", "Sep", "Oct", "Nov", "Dec") > dimnames(tbl)$Month <- mnt > tbl Month Year Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec 1960 24 15 2 12 19 22 18 24 22 20 30 29 1961 26 9 10 18 18 11 18 14 24 28 30 31 1962 22 14 19 2 18 19 27 26 26 29 15 28 1963 27 17 15 4 9 23 16 24 19 28 30 22 1964 15 25 9 13 19 14 23 20 24 30 25 27 1965 13 21 12 10 21 24 22 21 28 23 28 31 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 From: smart hendsome [mailto:putra_autum...@yahoo.com] Sent: Wednesday, September 30, 2015 9:24 PM To: David L Carlson Subject: Re: [R] Counting number of rain Hi David, Thanks for your reply, this is my data using dput; structure(list(Year = c(1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L
Re: [R] Counting occurrences of a set of values
df <- data.frame( V1= 1, V2= c( 2, 3, 2, 1), V3= c( 1, 2, 1, 1)) dfO <- df[ do.call( order, df), ] dfOD <- duplicated( dfO) dfODTrigger <- ! c( dfOD[-1], FALSE) dfOCounts <- diff( c( 0, which( dfODTrigger))) cbind( dfO[ dfODTrigger, ], dfOCounts) V1 V2 V3 dfOCounts 4 1 1 1 1 3 1 2 1 2 2 1 3 2 1 Regards On Thu, Sep 10, 2015 at 01:11:24PM +, Thomas Chesney wrote: > Can anyone suggest a way of counting how frequently sets of values occurs in > a data frame? Like table() only with sets. > > So for a dataset: > > V1, V2, V3 > 1, 2, 1 > 1, 3, 2 > 1, 2, 1 > 1, 1, 1 > > The output would be something like: > > 1,2,1: 2 > 1,3,2: 1 > 1,1,1: 1 > > Thank you, > > Thomas Chesney > > > > This message and any attachment are intended solely for the addressee > and may contain confidential information. If you have received this > message in error, please send it back to me, and immediately delete it. > > Please do not use, copy or disclose the information contained in this > message or in any attachment. Any views or opinions expressed by the > author of this email do not necessarily reflect the views of the > University of Nottingham. > > This message has been checked for viruses but the contents of an > attachment may still contain software viruses which could damage your > computer system, you are advised to perform your own checks. Email > communications with the University of Nottingham may be monitored as > permitted by UK legislation. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting occurrences of a set of values
Have a look at the dplyr package library(dplyr) n <- 1000 data_frame( V1 = sample(0:1, n, replace = TRUE), V2 = sample(0:1, n, replace = TRUE), V3 = sample(0:1, n, replace = TRUE) ) %>% group_by(V1, V2, V3) %>% mutate( Freq = n() ) ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-09-10 15:11 GMT+02:00 Thomas Chesney : > Can anyone suggest a way of counting how frequently sets of values occurs > in a data frame? Like table() only with sets. > > So for a dataset: > > V1, V2, V3 > 1, 2, 1 > 1, 3, 2 > 1, 2, 1 > 1, 1, 1 > > The output would be something like: > > 1,2,1: 2 > 1,3,2: 1 > 1,1,1: 1 > > Thank you, > > Thomas Chesney > > > > This message and any attachment are intended solely for the addressee > and may contain confidential information. If you have received this > message in error, please send it back to me, and immediately delete it. > > Please do not use, copy or disclose the information contained in this > message or in any attachment. Any views or opinions expressed by the > author of this email do not necessarily reflect the views of the > University of Nottingham. > > This message has been checked for viruses but the contents of an > attachment may still contain software viruses which could damage your > computer system, you are advised to perform your own checks. Email > communications with the University of Nottingham may be monitored as > permitted by UK legislation. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting occurrences of a set of values
Dear Thomas, How about this? > table(apply(Data, 1, paste, collapse=",")) 1,1,1 1,2,1 1,3,2 1 2 1 I hope this helps, John > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas > Chesney > Sent: September 10, 2015 9:11 AM > To: r-help@r-project.org > Subject: [R] Counting occurrences of a set of values > > Can anyone suggest a way of counting how frequently sets of values occurs in a > data frame? Like table() only with sets. > > So for a dataset: > > V1, V2, V3 > 1, 2, 1 > 1, 3, 2 > 1, 2, 1 > 1, 1, 1 > > The output would be something like: > > 1,2,1: 2 > 1,3,2: 1 > 1,1,1: 1 > > Thank you, > > Thomas Chesney > > > > This message and any attachment are intended solely for the addressee and may > contain confidential information. If you have received this message in error, > please send it back to me, and immediately delete it. > > Please do not use, copy or disclose the information contained in this message > or > in any attachment. Any views or opinions expressed by the author of this > email > do not necessarily reflect the views of the University of Nottingham. > > This message has been checked for viruses but the contents of an attachment > may still contain software viruses which could damage your computer system, > you are advised to perform your own checks. Email communications with the > University of Nottingham may be monitored as permitted by UK legislation. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting occurrences of a set of values
On 10/09/2015 9:11 AM, Thomas Chesney wrote: > Can anyone suggest a way of counting how frequently sets of values occurs in > a data frame? Like table() only with sets. Do you want 1,2,1 to be the same as 1,1,2, or different? What about 1,2,2? For sets, those are all the same, but for most purposes, they aren't. If you really want to keep the ordering, then table() does the counting you want, it just returns it in an ugly format. Duncan Murdoch > > So for a dataset: > > V1, V2, V3 > 1, 2, 1 > 1, 3, 2 > 1, 2, 1 > 1, 1, 1 > > The output would be something like: > > 1,2,1: 2 > 1,3,2: 1 > 1,1,1: 1 > > Thank you, > > Thomas Chesney > > > > This message and any attachment are intended solely for the addressee > and may contain confidential information. If you have received this > message in error, please send it back to me, and immediately delete it. > > Please do not use, copy or disclose the information contained in this > message or in any attachment. Any views or opinions expressed by the > author of this email do not necessarily reflect the views of the > University of Nottingham. > > This message has been checked for viruses but the contents of an > attachment may still contain software viruses which could damage your > computer system, you are advised to perform your own checks. Email > communications with the University of Nottingham may be monitored as > permitted by UK legislation. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting occurrences of a set of values
Can anyone suggest a way of counting how frequently sets of values occurs in a data frame? Like table() only with sets. So for a dataset: V1, V2, V3 1, 2, 1 1, 3, 2 1, 2, 1 1, 1, 1 The output would be something like: 1,2,1: 2 1,3,2: 1 1,1,1: 1 Thank you, Thomas Chesney This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. This message has been checked for viruses but the contents of an attachment may still contain software viruses which could damage your computer system, you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of rain
Assuming your data is already in R format please sent it dput() format. See ?dput or http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html for more details. John Kane Kingston ON Canada > -Original Message- > From: r-help@r-project.org > Sent: Tue, 8 Sep 2015 06:58:58 + (UTC) > To: r-help@r-project.org > Subject: [R] Counting number of rain > > Hello R-users, > I want to ask how to count the number of daily rain data. My data as > below: > Year Month Day Amount 1901 1 1 0 1901 1 2 3 1901 1 3 0 1901 1 4 0.5 1901 > 1 5 0 1901 1 6 0 1901 1 7 0.3 1901 1 8 0 1901 1 9 0 1901 1 10 0 1901 1 > 11 0.5 1901 1 12 1.8 1901 1 13 0 1901 1 14 0 1901 1 15 2.5 1901 1 16 0 > 1901 1 17 0 1901 1 18 0 1901 1 19 0 1901 1 20 0 1901 1 21 0 1901 1 22 0 > 1901 1 23 0 1901 1 24 0 1901 1 25 0 1901 1 26 16.5 1901 1 27 0.3 1901 1 > 28 0 1901 1 29 0 1901 1 30 0 1901 1 31 0 1901 2 1 0 1901 2 2 0 1901 2 3 0 > 1901 2 4 0 1901 2 5 0 1901 2 6 0 1901 2 7 0 1901 2 8 0.3 1901 2 9 0 1901 > 2 10 0 1901 2 11 0 1901 2 12 1 1901 2 13 0.3 1901 2 14 0 1901 2 15 0 1901 > 2 16 0 1901 2 17 0 1901 2 18 0 1901 2 19 0 1901 2 20 0 1901 2 21 0 1901 2 > 22 0 1901 2 23 0.3 1901 2 24 0 1901 2 25 0 1901 2 26 0.3 1901 2 27 0 1901 > 2 28 0 1901 3 1 0 1901 3 2 0.8 1901 3 3 2.3 1901 3 4 0 1901 3 5 0 1901 3 > 6 0 1901 3 7 0 1901 3 8 0 1901 3 9 0 1901 3 10 2 1901 3 11 0 1901 3 12 0 > 1901 3 13 0 1901 3 14 0 1901 3 15 0 1901 3 16 0 1901 3 17 0 1901 3 18 0 > 1901 3 19 0 1901 3 20 0 1901 3 21 0 1901 3 22 1.5 1901 3 23 1.3 1901 3 24 > 0 1901 3 25 0 1901 3 26 0 1901 3 27 0 1901 3 28 0.3 1901 3 29 0.3 1901 3 > 30 4.6 1901 3 31 0 1901 4 1 0 1901 4 2 4.6 1901 4 3 30.7 1901 4 4 0 1901 > 4 5 0 1901 4 6 0 1901 4 7 0 1901 4 8 0 1901 4 9 0 1901 4 10 0 1901 4 11 0 > 1901 4 12 0 1901 4 13 0 1901 4 14 0 1901 4 15 0.3 1901 4 16 1.3 1901 4 17 > 0 1901 4 18 0 1901 4 19 0.3 1901 4 20 1 1901 4 21 9.4 1901 4 22 0.5 1901 > 4 23 0.3 1901 4 24 0 1901 4 25 0 1901 4 26 0 1901 4 27 0 1901 4 28 0 1901 > 4 29 0 1901 4 30 0 1901 5 1 0 1901 5 2 0 1901 5 3 0 1901 5 4 0 1901 5 5 0 > 1901 5 6 0 1901 5 7 0 1901 5 8 0.5 1901 5 9 2.3 1901 5 10 0.3 1901 5 11 > 0 1901 5 12 0 1901 5 13 0 1901 5 14 0 1901 5 15 0 1901 5 16 0 1901 5 17 0 > 1901 5 18 0 1901 5 19 0 1901 5 20 0 1901 5 21 0.5 1901 5 22 0 1901 5 23 0 > 1901 5 24 0 1901 5 25 0 1901 5 26 4.8 1901 5 27 10.9 1901 5 28 3.6 1901 5 > 29 0 1901 5 30 0 1901 5 31 5.1 1901 6 1 0.5 1901 6 2 0 1901 6 3 2 1901 6 > 4 0 1901 6 5 10.2 1901 6 6 33.3 1901 6 7 0.3 1901 6 8 0 1901 6 9 0 1901 > 6 10 0.5 1901 6 11 0.5 1901 6 12 0.3 1901 6 13 2.8 1901 6 14 5.6 1901 6 > 15 0.3 1901 6 16 6.6 1901 6 17 14.2 1901 6 18 4.8 1901 6 19 8.4 1901 6 > 20 1.8 1901 6 21 1.8 1901 6 22 0.3 1901 6 23 8.6 1901 6 24 0 1901 6 25 0 > 1901 6 26 0 1901 6 27 0 1901 6 28 0 1901 6 29 0 1901 6 30 0 1901 7 1 0 > 1901 7 2 0 1901 7 3 0 1901 7 4 0 1901 7 5 1 1901 7 6 0.5 1901 7 7 0.3 > 1901 7 8 0.3 1901 7 9 6.1 1901 7 10 0.3 1901 7 11 1.5 1901 7 12 0 1901 7 > 13 1.5 1901 7 14 0.3 1901 7 15 3.3 1901 7 16 2.3 1901 7 17 0.5 1901 7 18 > 0 1901 7 19 0 1901 7 20 0 1901 7 21 1.8 1901 7 22 0 1901 7 23 1 1901 7 24 > 0.3 1901 7 25 0.3 1901 7 26 1.3 1901 7 27 17 1901 7 28 6.6 1901 7 29 6.1 > 1901 7 30 0.5 1901 7 31 0.3 1901 8 1 0 1901 8 2 0 1901 8 3 0 1901 8 4 0 > 1901 8 5 0 1901 8 6 3.3 1901 8 7 4.1 1901 8 8 0.3 1901 8 9 0 1901 8 10 0 > 1901 8 11 0 1901 8 12 0 1901 8 13 0 1901 8 14 0 1901 8 15 0 1901 8 16 0 > 1901 8 17 0.5 1901 8 18 0 1901 8 19 0 1901 8 20 0 1901 8 21 0 1901 8 22 0 > 1901 8 23 0.3 1901 8 24 1 1901 8 25 0 1901 8 26 0 1901 8 27 10.2 1901 8 > 28 1.5 1901 8 29 0.5 1901 8 30 1.3 1901 8 31 0 1901 9 1 0 1901 9 2 3 > 1901 9 3 1 1901 9 4 0.5 1901 9 5 0.3 1901 9 6 0 1901 9 7 0 1901 9 8 2.3 > 1901 9 9 0.3 1901 9 10 0 1901 9 11 0 1901 9 12 0 1901 9 13 0 1901 9 14 0 > 1901 9 15 0 1901 9 16 0 1901 9 17 0 1901 9 18 1.8 1901 9 19 8.1 1901 9 20 > 0.3 1901 9 21 5.8 1901 9 22 4.1 1901 9 23 0.3 1901 9 24 1.8 1901 9 25 0 > 1901 9 26 0 1901 9 27 0 1901 9 28 0 1901 9 29 1.8 1901 9 30 0.8 1901 10 > 1 0 1901 10 2 0 1901 10 3 0 1901 10 4 0 1901 10 5 0.3 1901 10 6 0 1901 10 > 7 0 1901 10 8 0 1901 10 9 0 1901 10 10 0 1901 10 11 0.3 1901 10 12 3.8 > 1901 10 13 0.4 1901 10 14 9 1901 10 15 2 1901 10 16 1 1901 10 17 0 1901 > 10 18 0 1901 10 19 0 1901 10 20 0.3 1901 10 21 0 1901 10 22 0 1901 10 23 > 0 1901 10 24 0 1901 10 25 0 1901 10 26 0 1901 10 27 14.5 1901 10 28 6.4 > 1901 10 29 0.8 1901 10 30 0 1901 10 31 0 1901 11 1 0 1901 11 2 0 1901 11 > 3 0 1901 11 4 0 1901 11 5 0 1901 11 6 0 1901 11 7 0 1901 11 8 0 1901 11 > 9 0 1901 11 10 0 1901 11 11 0 1901 11 12 5.1 1901 11 13 0.3 1901 11 14 > 5.8 1901 11 15 0 1901 11 16 0 1901 11 17 1 1901 11 18 0.5 1901 11 19 0 > 1901 11 20 0 1901 11 21 0 1901 11 22 0 1901 11 23 0 1901 11
Re: [R] Counting number of rain
Try the following: ## step 1: write raw data to an array junk<-scan('clipboard') # entering the numbers (not the 'year' etc. labels) into R as a vector after junk<-t(array(junk,dim=c(4,length(junk)/4))) # convert the vector into a 2-d array with 4 columns (year, month, day, amount) ## step 2: create a dataframe to store and display the results nyr<-length(unique(junk[,1])) ans<-data.frame(array(dim=c(nyr,12))) # a dataframe for storing the results names(ans)<-c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec') yrs<-sort(unique(junk[,1])) row.names(ans)<-yrs # step 3: calculate for (yi in 1:nyr){ # loop through the years... for (mi in 1:12){ # ...and the months ans[yi,mi]<-sum(junk[junk[,1]==yrs[yi] & junk[,2]==mi,4]>0.01) # count the rainy days by # first subsetting the junk array by rows that match the given year and month and sum } } Does that help? - Dan -- View this message in context: http://r.789695.n4.nabble.com/Counting-number-of-rain-tp4712007p4712011.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting number of rain
Hello R-users, I want to ask how to count the number of daily rain data. My data as below: Year Month Day Amount 1901 1 1 0 1901 1 2 3 1901 1 3 0 1901 1 4 0.5 1901 1 5 0 1901 1 6 0 1901 1 7 0.3 1901 1 8 0 1901 1 9 0 1901 1 10 0 1901 1 11 0.5 1901 1 12 1.8 1901 1 13 0 1901 1 14 0 1901 1 15 2.5 1901 1 16 0 1901 1 17 0 1901 1 18 0 1901 1 19 0 1901 1 20 0 1901 1 21 0 1901 1 22 0 1901 1 23 0 1901 1 24 0 1901 1 25 0 1901 1 26 16.5 1901 1 27 0.3 1901 1 28 0 1901 1 29 0 1901 1 30 0 1901 1 31 0 1901 2 1 0 1901 2 2 0 1901 2 3 0 1901 2 4 0 1901 2 5 0 1901 2 6 0 1901 2 7 0 1901 2 8 0.3 1901 2 9 0 1901 2 10 0 1901 2 11 0 1901 2 12 1 1901 2 13 0.3 1901 2 14 0 1901 2 15 0 1901 2 16 0 1901 2 17 0 1901 2 18 0 1901 2 19 0 1901 2 20 0 1901 2 21 0 1901 2 22 0 1901 2 23 0.3 1901 2 24 0 1901 2 25 0 1901 2 26 0.3 1901 2 27 0 1901 2 28 0 1901 3 1 0 1901 3 2 0.8 1901 3 3 2.3 1901 3 4 0 1901 3 5 0 1901 3 6 0 1901 3 7 0 1901 3 8 0 1901 3 9 0 1901 3 10 2 1901 3 11 0 1901 3 12 0 1901 3 13 0 1901 3 14 0 1901 3 15 0 1901 3 16 0 1901 3 17 0 1901 3 18 0 1901 3 19 0 1901 3 20 0 1901 3 21 0 1901 3 22 1.5 1901 3 23 1.3 1901 3 24 0 1901 3 25 0 1901 3 26 0 1901 3 27 0 1901 3 28 0.3 1901 3 29 0.3 1901 3 30 4.6 1901 3 31 0 1901 4 1 0 1901 4 2 4.6 1901 4 3 30.7 1901 4 4 0 1901 4 5 0 1901 4 6 0 1901 4 7 0 1901 4 8 0 1901 4 9 0 1901 4 10 0 1901 4 11 0 1901 4 12 0 1901 4 13 0 1901 4 14 0 1901 4 15 0.3 1901 4 16 1.3 1901 4 17 0 1901 4 18 0 1901 4 19 0.3 1901 4 20 1 1901 4 21 9.4 1901 4 22 0.5 1901 4 23 0.3 1901 4 24 0 1901 4 25 0 1901 4 26 0 1901 4 27 0 1901 4 28 0 1901 4 29 0 1901 4 30 0 1901 5 1 0 1901 5 2 0 1901 5 3 0 1901 5 4 0 1901 5 5 0 1901 5 6 0 1901 5 7 0 1901 5 8 0.5 1901 5 9 2.3 1901 5 10 0.3 1901 5 11 0 1901 5 12 0 1901 5 13 0 1901 5 14 0 1901 5 15 0 1901 5 16 0 1901 5 17 0 1901 5 18 0 1901 5 19 0 1901 5 20 0 1901 5 21 0.5 1901 5 22 0 1901 5 23 0 1901 5 24 0 1901 5 25 0 1901 5 26 4.8 1901 5 27 10.9 1901 5 28 3.6 1901 5 29 0 1901 5 30 0 1901 5 31 5.1 1901 6 1 0.5 1901 6 2 0 1901 6 3 2 1901 6 4 0 1901 6 5 10.2 1901 6 6 33.3 1901 6 7 0.3 1901 6 8 0 1901 6 9 0 1901 6 10 0.5 1901 6 11 0.5 1901 6 12 0.3 1901 6 13 2.8 1901 6 14 5.6 1901 6 15 0.3 1901 6 16 6.6 1901 6 17 14.2 1901 6 18 4.8 1901 6 19 8.4 1901 6 20 1.8 1901 6 21 1.8 1901 6 22 0.3 1901 6 23 8.6 1901 6 24 0 1901 6 25 0 1901 6 26 0 1901 6 27 0 1901 6 28 0 1901 6 29 0 1901 6 30 0 1901 7 1 0 1901 7 2 0 1901 7 3 0 1901 7 4 0 1901 7 5 1 1901 7 6 0.5 1901 7 7 0.3 1901 7 8 0.3 1901 7 9 6.1 1901 7 10 0.3 1901 7 11 1.5 1901 7 12 0 1901 7 13 1.5 1901 7 14 0.3 1901 7 15 3.3 1901 7 16 2.3 1901 7 17 0.5 1901 7 18 0 1901 7 19 0 1901 7 20 0 1901 7 21 1.8 1901 7 22 0 1901 7 23 1 1901 7 24 0.3 1901 7 25 0.3 1901 7 26 1.3 1901 7 27 17 1901 7 28 6.6 1901 7 29 6.1 1901 7 30 0.5 1901 7 31 0.3 1901 8 1 0 1901 8 2 0 1901 8 3 0 1901 8 4 0 1901 8 5 0 1901 8 6 3.3 1901 8 7 4.1 1901 8 8 0.3 1901 8 9 0 1901 8 10 0 1901 8 11 0 1901 8 12 0 1901 8 13 0 1901 8 14 0 1901 8 15 0 1901 8 16 0 1901 8 17 0.5 1901 8 18 0 1901 8 19 0 1901 8 20 0 1901 8 21 0 1901 8 22 0 1901 8 23 0.3 1901 8 24 1 1901 8 25 0 1901 8 26 0 1901 8 27 10.2 1901 8 28 1.5 1901 8 29 0.5 1901 8 30 1.3 1901 8 31 0 1901 9 1 0 1901 9 2 3 1901 9 3 1 1901 9 4 0.5 1901 9 5 0.3 1901 9 6 0 1901 9 7 0 1901 9 8 2.3 1901 9 9 0.3 1901 9 10 0 1901 9 11 0 1901 9 12 0 1901 9 13 0 1901 9 14 0 1901 9 15 0 1901 9 16 0 1901 9 17 0 1901 9 18 1.8 1901 9 19 8.1 1901 9 20 0.3 1901 9 21 5.8 1901 9 22 4.1 1901 9 23 0.3 1901 9 24 1.8 1901 9 25 0 1901 9 26 0 1901 9 27 0 1901 9 28 0 1901 9 29 1.8 1901 9 30 0.8 1901 10 1 0 1901 10 2 0 1901 10 3 0 1901 10 4 0 1901 10 5 0.3 1901 10 6 0 1901 10 7 0 1901 10 8 0 1901 10 9 0 1901 10 10 0 1901 10 11 0.3 1901 10 12 3.8 1901 10 13 0.4 1901 10 14 9 1901 10 15 2 1901 10 16 1 1901 10 17 0 1901 10 18 0 1901 10 19 0 1901 10 20 0.3 1901 10 21 0 1901 10 22 0 1901 10 23 0 1901 10 24 0 1901 10 25 0 1901 10 26 0 1901 10 27 14.5 1901 10 28 6.4 1901 10 29 0.8 1901 10 30 0 1901 10 31 0 1901 11 1 0 1901 11 2 0 1901 11 3 0 1901 11 4 0 1901 11 5 0 1901 11 6 0 1901 11 7 0 1901 11 8 0 1901 11 9 0 1901 11 10 0 1901 11 11 0 1901 11 12 5.1 1901 11 13 0.3 1901 11 14 5.8 1901 11 15 0 1901 11 16 0 1901 11 17 1 1901 11 18 0.5 1901 11 19 0 1901 11 20 0 1901 11 21 0 1901 11 22 0 1901 11 23 0 1901 11 24 0 1901 11 25 0.3 1901 11 26 0 1901 11 27 0 1901 11 28 0 1901 11 29 0 1901 11 30 3.3 1901 12 1 0 1901 12 2 0 1901 12 3 0 1901 12 4 0 1901 12 5 0 1901 12 6 0 1901 12 7 0 1901 12 8 0 1901 12 9 0 1901 12 10 0 1901 12 11 0 1901 12 12 0 1901 12 13 0 1901 12 14 0 1901 12 15 0 1901 12 16 0 1901 12 17 0 1901 12 18 0 1901 12 19 0 1901 12 20 0 1901 12 21 6.1 1901 12 22 5.6 1901 12 23 0 1901 12 24 0 1901 12 25 0 1901 12 26 0 1901 12 27 0 1901 12 28 0 1901 12 29 0 1901 12 30 0 1901 12 31 9.9 1902 1 1 0 1902 1 2 0 1902 1 3 0 1902 1 4 4.1 1902 1 5 0 1902 1 6 0 1902 1 7 0 1902 1 8 0 1902 1 9 2.5 1902 1 10 0 1902 1 11 0 1902 1 12 0 1902 1 13 0.3 1902 1 14 0 1902 1 15 0 1902 1 16 0 1902 1 17 0 1902 1 18 0 19
Re: [R] counting similar strings in data.frame
OK. I do not have canned solution for you, but temp <- apply(test,1, table) gives you number of occurences in each row. From this it shall be possible to extract name info and number info lapply(temp, function(x) x[x>1]) [[1]] four 2 [[2]] one two 2 2 [[3]] three 2 [[4]] four 3 Here you have numbers and strings and you need to combine them. However I am not sure how. If you want to use them for some further computation, maybe list structure is as good as data.frame with namy NAs. Cheers Petr > -Original Message- > From: Knut Krueger [mailto:r...@knut-krueger.de] > Sent: Friday, June 26, 2015 12:50 PM > To: PIKAL Petr; r-h...@stat.math.ethz.ch > Subject: Re: [R] counting similar strings in data.frame > > Am 26.06.2015 um 10:38 schrieb PIKAL Petr: > > Hi > > > > I am little bit lost in your logic. Why triple in your fourth line is > one. I expected it will be four? > > > > Petr > Sorry yes you are right ... > > type mismatch > Knut > > > > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting similar strings in data.frame
Am 26.06.2015 um 10:38 schrieb PIKAL Petr: Hi I am little bit lost in your logic. Why triple in your fourth line is one. I expected it will be four? Petr Sorry yes you are right ... type mismatch Knut __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting similar strings in data.frame
Sorry last count was wrong ... test =data.frame("first"=c("seven","two","five","four"), "second"=c("three","one","three","one"), "third"=c("four","two","three","four"), "fourth"=c("four","one","one","four")) count =data.frame("dobule1"=c("four","two","three","NA"), "double2"=c("NA","one","NA","NA"), "triple"=c("NA","NA","NA","four")) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting similar strings in data.frame
Hi I am little bit lost in your logic. Why triple in your fourth line is one. I expected it will be four? Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Knut > Krueger > Sent: Friday, June 26, 2015 10:10 AM > To: r-h...@stat.math.ethz.ch > Subject: [R] counting similar strings in data.frame > > Dear Members, > > is there a better solution to count the amounts of occurrence in a row > with string data than with loops to get the count data.frame? > > test =data.frame("first"=c("seven","two","five","four"), > "second"=c("three","one","three","one"), > "third"=c("four","two","three","four"), > "fourth"=c("four","one","one","four")) > > > > count =data.frame("double1"=c("four","two","three","NA"), > "double2"=c("NA","one","NA","NA"), > "triple"=c("NA","NA","NA","one")) > > > double1: first double occurrence in row (NA if triple available) > double2: second double occurrence in row (NA if triple available or if > there is only one double) > triple: triple occurrence in row (NA if a double available) > > > Kind regards Knut > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting similar strings in data.frame
Dear Members, is there a better solution to count the amounts of occurrence in a row with string data than with loops to get the count data.frame? test =data.frame("first"=c("seven","two","five","four"), "second"=c("three","one","three","one"), "third"=c("four","two","three","four"), "fourth"=c("four","one","one","four")) count =data.frame("double1"=c("four","two","three","NA"), "double2"=c("NA","one","NA","NA"), "triple"=c("NA","NA","NA","one")) double1: first double occurrence in row (NA if triple available) double2: second double occurrence in row (NA if triple available or if there is only one double) triple: triple occurrence in row (NA if a double available) Kind regards Knut __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting consecutive events in R
I normally use rle() for these problems, see ?rle. for instance, k <- rbinom(999, 1, .5) series <- function(run) { r <- rle(run)ser <- which(r$lengths > 5 & r$values) } series(k) returns the indices of consecutive runs that have length 5 or longer. Abhinaba Roy [Thu, May 14, 2015 at 02:16:31PM CEST]: Hi, I have the following dataframe structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 14:57:15.0.0312", "2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343", "2015-04-01 14:57:17.0.2573", "2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722", "2015-04-01 14:57:19.0.5265", "2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839", "2015-04-01 14:57:25.0.5343", "2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273", "2015-04-01 14:58:03.0.0702", "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694", "2015-04-01 14:57:58.0.4134", "2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630", "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637", "2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134", "2015-04-01 14:57:07.0.4212", "2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204", "2015-04-01 14:57:09.0.6864", "2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871", "2015-04-01 14:57:11.0.9360", "2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251", "2015-04-01 14:57:14.0.1754", "2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903", "2015-04-01 14:57:16.0.4407", "2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"), Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01 14:57:15.0.8346", "2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351", "2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385", "2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858", "2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772", "2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936", "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694", "2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630", "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637", "2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793", "2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435", "2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209", "2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201", "2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364", "2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513", "2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248", "2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162", "2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507, 0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429, 0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481, 0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866, 0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907, 0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End", "Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame", row.names = c(NA, -39L)) For each unique value in column 'Type' , I want to check for consecutive 5 rows (if any) of 'Score' > 0. Now, if there are five consecutive rows with Score > 0 and 'Type_Desc' = 0, then we print "Type_low" , else if 'Type_Desc' = 1, we print "Type_h
Re: [R] Counting consecutive events in R
Assuming I understand the problem correctly, you want to check for runs of at least length five where both Score and Test_desc assume particular values. You don't care where they are or what other data are associated, you just want to know if at least one such run exists in your data frame. Here's a function that does that: checkruns <- function(testdata) { test1 <- ifelse(testdata$Score > 0 & testdata$Type_Desc == 1 & !is.na(testdata$Type_Desc), 1, 0) test0 <- ifelse(testdata$Score > 0 & testdata$Type_Desc == 0 & !is.na(testdata$Type_Desc), 1, 0) test1.rle <- rle(test1) test0.rle <- rle(test0) if(any(test1.rle$lengths >= 5 & test1.rle$values == 1)) cat("Type_high\n") if(any(test0.rle$lengths >= 5 & test0.rle$values == 1)) cat("Type_low\n") invisible() } Sarah On Thu, May 14, 2015 at 8:16 AM, Abhinaba Roy wrote: > Hi, > > I have the following dataframe > > structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS", > "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP", > "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc", > "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", > "QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 > 14:57:15.0.0312", > "2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343", > "2015-04-01 14:57:17.0.2573", > "2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722", > "2015-04-01 14:57:19.0.5265", > "2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839", > "2015-04-01 14:57:25.0.5343", > "2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273", > "2015-04-01 14:58:03.0.0702", > "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694", > "2015-04-01 14:57:58.0.4134", > "2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126", > "2015-04-01 14:58:00.0.6630", > "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637", > "2015-04-01 14:58:02.0.9126", > "2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134", > "2015-04-01 14:57:07.0.4212", > "2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204", > "2015-04-01 14:57:09.0.6864", > "2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871", > "2015-04-01 14:57:11.0.9360", > "2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251", > "2015-04-01 14:57:14.0.1754", > "2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903", > "2015-04-01 14:57:16.0.4407", > "2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"), > Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01 > 14:57:15.0.8346", > "2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351", > "2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385", > "2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858", > "2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772", > "2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936", > "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694", > "2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637", > "2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630", > "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637", > "2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630", > "2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793", > "2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435", > "2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209", > "2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201", > "2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364", > "2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513", > "2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248", > "2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162", > "2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507, > 0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429, > 0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, > 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481, > 0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866, > 0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907, > 0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L, > 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, > 0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA, > NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, > 1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, > 0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L, > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, > 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, > 4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End", > "Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame", > row.names = c(NA, > -39L)) > > > For each unique value in column 'Type' , I want to check for
[R] Counting consecutive events in R
Hi, I have the following dataframe structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 14:57:15.0.0312", "2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343", "2015-04-01 14:57:17.0.2573", "2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722", "2015-04-01 14:57:19.0.5265", "2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839", "2015-04-01 14:57:25.0.5343", "2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273", "2015-04-01 14:58:03.0.0702", "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694", "2015-04-01 14:57:58.0.4134", "2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630", "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637", "2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134", "2015-04-01 14:57:07.0.4212", "2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204", "2015-04-01 14:57:09.0.6864", "2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871", "2015-04-01 14:57:11.0.9360", "2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251", "2015-04-01 14:57:14.0.1754", "2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903", "2015-04-01 14:57:16.0.4407", "2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"), Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01 14:57:15.0.8346", "2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351", "2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385", "2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858", "2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772", "2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936", "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694", "2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630", "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637", "2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793", "2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435", "2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209", "2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201", "2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364", "2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513", "2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248", "2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162", "2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507, 0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429, 0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481, 0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866, 0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907, 0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End", "Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame", row.names = c(NA, -39L)) For each unique value in column 'Type' , I want to check for consecutive 5 rows (if any) of 'Score' > 0. Now, if there are five consecutive rows with Score > 0 and 'Type_Desc' = 0, then we print "Type_low" , else if 'Type_Desc' = 1, we print "Type_high". The search should end once 5 consecutive rows have been found. So, for this data frame we will have two statements as follows, 1.PP_high (reason - consecutive 5 rows of score > 0 and 'Type_Desc' = 1 ) 2.QTc_low (reason - consecutive 5 rows of score > 0 and 'Type_Desc' = 0 ) How can this problem tackled in R? Thanks, Abhinaba [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting Words
That' s perfect. Many thanks forma your appreciated help. El 22/01/2015 19:50, "Chel Hee Lee" escribió: > > x <- c("hola mundo mundo"); > > table(unlist(strsplit(x, " "))) > > hola mundo > 1 2 > > > > Is this what you are looking for? I hope this helps. > > Chel Hee Lee > > On 1/22/2015 8:25 AM, bgnumis bgnum wrote: > >> Hi all, >> >> I want to cout the different words in a text. >> >> You see if the text is: "hola mundo mundo" the program will count: >> >> hola 1 >> mundo 2 >> >> Is posible that Cran r have a similar function? >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting Words
In addition to the other suggestions, which are fine for your simple example, I would take a trip to the CRAN Task View "Natural Language Processing", and see if there's anything there. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 1/22/15, 6:25 AM, "bgnumis bgnum" wrote: >Hi all, > >I want to cout the different words in a text. > >You see if the text is: "hola mundo mundo" the program will count: > >hola 1 >mundo 2 > >Is posible that Cran r have a similar function? > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting Words
table(strsplit("hola mundo mundo", " ")[[1]]) On Thu, Jan 22, 2015 at 9:25 AM, bgnumis bgnum wrote: > Hi all, > > I want to cout the different words in a text. > > You see if the text is: "hola mundo mundo" the program will count: > > hola 1 > mundo 2 > > Is posible that Cran r have a similar function? > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting Words
> x <- c("hola mundo mundo"); > table(unlist(strsplit(x, " "))) hola mundo 1 2 > Is this what you are looking for? I hope this helps. Chel Hee Lee On 1/22/2015 8:25 AM, bgnumis bgnum wrote: Hi all, I want to cout the different words in a text. You see if the text is: "hola mundo mundo" the program will count: hola 1 mundo 2 Is posible that Cran r have a similar function? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting Words
Hi all, I want to cout the different words in a text. You see if the text is: "hola mundo mundo" the program will count: hola 1 mundo 2 Is posible that Cran r have a similar function? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting sets of consecutive integers in a vector
Thanks, Peter. Why not cbind your idea for the first column with my idea for the second column and get it done in one line?: v <- c(1,2,5,6,7,8,25,30,31,32,33) M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) )$lengths ) M [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I find that pretty appealing and I'll probably stick with it. It seems quite fast. Here's an example: # make fairly long vector v <- sort(unique(round(10*runif(10 length(v) [1] 63274 # time the procedure: ptm <- proc.time() ; M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) )$lengths ) ; proc.time() - ptm user system elapsed 0.030.000.03 dim(M) [1] 23212 2 I probably won't be using vectors any longer than that, and this isn't the kind of thing that I do over and over again, so that speed is excellent. Mike On Mon, 5 Jan 2015, Peter Alspach wrote: Tena koe Mike An alternative, which is slightly fast: diffv <- diff(v) starts <- c(1, which(diffv!=1)+1) cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1)) Peter Alspach -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller Sent: Monday, 5 January 2015 1:03 p.m. To: R-Help List Subject: [R] counting sets of consecutive integers in a vector I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()). For example, this: c(1,2,5,6,7,8,25,30,31,32,33) I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values. For example: c(1:20) would become this... 1 20 ...because there are 20 consecutive integers beginning with 1 and c(1,2,5,6,7,8,25,30,31,32,33) would become 1 2 5 4 25 1 30 4 What would be the best way to accomplish this? Here is my first effort: v <- c(1,2,5,6,7,8,25,30,31,32,33) L <- rle( v - 1:length(v) )$lengths n <- length( L ) matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle. ;-) The use for this is that I will be doing repeated seeks of a binary file to extract data. seek() gives the starting point and readBin(n=X) gives the number of bytes to read. So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls. (The data are in a transposed format where I read in every record for some variable as sequential elements.) I'm probably not the first person to deal with this. Best, Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4J __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting sets of consecutive integers in a vector
Here is a solution using data.table > require(data.table) > x <- data.table(v, diff = cumsum(c(1, diff(v)) != 1)) > x v diff 1: 10 2: 20 3: 51 4: 61 5: 71 6: 81 7: 252 8: 303 9: 313 10: 323 11: 333 > x[, list(value = v[1L], length = .N), key = 'diff'] diff value length 1:0 1 2 2:1 5 4 3:225 1 4:330 4 > x[, list(value = v[1L], length = .N), key = 'diff'][, -1, with = FALSE] # get rid of 'diff' column value length 1: 1 2 2: 5 4 3:25 1 4:30 4 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Jan 4, 2015 at 7:03 PM, Mike Miller wrote: > I have a vector of sorted positive integer values (e.g., postive integers > after applying sort() and unique()). For example, this: > > c(1,2,5,6,7,8,25,30,31,32,33) > > I want to make a matrix from that vector that has two columns: (1) the > first value in every run of consecutive integer values, and (2) the > corresponding number of consecutive values. For example: > > c(1:20) would become this... > > 1 20 > > ...because there are 20 consecutive integers beginning with 1 and > c(1,2,5,6,7,8,25,30,31,32,33) would become > > 1 2 > 5 4 > 25 1 > 30 4 > > What would be the best way to accomplish this? Here is my first effort: > > v <- c(1,2,5,6,7,8,25,30,31,32,33) > L <- rle( v - 1:length(v) )$lengths > n <- length( L ) > matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) > > [,1] [,2] > [1,]12 > [2,]54 > [3,] 251 > [4,] 304 > > I suppose that works well enough, but there may be a better way, and > besides, I wouldn't want to deny anyone here the opportunity to solve a fun > puzzle. ;-) > > The use for this is that I will be doing repeated seeks of a binary file > to extract data. seek() gives the starting point and readBin(n=X) gives > the number of bytes to read. So when there are many consecutive variables > to be read, I can multiply the X in n=X by that number instead of doing > many different seek() calls. (The data are in a transposed format where I > read in every record for some variable as sequential elements.) I'm > probably not the first person to deal with this. > > Best, > > Mike > > -- > Michael B. Miller, Ph.D. > University of Minnesota > http://scholar.google.com/citations?user=EV_phq4J > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting sets of consecutive integers in a vector
Here is another approach: > v <- c(1,2,5,6,7,8,25,30,31,32,33) > > # split by differences != 1 > t(sapply(split(v, cumsum(c(1, diff(v)) != 1)), function(x){ + c(value = x[1L], length = length(x)) # output first value and length + })) value length 0 1 2 1 5 4 225 1 330 4 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Jan 4, 2015 at 8:27 PM, Peter Alspach < peter.alsp...@plantandfood.co.nz> wrote: > Tena koe Mike > > An alternative, which is slightly fast: > > diffv <- diff(v) > starts <- c(1, which(diffv!=1)+1) > cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1)) > > Peter Alspach > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike > Miller > Sent: Monday, 5 January 2015 1:03 p.m. > To: R-Help List > Subject: [R] counting sets of consecutive integers in a vector > > I have a vector of sorted positive integer values (e.g., postive integers > after applying sort() and unique()). For example, this: > > c(1,2,5,6,7,8,25,30,31,32,33) > > I want to make a matrix from that vector that has two columns: (1) the > first value in every run of consecutive integer values, and (2) the > corresponding number of consecutive values. For example: > > c(1:20) would become this... > > 1 20 > > ...because there are 20 consecutive integers beginning with 1 and > c(1,2,5,6,7,8,25,30,31,32,33) would become > > 1 2 > 5 4 > 25 1 > 30 4 > > What would be the best way to accomplish this? Here is my first effort: > > v <- c(1,2,5,6,7,8,25,30,31,32,33) > L <- rle( v - 1:length(v) )$lengths > n <- length( L ) > matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) > > [,1] [,2] > [1,]12 > [2,]54 > [3,] 251 > [4,] 304 > > I suppose that works well enough, but there may be a better way, and > besides, I wouldn't want to deny anyone here the opportunity to solve a fun > puzzle. ;-) > > The use for this is that I will be doing repeated seeks of a binary file > to extract data. seek() gives the starting point and readBin(n=X) gives > the number of bytes to read. So when there are many consecutive variables > to be read, I can multiply the X in n=X by that number instead of doing > many different seek() calls. (The data are in a transposed format where I > read in every record for some variable as sequential elements.) I'm > probably not the first person to deal with this. > > Best, > > Mike > > -- > Michael B. Miller, Ph.D. > University of Minnesota > http://scholar.google.com/citations?user=EV_phq4J > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > The contents of this e-mail are confidential and may be ...{{dropped:14}} > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting sets of consecutive integers in a vector
Tena koe Mike An alternative, which is slightly fast: diffv <- diff(v) starts <- c(1, which(diffv!=1)+1) cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1)) Peter Alspach -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller Sent: Monday, 5 January 2015 1:03 p.m. To: R-Help List Subject: [R] counting sets of consecutive integers in a vector I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()). For example, this: c(1,2,5,6,7,8,25,30,31,32,33) I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values. For example: c(1:20) would become this... 1 20 ...because there are 20 consecutive integers beginning with 1 and c(1,2,5,6,7,8,25,30,31,32,33) would become 1 2 5 4 25 1 30 4 What would be the best way to accomplish this? Here is my first effort: v <- c(1,2,5,6,7,8,25,30,31,32,33) L <- rle( v - 1:length(v) )$lengths n <- length( L ) matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle. ;-) The use for this is that I will be doing repeated seeks of a binary file to extract data. seek() gives the starting point and readBin(n=X) gives the number of bytes to read. So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls. (The data are in a transposed format where I read in every record for some variable as sequential elements.) I'm probably not the first person to deal with this. Best, Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4J __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be ...{{dropped:14}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting sets of consecutive integers in a vector
I have a vector of sorted positive integer values (e.g., postive integers after applying sort() and unique()). For example, this: c(1,2,5,6,7,8,25,30,31,32,33) I want to make a matrix from that vector that has two columns: (1) the first value in every run of consecutive integer values, and (2) the corresponding number of consecutive values. For example: c(1:20) would become this... 1 20 ...because there are 20 consecutive integers beginning with 1 and c(1,2,5,6,7,8,25,30,31,32,33) would become 1 2 5 4 25 1 30 4 What would be the best way to accomplish this? Here is my first effort: v <- c(1,2,5,6,7,8,25,30,31,32,33) L <- rle( v - 1:length(v) )$lengths n <- length( L ) matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n) [,1] [,2] [1,]12 [2,]54 [3,] 251 [4,] 304 I suppose that works well enough, but there may be a better way, and besides, I wouldn't want to deny anyone here the opportunity to solve a fun puzzle. ;-) The use for this is that I will be doing repeated seeks of a binary file to extract data. seek() gives the starting point and readBin(n=X) gives the number of bytes to read. So when there are many consecutive variables to be read, I can multiply the X in n=X by that number instead of doing many different seek() calls. (The data are in a transposed format where I read in every record for some variable as sequential elements.) I'm probably not the first person to deal with this. Best, Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4J __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting within groups / means by groups
In addition to Jeff's recommendation, you need to read a basic introduction to R. Your data frame is probably not what you think it is: > group<-c("A", "A", "A", "B", "B", "B", "B", "C") > value<-c(1,3,2,2,2,4,4,1) > df<-as.data.frame(cbind(group, value)) > str(df) 'data.frame': 8 obs. of 2 variables: $ group: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 2 3 $ value: Factor w/ 4 levels "1","2","3","4": 1 3 2 2 2 4 4 1 By using cbind() you combined a character vector and a numeric vector into a matrix so R converted the numeric value to characters since a matrix can hold only a single data type. The cbind() function is generic and which version you get depends on the first argument. > cbind(group, value) group value [1,] "A" "1" [2,] "A" "3" [3,] "A" "2" [4,] "B" "2" [5,] "B" "2" [6,] "B" "4" [7,] "B" "4" [8,] "C" "1" Then you used as.data.frame() to convert the character matrix to a data.frame. The default for character variables is to convert those to factors. All you need is > dfa <- data.frame(group, value) > str(dfa) 'data.frame': 8 obs. of 2 variables: $ group: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 2 3 $ value: num 1 3 2 2 2 4 4 1 I changed df to dfa since df() is the density function for the f distribution. R is not likely to get confused, but you might. Then read the manual page on ave() to see why these work and how to adapt them: > ave(dfa$value, dfa$group, FUN=length) [1] 3 3 3 4 4 4 4 1 > ave(dfa$value, dfa$group) [1] 2 2 2 3 3 3 3 1 - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jeff Newmiller Sent: Monday, November 10, 2014 9:19 AM To: stude...@gmail.com; r-help@r-project.org Subject: Re: [R] Counting within groups / means by groups Help file ?ave should apply here. Please read the Posting Guide mentioned in the footer of every email on this list and on the list manager page for this mailing list. It warns you to read the archives before posting and to post in plain text format rather than HTML format. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On November 10, 2014 6:39:47 AM PST, David Studer wrote: >Hi everyone! > >I have problems finding a solution to the following two problems: > >My sample-dataframe consists of two variables "group" and "value": > >group<-c("A", "A", "A", "B", "B", "B", "B", "C") >value<-c(1,3,2,2,2,4,4,1) >df<-as.data.frame(cbind(group, value)) > >Problem 1: >** > >Now I'd like to count the number of group-A-cases, group-B-cases etc >and >write >this number into a new column. It should be like: > >count_group<-c(3, 3, 3, 4, 4, 4, 4, 1) > >Problem 2: >*** > >I'd like to add new column with the mean values (or any other function) >within >my groups. E.g: > >Group A: (1+3+2)/3=2 >Group B: (2+2+4+4)/4=3 >Group C: =1 > >Now I'd add another column 2 2 3 3 3 3 1 > > >Can anyone help me, how this can be done best? > >Thank you! >David > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting within groups / means by groups
Help file ?ave should apply here. Please read the Posting Guide mentioned in the footer of every email on this list and on the list manager page for this mailing list. It warns you to read the archives before posting and to post in plain text format rather than HTML format. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On November 10, 2014 6:39:47 AM PST, David Studer wrote: >Hi everyone! > >I have problems finding a solution to the following two problems: > >My sample-dataframe consists of two variables "group" and "value": > >group<-c("A", "A", "A", "B", "B", "B", "B", "C") >value<-c(1,3,2,2,2,4,4,1) >df<-as.data.frame(cbind(group, value)) > >Problem 1: >** > >Now I'd like to count the number of group-A-cases, group-B-cases etc >and >write >this number into a new column. It should be like: > >count_group<-c(3, 3, 3, 4, 4, 4, 4, 1) > >Problem 2: >*** > >I'd like to add new column with the mean values (or any other function) >within >my groups. E.g: > >Group A: (1+3+2)/3=2 >Group B: (2+2+4+4)/4=3 >Group C: =1 > >Now I'd add another column 2 2 3 3 3 3 1 > > >Can anyone help me, how this can be done best? > >Thank you! >David > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting within groups / means by groups
Hi everyone! I have problems finding a solution to the following two problems: My sample-dataframe consists of two variables "group" and "value": group<-c("A", "A", "A", "B", "B", "B", "B", "C") value<-c(1,3,2,2,2,4,4,1) df<-as.data.frame(cbind(group, value)) Problem 1: ** Now I'd like to count the number of group-A-cases, group-B-cases etc and write this number into a new column. It should be like: count_group<-c(3, 3, 3, 4, 4, 4, 4, 1) Problem 2: *** I'd like to add new column with the mean values (or any other function) within my groups. E.g: Group A: (1+3+2)/3=2 Group B: (2+2+4+4)/4=3 Group C: =1 Now I'd add another column 2 2 3 3 3 3 1 Can anyone help me, how this can be done best? Thank you! David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting the number of rows that satisfy a certain criteria
Hi, Try: set.seed(42) X <- as.data.frame(matrix(sample(0:1, 4*50,replace=TRUE), ncol=4)) table(X[1:2])[4] #[1] 15 sum(rowSums(X[1:2])==2) #[1] 15 A.K. On Saturday, June 21, 2014 10:59 AM, Kate Ignatius wrote: I have 4 columns, and about 300K plus rows with 0s and 1s. I'm trying to count how many rows satisfy a certain criteria... for instance, how many rows are there that have the first column == 1 as well as the second column == 1. I've tried using rowSums and colSums but it keeps giving me this type of error: Error in rowSums(X[1] == 1 & X[2] == 1) : 'x' must be an array of at least two dimensions Thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting the number of rows that satisfy a certain criteria
Thanks! On Sat, Jun 21, 2014 at 11:05 AM, Jorge I Velez wrote: > Hi Kate, > > You could try > > sum(X[, 1] == 1 & X[, 2] == 1) > > where X is your data set. > > HTH, > Jorge.- > > > > On Sun, Jun 22, 2014 at 12:57 AM, Kate Ignatius > wrote: >> >> I have 4 columns, and about 300K plus rows with 0s and 1s. >> >> I'm trying to count how many rows satisfy a certain criteria... for >> instance, how many rows are there that have the first column == 1 as >> well as the second column == 1. >> >> I've tried using rowSums and colSums but it keeps giving me this type of >> error: >> >> Error in rowSums(X[1] == 1 & X[2] == 1) : >> 'x' must be an array of at least two dimensions >> >> Thanks in advance! >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting the number of rows that satisfy a certain criteria
Hi Kate, You could try sum(X[, 1] == 1 & X[, 2] == 1) where X is your data set. HTH, Jorge.- On Sun, Jun 22, 2014 at 12:57 AM, Kate Ignatius wrote: > I have 4 columns, and about 300K plus rows with 0s and 1s. > > I'm trying to count how many rows satisfy a certain criteria... for > instance, how many rows are there that have the first column == 1 as > well as the second column == 1. > > I've tried using rowSums and colSums but it keeps giving me this type of > error: > > Error in rowSums(X[1] == 1 & X[2] == 1) : > 'x' must be an array of at least two dimensions > > Thanks in advance! > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting the number of rows that satisfy a certain criteria
I have 4 columns, and about 300K plus rows with 0s and 1s. I'm trying to count how many rows satisfy a certain criteria... for instance, how many rows are there that have the first column == 1 as well as the second column == 1. I've tried using rowSums and colSums but it keeps giving me this type of error: Error in rowSums(X[1] == 1 & X[2] == 1) : 'x' must be an array of at least two dimensions Thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting number of days my program runs
Hi all , I have a package and i want to count the 1st execution day of the package till 30 days afterwards ? I hope I am clear with this question . Please reply if you have anything to share . Thanks ASHIS [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting words that are contained in a list
Hi, May be this helps: vec1 <- c("victory","happiness","medal","war","service","ribbon", "dates") vec2 <- c("The World War II Victory Medal was first issued as a service ribbon referred to as the Victory Ribbon.", "By 1946, a full medal had been established which was referred to as the World War II Victory Medal.", "The medal commemorates military service during World War II and is awarded to any member of the United States military, including members of the armed forces of the Government of the Philippine Islands, who served on active duty, or as a reservist, between December 7, 1941 and December 31, 1946","This is awarded for service between 7 December 1941 and 31 December 1946, both dates inclusive") res <- sort(table(factor(unlist(regmatches(tolower(vec2),gregexpr(paste(vec1,collapse="|"),vec2,ignore.case=TRUE))),levels=vec1)),decreasing=TRUE) res # war medal victory service ribbon dates happiness # 5 4 3 3 2 1 0 res[1:5] A.K. Hi guys! I have a vector with a list of words e.g c("victory","happines"). I have a vector of sentence e.g. In "WWII the victory was achived by allied forces". As word victory is in my list, victory has a frequency of 1, happines 0. At the end I wolud like to get 5 most frequent words from my list that appear in sentences. Can you help me. Uros __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting matched elements in two vectors
On 01/23/2014 04:49 PM, Hervé Pagès wrote: Hi Mintewab, With the IRanges packages (from Bioconductor): > library(IRanges) > countMatches(z, w) [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 3 1 1 0 1 0 0 0 0 0 0 1 3 2 0 0 1 0 0 [39] 0 0 0 0 0 0 0 0 And if you don't want to depend on IRanges for such a simple operation, here how countMatches() is implemented: countMatches <- function(x, table) { table2 <- match(table, x) x2 <- match(x, x) tabulate(table2, nbins=length(x))[x2] } Cheers, H. To install the IRanges package: source("http://bioconductor.org/biocLite.R";) biocLite("IRanges") Cheers, H. On 01/23/2014 07:43 AM, m.beza...@lse.ac.uk wrote: Hi all, I have the following reproducible example z<-c(-5:40) w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) r<-z %in% w now r gives me the presence or absence of elements in z that are in w but I am interested in getting the number of times each element in z appears (or doesn't appear) in w. I want the dimension of my resulting vector to be the same as that of z. How do I do that? Thanks in advance Mintewab Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting matched elements in two vectors
Here's a solution: # This gives a vector of counts (if z is a data frame, first convert it to a matrix) res = sapply(as.vector(z), function(x) sum(w==x)) # This copies the dimensions of the variable 'z' to 'res': dim(res) = dim(z) Peter On Thu, Jan 23, 2014 at 7:43 AM, wrote: >Hi all, > I have the following reproducible example > > z<-c(-5:40) > w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) > r<-z %in% w > > now r gives me the presence or absence of elements in z that are in w but I > am interested in getting the number of times each element in z appears (or > doesn't appear) in w. I want the dimension of my resulting vector to be the > same as that of z. How do I do that? > > Thanks in advance > Mintewab > > > Please access the attached hyperlink for an important electronic > communications disclaimer: http://lse.ac.uk/emailDisclaimer > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting matched elements in two vectors
Hi Mintewab, With the IRanges packages (from Bioconductor): > library(IRanges) > countMatches(z, w) [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 3 1 1 0 1 0 0 0 0 0 0 1 3 2 0 0 1 0 0 [39] 0 0 0 0 0 0 0 0 To install the IRanges package: source("http://bioconductor.org/biocLite.R";) biocLite("IRanges") Cheers, H. On 01/23/2014 07:43 AM, m.beza...@lse.ac.uk wrote: Hi all, I have the following reproducible example z<-c(-5:40) w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) r<-z %in% w now r gives me the presence or absence of elements in z that are in w but I am interested in getting the number of times each element in z appears (or doesn't appear) in w. I want the dimension of my resulting vector to be the same as that of z. How do I do that? Thanks in advance Mintewab Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting matched elements in two vectors
Thank you for the reproducible example, but your description is missing a clear definition of what you want. For example, if your desired output is result <- c(rep(0,16),2,1,0,3,1,1,0,1,0,0,0,0,0,0,1,3,2,0,0,1,rep(0,10)) then one answer might be as.vector(table(factor(w,levels=z))) --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. m.beza...@lse.ac.uk wrote: > Hi all, >I have the following reproducible example > >z<-c(-5:40) >w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) > r<-z %in% w > >now r gives me the presence or absence of elements in z that are in w >but I am interested in getting the number of times each element in z >appears (or doesn't appear) in w. I want the dimension of my resulting >vector to be the same as that of z. How do I do that? > > Thanks in advance > Mintewab > > >Please access the attached hyperlink for an important electronic >communications disclaimer: http://lse.ac.uk/emailDisclaimer > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting matches in two vectors
Many thanks, Arun. Res 1 is exactly what I wanted. Mintewab -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of arun Sent: 23 January 2014 16:27 To: R help Subject: Re: [R] counting matches in two vectors Hi, May be this helps: z1 <- factor(z) res1 <- table(z1[cut(w,breaks=c(-Inf,z,Inf),labels=F)]) res1 # #-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # 0 0 0 0 0 0 0 0 0 0 2 1 0 3 1 1 0 1 0 0 0 0 0 0 1 3 #21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 # 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #or res2 <- table(z1[findInterval(w,z)]) identical(res1,res2) #[1] TRUE A.K. Hi all, I have the following reproducible example z<-c(-5:40) w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) r<-z %in% w now r gives me the presence or absence of elements in z that are in w but I am interested in getting the number of times each element in z appears (or doesn't appear) in w. I want the dimension of my resulting vector to be the same as that of z. How do I do that? Thanks in advance Mintewab __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting matches in two vectors
Also, res3 <- table(z1[match(w,z1)]) identical(res3,res1) #[1] TRUE A.K. On Thursday, January 23, 2014 11:26 AM, arun wrote: Hi, May be this helps: z1 <- factor(z) res1 <- table(z1[cut(w,breaks=c(-Inf,z,Inf),labels=F)]) res1 # #-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # 0 0 0 0 0 0 0 0 0 0 2 1 0 3 1 1 0 1 0 0 0 0 0 0 1 3 #21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 # 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #or res2 <- table(z1[findInterval(w,z)]) identical(res1,res2) #[1] TRUE A.K. Hi all, I have the following reproducible example z<-c(-5:40) w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) r<-z %in% w now r gives me the presence or absence of elements in z that are in w but I am interested in getting the number of times each element in z appears (or doesn't appear) in w. I want the dimension of my resulting vector to be the same as that of z. How do I do that? Thanks in advance Mintewab __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting matches in two vectors
Hi, May be this helps: z1 <- factor(z) res1 <- table(z1[cut(w,breaks=c(-Inf,z,Inf),labels=F)]) res1 # #-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # 0 0 0 0 0 0 0 0 0 0 2 1 0 3 1 1 0 1 0 0 0 0 0 0 1 3 #21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 # 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #or res2 <- table(z1[findInterval(w,z)]) identical(res1,res2) #[1] TRUE A.K. Hi all, I have the following reproducible example z<-c(-5:40) w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) r<-z %in% w now r gives me the presence or absence of elements in z that are in w but I am interested in getting the number of times each element in z appears (or doesn't appear) in w. I want the dimension of my resulting vector to be the same as that of z. How do I do that? Thanks in advance Mintewab __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] counting matched elements in two vectors
Hi all, I have the following reproducible example z<-c(-5:40) w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) r<-z %in% w now r gives me the presence or absence of elements in z that are in w but I am interested in getting the number of times each element in z appears (or doesn't appear) in w. I want the dimension of my resulting vector to be the same as that of z. How do I do that? Thanks in advance Mintewab Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting variables repeted in dataframe columns to create a presence-absence table
Hi, Try: data_m <- read.table(text="Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1 S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR 2 S5305A_IGR S5300A_IGR S5305A_IGR S5300A_IGR S5300A_IGR 3 S5300A_IGR S5300B_IGR S5300A_IGR S5300B_IGR S5300B_IGR 4 S5300B_IGR S5299B_IGR S5300B_IGR S5299B_IGR S5299B_IGR 5 S5299B_IGR S5299A_IGR S5299B_IGR S5829B_IGR S5299A_IGR",sep="",header=TRUE,stringsAsFactors=FALSE) data_m$new <-1 library(reshape2) dM <- melt(data_m,id.vars="new") xtabs(new~value+variable,dM) #or dcast(dM,value~variable,value.var="new",fill=0) A.K. On Thursday, November 28, 2013 12:18 PM, Gmail wrote: Hi! I'm new in R and I'm writing you asking for some guidance. I had analyzed a comparative genomic microarray data of /56 Salmonella/ strains to identify absent genes in each of the serovars, and finally I got a matrix that looks like that: > data[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1 S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR 2 S5305A_IGR S5300A_IGR S5305A_IGR S5300A_IGR S5300A_IGR 3 S5300A_IGR S5300B_IGR S5300A_IGR S5300B_IGR S5300B_IGR 4 S5300B_IGR S5299B_IGR S5300B_IGR S5299B_IGR S5299B_IGR 5 S5299B_IGR S5299A_IGR S5299B_IGR S5829B_IGR S5299A_IGR The variables corresponds to those genes identified as absent in each of the serovars. I would like to create a presence-absence matrix of those genes comparing all the serovars at the same time, I assume that should not be complicated but I don't know how to do it. I would like a matrix similar to the next one: > data_m[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 S5305B_IGR 1 1 1 1 1 S5305A_IGR 1 0 1 0 0 S5300A_IGR 1 1 1 1 1 Any help would be welcome, and thank you in advance, Oihane -- Oihane Irazoki Sanchez PhD Student, Molecular Microbiology Genetics and Microbiology Department, Faculty of Biosciences Autonomous University of Barcelona 08193 Bellaterra (Barcelona), Spain Telf: 34 - 935 811 665 E-mail: oihane.iraz...@uab.cat / o.iraz...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting variables repeted in dataframe columns to create a presence-absence table
Hi! I'm new in R and I'm writing you asking for some guidance. I had analyzed a comparative genomic microarray data of /56 Salmonella/ strains to identify absent genes in each of the serovars, and finally I got a matrix that looks like that: > data[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 1 S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR S5305B_IGR 2 S5305A_IGR S5300A_IGR S5305A_IGR S5300A_IGR S5300A_IGR 3 S5300A_IGR S5300B_IGR S5300A_IGR S5300B_IGR S5300B_IGR 4 S5300B_IGR S5299B_IGR S5300B_IGR S5299B_IGR S5299B_IGR 5 S5299B_IGR S5299A_IGR S5299B_IGR S5829B_IGR S5299A_IGR The variables corresponds to those genes identified as absent in each of the serovars. I would like to create a presence-absence matrix of those genes comparing all the serovars at the same time, I assume that should not be complicated but I don't know how to do it. I would like a matrix similar to the next one: > data_m[1:5,1:5] Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488 S5305B_IGR 11 11 1 S5305A_IGR 10 10 0 S5300A_IGR 11 11 1 Any help would be welcome, and thank you in advance, Oihane -- Oihane Irazoki Sanchez PhD Student, Molecular Microbiology Genetics and Microbiology Department, Faculty of Biosciences Autonomous University of Barcelona 08193 Bellaterra (Barcelona), Spain Telf: 34 - 935 811 665 E-mail: oihane.iraz...@uab.cat / o.iraz...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting numbers in R
I got sorted, Thanks all On Fri, Oct 4, 2013 at 2:03 PM, S Ellison wrote: > > I have a set of data and I need to find out how many points are below a > > certain value but R will not calculate this properly for me. > R will. But you aren't. > > > Negative numbers seem to be causing the issue. > You haven't got any negative numbers in your data set. In fact, you > haven't got any numbers. It's all character strings. Is there a reason for > that? > > Assuming there is, if you have your data in a data frame 'A' and just want > the count: > > table(as.numeric(A$Tm_ugL) <= 0.0002) > > If you just want a complete vector of TRUE or FALSE > as.numeric(d$Tm_ugL) <= 0.0002) > > does that. If you want to add that to your data frame (is it called A?) > that looks like > A$Censored <- as.numeric(d$Tm_ugL) <= 0.0002) > > But you really shouldn't have numbers in character format; read it as > numeric. Then it's just > table(d$Tm_ugL <= 0.0002) and so on. If it's refusing to read as numeric, > find out why and fix the data. > > > And some comments on code, while I'm here: > > > for (i in one:nrow(A)) > ... > > if (A[i,two]<=A_LLD) > Variables called 'one' and 'two' look like a really bad idea. If they are > equal to 1 and 2, use 1 and 2 (or 1L and 2L if you want to be _sure_ they > are integer). If not, the names are going to be pretty confusing, no? > > > (A_Censored[i,two]<-"TRUE") > Why use a character string like "TRUE" that R can't interpret as logical > instead of the logical values TRUE and FALSE? > > S Ellison > > > *** > This email and any attachments are confidential. Any u...{{dropped:17}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting numbers in R
> I have a set of data and I need to find out how many points are below a > certain value but R will not calculate this properly for me. R will. But you aren't. > Negative numbers seem to be causing the issue. You haven't got any negative numbers in your data set. In fact, you haven't got any numbers. It's all character strings. Is there a reason for that? Assuming there is, if you have your data in a data frame 'A' and just want the count: table(as.numeric(A$Tm_ugL) <= 0.0002) If you just want a complete vector of TRUE or FALSE as.numeric(d$Tm_ugL) <= 0.0002) does that. If you want to add that to your data frame (is it called A?) that looks like A$Censored <- as.numeric(d$Tm_ugL) <= 0.0002) But you really shouldn't have numbers in character format; read it as numeric. Then it's just table(d$Tm_ugL <= 0.0002) and so on. If it's refusing to read as numeric, find out why and fix the data. And some comments on code, while I'm here: > for (i in one:nrow(A)) ... > if (A[i,two]<=A_LLD) Variables called 'one' and 'two' look like a really bad idea. If they are equal to 1 and 2, use 1 and 2 (or 1L and 2L if you want to be _sure_ they are integer). If not, the names are going to be pretty confusing, no? > (A_Censored[i,two]<-"TRUE") Why use a character string like "TRUE" that R can't interpret as logical instead of the logical values TRUE and FALSE? S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting process in survival
It is hard to know exactly what you mean with such a generic question. If you mean "treat survival as a counting process", then the answer is yes. The survival package in S (which is the direct ancestor of the Splus package, which is the direct ancestor of the R package) was the very first to do this. I created the feature in 1984. Terry Therneau On 05/31/2013 05:00 AM, r-help-requ...@r-project.org wrote: HiI have a question, Is there a package to do counting process in survival analysis with R? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting number of consecutive occurrences per rows
Hi I slightly modified Jim's code first part is function to split data frame test according to act, juln and day and compute repetitions in each chunk. fff<- function(x) { fac <- factor((x[, "act"]==0)*1+(x[,"act"] == 200)*2, levels=c(1,0,2)) int<-interaction(x[,"juln"], x[,"day"], fac) res <- cumsum(c(1, abs(diff(as.numeric(int) res } test$fac<-fff(test) Second part evaluates length of each chunk test$res <- ave(test$fac, test$fac, FUN=length) Last part computes max (min, sum) of res in each distinct chunk. fff2<- function(x) { fac <- factor((x[, "act"]==0)*1+(x[,"act"] == 200)*2, levels=c(1,0,2), labels=c("0", "1-199", "200")) fac } aggregate(test$res, list(test$juln, test$day), max) aggregate(test$res, list(test$juln, test$day, fff2(test)), max) Is it what you want? Petr From: zuzana zajkova [mailto:zuzu...@gmail.com] Sent: Friday, May 03, 2013 7:10 PM To: PIKAL Petr; jholt...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] Counting number of consecutive occurrences per rows Hi, I'm sorry that it takes me so much time to respond, finally yesterday I got time to try your suggestions. Thank you for them! I tried both, they give the same results, but in both there are some things I still need to solve. I would appreciate your help. I include a little bigger dataframe (test2, in the end of this email), with more differencies in variables, to be able to better explain what I would like to calculate in addition. Jim's code: I needed to make some changes in assigning the key. Yours worked ok for that small "test" data, but when I tried it on my dataframe which has around 25000rows, it didn't work properly. test2$key[test2$act == 0] <- 1 test2$key[test2$act > 0 & test2$act < 200] <- 2 test2$key[test2$act == 200] <- 3 # this works ok test2$resChange <- cumsum(c(1, abs(diff(test2$key test2$res <- ave(test2$resChange, test2$resChange, FUN = length) # I added new column by jul date test2$resJ <- ave(test2$resChange, test2$resChange, test2$juln, FUN = length) # this works fine as well, for dividing between day 0 and day 1 test2$resJD <- ave(test2$resChange, test2$resChange, test2$juln, test2$day, FUN = length) # resume test2Resume <- test2[ , list(maxres = max(res) , minres = min(res) , sumres = length(unique(resChange))) , keyby = c('day', 'key')] # change 'key' test2Resume_day$key <- c('0', '1-199', '200')[test2Resume_day$key] test2Resume_day day key maxres minres sumres 1: 0 0 2 2 3 2: 0 1-199 3 1 9 3: 0 200 6 1 7 4: 1 0 1 1 1 5: 1 1-199 10 1 7 6: 1 200 6 1 6 # resume by juln test2Resume_jul <- test2[ , list(maxres = max(res) , minres = min(res) , sumres = length(unique(resChange))) , keyby = c('juln', 'key')] # by juln # change 'key' test2Resume_jul$key <- c('0', '1-199', '200')[test2Resume_jul$key] test2Resume_jul juln key maxres minres sumres 1: 15173 0 2 2 1 2: 15173 1-199 3 1 7 3: 15173 200 6 1 6 4: 15174 0 2 1 3 5: 15174 1-199 10 1 8 6: 15174 200 6 1 6 It is ok, but what I would like to get is resume for juln and for variable day (0 and 1) aswell. Like this: juln day key maxres minressumres 15173 00 15173 01-199 15173 0200 15173 10 15173 11-199 15173 1200 15174 0 0 15174 0 1-199 15174 0 200 15174 1 0 15174 1 1-199 15174 1 200 ... The other thing is that the "sumres" I would like to calculate like a sum of values of occurencies for each "key". For example, if in the test2 dataframe res values for key 200 (juln 15173) are 1, 1, 2,2,1,2 the sumres should be 9 (1+1+2+2+1+2), not 6 (which I suppose come form sum of number of unique occurencies). Petr's code: This works fine also, the thing is that doing the aggregation I would need the intervals to be like this [0, 1) [1, 199] (199, 200] what I don't know if is possible... I checked the hepl for cut, but I found that it can be closed just right or left... Thank you very much for your time and sharing your knowledge! Zuzana ## here is the bigger test2 dataframe > dput(test2) structure(list(daten = structure(c(15173, 15173, 15173, 15173, 15173, 15173, 15173, 1
Re: [R] Counting number of consecutive occurrences per rows
ot;win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win", "win"), night = structure(c(1310962792, 1310963392, 1310963992, 1310964592, 1310965192, 1310965792, 1310966392, 1310966992, 1310967592, 1310968192, 1310968792, 1310969392, 1310969992, 1310970592, 1310971192, 1310971792, 1310972392, 1310972992, 1310973592, 1310974192, 1310974792, 1310975392, 1311107991, 1311108591, 1311109191, 1311109791, 130391, 130991, 131591, 132191, 132791, 133391, 133991, 134591, 135191, 135791, 136391, 136991, 137591, 138191, 138791, 139391, 139991, 1311034191, 1311034791, 1311035391, 1311035991, 1311036591, 1311037191, 1311037791, 1311038391, 1311038991, 1311039591, 1311040191, 1311040791, 1311041391, 1311041991, 1311042591, 1311043191, 1311043791), class = c("POSIXct", "POSIXt" ), tzone = "GMT"), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), act = c(196, 200, 199, 200, 197, 198, 197, 200, 200, 197, 200, 200, 198, 200, 1, 1, 0, 0, 1, 2, 200, 200, 200, 200, 200, 200, 199, 61, 0, 194, 198, 198, 196, 193, 194, 193, 197, 198, 199, 200, 197, 199, 199, 200, 198, 200, 200, 198, 200, 34, 1, 1, 0, 0, 199, 200, 199, 7, 0, 0)), .Names = c("daten", "juln", "fen", "night", "day", "act"), row.names = 9990:10049, class = "data.frame") On 29 April 2013 14:35, PIKAL Petr wrote: > Hi > > rrr<-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T))) > test$res <- rep(rrr$lengths, rrr$lengths) > > If you put it in function > > fff<- function(x, limits=c(0,1,199,200)) { > rrr<-rle(as.numeric(cut(x, limits, include.lowest=T))) > res <- rep(rrr$lengths, rrr$lengths) > res > } > > you can use split/lapply approach > > test$res2<-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))), > fff)) > > Beware of correct ordering of days in output. Without correct leveling of > factor 0 precedes 1. > > And for the last part probably aggregate can be the way. > > > aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), > include.lowest=T)), max) > Group.1 Group.2 x > 1 14655 [0,1] 4 > 2 14655 (1,199] 3 > 3 14655 (199,200] 3 > > aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), > include.lowest=T)), min) > Group.1 Group.2 x > 1 14655 [0,1] 4 > 2 14655 (1,199] 1 > 3 14655 (199,200] 2 > > Regards > Petr > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > > project.org] On Behalf Of zuzana zajkova > > Sent: Monday, April 29, 2013 12:45 PM > > To: r-help@r-project.org > > Subject: [R] Counting number of consecutive occurrences per rows > > > > Hi, > > > > I would appreciate if somebody could help me with following > > calculation. > > I have a dataframe, by 10 minutes time, for mostly one year data. This > > is small example: > > > > > dput(test) > > structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655, > > 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, > > 14655), origin = structure(0, class = "Date")), > > time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, > > 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, > > 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, > > 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone = > > "GMT"), > > act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, > > 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > > 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day" > > ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, 518L, > > 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, > > 540L)) > > > > L
Re: [R] Counting number of consecutive occurrences per rows
Hi rrr<-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T))) test$res <- rep(rrr$lengths, rrr$lengths) If you put it in function fff<- function(x, limits=c(0,1,199,200)) { rrr<-rle(as.numeric(cut(x, limits, include.lowest=T))) res <- rep(rrr$lengths, rrr$lengths) res } you can use split/lapply approach test$res2<-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))), fff)) Beware of correct ordering of days in output. Without correct leveling of factor 0 precedes 1. And for the last part probably aggregate can be the way. > aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), > include.lowest=T)), max) Group.1 Group.2 x 1 14655 [0,1] 4 2 14655 (1,199] 3 3 14655 (199,200] 3 > aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), > include.lowest=T)), min) Group.1 Group.2 x 1 14655 [0,1] 4 2 14655 (1,199] 1 3 14655 (199,200] 2 Regards Petr > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of zuzana zajkova > Sent: Monday, April 29, 2013 12:45 PM > To: r-help@r-project.org > Subject: [R] Counting number of consecutive occurrences per rows > > Hi, > > I would appreciate if somebody could help me with following > calculation. > I have a dataframe, by 10 minutes time, for mostly one year data. This > is small example: > > > dput(test) > structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655, > 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, > 14655), origin = structure(0, class = "Date")), > time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, > 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, > 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, > 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone = > "GMT"), > act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, > 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day" > ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, 518L, > 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, > 540L)) > > Looks like this: > > > test > jultime act day > 510 14655 2010-02-15 18:25:54 130 1 > 512 14655 2010-02-15 18:35:54 23 1 > 514 14655 2010-02-15 18:45:54 45 1 > 516 14655 2010-02-15 18:55:54 200 1 > 518 14655 2010-02-15 19:05:54 200 1 > 520 14655 2010-02-15 19:15:54 200 1 > 522 14655 2010-02-15 19:25:54 199 1 > 524 14655 2010-02-15 19:35:54 150 1 > 526 14655 2010-02-15 19:45:54 0 1 > 528 14655 2010-02-15 19:55:54 0 1 > 530 14655 2010-02-15 20:05:54 0 0 > 532 14655 2010-02-15 20:15:54 0 0 > 534 14655 2010-02-15 20:25:54 34 0 > 536 14655 2010-02-15 20:35:54 200 0 > 538 14655 2010-02-15 20:45:54 200 0 > 540 14655 2010-02-15 20:55:54 145 0 > > > What I would like to calculate is the number of consecutive occurrences > of values 200, 0 and together values from 1 til 199 (in fact the > values that differ from 200 and 0) in column "act". > > I would like to get something like this (result$res) > > > result > jultime act day res res2 > 510 14655 2010-02-15 18:25:54 130 1 33 > 512 14655 2010-02-15 18:35:54 23 1 33 > 514 14655 2010-02-15 18:45:54 45 1 33 > 516 14655 2010-02-15 18:55:54 200 1 33 > 518 14655 2010-02-15 19:05:54 200 1 33 > 520 14655 2010-02-15 19:15:54 200 1 33 > 522 14655 2010-02-15 19:25:54 199 1 22 > 524 14655 2010-02-15 19:35:54 150 1 22 > 526 14655 2010-02-15 19:45:54 0 1 42 > 528 14655 2010-02-15 19:55:54 0 1 42 > 530 14655 2010-02-15 20:05:54 0 0 42 > 532 14655 2010-02-15 20:15:54 0 0 42 > 534 14655 2010-02-15 20:25:54 34 0 11 > 536 14655 2010-02-15 20:35:54 200 0 22 > 538 14655 2010-02-15 20:45:54 200 0 22 > 540 14655 2010-02-15 20:55:54 145 0 11 > > And if possible, distinguish among day==1 and day==0 (see the "act" > values of 0 for example), results as in result$res2. > > After it I would like to make a resume table per days (jul): > where maxres is max(result$res) for the "act" value where minres is > min(result$res) for the "act" value where sumres is sum(result$res) for > the "act" value (for example, if the 200 value ocurrs in different > times per day(jul) consecutively 3, 5, 1, 6 and 7 times the sumr
Re: [R] Counting number of consecutive occurrences per rows
try this: > test <- structure(list(jul = structure(c(14655, 14655, 14655, 14655, + 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, + 14655, 14655, 14655), origin = structure(0, class = "Date")), + time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, + 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, + 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, + 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone = + "GMT"), + act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, + 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day" + ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, + 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, + 540L)) > > # add key to separate data > test$key <- ifelse(test$act == 0 + , 1L # 0 + , ifelse(test$act == 200 + , 3L # 200 + , 2L # 1-199 + ) + ) > # mark changes in sequence > test$resChange <- cumsum(c(1L, abs(diff(test$key > test$res <- ave(test$resChange, test$resChange, FUN = length) > > test$res2 <- ave(test$resChange, test$resChange, test$day, FUN = length) > > test jultime act day key resChange res res2 510 14655 2010-02-15 18:25:54 130 1 2 1 33 512 14655 2010-02-15 18:35:54 23 1 2 1 33 514 14655 2010-02-15 18:45:54 45 1 2 1 33 516 14655 2010-02-15 18:55:54 200 1 3 2 33 518 14655 2010-02-15 19:05:54 200 1 3 2 33 520 14655 2010-02-15 19:15:54 200 1 3 2 33 522 14655 2010-02-15 19:25:54 199 1 2 3 22 524 14655 2010-02-15 19:35:54 150 1 2 3 22 526 14655 2010-02-15 19:45:54 0 1 1 4 42 528 14655 2010-02-15 19:55:54 0 1 1 4 42 530 14655 2010-02-15 20:05:54 0 0 1 4 42 532 14655 2010-02-15 20:15:54 0 0 1 4 42 534 14655 2010-02-15 20:25:54 34 0 2 5 11 536 14655 2010-02-15 20:35:54 200 0 3 6 22 538 14655 2010-02-15 20:45:54 200 0 3 6 22 540 14655 2010-02-15 20:55:54 145 0 2 7 11 > On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova wrote: > Hi, > > I would appreciate if somebody could help me with following calculation. > I have a dataframe, by 10 minutes time, for mostly one year data. This is > small example: > > > dput(test) > structure(list(jul = structure(c(14655, 14655, 14655, 14655, > 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, > 14655, 14655, 14655), origin = structure(0, class = "Date")), > time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, > 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, > 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, > 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone = > "GMT"), > act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, > 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day" > ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, > 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, > 540L)) > > Looks like this: > > > test > jultime act day > 510 14655 2010-02-15 18:25:54 130 1 > 512 14655 2010-02-15 18:35:54 23 1 > 514 14655 2010-02-15 18:45:54 45 1 > 516 14655 2010-02-15 18:55:54 200 1 > 518 14655 2010-02-15 19:05:54 200 1 > 520 14655 2010-02-15 19:15:54 200 1 > 522 14655 2010-02-15 19:25:54 199 1 > 524 14655 2010-02-15 19:35:54 150 1 > 526 14655 2010-02-15 19:45:54 0 1 > 528 14655 2010-02-15 19:55:54 0 1 > 530 14655 2010-02-15 20:05:54 0 0 > 532 14655 2010-02-15 20:15:54 0 0 > 534 14655 2010-02-15 20:25:54 34 0 > 536 14655 2010-02-15 20:35:54 200 0 > 538 14655 2010-02-15 20:45:54 200 0 > 540 14655 2010-02-15 20:55:54 145 0 > > > What I would like to calculate is the number of consecutive occurrences of > values 200, 0 and together values from 1 til 199 (in fact the values that > differ from 200 and 0) in column "act". > > I would like to get something like this (result$res) > > > result > jultime act day res res2 > 510 14655 2010-02-15 18:25:54 130 1 33 > 512 14655 2010-02-15 18:35:54 23 1 33 > 514 14655 2010-02-15 18:45:54 45 1 33 > 516 14655 2010-02-15 18:55:54 200 1 33 > 518 14655 2010-02-15 19:05:54 200 1 33 > 520 14655 2010-02-15 19:15:54 200 1 33 > 522 14655 2010-02-15 19:25:54 199 1 22 > 524 14655 2010-02-15 19:35:54 150 1 22 > 526 14655 2010-02-15 19:45:54 0 1 42 > 528 14655 2010-02-15 19:55:54 0 1 42 > 530 14655 2010-
Re: [R] Counting number of consecutive occurrences per rows
Forgot the last part of the question: > test <- structure(list(jul = structure(c(14655, 14655, 14655, 14655, + 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, + 14655, 14655, 14655), origin = structure(0, class = "Date")), + time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, + 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, + 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, + 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone = + "GMT"), + act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, + 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day" + ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, + 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, + 540L)) > > # add key to separate data > test$key <- ifelse(test$act == 0 + , 1L # 0 + , ifelse(test$act == 200 + , 3L # 200 + , 2L # 1-199 + ) + ) > # mark changes in sequence > test$resChange <- cumsum(c(1L, abs(diff(test$key > test$res <- ave(test$resChange, test$resChange, FUN = length) > > test$res2 <- ave(test$resChange, test$resChange, test$day, FUN = length) > > require(data.table) # use this for aggregation > test <- data.table(test) > testResume <- test[ + , list(maxres = max(res) + , minres = min(res) + , sumres = length(unique(resChange)) + ) + , keyby = c('day', 'key') + ] > # change 'key' > testResume$key <- c('0', '1-199', '200')[testResume$key] > testResume day key maxres minres sumres 1: 0 0 4 4 1 2: 0 1-199 1 1 2 3: 0 200 2 2 1 4: 1 0 4 4 1 5: 1 1-199 3 2 2 6: 1 200 3 3 1 > On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova wrote: > Hi, > > I would appreciate if somebody could help me with following calculation. > I have a dataframe, by 10 minutes time, for mostly one year data. This is > small example: > > > dput(test) > structure(list(jul = structure(c(14655, 14655, 14655, 14655, > 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, > 14655, 14655, 14655), origin = structure(0, class = "Date")), > time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, > 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, > 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, > 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone = > "GMT"), > act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, > 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day" > ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, > 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, > 540L)) > > Looks like this: > > > test > jultime act day > 510 14655 2010-02-15 18:25:54 130 1 > 512 14655 2010-02-15 18:35:54 23 1 > 514 14655 2010-02-15 18:45:54 45 1 > 516 14655 2010-02-15 18:55:54 200 1 > 518 14655 2010-02-15 19:05:54 200 1 > 520 14655 2010-02-15 19:15:54 200 1 > 522 14655 2010-02-15 19:25:54 199 1 > 524 14655 2010-02-15 19:35:54 150 1 > 526 14655 2010-02-15 19:45:54 0 1 > 528 14655 2010-02-15 19:55:54 0 1 > 530 14655 2010-02-15 20:05:54 0 0 > 532 14655 2010-02-15 20:15:54 0 0 > 534 14655 2010-02-15 20:25:54 34 0 > 536 14655 2010-02-15 20:35:54 200 0 > 538 14655 2010-02-15 20:45:54 200 0 > 540 14655 2010-02-15 20:55:54 145 0 > > > What I would like to calculate is the number of consecutive occurrences of > values 200, 0 and together values from 1 til 199 (in fact the values that > differ from 200 and 0) in column "act". > > I would like to get something like this (result$res) > > > result > jultime act day res res2 > 510 14655 2010-02-15 18:25:54 130 1 33 > 512 14655 2010-02-15 18:35:54 23 1 33 > 514 14655 2010-02-15 18:45:54 45 1 33 > 516 14655 2010-02-15 18:55:54 200 1 33 > 518 14655 2010-02-15 19:05:54 200 1 33 > 520 14655 2010-02-15 19:15:54 200 1 33 > 522 14655 2010-02-15 19:25:54 199 1 22 > 524 14655 2010-02-15 19:35:54 150 1 22 > 526 14655 2010-02-15 19:45:54 0 1 42 > 528 14655 2010-02-15 19:55:54 0 1 42 > 530 14655 2010-02-15 20:05:54 0 0 42 > 532 14655 2010-02-15 20:15:54 0 0 42 > 534 14655 2010-02-15 20:25:54 34 0 11 > 536 14655 2010-02-15 20:35:54 200 0 22 > 538 14655 2010-02-15 20:45:54 200 0 22 > 540 14655 2010-02-15 20:55:54 145 0 11 > > And if possible, distinguish among day==1 and day==0 (see the "act" values > of 0 for example), results as in result$res2. > > After it I would like
[R] Counting number of consecutive occurrences per rows
Hi, I would appreciate if somebody could help me with following calculation. I have a dataframe, by 10 minutes time, for mostly one year data. This is small example: > dput(test) structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655), origin = structure(0, class = "Date")), time = structure(c(1266258354, 1266258954, 1266259554, 1266260154, 1266260754, 1266261354, 1266261954, 1266262554, 1266263154, 1266263754, 1266264354, 1266264954, 1266265554, 1266266154, 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone = "GMT"), act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0, 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day" ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L, 540L)) Looks like this: > test jultime act day 510 14655 2010-02-15 18:25:54 130 1 512 14655 2010-02-15 18:35:54 23 1 514 14655 2010-02-15 18:45:54 45 1 516 14655 2010-02-15 18:55:54 200 1 518 14655 2010-02-15 19:05:54 200 1 520 14655 2010-02-15 19:15:54 200 1 522 14655 2010-02-15 19:25:54 199 1 524 14655 2010-02-15 19:35:54 150 1 526 14655 2010-02-15 19:45:54 0 1 528 14655 2010-02-15 19:55:54 0 1 530 14655 2010-02-15 20:05:54 0 0 532 14655 2010-02-15 20:15:54 0 0 534 14655 2010-02-15 20:25:54 34 0 536 14655 2010-02-15 20:35:54 200 0 538 14655 2010-02-15 20:45:54 200 0 540 14655 2010-02-15 20:55:54 145 0 What I would like to calculate is the number of consecutive occurrences of values 200, 0 and together values from 1 til 199 (in fact the values that differ from 200 and 0) in column "act". I would like to get something like this (result$res) > result jultime act day res res2 510 14655 2010-02-15 18:25:54 130 1 33 512 14655 2010-02-15 18:35:54 23 1 33 514 14655 2010-02-15 18:45:54 45 1 33 516 14655 2010-02-15 18:55:54 200 1 33 518 14655 2010-02-15 19:05:54 200 1 33 520 14655 2010-02-15 19:15:54 200 1 33 522 14655 2010-02-15 19:25:54 199 1 22 524 14655 2010-02-15 19:35:54 150 1 22 526 14655 2010-02-15 19:45:54 0 1 42 528 14655 2010-02-15 19:55:54 0 1 42 530 14655 2010-02-15 20:05:54 0 0 42 532 14655 2010-02-15 20:15:54 0 0 42 534 14655 2010-02-15 20:25:54 34 0 11 536 14655 2010-02-15 20:35:54 200 0 22 538 14655 2010-02-15 20:45:54 200 0 22 540 14655 2010-02-15 20:55:54 145 0 11 And if possible, distinguish among day==1 and day==0 (see the "act" values of 0 for example), results as in result$res2. After it I would like to make a resume table per days (jul): where maxres is max(result$res) for the "act" value where minres is min(result$res) for the "act" value where sumres is sum(result$res) for the "act" value (for example, if the 200 value ocurrs in different times per day(jul) consecutively 3, 5, 1, 6 and 7 times the sumres would be 3+5+1+6+7= 22) something like this (this are made up numbers): julact maxres minres sumres 146550 4 1 25 14655 200 32 48 146551-199 3171 146560 8238 14656 200 15360 146561-199 114 46 ... (theoretically the sum of sumres per day(jul) should be 144) > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) I hope my explanation is sufficient. I appreciate any hint. Thank you, Zuzana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting various elemnts in a vactor
Hi, library(plyr) df1<-count(df) rep(df1[,1],df1[,2]*100) count(as.character(rep(df1[,1],df1[,2]*100))) # x freq #1 A 200 #2 B 200 #3 C 200 #4 D 400 #5 F 400 A.K. - Original Message - From: Katherine Gobin To: r-help@r-project.org Cc: Sent: Tuesday, March 26, 2013 4:12 AM Subject: [R] Counting various elemnts in a vactor Dear R forum I have a vector say as given below df = c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B", "C") I need to find (1) how many times each element occurs? e.g. in above vector F occurs 4 times, C occurs 2 times etc. (2) Depending on the number of occurrences, I need to repeat the element 100 times of the occurrences e.g. I need to repeat F 6 * 100 = 600 times, C 2*100 = 200 times. I can manage the second part i.e. repeating but I am not able to count the number of times the element is appearing in a given vector. Kindly guide Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting various elemnts in a vactor
Dear Sir, Thanks a lot for your great help. I couldn't have figured it out. Thanks again. Regards Katherine --- On Tue, 26/3/13, D. Rizopoulos wrote: From: D. Rizopoulos Subject: Re: [R] Counting various elemnts in a vactor To: "Katherine Gobin" Cc: "r-help@r-project.org" Date: Tuesday, 26 March, 2013, 8:23 AM try this: df <- c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B", "C") tab <- table(df) tab rep(names(tab), 100 * tab) I hope it helps. Best, Dimitris On 3/26/2013 9:12 AM, Katherine Gobin wrote: > Dear R forum > > I have a vector say as given below > > df = c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B", > "C") > > I need to find > > (1) how many times each element occurs? e.g. in above vector F occurs 4 > times, C occurs 2 times etc. > > (2) Depending on the number of occurrences, I need to repeat the element 100 > times of the occurrences e.g. I need to repeat F 6 * 100 = 600 times, C 2*100 > = 200 times. > > I can manage the second part i.e. repeating but I am not able to count the > number of times the element is appearing in a given vector. > > Kindly guide > > Katherine > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting various elemnts in a vactor
try this: df <- c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B", "C") tab <- table(df) tab rep(names(tab), 100 * tab) I hope it helps. Best, Dimitris On 3/26/2013 9:12 AM, Katherine Gobin wrote: > Dear R forum > > I have a vector say as given below > > df = c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B", > "C") > > I need to find > > (1) how many times each element occurs? e.g. in above vector F occurs 4 > times, C occurs 2 times etc. > > (2) Depending on the number of occurrences, I need to repeat the element 100 > times of the occurrences e.g. I need to repeat F 6 * 100 = 600 times, C 2*100 > = 200 times. > > I can manage the second part i.e. repeating but I am not able to count the > number of times the element is appearing in a given vector. > > Kindly guide > > Katherine > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting various elemnts in a vactor
Dear R forum I have a vector say as given below df = c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B", "C") I need to find (1) how many times each element occurs? e.g. in above vector F occurs 4 times, C occurs 2 times etc. (2) Depending on the number of occurrences, I need to repeat the element 100 times of the occurrences e.g. I need to repeat F 6 * 100 = 600 times, C 2*100 = 200 times. I can manage the second part i.e. repeating but I am not able to count the number of times the element is appearing in a given vector. Kindly guide Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting confidence intervals
You should look at findInterval. Used with as.numeric it could do what you request although it has a much wider range of uses. -- David Sent from my iPhone On Mar 20, 2013, at 5:15 PM, Greg Snow <538...@gmail.com> wrote: > The TeachingDemos package has %<% and %<=% functions that can be chained > simply, so you could do something like: > > sum( 5:1 %<=% 1:5 %<=% 10:14 ) > > and other similar approaches. > > The idea is that you can do comparisons as: > > lower %<% x %<% upper > > instead of > > lower < x & x < upper > > > > On Mon, Mar 18, 2013 at 10:16 AM, S Ellison wrote: > I want to cont how many times a number say 12 lies in the interval. Can anyone assist? >> >> Has anyone else ever wished there was a moderately general 'inside' or >> 'within' function in R for this problem? >> >> For example, something that behaves more or less like >> >> within <- function(x, interval=NULL, closed=c(TRUE, TRUE), >> lower=min(interval), upper=max(interval)) { >>#interval must be a length 2 vector >>#closed is taken in the order (lower, upper) >>#lower and upper may be vectors and will be recycled (by "<" etc) >> if not of length length(x) >> >>low.comp <- if(closed[1]) "<=" else "<" >>high.comp <- if(closed[2]) ">=" else ">" >> >>do.call(low.comp, list(lower, x)) & do.call(high.comp, list(upper, >> x)) >> } >> >> >> #Examples >> within(1:5, c(2,4)) >> >> within(1:5, c(2,4), closed=c(FALSE, TRUE)) >> >> within(1:5, lower=5:1, upper=10:14) >> >> >> S Ellison >> LGC >> >> *** >> This email and any attachments are confidential. Any u...{{dropped:19}} > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting confidence intervals
The TeachingDemos package has %<% and %<=% functions that can be chained simply, so you could do something like: sum( 5:1 %<=% 1:5 %<=% 10:14 ) and other similar approaches. The idea is that you can do comparisons as: lower %<% x %<% upper instead of lower < x & x < upper On Mon, Mar 18, 2013 at 10:16 AM, S Ellison wrote: > > > I want to cont how many > > > times a number say 12 lies in the interval. Can anyone assist? > > Has anyone else ever wished there was a moderately general 'inside' or > 'within' function in R for this problem? > > For example, something that behaves more or less like > > within <- function(x, interval=NULL, closed=c(TRUE, TRUE), > lower=min(interval), upper=max(interval)) { > #interval must be a length 2 vector > #closed is taken in the order (lower, upper) > #lower and upper may be vectors and will be recycled (by "<" etc) > if not of length length(x) > > low.comp <- if(closed[1]) "<=" else "<" > high.comp <- if(closed[2]) ">=" else ">" > > do.call(low.comp, list(lower, x)) & do.call(high.comp, list(upper, > x)) > } > > > #Examples > within(1:5, c(2,4)) > > within(1:5, c(2,4), closed=c(FALSE, TRUE)) > > within(1:5, lower=5:1, upper=10:14) > > > S Ellison > LGC > > *** > This email and any attachments are confidential. Any u...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting confidence intervals
> There _is_ a function ?within. Drat! of course there is. I even use it, though not often. > Maybe your function can be > named 'between' Good thought - thanks Steve E *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting confidence intervals
Hello, There _is_ a function ?within. Maybe your function can be named 'between' Rui Barradas Em 18-03-2013 16:16, S Ellison escreveu: I want to cont how many times a number say 12 lies in the interval. Can anyone assist? Has anyone else ever wished there was a moderately general 'inside' or 'within' function in R for this problem? For example, something that behaves more or less like within <- function(x, interval=NULL, closed=c(TRUE, TRUE), lower=min(interval), upper=max(interval)) { #interval must be a length 2 vector #closed is taken in the order (lower, upper) #lower and upper may be vectors and will be recycled (by "<" etc) if not of length length(x) low.comp <- if(closed[1]) "<=" else "<" high.comp <- if(closed[2]) ">=" else ">" do.call(low.comp, list(lower, x)) & do.call(high.comp, list(upper, x)) } #Examples within(1:5, c(2,4)) within(1:5, c(2,4), closed=c(FALSE, TRUE)) within(1:5, lower=5:1, upper=10:14) S Ellison LGC *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.