subject:"\[R\] Counting"

Re: [R] counting duplicate items that occur in multiple groups

2020-11-18 Thread Tom Woolman


Thanks, everyone!



Quoting Jim Lemon :


Oops, I sent this to Tom earlier today and forgot to copy to the list:

VendorID=rep(paste0("V",1:10),each=5)
AcctID=paste0("A",sample(1:5,50,TRUE))
Data<-data.frame(VendorID,AcctID)
table(Data)
# get multiple vendors for each account
dupAcctID<-colSums(table(Data)>0)
Data$dupAcct<-NA
# fill in the new column
for(i in 1:length(dupAcctID))
 Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i]

Jim

On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman 
wrote:


Hi everyone.  I have a dataframe that is a collection of Vendor IDs
plus a bank account number for each vendor. I'm trying to find a way
to count the number of duplicate bank accounts that occur in more than
one unique Vendor_ID, and then assign the count value for each row in
the dataframe in a new variable.

I can do a count of bank accounts that occur within the same vendor
using dplyr and group_by and count, but I can't figure out a way to
count duplicates among multiple Vendor_IDs.


Dataframe example code:


#Create a sample data frame:

set.seed(1)

Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
sample(1:1))




Thanks in advance for any help.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

2020-11-18 Thread Jim Lemon

Oops, I sent this to Tom earlier today and forgot to copy to the list:

VendorID=rep(paste0("V",1:10),each=5)
AcctID=paste0("A",sample(1:5,50,TRUE))
Data<-data.frame(VendorID,AcctID)
table(Data)
# get multiple vendors for each account
dupAcctID<-colSums(table(Data)>0)
Data$dupAcct<-NA
# fill in the new column
for(i in 1:length(dupAcctID))
 Data$dupAcct[Data$AcctID == names(dupAcctID[i])]<-dupAcctID[i]

Jim

On Wed, Nov 18, 2020 at 8:20 AM Tom Woolman 
wrote:

> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> plus a bank account number for each vendor. I'm trying to find a way
> to count the number of duplicate bank accounts that occur in more than
> one unique Vendor_ID, and then assign the count value for each row in
> the dataframe in a new variable.
>
> I can do a count of bank accounts that occur within the same vendor
> using dplyr and group_by and count, but I can't figure out a way to
> count duplicates among multiple Vendor_IDs.
>
>
> Dataframe example code:
>
>
> #Create a sample data frame:
>
> set.seed(1)
>
> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
> sample(1:1))
>
>
>
>
> Thanks in advance for any help.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

2020-11-18 Thread Deepayan Sarkar

On Wed, Nov 18, 2020 at 5:40 AM Bert Gunter  wrote:
>
> z <- with(Data2, tapply(Vendor,Account, I))
> n <- vapply(z,length,1)
> data.frame (Vendor = unlist(z),
>Account = rep(names(z),n),
>NumVen = rep(n,n)
> )
>
> ## which gives:
>
>Vendor Account NumVen
> A1  V1  A1  1
> A21 V2  A2  3
> A22 V3  A2  3
> A23 V1  A2  3
> A3  V4  A3  1
> A4  V2  A4  1
>
> Of course this also works for Data1
>
> Bill may be able to come up with a slicker version, however.

Perhaps

transform(Data2, nshare = as.vector(table(Account)[Account]))

(or dplyr::mutate() instead of transform(), if you prefer.)

-Deepayan

>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman 
> wrote:
>
> > Yes, good catch. Thanks
> >
> >
> > Quoting Bert Gunter :
> >
> > > Why 0's in the data frame? Shouldn't that be 1 (vendor with that
> > account)?
> > >
> > > Bert
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep coming along
> > and
> > > sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >
> > >
> > > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman 
> > > wrote:
> > >
> > >> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> > >> (too hard) not to share any actual values so I just created randomized
> > >> values for my example; of course I should have specified that the
> > >> random values would not provide the expected problem pattern. I should
> > >> have just used simple dummy codes as Bill Dunlap did.
> > >>
> > >> So per Bill's example data for Data1, the expected (hoped for) output
> > >> should be:
> > >>
> > >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> > >> 1 V1  A1  0
> > >> 2 V2  A2  3
> > >> 3 V3  A2  3
> > >> 4 V4  A2  3
> > >>
> > >>
> > >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> > >> The value is 3 for V2, V3 and V4 because they each share bank account
> > >> A2.
> > >>
> > >>
> > >> Likewise, in the Data2 frame, the same logic applies:
> > >>
> > >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> > >> 1 V1  A1 0
> > >> 2 V2  A2 3
> > >> 3 V3  A2 3
> > >> 4 V1  A2 3
> > >> 5 V4  A3 0
> > >> 6 V2  A4 0
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Thanks!
> > >>
> > >>
> > >> Quoting Bill Dunlap :
> > >>
> > >> > What should the result be for
> > >> >   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> > >> > Account=c("A1","A2","A2","A2"))
> > >> > ?
> > >> >
> > >> > Must each vendor have only one account?  If not, what should the
> > result
> > >> be
> > >> > for
> > >> >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> > >> > Account=c("A1","A2","A2","A2","A3","A4"))
> > >> > ?
> > >> >
> > >> > -Bill
> > >> >
> > >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman  > >
> > >> > wrote:
> > >> >
> > >> >> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> > >> >> plus a bank account number for each vendor. I'm trying to find a way
> > >> >> to count the number of duplicate bank accounts that occur in more
> > than
> > >> >> one unique Vendor_ID, and then assign the count value for each row in
> > >> >> the dataframe in a new variable.
> > >> >>
> > >> >> I can do a count of bank accounts that occur within the same vendor
> > >> >> using dplyr and group_by and count, but I can't figure out a way to
> > >> >> count duplicates among multiple Vendor_IDs.
> > >> >>
> > >> >>
> > >> >> Dataframe example code:
> > >> >>
> > >> >>
> > >> >> #Create a sample data frame:
> > >> >>
> > >> >> set.seed(1)
> > >> >>
> > >> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
> > >> >> sample(1:1))
> > >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> Thanks in advance for any help.
> > >> >>
> > >> >> __
> > >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html
> > >> >> and provide commented, minimal, self-contained, reproducible code.
> > >> >>
> > >>
> > >> __
> > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> >
> >
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Bert Gunter

z <- with(Data2, tapply(Vendor,Account, I))
n <- vapply(z,length,1)
data.frame (Vendor = unlist(z),
   Account = rep(names(z),n),
   NumVen = rep(n,n)
)

## which gives:

   Vendor Account NumVen
A1  V1  A1  1
A21 V2  A2  3
A22 V3  A2  3
A23 V1  A2  3
A3  V4  A3  1
A4  V2  A4  1

Of course this also works for Data1

Bill may be able to come up with a slicker version, however.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman 
wrote:

> Yes, good catch. Thanks
>
>
> Quoting Bert Gunter :
>
> > Why 0's in the data frame? Shouldn't that be 1 (vendor with that
> account)?
> >
> > Bert
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman 
> > wrote:
> >
> >> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> >> (too hard) not to share any actual values so I just created randomized
> >> values for my example; of course I should have specified that the
> >> random values would not provide the expected problem pattern. I should
> >> have just used simple dummy codes as Bill Dunlap did.
> >>
> >> So per Bill's example data for Data1, the expected (hoped for) output
> >> should be:
> >>
> >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> >> 1 V1  A1  0
> >> 2 V2  A2  3
> >> 3 V3  A2  3
> >> 4 V4  A2  3
> >>
> >>
> >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> >> The value is 3 for V2, V3 and V4 because they each share bank account
> >> A2.
> >>
> >>
> >> Likewise, in the Data2 frame, the same logic applies:
> >>
> >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> >> 1 V1  A1 0
> >> 2 V2  A2 3
> >> 3 V3  A2 3
> >> 4 V1  A2 3
> >> 5 V4  A3 0
> >> 6 V2  A4 0
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks!
> >>
> >>
> >> Quoting Bill Dunlap :
> >>
> >> > What should the result be for
> >> >   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> >> > Account=c("A1","A2","A2","A2"))
> >> > ?
> >> >
> >> > Must each vendor have only one account?  If not, what should the
> result
> >> be
> >> > for
> >> >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> >> > Account=c("A1","A2","A2","A2","A3","A4"))
> >> > ?
> >> >
> >> > -Bill
> >> >
> >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman  >
> >> > wrote:
> >> >
> >> >> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> >> >> plus a bank account number for each vendor. I'm trying to find a way
> >> >> to count the number of duplicate bank accounts that occur in more
> than
> >> >> one unique Vendor_ID, and then assign the count value for each row in
> >> >> the dataframe in a new variable.
> >> >>
> >> >> I can do a count of bank accounts that occur within the same vendor
> >> >> using dplyr and group_by and count, but I can't figure out a way to
> >> >> count duplicates among multiple Vendor_IDs.
> >> >>
> >> >>
> >> >> Dataframe example code:
> >> >>
> >> >>
> >> >> #Create a sample data frame:
> >> >>
> >> >> set.seed(1)
> >> >>
> >> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
> >> >> sample(1:1))
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Thanks in advance for any help.
> >> >>
> >> >> __
> >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Avi Gross via R-help

Many problems can often be solved with some thought by using the right tools, 
such as the ones from the tidyverse.

Without giving a specific answer, you might want to think about using the 
group_by() functionality in a pipeline that would lump together all rows 
matching say having the same value in several columns. Then in something like a 
mutate() or summarize() you can use special functions like n() that return how 
many rows exist within each grouping. There are many more such verbs and 
features that let you build up something, often by removing the grouping along 
the way and perhaps adding some other form of grouping including the new 
rowwise() that then lets you do things across columns on a row at a time and so 
on.

I think the point is to think of steps that lead to a result that can be used 
in the next step and so on. 

And, for some problems, you can  think outside the pipelines and create 
multiple intermediate data.frames with parts of what you will need and then 
combine them with joins or whatever it takes to efficiently get a result, or by 
brute force. Sometimes (as when making graphs) you might want to convert data 
between forms that are often called long versus wide. 

Yes, plenty can be done in base R or using other packages. But a good set of 
tools might be part of what you need to investigate.

Of course, others can chime in suggesting that there are negatives to dplyr and 
other aspects of the tidyverse and they would be right too. 

-Original Message-
From: R-help  On Behalf Of Tom Woolman
Sent: Tuesday, November 17, 2020 6:30 PM
To: Bill Dunlap 
Cc: r-help@r-project.org
Subject: Re: [R] counting duplicate items that occur in multiple groups

Hi Bill. Sorry to be so obtuse with the example data, I was trying (too hard) 
not to share any actual values so I just created randomized values for my 
example; of course I should have specified that the random values would not 
provide the expected problem pattern. I should have just used simple dummy 
codes as Bill Dunlap did.

So per Bill's example data for Data1, the expected (hoped for) output should be:

  Vendor Account Num_Vendors_Sharing_Bank_Acct
1 V1  A1  0
2 V2  A2  3
3 V3  A2  3
4 V4  A2  3

Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.  
The value is 3 for V2, V3 and V4 because they each share bank account A2.

Likewise, in the Data2 frame, the same logic applies:

  Vendor Account Num_Vendors_Sharing_Bank_Acct
1 V1  A1 0
2 V2  A2 3
3 V3  A2 3
4 V1  A2 3
5 V4  A3 0
6 V2  A4 0

Thanks!

Quoting Bill Dunlap :

> What should the result be for
>   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> Account=c("A1","A2","A2","A2"))
> ?
>
> Must each vendor have only one account?  If not, what should the 
> result be for
>Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> Account=c("A1","A2","A2","A2","A3","A4"))
> ?
>
> -Bill
>
> On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman 
> wrote:
>
>> Hi everyone.  I have a dataframe that is a collection of Vendor IDs 
>> plus a bank account number for each vendor. I'm trying to find a way 
>> to count the number of duplicate bank accounts that occur in more 
>> than one unique Vendor_ID, and then assign the count value for each 
>> row in the dataframe in a new variable.
>>
>> I can do a count of bank accounts that occur within the same vendor 
>> using dplyr and group_by and count, but I can't figure out a way to 
>> count duplicates among multiple Vendor_IDs.
>>
>>
>> Dataframe example code:
>>
>>
>> #Create a sample data frame:
>>
>> set.seed(1)
>>
>> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
>> sample(1:1))
>>
>>
>>
>>
>> Thanks in advance for any help.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Bert Gunter

Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)?

Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman 
wrote:

> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> (too hard) not to share any actual values so I just created randomized
> values for my example; of course I should have specified that the
> random values would not provide the expected problem pattern. I should
> have just used simple dummy codes as Bill Dunlap did.
>
> So per Bill's example data for Data1, the expected (hoped for) output
> should be:
>
>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> 1 V1  A1  0
> 2 V2  A2  3
> 3 V3  A2  3
> 4 V4  A2  3
>
>
> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> The value is 3 for V2, V3 and V4 because they each share bank account
> A2.
>
>
> Likewise, in the Data2 frame, the same logic applies:
>
>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> 1 V1  A1 0
> 2 V2  A2 3
> 3 V3  A2 3
> 4 V1  A2 3
> 5 V4  A3 0
> 6 V2  A4 0
>
>
>
>
>
>
> Thanks!
>
>
> Quoting Bill Dunlap :
>
> > What should the result be for
> >   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> > Account=c("A1","A2","A2","A2"))
> > ?
> >
> > Must each vendor have only one account?  If not, what should the result
> be
> > for
> >Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> > Account=c("A1","A2","A2","A2","A3","A4"))
> > ?
> >
> > -Bill
> >
> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman 
> > wrote:
> >
> >> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> >> plus a bank account number for each vendor. I'm trying to find a way
> >> to count the number of duplicate bank accounts that occur in more than
> >> one unique Vendor_ID, and then assign the count value for each row in
> >> the dataframe in a new variable.
> >>
> >> I can do a count of bank accounts that occur within the same vendor
> >> using dplyr and group_by and count, but I can't figure out a way to
> >> count duplicates among multiple Vendor_IDs.
> >>
> >>
> >> Dataframe example code:
> >>
> >>
> >> #Create a sample data frame:
> >>
> >> set.seed(1)
> >>
> >> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
> >> sample(1:1))
> >>
> >>
> >>
> >>
> >> Thanks in advance for any help.
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Tom Woolman

Yes, good catch. Thanks

Quoting Bert Gunter :

Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)?

Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman 
wrote:

Hi Bill. Sorry to be so obtuse with the example data, I was trying
(too hard) not to share any actual values so I just created randomized
values for my example; of course I should have specified that the
random values would not provide the expected problem pattern. I should
have just used simple dummy codes as Bill Dunlap did.

So per Bill's example data for Data1, the expected (hoped for) output
should be:

  Vendor Account Num_Vendors_Sharing_Bank_Acct
1 V1  A1  0
2 V2  A2  3
3 V3  A2  3
4 V4  A2  3

Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
The value is 3 for V2, V3 and V4 because they each share bank account
A2.

Likewise, in the Data2 frame, the same logic applies:

  Vendor Account Num_Vendors_Sharing_Bank_Acct
1 V1  A1 0
2 V2  A2 3
3 V3  A2 3
4 V1  A2 3
5 V4  A3 0
6 V2  A4 0

Thanks!

Quoting Bill Dunlap :

> What should the result be for
>   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> Account=c("A1","A2","A2","A2"))
> ?
>
> Must each vendor have only one account?  If not, what should the result
be
> for
>Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> Account=c("A1","A2","A2","A2","A3","A4"))
> ?
>
> -Bill
>
> On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman 
> wrote:
>
>> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
>> plus a bank account number for each vendor. I'm trying to find a way
>> to count the number of duplicate bank accounts that occur in more than
>> one unique Vendor_ID, and then assign the count value for each row in
>> the dataframe in a new variable.
>>
>> I can do a count of bank accounts that occur within the same vendor
>> using dplyr and group_by and count, but I can't figure out a way to
>> count duplicates among multiple Vendor_IDs.
>>
>>
>> Dataframe example code:
>>
>>
>> #Create a sample data frame:
>>
>> set.seed(1)
>>
>> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
>> sample(1:1))
>>
>>
>>
>>
>> Thanks in advance for any help.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Tom Woolman

Hi Bill. Sorry to be so obtuse with the example data, I was trying  
(too hard) not to share any actual values so I just created randomized  
values for my example; of course I should have specified that the  
random values would not provide the expected problem pattern. I should  
have just used simple dummy codes as Bill Dunlap did.


So per Bill's example data for Data1, the expected (hoped for) output  
should be:


 Vendor Account Num_Vendors_Sharing_Bank_Acct
1 V1  A1  0
2 V2  A2  3
3 V3  A2  3
4 V4  A2  3


Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.  
The value is 3 for V2, V3 and V4 because they each share bank account  
A2.



Likewise, in the Data2 frame, the same logic applies:

 Vendor Account Num_Vendors_Sharing_Bank_Acct
1 V1  A1 0
2 V2  A2 3
3 V3  A2 3
4 V1  A2 3
5 V4  A3 0
6 V2  A4 0






Thanks!


Quoting Bill Dunlap :


What should the result be for
  Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
Account=c("A1","A2","A2","A2"))
?

Must each vendor have only one account?  If not, what should the result be
for
   Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
Account=c("A1","A2","A2","A2","A3","A4"))
?

-Bill

On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman 
wrote:


Hi everyone.  I have a dataframe that is a collection of Vendor IDs
plus a bank account number for each vendor. I'm trying to find a way
to count the number of duplicate bank accounts that occur in more than
one unique Vendor_ID, and then assign the count value for each row in
the dataframe in a new variable.

I can do a count of bank accounts that occur within the same vendor
using dplyr and group_by and count, but I can't figure out a way to
count duplicates among multiple Vendor_IDs.


Dataframe example code:


#Create a sample data frame:

set.seed(1)

Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
sample(1:1))




Thanks in advance for any help.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Bill Dunlap

What should the result be for
  Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
Account=c("A1","A2","A2","A2"))
?

Must each vendor have only one account?  If not, what should the result be
for
   Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
Account=c("A1","A2","A2","A2","A3","A4"))
?

-Bill

On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman 
wrote:

> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> plus a bank account number for each vendor. I'm trying to find a way
> to count the number of duplicate bank accounts that occur in more than
> one unique Vendor_ID, and then assign the count value for each row in
> the dataframe in a new variable.
>
> I can do a count of bank accounts that occur within the same vendor
> using dplyr and group_by and count, but I can't figure out a way to
> count duplicates among multiple Vendor_IDs.
>
>
> Dataframe example code:
>
>
> #Create a sample data frame:
>
> set.seed(1)
>
> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
> sample(1:1))
>
>
>
>
> Thanks in advance for any help.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Bert Gunter

Inline.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman 
wrote:

> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> plus a bank account number for each vendor.

I interpret this as: "all vendors are unique and each vendor has a single
bank account." Is that correct?

> I'm trying to find a way
> to count the number of duplicate bank accounts that occur in more than
> one unique Vendor_ID,

The following makes no sense to me, as each row is a unique vendor and has
only one bank account.

> and then assign the count value for each row in
> the dataframe in a new variable.
>
> I can do a count of bank accounts that occur within the same vendor
>
using dplyr and group_by and count, but I can't figure out a way to
> count duplicates among multiple Vendor_IDs.
>
I interpret this to mean that you want to count vendor ID's by account .
With only one account per vendor
this is trivial; e.g.

set.seed(22)
d1 <- data.frame(id = sample(1:30),
  account = sample(1:20,30, replace = TRUE))

table(d1$account)

## gives
 1  2  3  6  7  8  9 10 11 13 15 16 17 18 19 20
 3  1  2  1  1  1  1  1  4  3  1  2  1  3  2  3

Note that AFAICS your example is useless, as it gives the same number of
different account numbers as ID's, so no duplication can occur.

As my interpretations are likely incorrect and this is not what you mean
nor want, either clarify your meaning and provide a useful **minimal**
example; or wait for a reply from someone with a better understanding than
I.

Cheers,
Bert

>
> Dataframe example code:
>
>
> #Create a sample data frame:
>
> set.seed(1)
>
> Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =
> sample(1:1))
>
>
>
>
> Thanks in advance for any help.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] counting duplicate items that occur in multiple groups

2020-11-17 Thread Tom Woolman

Hi everyone.  I have a dataframe that is a collection of Vendor IDs  
plus a bank account number for each vendor. I'm trying to find a way  
to count the number of duplicate bank accounts that occur in more than  
one unique Vendor_ID, and then assign the count value for each row in  
the dataframe in a new variable.


I can do a count of bank accounts that occur within the same vendor  
using dplyr and group_by and count, but I can't figure out a way to  
count duplicates among multiple Vendor_IDs.



Dataframe example code:


#Create a sample data frame:

set.seed(1)

Data <- data.frame(Vendor_ID = sample(1:1), Bank_Account_ID =  
sample(1:1))





Thanks in advance for any help.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting unique values (summary stats)

2019-03-22 Thread David L Carlson

You have several problems. As David W pointed out, there is no replace= 
argument in the unique() function. The first step in debugging your code should 
be to read the manual page for any function returning an error. Also you did 
not include a comma at the end of the line containing replace=TRUE. Finally the 
code for counting the missing values is more complicated than it needs to be. 

This code will only work if myData is a data frame that contains only columns 
with numeric data.

options(digits=4)
myData <- USArrests
summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), 
 sd=sapply(myData, sd, na.rm=TRUE), 
 min=sapply(myData, min, na.rm=TRUE), 
 max=sapply(myData, max, na.rm=TRUE), 
 median=sapply(myData, median, na.rm=TRUE), 
 length=sapply(myData, length),
 unique=sapply(myData, function (x) length(unique(x))),
 miss.val=sapply(myData, function(y) sum(is.na(y
summary.stats

mean sd  min   max median length unique miss.val
# Murder 7.788  4.356  0.8  17.4   7.25 50 430
# Assault  170.760 83.338 45.0 337.0 159.00 50 450
# UrbanPop  65.540 14.475 32.0  91.0  66.00 50 360
# Rape  21.232  9.366  7.3  46.0  20.10 50 480

David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352

-Original Message-
From: R-help  On Behalf Of David Winsemius
Sent: Thursday, March 21, 2019 5:55 PM
To: reichm...@sbcglobal.net; 'r-help mailing list' 
Subject: Re: [R] counting unique values (summary stats)

On 3/21/19 3:31 PM, reichm...@sbcglobal.net wrote:
> r-help
>
> I have the following little scrip to create a df of summary stats.  I'm
> having problems obtaining the # of unique values
>
> unique=sapply(myData, function (x)
>   length(unique(x), replace = TRUE))

I just looked up the usage on `length` and do not see any possibility of 
using a "replace" parameter. It's also unclear what sort of data object 
`myData` might be. (And you might consider using column names other than 
the names of R functions.)

-- 

David.

>
> Can I do that, or am I using the wrong R function?
>
> summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE),
> sd=sapply(myData, sd, na.rm=TRUE),
> min=sapply(myData, min, na.rm=TRUE),
> max=sapply(myData, max, na.rm=TRUE),
> median=sapply(myData, median, na.rm=TRUE),
> length=sapply(myData, length),
> unique=sapply(myData, function (x)
>   length(unique(x), replace = TRUE))
> miss.val=sapply(myData, function(y)
>   sum(length(which(is.na(y))
>
> Jeff Reichman
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting unique values (summary stats)

2019-03-21 Thread David Winsemius




On 3/21/19 3:31 PM, reichm...@sbcglobal.net wrote:

r-help

I have the following little scrip to create a df of summary stats.  I'm
having problems obtaining the # of unique values

unique=sapply(myData, function (x)
  length(unique(x), replace = TRUE))


I just looked up the usage on `length` and do not see any possibility of 
using a "replace" parameter. It's also unclear what sort of data object 
`myData` might be. (And you might consider using column names other than 
the names of R functions.)



--

David.



Can I do that, or am I using the wrong R function?

summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE),
sd=sapply(myData, sd, na.rm=TRUE),
min=sapply(myData, min, na.rm=TRUE),
max=sapply(myData, max, na.rm=TRUE),
median=sapply(myData, median, na.rm=TRUE),
length=sapply(myData, length),
unique=sapply(myData, function (x)
  length(unique(x), replace = TRUE))
miss.val=sapply(myData, function(y)
  sum(length(which(is.na(y))

Jeff Reichman

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] counting unique values (summary stats)

2019-03-21 Thread reichmanj

r-help

I have the following little scrip to create a df of summary stats.  I'm
having problems obtaining the # of unique values 

   unique=sapply(myData, function (x)
 length(unique(x), replace = TRUE))

Can I do that, or am I using the wrong R function?

summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), 
   sd=sapply(myData, sd, na.rm=TRUE), 
   min=sapply(myData, min, na.rm=TRUE), 
   max=sapply(myData, max, na.rm=TRUE), 
   median=sapply(myData, median, na.rm=TRUE), 
   length=sapply(myData, length),
   unique=sapply(myData, function (x)
 length(unique(x), replace = TRUE))
   miss.val=sapply(myData, function(y) 
 sum(length(which(is.na(y))

Jeff Reichman

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting nuber of sentences by qdap package

2017-10-29 Thread Elahe chalabi via R-help

Hi all,

I have a data frame with a variable Description containing text of speeches and 
I would like to count number of sentences in each speech,


> str(data)
'data.frame':   255 obs. of  3 variables:
$ Group  : Factor w/ 255 levels "AlzheimerGroup1","AlzheimerGroup10",..: 1 
112 179 190 201 212 223 234 245 2 ...
$ Gender : int  1 1 0 0 0 0 0 1 0 0 ...
$ Description: Factor w/ 255 levels "A boy's on the uh falling off the stool 
picking up cookies . The girl's reaching up for it . The girl the lady "| 
__truncated__,..: 63 69 38 134 111 242 196 85 84 233 ...

I want to use qdap package.
Does anyone know how should I do this?
Thanks for any help!
Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting with multiple criteria using data table

2017-06-21 Thread Jeff Newmiller

To be fair, the OP did provide brief snippets of data.table usage below the 
data dump indicating some level of effort, but posted it all in HTML (what you 
see we do not see), did not make the example reproducible (dput is great, and 
library calls really clear things up [1][2][3]), and this looks suspiciously 
like homework  (not on topic here, see the Posting Guide).

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html
-- 
Sent from my phone. Please excuse my brevity.

On June 21, 2017 4:16:36 PM PDT, Bert Gunter  wrote:
>Have you gone through any R tutorials? If not, why not? If so, maybe
>you need to spend some more time with them.
>
>It looks like you want us to do your work for you. We don't do this.
>See (and follow) the posting guide below for what we might do (we're
>volunteers, so no guarantees).
>
>Cheers,
>Bert
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Wed, Jun 21, 2017 at 2:50 PM, Ek Esawi  wrote:
>> I have a data.table which is shown below. I want to count
>combinations of
>> columns on i and count on j with by. A few examples are given below
>the
>> table.
>>
>>
>>
>> I want to:
>>
>> all months to show on the output including those that they have zero
>value
>>
>> I want the three statements combined in on if possible so the output
>will
>> be one data table; that is the outputs are next to each other as
>manually
>> illustrated on the last part (desired output).
>>
>>
>>
>>
>>
>> Thanks--EK
>>
>>
>>
>>
>>
>>> Test
>>
>>  Color Grade Value  Month Day
>>
>>  1: yellow A20May   1
>>
>>  2:  green B25   June   2
>>
>>  3:  green A10  April   3
>>
>>  4:  black A17 August   3
>>
>>  5:red C 5May   5
>>
>>  6: orange D 0   June  13
>>
>>  7: orange E12  April   5
>>
>>  8: orange F11 August   8
>>
>>  9: orange F99  April  23
>>
>> 10: orange F70May   7
>>
>> 11:  black A77   June  11
>>
>> 12:  green B87 August  33
>>
>> 13:  black A79  April   9
>>
>> 14:  green A68May  14
>>
>> 15:  black C90   June  31
>>
>> 16:  green D79 August  11
>>
>> 17:  black E   101  April  17
>>
>> 18:red F90   June  21
>>
>> 19:red F   112 August  13
>>
>> 20:red F   101  April  20
>>
>>> Test[Color=="green"&Grade=="A", .N, by=Month]
>>
>>Month N
>>
>> 1: April 1
>>
>> 2:   May 1
>>
>>> Test[Color=="orange"&Grade=="F", .N, by=Month]
>>
>> Month N
>>
>> 1: August 1
>>
>> 2:  April 1
>>
>> 3:May 1
>>
>>
>>
>>> Test[Color=="orange"&Grade=="F", .N, by=Month]
>>
>> Month N
>>
>> 1: August 1
>>
>> 2:  April 1
>>
>> 3:May 1
>>
>>> Test[Color=="red"&Grade=="F", .N, by=Month]
>>
>> Month N
>>
>> 1:   June 1
>>
>> 2: August 1
>>
>> 3:  April 1
>>
>>
>>
>> Desired output
>>
>> N1   N2   N3
>>
>> April   1  1  1
>>
>> May   1  1  1
>>
>> June0  0  0
>>
>> August 01  1
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting with multiple criteria using data table

2017-06-21 Thread David Winsemius


> On Jun 21, 2017, at 2:50 PM, Ek Esawi  wrote:
> 
> I have a data.table which is shown below. I want to count combinations of
> columns on i and count on j with by. A few examples are given below the
> table.
> 
> 
> 
> I want to:
> 
> all months to show on the output including those that they have zero value
> 
> I want the three statements combined in on if possible so the output will
> be one data table; that is the outputs are next to each other as manually
> illustrated on the last part (desired output).
> 
> 
> 
> 
> 
> Thanks--EK
> 
> 
> 
> 
> 
>> Test
> 
> Color Grade Value  Month Day
> 
> 1: yellow A20May   1
> 
> 2:  green B25   June   2
> 
> 3:  green A10  April   3
> 
> 4:  black A17 August   3
> 
> 5:red C 5May   5
> 
> 6: orange D 0   June  13
> 
> 7: orange E12  April   5
> 
> 8: orange F11 August   8
> 
> 9: orange F99  April  23
> 
> 10: orange F70May   7
> 
> 11:  black A77   June  11
> 
> 12:  green B87 August  33
> 
> 13:  black A79  April   9
> 
> 14:  green A68May  14
> 
> 15:  black C90   June  31
> 
> 16:  green D79 August  11
> 
> 17:  black E   101  April  17
> 
> 18:red F90   June  21
> 
> 19:red F   112 August  13
> 
> 20:red F   101  April  20

You should have offered the output of:

 dput(Test)

> 
>> Test[Color=="green"&Grade=="A", .N, by=Month]
> 
>   Month N
> 
> 1: April 1
> 
> 2:   May 1
> 
>> Test[Color=="orange"&Grade=="F", .N, by=Month]
> 
>Month N
> 
> 1: August 1
> 
> 2:  April 1
> 
> 3:May 1
> 
> 
> 
>> Test[Color=="orange"&Grade=="F", .N, by=Month]
> 
>Month N
> 
> 1: August 1
> 
> 2:  April 1
> 
> 3:May 1
> 
>> Test[Color=="red"&Grade=="F", .N, by=Month]
> 
>Month N
> 
> 1:   June 1
> 
> 2: August 1
> 
> 3:  April 1
> 
> 
> 
> Desired output
> 
>N1   N2   N3
> 
> April   1  1  1
> 
> May   1  1  1
> 
> June0  0  0
> 
> August 01  1

I count 4 data.tables and a total of 11 items so why only 3 columns and 9 items?

Were you tabulating colors by month?

> Test[ (Color=="green"&Grade=="A") | 
(Color=="red"&Grade=="F") |
(Color=="orange"&Grade=="F")|
(Color=="orange"&Grade=="F")|
(Color=="red"&Grade=="F") ,table(Month, Color)]
Color
Monthgreen orange red
  April  1  1   1
  August 0  1   1
  June   0  0   1
  May1  1   0
> 


> 
>   [[alternative HTML version deleted]]

Rhelp is plain-text. Do read the Posting Guide:

Another possibility:

 Test[(Color=="green"&Grade=="A") |These are the conditions separated by 
logical OR's in the first argument to `[data.table`
  (Color=="red"&Grade=="F") |
  (Color=="orange"&Grade=="F")|
  (Color=="orange"&Grade=="F")|
  (Color=="red"&Grade=="F") ,  table(Month, Grade)]


Grade
MonthA F
  April  1 2
  August 0 2
  June   0 1
  May1 1


> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting with multiple criteria using data table

2017-06-21 Thread Bert Gunter

Have you gone through any R tutorials? If not, why not? If so, maybe
you need to spend some more time with them.

It looks like you want us to do your work for you. We don't do this.
See (and follow) the posting guide below for what we might do (we're
volunteers, so no guarantees).

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Jun 21, 2017 at 2:50 PM, Ek Esawi  wrote:
> I have a data.table which is shown below. I want to count combinations of
> columns on i and count on j with by. A few examples are given below the
> table.
>
>
>
> I want to:
>
> all months to show on the output including those that they have zero value
>
> I want the three statements combined in on if possible so the output will
> be one data table; that is the outputs are next to each other as manually
> illustrated on the last part (desired output).
>
>
>
>
>
> Thanks--EK
>
>
>
>
>
>> Test
>
>  Color Grade Value  Month Day
>
>  1: yellow A20May   1
>
>  2:  green B25   June   2
>
>  3:  green A10  April   3
>
>  4:  black A17 August   3
>
>  5:red C 5May   5
>
>  6: orange D 0   June  13
>
>  7: orange E12  April   5
>
>  8: orange F11 August   8
>
>  9: orange F99  April  23
>
> 10: orange F70May   7
>
> 11:  black A77   June  11
>
> 12:  green B87 August  33
>
> 13:  black A79  April   9
>
> 14:  green A68May  14
>
> 15:  black C90   June  31
>
> 16:  green D79 August  11
>
> 17:  black E   101  April  17
>
> 18:red F90   June  21
>
> 19:red F   112 August  13
>
> 20:red F   101  April  20
>
>> Test[Color=="green"&Grade=="A", .N, by=Month]
>
>Month N
>
> 1: April 1
>
> 2:   May 1
>
>> Test[Color=="orange"&Grade=="F", .N, by=Month]
>
> Month N
>
> 1: August 1
>
> 2:  April 1
>
> 3:May 1
>
>
>
>> Test[Color=="orange"&Grade=="F", .N, by=Month]
>
> Month N
>
> 1: August 1
>
> 2:  April 1
>
> 3:May 1
>
>> Test[Color=="red"&Grade=="F", .N, by=Month]
>
> Month N
>
> 1:   June 1
>
> 2: August 1
>
> 3:  April 1
>
>
>
> Desired output
>
> N1   N2   N3
>
> April   1  1  1
>
> May   1  1  1
>
> June0  0  0
>
> August 01  1
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting with multiple criteria using data table

2017-06-21 Thread Ek Esawi

I have a data.table which is shown below. I want to count combinations of
columns on i and count on j with by. A few examples are given below the
table.



I want to:

all months to show on the output including those that they have zero value

I want the three statements combined in on if possible so the output will
be one data table; that is the outputs are next to each other as manually
illustrated on the last part (desired output).





Thanks--EK





> Test

 Color Grade Value  Month Day

 1: yellow A20May   1

 2:  green B25   June   2

 3:  green A10  April   3

 4:  black A17 August   3

 5:red C 5May   5

 6: orange D 0   June  13

 7: orange E12  April   5

 8: orange F11 August   8

 9: orange F99  April  23

10: orange F70May   7

11:  black A77   June  11

12:  green B87 August  33

13:  black A79  April   9

14:  green A68May  14

15:  black C90   June  31

16:  green D79 August  11

17:  black E   101  April  17

18:red F90   June  21

19:red F   112 August  13

20:red F   101  April  20

> Test[Color=="green"&Grade=="A", .N, by=Month]

   Month N

1: April 1

2:   May 1

> Test[Color=="orange"&Grade=="F", .N, by=Month]

Month N

1: August 1

2:  April 1

3:May 1



> Test[Color=="orange"&Grade=="F", .N, by=Month]

Month N

1: August 1

2:  April 1

3:May 1

> Test[Color=="red"&Grade=="F", .N, by=Month]

Month N

1:   June 1

2: August 1

3:  April 1



Desired output

N1   N2   N3

April   1  1  1

May   1  1  1

June0  0  0

August 01  1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting enumerated items in each element of a character vector

2017-04-26 Thread Boris Steipe

Let's be a bit careful.

You'll probably need a regular expression. But maybe a regex can't work in 
principle, so one can't just gloss over the details.

You said: "blah blah blah" can contain ANY text. If this is true, "blah blah 
blah" could contain the delimiters. If that is the case, a regex is not 
powerful enough in principle and you need a context-sensitive parser.

So let's have a list of valid demarcations. From what you write I can guess 
that ...

text2 <- c(
"blah   1) blah blah blah 1",
"blah   10. blah blah blah 1",
"blah 1)  1) blah blah blah 1",
"blah 1.  10) blah blah blah 1",
"blah 1)  1. blah blah blah 1",
"blah 10.  10. blah blah blah 1"
)

... captures the variation. But that's just my guess from staring at your 
examples. I can't be sure - that's your task to contribute.

On text2, the regular expression ...

"(\d+(\)|\.)\s*){1,2}"

... gives the expected result of
# [1] 1 1 1 1 1 1
... and ...
# [1] 5 5 5 5
... on your text1.

In code:

library(stringr)
str_count(text1, "(\\d+(\\)|\\.)\\s*){1,2}")






> On Apr 26, 2017, at 10:13 AM, Dan Abner  wrote:
> 
> Hi all,
> 
> I am looking for a streamlined way of counting the number of enumerated items 
> are each element of a character vector. For example:
> 
> 
> text1<-c("blah blah blah.
> blah blah blah
> 1) blah blah blah 1
> 2) blah blah blah
> 10) blah 10 blah blah
> blah blah blah
> 1) blah blah blah
> 2) blah blah blah 2
> blah blah blah.","blah blah blah.
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> 10.blah 10 blah blah
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) 
> blah blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) 
> blah blah blah. blah blah blah."
> ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah. 10. 
> blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah blah. 
> blah blah blah.")
> 
> text1
> 
> ===
> 
> I would like the result to be c(5,5,5,5). Notice that sometimes there are 
> leading hard returns, other times not. Sometimes are there separate lists and 
> the same numbers are used in the enumerated items multiple times within each 
> character string. Sometimes the leading numbers for the enumerated items 
> exceed single digits. Notice that the delimiter may be ) or a period (.). If 
> the delimiter is a period and there are hard returns (example 2), then I 
> expect that will be easy enough to differentiate sentences ending with a 
> number from enumerated items. However, I imagine it would be much more 
> difficult to differentiate the two for example 4.
> 
> Any suggestions are appreciated.
> 
> Best,
> 
> Dan
> 
> On Wed, Apr 26, 2017 at 8:35 AM, Boris Steipe  
> wrote:
> What's the expected output for this sample?
> 
> How do _you_ define what should be counted?
> 
> 
> 
> 
> 
> > On Apr 26, 2017, at 8:33 AM, Dan Abner  wrote:
> >
> > Hi all,
> >
> > I was not clearly enough in my example code. Please see below where "blah
> > blah blah" can be ANY text or numbers: No predictable pattern at all to
> > what may or may not be written in place of "blah blah blah".
> >
> > text1<-c("blah blah blah.
> > blah blah blah
> > 1) blah blah blah 1
> > 2) blah blah blah
> > 10) blah 10 blah blah
> > blah blah blah
> > 1) blah blah blah
> > 2) blah blah blah 2
> > blah blah blah.","blah blah blah.
> > blah blah blah
> > 1. blah blah blah 1
> > 2. blah blah blah
> > 10.blah 10 blah blah
> > blah blah blah
> > 1. blah blah blah 1
> > 2. blah blah blah
> > blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) 
> > blah
> > blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah
> > blah blah. blah blah blah."
> > ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah.
> > 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah
> > blah. blah blah blah.")
> >
> > text1
> >
> > Thank you in advance for your suggestions and/or guidance.
> >
> > Best,
> >
> > Dan
> >
> >
> > On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon  >> wrote:
> >
> >> Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
> >> hadn't use stringr.
> >>
> >> -- Mike
> >>
> >>
> >> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn  wrote:
> >>> stringr::str_count (and stringi::stri_count that it wraps) interpret
> >>> the pattern argument as a regular expression by default.
> >>>
> >>> Best,
> >>> Ista
> >>>
> >>> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
> >>>  wrote:
>  I like Boris's "Hadley" solution.  For the record, I've appended a
>  version that uses regular expressions, the only benefit of which is
>  that it could be generalized to find more-complicated patterns.
> 
>  -- Mike
> 
>  counts <- sapply(text1, function(next_string) {
> loc_example <- length(gregexpr("Example", next_string)[[1]])
> loc_example
>  }, USE.NAM

Re: [R] Counting enumerated items in each element of a character vector

2017-04-26 Thread Boris Steipe

What's the expected output for this sample?

How do _you_ define what should be counted?





> On Apr 26, 2017, at 8:33 AM, Dan Abner  wrote:
> 
> Hi all,
> 
> I was not clearly enough in my example code. Please see below where "blah
> blah blah" can be ANY text or numbers: No predictable pattern at all to
> what may or may not be written in place of "blah blah blah".
> 
> text1<-c("blah blah blah.
> blah blah blah
> 1) blah blah blah 1
> 2) blah blah blah
> 10) blah 10 blah blah
> blah blah blah
> 1) blah blah blah
> 2) blah blah blah 2
> blah blah blah.","blah blah blah.
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> 10.blah 10 blah blah
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) blah
> blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah
> blah blah. blah blah blah."
> ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah.
> 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah
> blah. blah blah blah.")
> 
> text1
> 
> Thank you in advance for your suggestions and/or guidance.
> 
> Best,
> 
> Dan
> 
> 
> On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon > wrote:
> 
>> Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
>> hadn't use stringr.
>> 
>> -- Mike
>> 
>> 
>> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn  wrote:
>>> stringr::str_count (and stringi::stri_count that it wraps) interpret
>>> the pattern argument as a regular expression by default.
>>> 
>>> Best,
>>> Ista
>>> 
>>> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
>>>  wrote:
 I like Boris's "Hadley" solution.  For the record, I've appended a
 version that uses regular expressions, the only benefit of which is
 that it could be generalized to find more-complicated patterns.
 
 -- Mike
 
 counts <- sapply(text1, function(next_string) {
loc_example <- length(gregexpr("Example", next_string)[[1]])
loc_example
 }, USE.NAMES=FALSE)
 
> counts
 [1] 5 5 5 5
> 
 
 On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe 
>> wrote:
> I should add: there's a str_count() function in the stringr package.
> 
> library(stringr)
> str_count(text1, "Example")
> # [1] 5 5 5 5
> 
> I guess that would be the neater solution.
> 
> B.
> 
> 
> 
>> On Apr 25, 2017, at 8:23 PM, Boris Steipe 
>> wrote:
>> 
>> How about:
>> 
>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1
>> } ))
>> 
>> 
>> Splitting your string on the five "Examples" in each gives six
>> elements. length(x) - 1 is the number of
>> matches. You can use any regex instead of "example" if you need to
>> tweak what you are looking for.
>> 
>> 
>> B.
>> 
>> 
>> 
>> 
>>> On Apr 25, 2017, at 8:14 PM, Dan Abner 
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I am looking for a streamlined way of counting the number of
>> enumerated
>>> items are each element of a character vector. For example:
>>> 
>>> 
>>> text1<-c("This is an example.
>>> List 1
>>> 1) Example 1
>>> 2) Example 2
>>> 10) Example 10
>>> List 2
>>> 1) Example 1
>>> 2) Example 2
>>> These have been examples.","This is another example.
>>> List 1
>>> 1. Example 1
>>> 2. Example 2
>>> 10. Example 10
>>> List 2
>>> 1. Example 1
>>> 2. Example 2
>>> These have been examples.","This is a third example. List 1 1)
>> Example 1.
>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2.
>> These have
>>> been examples."
>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10.
>> Example
>>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>>> 
>>> text1
>>> 
>>> ===
>>> 
>>> I would like the result to be c(5,5,5,5). Notice that sometimes
>> there are
>>> leading hard returns, other times not. Sometimes are there separate
>> lists
>>> and the same numbers are used in the enumerated items multiple times
>> within
>>> each character string. Sometimes the leading numbers for the
>> enumerated
>>> items exceed single digits. Notice that the delimiter may be ) or a
>> period
>>> (.). If the delimiter is a period and there are hard returns
>> (example 2),
>>> then I expect that will be easy enough to differentiate sentences
>> ending
>>> with a number from enumerated items. However, I imagine it would be
>> much
>>> more difficult to differentiate the two for example 4.
>>> 
>>> Any suggestions are appreciated.
>>> 
>>> Best,
>>> 
>>> Dan
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/

Re: [R] Counting enumerated items in each element of a character vector

2017-04-26 Thread Dan Abner

Hi all,

I was not clearly enough in my example code. Please see below where "blah
blah blah" can be ANY text or numbers: No predictable pattern at all to
what may or may not be written in place of "blah blah blah".

text1<-c("blah blah blah.
blah blah blah
1) blah blah blah 1
2) blah blah blah
10) blah 10 blah blah
blah blah blah
1) blah blah blah
2) blah blah blah 2
blah blah blah.","blah blah blah.
blah blah blah
1. blah blah blah 1
2. blah blah blah
10.blah 10 blah blah
blah blah blah
1. blah blah blah 1
2. blah blah blah
blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) blah
blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah
blah blah. blah blah blah."
,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah.
 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah
blah. blah blah blah.")

text1

Thank you in advance for your suggestions and/or guidance.

Best,

Dan


On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon  wrote:

> Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
> hadn't use stringr.
>
> -- Mike
>
>
> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn  wrote:
> > stringr::str_count (and stringi::stri_count that it wraps) interpret
> > the pattern argument as a regular expression by default.
> >
> > Best,
> > Ista
> >
> > On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
> >  wrote:
> >> I like Boris's "Hadley" solution.  For the record, I've appended a
> >> version that uses regular expressions, the only benefit of which is
> >> that it could be generalized to find more-complicated patterns.
> >>
> >> -- Mike
> >>
> >> counts <- sapply(text1, function(next_string) {
> >> loc_example <- length(gregexpr("Example", next_string)[[1]])
> >> loc_example
> >> }, USE.NAMES=FALSE)
> >>
> >>> counts
> >> [1] 5 5 5 5
> >>>
> >>
> >> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe 
> wrote:
> >>> I should add: there's a str_count() function in the stringr package.
> >>>
> >>> library(stringr)
> >>> str_count(text1, "Example")
> >>> # [1] 5 5 5 5
> >>>
> >>> I guess that would be the neater solution.
> >>>
> >>> B.
> >>>
> >>>
> >>>
>  On Apr 25, 2017, at 8:23 PM, Boris Steipe 
> wrote:
> 
>  How about:
> 
>  unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1
> } ))
> 
> 
>  Splitting your string on the five "Examples" in each gives six
> elements. length(x) - 1 is the number of
>  matches. You can use any regex instead of "example" if you need to
> tweak what you are looking for.
> 
> 
>  B.
> 
> 
> 
> 
> > On Apr 25, 2017, at 8:14 PM, Dan Abner 
> wrote:
> >
> > Hi all,
> >
> > I am looking for a streamlined way of counting the number of
> enumerated
> > items are each element of a character vector. For example:
> >
> >
> > text1<-c("This is an example.
> > List 1
> > 1) Example 1
> > 2) Example 2
> > 10) Example 10
> > List 2
> > 1) Example 1
> > 2) Example 2
> > These have been examples.","This is another example.
> > List 1
> > 1. Example 1
> > 2. Example 2
> > 10. Example 10
> > List 2
> > 1. Example 1
> > 2. Example 2
> > These have been examples.","This is a third example. List 1 1)
> Example 1.
> > 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2.
> These have
> > been examples."
> > ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10.
> Example
> > 10. List 2 Example 1. 2. Example 2. These have been examples.")
> >
> > text1
> >
> > ===
> >
> > I would like the result to be c(5,5,5,5). Notice that sometimes
> there are
> > leading hard returns, other times not. Sometimes are there separate
> lists
> > and the same numbers are used in the enumerated items multiple times
> within
> > each character string. Sometimes the leading numbers for the
> enumerated
> > items exceed single digits. Notice that the delimiter may be ) or a
> period
> > (.). If the delimiter is a period and there are hard returns
> (example 2),
> > then I expect that will be easy enough to differentiate sentences
> ending
> > with a number from enumerated items. However, I imagine it would be
> much
> > more difficult to differentiate the two for example 4.
> >
> > Any suggestions are appreciated.
> >
> > Best,
> >
> > Dan
> >
> >  [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
>  __
>  R-help@r-project.org mailing li

Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Michael Hannon

Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
hadn't use stringr.

-- Mike


On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn  wrote:
> stringr::str_count (and stringi::stri_count that it wraps) interpret
> the pattern argument as a regular expression by default.
>
> Best,
> Ista
>
> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
>  wrote:
>> I like Boris's "Hadley" solution.  For the record, I've appended a
>> version that uses regular expressions, the only benefit of which is
>> that it could be generalized to find more-complicated patterns.
>>
>> -- Mike
>>
>> counts <- sapply(text1, function(next_string) {
>> loc_example <- length(gregexpr("Example", next_string)[[1]])
>> loc_example
>> }, USE.NAMES=FALSE)
>>
>>> counts
>> [1] 5 5 5 5
>>>
>>
>> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe  
>> wrote:
>>> I should add: there's a str_count() function in the stringr package.
>>>
>>> library(stringr)
>>> str_count(text1, "Example")
>>> # [1] 5 5 5 5
>>>
>>> I guess that would be the neater solution.
>>>
>>> B.
>>>
>>>
>>>
 On Apr 25, 2017, at 8:23 PM, Boris Steipe  wrote:

 How about:

 unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))


 Splitting your string on the five "Examples" in each gives six elements. 
 length(x) - 1 is the number of
 matches. You can use any regex instead of "example" if you need to tweak 
 what you are looking for.


 B.




> On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:
>
> Hi all,
>
> I am looking for a streamlined way of counting the number of enumerated
> items are each element of a character vector. For example:
>
>
> text1<-c("This is an example.
> List 1
> 1) Example 1
> 2) Example 2
> 10) Example 10
> List 2
> 1) Example 1
> 2) Example 2
> These have been examples.","This is another example.
> List 1
> 1. Example 1
> 2. Example 2
> 10. Example 10
> List 2
> 1. Example 1
> 2. Example 2
> These have been examples.","This is a third example. List 1 1) Example 1.
> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These 
> have
> been examples."
> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>
> text1
>
> ===
>
> I would like the result to be c(5,5,5,5). Notice that sometimes there are
> leading hard returns, other times not. Sometimes are there separate lists
> and the same numbers are used in the enumerated items multiple times 
> within
> each character string. Sometimes the leading numbers for the enumerated
> items exceed single digits. Notice that the delimiter may be ) or a period
> (.). If the delimiter is a period and there are hard returns (example 2),
> then I expect that will be easy enough to differentiate sentences ending
> with a number from enumerated items. However, I imagine it would be much
> more difficult to differentiate the two for example 4.
>
> Any suggestions are appreciated.
>
> Best,
>
> Dan
>
>  [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Ista Zahn

stringr::str_count (and stringi::stri_count that it wraps) interpret
the pattern argument as a regular expression by default.

Best,
Ista

On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
 wrote:
> I like Boris's "Hadley" solution.  For the record, I've appended a
> version that uses regular expressions, the only benefit of which is
> that it could be generalized to find more-complicated patterns.
>
> -- Mike
>
> counts <- sapply(text1, function(next_string) {
> loc_example <- length(gregexpr("Example", next_string)[[1]])
> loc_example
> }, USE.NAMES=FALSE)
>
>> counts
> [1] 5 5 5 5
>>
>
> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe  
> wrote:
>> I should add: there's a str_count() function in the stringr package.
>>
>> library(stringr)
>> str_count(text1, "Example")
>> # [1] 5 5 5 5
>>
>> I guess that would be the neater solution.
>>
>> B.
>>
>>
>>
>>> On Apr 25, 2017, at 8:23 PM, Boris Steipe  wrote:
>>>
>>> How about:
>>>
>>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))
>>>
>>>
>>> Splitting your string on the five "Examples" in each gives six elements. 
>>> length(x) - 1 is the number of
>>> matches. You can use any regex instead of "example" if you need to tweak 
>>> what you are looking for.
>>>
>>>
>>> B.
>>>
>>>
>>>
>>>
 On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:

 Hi all,

 I am looking for a streamlined way of counting the number of enumerated
 items are each element of a character vector. For example:


 text1<-c("This is an example.
 List 1
 1) Example 1
 2) Example 2
 10) Example 10
 List 2
 1) Example 1
 2) Example 2
 These have been examples.","This is another example.
 List 1
 1. Example 1
 2. Example 2
 10. Example 10
 List 2
 1. Example 1
 2. Example 2
 These have been examples.","This is a third example. List 1 1) Example 1.
 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
 been examples."
 ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
 10. List 2 Example 1. 2. Example 2. These have been examples.")

 text1

 ===

 I would like the result to be c(5,5,5,5). Notice that sometimes there are
 leading hard returns, other times not. Sometimes are there separate lists
 and the same numbers are used in the enumerated items multiple times within
 each character string. Sometimes the leading numbers for the enumerated
 items exceed single digits. Notice that the delimiter may be ) or a period
 (.). If the delimiter is a period and there are hard returns (example 2),
 then I expect that will be easy enough to differentiate sentences ending
 with a number from enumerated items. However, I imagine it would be much
 more difficult to differentiate the two for example 4.

 Any suggestions are appreciated.

 Best,

 Dan

  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Michael Hannon

I like Boris's "Hadley" solution.  For the record, I've appended a
version that uses regular expressions, the only benefit of which is
that it could be generalized to find more-complicated patterns.

-- Mike

counts <- sapply(text1, function(next_string) {
loc_example <- length(gregexpr("Example", next_string)[[1]])
loc_example
}, USE.NAMES=FALSE)

> counts
[1] 5 5 5 5
>

On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe  wrote:
> I should add: there's a str_count() function in the stringr package.
>
> library(stringr)
> str_count(text1, "Example")
> # [1] 5 5 5 5
>
> I guess that would be the neater solution.
>
> B.
>
>
>
>> On Apr 25, 2017, at 8:23 PM, Boris Steipe  wrote:
>>
>> How about:
>>
>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))
>>
>>
>> Splitting your string on the five "Examples" in each gives six elements. 
>> length(x) - 1 is the number of
>> matches. You can use any regex instead of "example" if you need to tweak 
>> what you are looking for.
>>
>>
>> B.
>>
>>
>>
>>
>>> On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:
>>>
>>> Hi all,
>>>
>>> I am looking for a streamlined way of counting the number of enumerated
>>> items are each element of a character vector. For example:
>>>
>>>
>>> text1<-c("This is an example.
>>> List 1
>>> 1) Example 1
>>> 2) Example 2
>>> 10) Example 10
>>> List 2
>>> 1) Example 1
>>> 2) Example 2
>>> These have been examples.","This is another example.
>>> List 1
>>> 1. Example 1
>>> 2. Example 2
>>> 10. Example 10
>>> List 2
>>> 1. Example 1
>>> 2. Example 2
>>> These have been examples.","This is a third example. List 1 1) Example 1.
>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
>>> been examples."
>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
>>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>>>
>>> text1
>>>
>>> ===
>>>
>>> I would like the result to be c(5,5,5,5). Notice that sometimes there are
>>> leading hard returns, other times not. Sometimes are there separate lists
>>> and the same numbers are used in the enumerated items multiple times within
>>> each character string. Sometimes the leading numbers for the enumerated
>>> items exceed single digits. Notice that the delimiter may be ) or a period
>>> (.). If the delimiter is a period and there are hard returns (example 2),
>>> then I expect that will be easy enough to differentiate sentences ending
>>> with a number from enumerated items. However, I imagine it would be much
>>> more difficult to differentiate the two for example 4.
>>>
>>> Any suggestions are appreciated.
>>>
>>> Best,
>>>
>>> Dan
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Boris Steipe

I should add: there's a str_count() function in the stringr package.

library(stringr)
str_count(text1, "Example")
# [1] 5 5 5 5

I guess that would be the neater solution.

B.



> On Apr 25, 2017, at 8:23 PM, Boris Steipe  wrote:
> 
> How about:
> 
> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))
> 
> 
> Splitting your string on the five "Examples" in each gives six elements. 
> length(x) - 1 is the number of
> matches. You can use any regex instead of "example" if you need to tweak what 
> you are looking for.
> 
> 
> B.
> 
> 
> 
> 
>> On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:
>> 
>> Hi all,
>> 
>> I am looking for a streamlined way of counting the number of enumerated
>> items are each element of a character vector. For example:
>> 
>> 
>> text1<-c("This is an example.
>> List 1
>> 1) Example 1
>> 2) Example 2
>> 10) Example 10
>> List 2
>> 1) Example 1
>> 2) Example 2
>> These have been examples.","This is another example.
>> List 1
>> 1. Example 1
>> 2. Example 2
>> 10. Example 10
>> List 2
>> 1. Example 1
>> 2. Example 2
>> These have been examples.","This is a third example. List 1 1) Example 1.
>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
>> been examples."
>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>> 
>> text1
>> 
>> ===
>> 
>> I would like the result to be c(5,5,5,5). Notice that sometimes there are
>> leading hard returns, other times not. Sometimes are there separate lists
>> and the same numbers are used in the enumerated items multiple times within
>> each character string. Sometimes the leading numbers for the enumerated
>> items exceed single digits. Notice that the delimiter may be ) or a period
>> (.). If the delimiter is a period and there are hard returns (example 2),
>> then I expect that will be easy enough to differentiate sentences ending
>> with a number from enumerated items. However, I imagine it would be much
>> more difficult to differentiate the two for example 4.
>> 
>> Any suggestions are appreciated.
>> 
>> Best,
>> 
>> Dan
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Boris Steipe

How about:

unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))


Splitting your string on the five "Examples" in each gives six elements. 
length(x) - 1 is the number of
matches. You can use any regex instead of "example" if you need to tweak what 
you are looking for.


B.




> On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:
> 
> Hi all,
> 
> I am looking for a streamlined way of counting the number of enumerated
> items are each element of a character vector. For example:
> 
> 
> text1<-c("This is an example.
> List 1
> 1) Example 1
> 2) Example 2
> 10) Example 10
> List 2
> 1) Example 1
> 2) Example 2
> These have been examples.","This is another example.
> List 1
> 1. Example 1
> 2. Example 2
> 10. Example 10
> List 2
> 1. Example 1
> 2. Example 2
> These have been examples.","This is a third example. List 1 1) Example 1.
> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
> been examples."
> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
> 10. List 2 Example 1. 2. Example 2. These have been examples.")
> 
> text1
> 
> ===
> 
> I would like the result to be c(5,5,5,5). Notice that sometimes there are
> leading hard returns, other times not. Sometimes are there separate lists
> and the same numbers are used in the enumerated items multiple times within
> each character string. Sometimes the leading numbers for the enumerated
> items exceed single digits. Notice that the delimiter may be ) or a period
> (.). If the delimiter is a period and there are hard returns (example 2),
> then I expect that will be easy enough to differentiate sentences ending
> with a number from enumerated items. However, I imagine it would be much
> more difficult to differentiate the two for example 4.
> 
> Any suggestions are appreciated.
> 
> Best,
> 
> Dan
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Dan Abner

Hi all,

I am looking for a streamlined way of counting the number of enumerated
items are each element of a character vector. For example:


text1<-c("This is an example.
List 1
1) Example 1
2) Example 2
10) Example 10
List 2
1) Example 1
2) Example 2
These have been examples.","This is another example.
List 1
1. Example 1
2. Example 2
10. Example 10
List 2
1. Example 1
2. Example 2
These have been examples.","This is a third example. List 1 1) Example 1.
2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
been examples."
,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
10. List 2 Example 1. 2. Example 2. These have been examples.")

text1

===

I would like the result to be c(5,5,5,5). Notice that sometimes there are
leading hard returns, other times not. Sometimes are there separate lists
and the same numbers are used in the enumerated items multiple times within
each character string. Sometimes the leading numbers for the enumerated
items exceed single digits. Notice that the delimiter may be ) or a period
(.). If the delimiter is a period and there are hard returns (example 2),
then I expect that will be easy enough to differentiate sentences ending
with a number from enumerated items. However, I imagine it would be much
more difficult to differentiate the two for example 4.

Any suggestions are appreciated.

Best,

Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-02 Thread Rolf Turner


On 03/10/15 04:42, David Winsemius wrote:


On Oct 2, 2015, at 2:33 AM, Duncan Murdoch wrote:





The zoo package replaces as.Date.numeric() with a function that
assumes an origin of "1970-01-01".  There may be other packages
that also make a replacement like this.  David appears to have one
of them attached, and you don't.


Quite right, Duncan. I failed to include the  even though it was staring me in the face. My wife
says I have an extreme case of "refrigerator blindness" which now
seems to be spreading to other areas of my cognitive activities.

Sorry, Rolf.


Quite alright.  The syndrome is *very* familiar to me! :-)

cheers,

Rolf



--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-02 Thread David Winsemius

On Oct 2, 2015, at 2:33 AM, Duncan Murdoch wrote:

> On 01/10/2015 11:29 PM, Rolf Turner wrote:
>> On 02/10/15 15:47, David Winsemius wrote:
>> 
>> 
>> 
>>> On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote:

 P.S. I have been unable to find a corresponding vector of the names
 of the days of the week, although I have a very vague recollection
 of the existence of such a vector.  Does it exist, and if so what
 is it called?
>>> 
>>> It's could called up by strptime because it is mapped to a character
>>> vector by the internationalization database:
>>> 
 format( as.Date(1:7)+2, format="%A")
>>> [1] "Sunday""Monday""Tuesday"   "Wednesday" "Thursday"
>>> "Friday" [7] "Saturday"
>> 
>> 
>> 
>> When I try that (copying and pasting your code so that there's no chance 
>> of fumble-fingering) I get:
>> 
>>> Error in as.Date.numeric(1:7) : 'origin' must be supplied
>> 
>> Why do these things always happen to *me*???
> 
> The zoo package replaces as.Date.numeric() with a function that assumes
> an origin of "1970-01-01".  There may be other packages that also make a
> replacement like this.  David appears to have one of them attached, and
> you don't.

Quite right, Duncan. I failed to include the  even 
though it was staring me in the face. My wife says I have an extreme case of 
"refrigerator blindness" which now seems to be spreading to other areas of my 
cognitive activities.

Sorry, Rolf.
-- 
David.

> 
> Duncan Murdoch
> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-02 Thread Duncan Murdoch

On 01/10/2015 11:29 PM, Rolf Turner wrote:
> On 02/10/15 15:47, David Winsemius wrote:
> 
> 
> 
>> On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote:
>>>
>>> P.S. I have been unable to find a corresponding vector of the names
>>> of the days of the week, although I have a very vague recollection
>>> of the existence of such a vector.  Does it exist, and if so what
>>> is it called?
>>
>> It's could called up by strptime because it is mapped to a character
>> vector by the internationalization database:
>>
>>> format( as.Date(1:7)+2, format="%A")
>> [1] "Sunday""Monday""Tuesday"   "Wednesday" "Thursday"
>> "Friday" [7] "Saturday"
> 
> 
> 
> When I try that (copying and pasting your code so that there's no chance 
> of fumble-fingering) I get:
> 
>> Error in as.Date.numeric(1:7) : 'origin' must be supplied
> 
> Why do these things always happen to *me*???

The zoo package replaces as.Date.numeric() with a function that assumes
an origin of "1970-01-01".  There may be other packages that also make a
replacement like this.  David appears to have one of them attached, and
you don't.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-01 Thread David Winsemius


On Oct 1, 2015, at 8:29 PM, Rolf Turner wrote:

> On 02/10/15 15:47, David Winsemius wrote:
> 
> 
> 
>> On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote:
>>> 
>>> P.S. I have been unable to find a corresponding vector of the names
>>> of the days of the week, although I have a very vague recollection
>>> of the existence of such a vector.  Does it exist, and if so what
>>> is it called?
>> 
>> It's could called up by strptime because it is mapped to a character
>> vector by the internationalization database:
>> 
>>> format( as.Date(1:7)+2, format="%A")
>> [1] "Sunday""Monday""Tuesday"   "Wednesday" "Thursday"
>> "Friday" [7] "Saturday"
> 
> 
> 
> When I try that (copying and pasting your code so that there's no chance of 
> fumble-fingering) I get:
> 
>> Error in as.Date.numeric(1:7) : 'origin' must be supplied
> 
> Why do these things always happen to *me*???

Or why am I so lucky as to avoid the need for an origin when the help page says 
the call is:

## S3 method for class 'numeric'
as.Date(x, origin, ...)# noting no default in the formals


The code says that origin should be supplied if it is missing:

> as.Date.numeric
function (x, origin, ...) 
{
if (missing(origin)) 
origin <- "1970-01-01"
if (identical(origin, "-00-00")) 
origin <- as.Date("-01-01", ...) - 1
as.Date(origin, ...) + x
}


-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-01 Thread Rolf Turner


On 02/10/15 15:47, David Winsemius wrote:




On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote:


P.S. I have been unable to find a corresponding vector of the names
of the days of the week, although I have a very vague recollection
of the existence of such a vector.  Does it exist, and if so what
is it called?


It's could called up by strptime because it is mapped to a character
vector by the internationalization database:


format( as.Date(1:7)+2, format="%A")

[1] "Sunday""Monday""Tuesday"   "Wednesday" "Thursday"
"Friday" [7] "Saturday"




When I try that (copying and pasting your code so that there's no chance 
of fumble-fingering) I get:



Error in as.Date.numeric(1:7) : 'origin' must be supplied


Why do these things always happen to *me*???

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-01 Thread David Winsemius


On Oct 1, 2015, at 6:22 PM, Rolf Turner wrote:

> On 02/10/15 10:54, peter dalgaard wrote:
> 
>>> On 01 Oct 2015, at 23:04 , Rolf Turner 
>>> wrote:
>>> 
>>> On 02/10/15 03:45, David L Carlson wrote:
>>> 
>>> 
>>> 
 If you want the month names:
 
> mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
 + "July", "Aug", "Sep", "Oct", "Nov", "Dec")
> dimnames(tbl)$Month <- mnt
>>> 
>>> 
>>> 
>>> Unnecessary typing; there is a built-in data set "month.abb" (in
>>> the "base" package) that is identical to your "mnt".
>>> 
>>> Difficult (nearly impossible!) to find, but, if you can't quite
>>> remember the name!  I *knew* I'd seen it, so I persisted and
>>> eventually tracked it down.
>>> 
>>> Strangely ??month or help.search("month") yield no trace of it.
>>> Pages and pages of (useless!) output but no sign of "month.abb"
>>> (nor of "month.name" which gives the unabbreviated month names).
>>> 
>>> Can anyone explain to me why "??" and help.search() are of no help
>>> here?
>> 
>> Umm,
>> 
>> --- Help files with alias or concept or title matching ‘month’
>> using fuzzy matching:
>> 
>> 
>> base::Constants Built-in Constants Aliases: month.abb,
>> month.name  ---
> 
> Hmm. When I did ??month I got a completely different display. It
> contained *absolutely no* mention of month.abb. That *seems* to be
> because I have help_type set to "html". When I re-set help_type to
> "text", I get a display like unto the one that you obtained (and it does 
> indeed lead one to month.abb).
> 
> It seems to me ver' strange that one gets a different collection of
> information under help_type="text" than one does under help_type="html".
> If I were me, I would classify this as a bug.
> 
>> Also, entering "month" gives the completions
>> 
>>> month
>> month.abb  monthplot  months.Date month.name months
>> months.POSIXt
> 
> Yes, I eventually managed to come up with this trick as well.  But that is 
> not really relevant to the phenomenon that "??" or help.search() don't work 
> effectively, or at least not consistently (the effectiveness appearing to 
> depend --- for some bizarre reason --- on the value of help_type).
> 
> cheers,
> 
> Rolf
> 
> P.S. I have been unable to find a corresponding vector of the names of the 
> days of the week, although I have a very vague recollection of the existence 
> of such a vector.  Does it exist, and if so what is it called?

It's could called up by strptime because it is mapped to a character vector by 
the internationalization database:

> format( as.Date(1:7)+2, format="%A")
[1] "Sunday""Monday""Tuesday"   "Wednesday" "Thursday"  "Friday"   
[7] "Saturday" 


> Or is my recollection an illusion brought on by advancing senility?
> 
> R.
> 
> -- 
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-01 Thread Rolf Turner


On 02/10/15 10:54, peter dalgaard wrote:


On 01 Oct 2015, at 23:04 , Rolf Turner 
wrote:

On 02/10/15 03:45, David L Carlson wrote:




If you want the month names:


mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",

+ "July", "Aug", "Sep", "Oct", "Nov", "Dec")

dimnames(tbl)$Month <- mnt




Unnecessary typing; there is a built-in data set "month.abb" (in
the "base" package) that is identical to your "mnt".

Difficult (nearly impossible!) to find, but, if you can't quite
remember the name!  I *knew* I'd seen it, so I persisted and
eventually tracked it down.

Strangely ??month or help.search("month") yield no trace of it.
Pages and pages of (useless!) output but no sign of "month.abb"
(nor of "month.name" which gives the unabbreviated month names).

Can anyone explain to me why "??" and help.search() are of no help
here?


Umm,

--- Help files with alias or concept or title matching ‘month’
using fuzzy matching:


base::Constants Built-in Constants Aliases: month.abb,
month.name  ---


Hmm. When I did ??month I got a completely different display. It
contained *absolutely no* mention of month.abb. That *seems* to be
because I have help_type set to "html". When I re-set help_type to
"text", I get a display like unto the one that you obtained (and it does 
indeed lead one to month.abb).


It seems to me ver' strange that one gets a different collection of
information under help_type="text" than one does under help_type="html".
If I were me, I would classify this as a bug.


Also, entering "month" gives the completions


month

month.abb  monthplot  months.Date month.name months
months.POSIXt


Yes, I eventually managed to come up with this trick as well.  But that 
is not really relevant to the phenomenon that "??" or help.search() 
don't work effectively, or at least not consistently (the effectiveness 
appearing to depend --- for some bizarre reason --- on the value of 
help_type).


cheers,

Rolf

P.S. I have been unable to find a corresponding vector of the names of 
the days of the week, although I have a very vague recollection of the 
existence of such a vector.  Does it exist, and if so what is it called?

Or is my recollection an illusion brought on by advancing senility?

R.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-01 Thread peter dalgaard


> On 01 Oct 2015, at 23:04 , Rolf Turner  wrote:
> 
> On 02/10/15 03:45, David L Carlson wrote:
> 
> 
> 
>> If you want the month names:
>> 
>>> mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
>> + "July", "Aug", "Sep", "Oct", "Nov", "Dec")
>>> dimnames(tbl)$Month <- mnt
> 
> 
> 
> Unnecessary typing; there is a built-in data set "month.abb" (in the
> "base" package) that is identical to your "mnt".
> 
> Difficult (nearly impossible!) to find, but, if you can't quite remember the 
> name!  I *knew* I'd seen it, so I persisted and eventually tracked it down.
> 
> Strangely ??month or help.search("month") yield no trace of it.  Pages and 
> pages of (useless!) output but no sign of "month.abb" (nor of "month.name" 
> which gives the unabbreviated month names).
> 
> Can anyone explain to me why "??" and help.search() are of no help here?

Umm,

---
Help files with alias or concept or title matching ‘month’ using fuzzy
matching:


base::Constants Built-in Constants
  Aliases: month.abb, month.name

---

Also, entering "month" gives the completions

> month
month.abb  monthplot  months.Date
month.name months months.POSIXt  

-pd

> 
> cheers,
> 
> Rolf Turner
> 
> -- 
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-01 Thread Rolf Turner


On 02/10/15 03:45, David L Carlson wrote:




If you want the month names:


mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",

+ "July", "Aug", "Sep", "Oct", "Nov", "Dec")

dimnames(tbl)$Month <- mnt




Unnecessary typing; there is a built-in data set "month.abb" (in the
"base" package) that is identical to your "mnt".

Difficult (nearly impossible!) to find, but, if you can't quite remember 
the name!  I *knew* I'd seen it, so I persisted and eventually tracked 
it down.


Strangely ??month or help.search("month") yield no trace of it.  Pages 
and pages of (useless!) output but no sign of "month.abb" (nor of 
"month.name" which gives the unabbreviated month names).


Can anyone explain to me why "??" and help.search() are of no help here?

cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-10-01 Thread David L Carlson

You should always reply to the list since other posters may have other 
suggestions. Assuming your data frame is called rain:

> str(rain)
'data.frame':   2192 obs. of  4 variables:
 $ Year  : int  1960 1960 1960 1960 1960 1960 1960 1960 1960 1960 ...
 $ Month : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Day   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Amount: num  0.3 0 0 0 0 2.7 7.1 14 12.6 11.1 ...

> tbl <- xtabs(~Year+Month, rain, subset=Amount > 0.01)
> tbl
  Month
Year1  2  3  4  5  6  7  8  9 10 11 12
  1960 24 15  2 12 19 22 18 24 22 20 30 29
  1961 26  9 10 18 18 11 18 14 24 28 30 31
  1962 22 14 19  2 18 19 27 26 26 29 15 28
  1963 27 17 15  4  9 23 16 24 19 28 30 22
  1964 15 25  9 13 19 14 23 20 24 30 25 27
  1965 13 21 12 10 21 24 22 21 28 23 28 31

If you want the month names:

> mnt <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
+ "July", "Aug", "Sep", "Oct", "Nov", "Dec")
> dimnames(tbl)$Month <- mnt
> tbl
  Month
Year   Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
  1960  24  15   2  12  19  22   18  24  22  20  30  29
  1961  26   9  10  18  18  11   18  14  24  28  30  31
  1962  22  14  19   2  18  19   27  26  26  29  15  28
  1963  27  17  15   4   9  23   16  24  19  28  30  22
  1964  15  25   9  13  19  14   23  20  24  30  25  27
  1965  13  21  12  10  21  24   22  21  28  23  28  31

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

From: smart hendsome [mailto:putra_autum...@yahoo.com] 
Sent: Wednesday, September 30, 2015 9:24 PM
To: David L Carlson
Subject: Re: [R] Counting number of rain

Hi David,

Thanks for your reply, this is my data using dput;

structure(list(Year = c(1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 1960L, 
1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 
1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 
1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 
1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 
1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 
1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 
1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 
1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L, 1961L

Re: [R] Counting occurrences of a set of values

2015-09-10 Thread Frank Schwidom



df <- data.frame( V1= 1, V2= c( 2, 3, 2, 1), V3= c( 1, 2, 1, 1))
dfO <- df[ do.call( order, df), ]
dfOD <- duplicated( dfO)
dfODTrigger <- ! c( dfOD[-1], FALSE)
dfOCounts <- diff( c( 0, which( dfODTrigger)))
cbind( dfO[ dfODTrigger, ], dfOCounts)

  V1 V2 V3 dfOCounts
4  1  1  1 1
3  1  2  1 2
2  1  3  2 1

Regards


On Thu, Sep 10, 2015 at 01:11:24PM +, Thomas Chesney wrote:
> Can anyone suggest a way of counting how frequently sets of values occurs in 
> a data frame? Like table() only with sets.
> 
> So for a dataset:
> 
> V1, V2, V3
> 1, 2, 1
> 1, 3, 2
> 1, 2, 1
> 1, 1, 1
> 
> The output would be something like:
> 
> 1,2,1: 2
> 1,3,2: 1
> 1,1,1: 1
> 
> Thank you,
> 
> Thomas Chesney
> 
> 
> 
> This message and any attachment are intended solely for the addressee
> and may contain confidential information. If you have received this
> message in error, please send it back to me, and immediately delete it. 
> 
> Please do not use, copy or disclose the information contained in this
> message or in any attachment.  Any views or opinions expressed by the
> author of this email do not necessarily reflect the views of the
> University of Nottingham.
> 
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your
> computer system, you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting occurrences of a set of values

2015-09-10 Thread Thierry Onkelinx

Have a look at the dplyr package

library(dplyr)
n <- 1000
data_frame(
  V1 = sample(0:1, n, replace = TRUE),
  V2 = sample(0:1, n, replace = TRUE),
  V3 = sample(0:1, n, replace = TRUE)
) %>%
  group_by(V1, V2, V3) %>%
  mutate(
Freq = n()
  )


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-09-10 15:11 GMT+02:00 Thomas Chesney :

> Can anyone suggest a way of counting how frequently sets of values occurs
> in a data frame? Like table() only with sets.
>
> So for a dataset:
>
> V1, V2, V3
> 1, 2, 1
> 1, 3, 2
> 1, 2, 1
> 1, 1, 1
>
> The output would be something like:
>
> 1,2,1: 2
> 1,3,2: 1
> 1,1,1: 1
>
> Thank you,
>
> Thomas Chesney
>
>
>
> This message and any attachment are intended solely for the addressee
> and may contain confidential information. If you have received this
> message in error, please send it back to me, and immediately delete it.
>
> Please do not use, copy or disclose the information contained in this
> message or in any attachment.  Any views or opinions expressed by the
> author of this email do not necessarily reflect the views of the
> University of Nottingham.
>
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your
> computer system, you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting occurrences of a set of values

2015-09-10 Thread Fox, John

Dear Thomas,

How about this?

> table(apply(Data, 1, paste, collapse=","))

1,1,1 1,2,1 1,3,2 
1 2 1

I hope this helps,
 John

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas
> Chesney
> Sent: September 10, 2015 9:11 AM
> To: r-help@r-project.org
> Subject: [R] Counting occurrences of a set of values
> 
> Can anyone suggest a way of counting how frequently sets of values occurs in a
> data frame? Like table() only with sets.
> 
> So for a dataset:
> 
> V1, V2, V3
> 1, 2, 1
> 1, 3, 2
> 1, 2, 1
> 1, 1, 1
> 
> The output would be something like:
> 
> 1,2,1: 2
> 1,3,2: 1
> 1,1,1: 1
> 
> Thank you,
> 
> Thomas Chesney
> 
> 
> 
> This message and any attachment are intended solely for the addressee and may
> contain confidential information. If you have received this message in error,
> please send it back to me, and immediately delete it.
> 
> Please do not use, copy or disclose the information contained in this message 
> or
> in any attachment.  Any views or opinions expressed by the author of this 
> email
> do not necessarily reflect the views of the University of Nottingham.
> 
> This message has been checked for viruses but the contents of an attachment
> may still contain software viruses which could damage your computer system,
> you are advised to perform your own checks. Email communications with the
> University of Nottingham may be monitored as permitted by UK legislation.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting occurrences of a set of values

2015-09-10 Thread Duncan Murdoch

On 10/09/2015 9:11 AM, Thomas Chesney wrote:
> Can anyone suggest a way of counting how frequently sets of values occurs in 
> a data frame? Like table() only with sets.

Do you want 1,2,1 to be the same as 1,1,2, or different?  What about
1,2,2?  For sets, those are all the same, but for most purposes, they
aren't.  If you really want to keep the ordering, then table() does the
counting you want, it just returns it in an ugly format.

Duncan Murdoch


> 
> So for a dataset:
> 
> V1, V2, V3
> 1, 2, 1
> 1, 3, 2
> 1, 2, 1
> 1, 1, 1
> 
> The output would be something like:
> 
> 1,2,1: 2
> 1,3,2: 1
> 1,1,1: 1
> 
> Thank you,
> 
> Thomas Chesney
> 
> 
> 
> This message and any attachment are intended solely for the addressee
> and may contain confidential information. If you have received this
> message in error, please send it back to me, and immediately delete it. 
> 
> Please do not use, copy or disclose the information contained in this
> message or in any attachment.  Any views or opinions expressed by the
> author of this email do not necessarily reflect the views of the
> University of Nottingham.
> 
> This message has been checked for viruses but the contents of an
> attachment may still contain software viruses which could damage your
> computer system, you are advised to perform your own checks. Email
> communications with the University of Nottingham may be monitored as
> permitted by UK legislation.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting occurrences of a set of values

2015-09-10 Thread Thomas Chesney

Can anyone suggest a way of counting how frequently sets of values occurs in a 
data frame? Like table() only with sets.

So for a dataset:

V1, V2, V3
1, 2, 1
1, 3, 2
1, 2, 1
1, 1, 1

The output would be something like:

1,2,1: 2
1,3,2: 1
1,1,1: 1

Thank you,

Thomas Chesney



This message and any attachment are intended solely for the addressee
and may contain confidential information. If you have received this
message in error, please send it back to me, and immediately delete it. 

Please do not use, copy or disclose the information contained in this
message or in any attachment.  Any views or opinions expressed by the
author of this email do not necessarily reflect the views of the
University of Nottingham.

This message has been checked for viruses but the contents of an
attachment may still contain software viruses which could damage your
computer system, you are advised to perform your own checks. Email
communications with the University of Nottingham may be monitored as
permitted by UK legislation.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of rain

2015-09-08 Thread John Kane

Assuming your data is already in R format please sent it  dput() format.  See 
?dput or 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 and http://adv-r.had.co.nz/Reproducibility.html for more details.

John Kane
Kingston ON Canada


> -Original Message-
> From: r-help@r-project.org
> Sent: Tue, 8 Sep 2015 06:58:58 + (UTC)
> To: r-help@r-project.org
> Subject: [R] Counting number of rain
> 
> Hello R-users,
> I want to ask how to count the number of daily rain data.  My data as
> below:
>  Year Month Day Amount 1901 1 1 0 1901 1 2 3 1901 1 3 0 1901 1 4 0.5 1901
> 1 5 0 1901 1 6 0  1901 1 7 0.3 1901 1 8 0 1901 1 9 0 1901 1 10 0 1901 1
> 11 0.5 1901 1 12 1.8 1901 1 13 0 1901 1 14 0 1901 1 15 2.5 1901 1 16 0
> 1901 1 17 0 1901 1 18 0 1901 1 19 0 1901 1 20 0 1901 1 21 0 1901 1 22 0
> 1901 1 23 0 1901 1 24 0 1901 1 25 0 1901 1 26 16.5 1901 1 27 0.3 1901 1
> 28 0 1901 1 29 0 1901 1 30 0 1901 1 31 0 1901 2 1 0 1901 2 2 0 1901 2 3 0
> 1901 2 4 0 1901 2 5 0 1901 2 6 0 1901 2 7 0 1901 2 8 0.3 1901 2 9 0 1901
> 2 10 0 1901 2 11 0 1901 2 12 1 1901 2 13 0.3 1901 2 14 0 1901 2 15 0 1901
> 2 16 0 1901 2 17 0 1901 2 18 0 1901 2 19 0 1901 2 20 0 1901 2 21 0 1901 2
> 22 0 1901 2 23 0.3 1901 2 24 0 1901 2 25 0 1901 2 26 0.3 1901 2 27 0 1901
> 2 28 0 1901 3 1 0 1901 3 2 0.8 1901 3 3 2.3 1901 3 4 0 1901 3 5 0 1901 3
> 6 0 1901 3 7 0 1901 3 8 0 1901 3 9 0 1901 3 10 2 1901 3 11 0 1901 3 12 0
> 1901 3 13 0 1901 3 14 0 1901 3 15 0 1901 3 16 0 1901 3 17 0 1901 3 18 0
> 1901 3 19 0 1901 3 20 0 1901 3 21 0 1901 3 22 1.5 1901 3 23 1.3 1901 3 24
> 0 1901 3 25 0 1901 3 26 0 1901 3 27 0 1901 3 28 0.3 1901 3 29 0.3  1901 3
> 30 4.6 1901 3 31 0 1901 4 1 0 1901 4 2 4.6 1901 4 3 30.7 1901 4 4 0 1901
> 4 5 0 1901 4 6 0 1901 4 7 0 1901 4 8 0 1901 4 9 0 1901 4 10 0 1901 4 11 0
> 1901 4 12 0 1901 4 13 0 1901 4 14 0 1901 4 15 0.3 1901 4 16 1.3 1901 4 17
> 0 1901 4 18 0 1901 4 19 0.3 1901 4 20 1 1901 4 21 9.4 1901 4 22 0.5 1901
> 4 23 0.3 1901 4 24 0 1901 4 25 0 1901 4 26 0 1901 4 27 0 1901 4 28 0 1901
> 4 29 0 1901 4 30 0 1901 5 1 0 1901 5 2 0 1901 5 3 0 1901 5 4 0 1901 5 5 0
> 1901  5 6 0 1901 5 7 0 1901 5 8 0.5 1901 5 9 2.3 1901 5 10 0.3 1901 5 11
> 0 1901 5 12 0 1901 5 13 0 1901 5 14 0 1901 5 15 0 1901 5 16 0 1901 5 17 0
> 1901 5 18 0 1901 5 19 0 1901 5 20 0 1901 5 21 0.5 1901 5 22 0 1901 5 23 0
> 1901 5 24 0 1901 5 25 0 1901 5 26 4.8 1901 5 27 10.9 1901 5 28 3.6 1901 5
> 29 0 1901 5 30 0 1901 5 31 5.1 1901 6 1 0.5 1901 6 2 0 1901 6 3 2 1901 6
> 4 0  1901 6 5 10.2 1901 6 6 33.3 1901 6 7 0.3 1901 6 8 0 1901 6 9 0 1901
> 6 10 0.5 1901 6 11 0.5 1901 6 12 0.3 1901 6 13 2.8 1901 6 14 5.6 1901 6
> 15 0.3 1901 6 16 6.6 1901 6 17 14.2 1901 6 18 4.8  1901 6 19 8.4 1901 6
> 20 1.8 1901 6 21 1.8 1901 6 22 0.3 1901 6 23 8.6 1901 6 24 0 1901 6 25 0
> 1901 6 26 0 1901 6 27 0 1901 6 28 0 1901 6 29 0 1901 6 30 0 1901 7 1 0
> 1901 7 2 0 1901 7 3 0 1901 7 4 0 1901 7 5 1 1901 7 6 0.5 1901 7 7 0.3
> 1901 7 8 0.3 1901 7 9 6.1 1901 7 10 0.3  1901 7 11 1.5 1901 7 12 0 1901 7
> 13 1.5 1901 7 14 0.3 1901 7 15 3.3 1901 7 16 2.3 1901 7 17 0.5  1901 7 18
> 0 1901 7 19 0 1901 7 20 0 1901 7 21 1.8 1901 7 22 0 1901 7 23 1 1901 7 24
> 0.3 1901  7 25 0.3 1901 7 26 1.3 1901 7 27 17 1901 7 28 6.6 1901 7 29 6.1
> 1901 7 30 0.5 1901 7 31 0.3 1901 8 1 0 1901 8 2 0 1901 8 3 0 1901 8 4 0
> 1901 8 5 0 1901 8 6 3.3 1901 8 7 4.1 1901 8 8 0.3  1901 8 9 0 1901 8 10 0
> 1901 8 11 0 1901 8 12 0 1901 8 13 0 1901 8 14 0 1901 8 15 0 1901 8 16 0
> 1901 8 17 0.5 1901 8 18 0 1901 8 19 0 1901 8 20 0 1901 8 21 0 1901 8 22 0
> 1901 8 23 0.3 1901 8 24 1 1901 8 25 0 1901 8 26 0 1901 8 27 10.2 1901 8
> 28 1.5 1901 8 29 0.5 1901 8 30 1.3  1901 8 31 0 1901 9 1 0 1901 9 2 3
> 1901 9 3 1 1901 9 4 0.5 1901 9 5 0.3 1901 9 6 0 1901 9 7 0 1901 9 8 2.3
> 1901 9 9 0.3 1901 9 10 0 1901 9 11 0 1901 9 12 0 1901 9 13 0 1901 9 14 0
> 1901 9 15 0 1901 9 16 0 1901 9 17 0 1901 9 18 1.8 1901 9 19 8.1 1901 9 20
> 0.3 1901 9 21 5.8 1901 9 22 4.1 1901 9 23 0.3 1901 9 24 1.8 1901 9 25 0
> 1901 9 26 0 1901 9 27 0 1901 9 28 0 1901  9 29 1.8 1901 9 30 0.8 1901 10
> 1 0 1901 10 2 0 1901 10 3 0 1901 10 4 0 1901 10 5 0.3 1901 10 6 0 1901 10
> 7 0 1901 10 8 0 1901 10 9 0 1901 10 10 0 1901 10 11 0.3 1901 10 12 3.8
> 1901 10 13 0.4 1901 10 14 9 1901 10 15 2 1901 10 16 1 1901 10 17 0 1901
> 10 18 0 1901 10 19 0 1901 10 20 0.3 1901 10 21 0 1901 10 22 0 1901 10 23
> 0 1901 10 24 0 1901 10 25 0 1901 10 26 0 1901 10 27 14.5  1901 10 28 6.4
> 1901 10 29 0.8 1901 10 30 0 1901 10 31 0 1901 11 1 0 1901 11 2 0 1901 11
> 3 0  1901 11 4 0 1901 11 5 0 1901 11 6 0 1901 11 7 0 1901 11 8 0 1901 11
> 9 0 1901 11 10 0 1901 11 11 0 1901 11 12 5.1 1901 11 13 0.3 1901 11 14
> 5.8 1901 11 15 0 1901 11 16 0 1901 11 17 1 1901 11 18 0.5 1901 11 19 0
> 1901 11 20 0 1901 11 21 0 1901 11 22 0 1901 11 23 0 1901 11

Re: [R] Counting number of rain

2015-09-08 Thread Dan D

Try the following:

## step 1: write raw data to an array
junk<-scan('clipboard') 
# entering the numbers (not the 'year' etc. labels) into R as a vector after

junk<-t(array(junk,dim=c(4,length(junk)/4))) 
# convert the vector into a 2-d array with 4 columns (year, month, day,
amount)

## step 2: create a dataframe to store and display the results
nyr<-length(unique(junk[,1]))
ans<-data.frame(array(dim=c(nyr,12))) # a dataframe for storing the results
names(ans)<-c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
yrs<-sort(unique(junk[,1]))
row.names(ans)<-yrs

# step 3: calculate
for (yi in 1:nyr){ # loop through the years...
  for (mi in 1:12){ # ...and the months
 ans[yi,mi]<-sum(junk[junk[,1]==yrs[yi] & junk[,2]==mi,4]>0.01) #
count the rainy days by
 # first subsetting the junk array by rows that match the given year and
month and sum
  }
}

Does that help?

- Dan



--
View this message in context: 
http://r.789695.n4.nabble.com/Counting-number-of-rain-tp4712007p4712011.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting number of rain

2015-09-08 Thread smart hendsome via R-help

Hello R-users,
I want to ask how to count the number of daily rain data.  My data as below:
 Year Month Day Amount 1901 1 1 0 1901 1 2 3 1901 1 3 0 1901 1 4 0.5 1901 1 5 0 
1901 1 6 0  1901 1 7 0.3 1901 1 8 0 1901 1 9 0 1901 1 10 0 1901 1 11 0.5 1901 1 
12 1.8 1901 1 13 0 1901 1 14 0 1901 1 15 2.5 1901 1 16 0 1901 1 17 0 1901 1 18 
0 1901 1 19 0 1901 1 20 0 1901 1 21 0 1901 1 22 0 1901 1 23 0 1901 1 24 0 1901 
1 25 0 1901 1 26 16.5 1901 1 27 0.3 1901 1 28 0 1901 1 29 0 1901 1 30 0 1901 1 
31 0 1901 2 1 0 1901 2 2 0 1901 2 3 0 1901 2 4 0 1901 2 5 0 1901 2 6 0 1901 2 7 
0 1901 2 8 0.3 1901 2 9 0 1901 2 10 0 1901 2 11 0 1901 2 12 1 1901 2 13 0.3 
1901 2 14 0 1901 2 15 0 1901 2 16 0 1901 2 17 0 1901 2 18 0 1901 2 19 0 1901 2 
20 0 1901 2 21 0 1901 2 22 0 1901 2 23 0.3 1901 2 24 0 1901 2 25 0 1901 2 26 
0.3 1901 2 27 0 1901 2 28 0 1901 3 1 0 1901 3 2 0.8 1901 3 3 2.3 1901 3 4 0 
1901 3 5 0 1901 3 6 0 1901 3 7 0 1901 3 8 0 1901 3 9 0 1901 3 10 2 1901 3 11 0 
1901 3 12 0 1901 3 13 0 1901 3 14 0 1901 3 15 0 1901 3 16 0 1901 3 17 0 1901 3 
18 0 1901 3 19 0 1901 3 20 0 1901 3 21 0 1901 3 22 1.5 1901 3 23 1.3 1901 3 24 
0 1901 3 25 0 1901 3 26 0 1901 3 27 0 1901 3 28 0.3 1901 3 29 0.3  1901 3 30 
4.6 1901 3 31 0 1901 4 1 0 1901 4 2 4.6 1901 4 3 30.7 1901 4 4 0 1901 4 5 0 
1901 4 6 0 1901 4 7 0 1901 4 8 0 1901 4 9 0 1901 4 10 0 1901 4 11 0 1901 4 12 0 
1901 4 13 0 1901 4 14 0 1901 4 15 0.3 1901 4 16 1.3 1901 4 17 0 1901 4 18 0 
1901 4 19 0.3 1901 4 20 1 1901 4 21 9.4 1901 4 22 0.5 1901 4 23 0.3 1901 4 24 0 
1901 4 25 0 1901 4 26 0 1901 4 27 0 1901 4 28 0 1901 4 29 0 1901 4 30 0 1901 5 
1 0 1901 5 2 0 1901 5 3 0 1901 5 4 0 1901 5 5 0 1901  5 6 0 1901 5 7 0 1901 5 8 
0.5 1901 5 9 2.3 1901 5 10 0.3 1901 5 11 0 1901 5 12 0 1901 5 13 0 1901 5 14 0 
1901 5 15 0 1901 5 16 0 1901 5 17 0 1901 5 18 0 1901 5 19 0 1901 5 20 0 1901 5 
21 0.5 1901 5 22 0 1901 5 23 0 1901 5 24 0 1901 5 25 0 1901 5 26 4.8 1901 5 27 
10.9 1901 5 28 3.6 1901 5 29 0 1901 5 30 0 1901 5 31 5.1 1901 6 1 0.5 1901 6 2 
0 1901 6 3 2 1901 6 4 0  1901 6 5 10.2 1901 6 6 33.3 1901 6 7 0.3 1901 6 8 0 
1901 6 9 0 1901 6 10 0.5 1901 6 11 0.5 1901 6 12 0.3 1901 6 13 2.8 1901 6 14 
5.6 1901 6 15 0.3 1901 6 16 6.6 1901 6 17 14.2 1901 6 18 4.8  1901 6 19 8.4 
1901 6 20 1.8 1901 6 21 1.8 1901 6 22 0.3 1901 6 23 8.6 1901 6 24 0 1901 6 25 0 
 1901 6 26 0 1901 6 27 0 1901 6 28 0 1901 6 29 0 1901 6 30 0 1901 7 1 0 1901 7 
2 0 1901 7 3 0 1901 7 4 0 1901 7 5 1 1901 7 6 0.5 1901 7 7 0.3 1901 7 8 0.3 
1901 7 9 6.1 1901 7 10 0.3  1901 7 11 1.5 1901 7 12 0 1901 7 13 1.5 1901 7 14 
0.3 1901 7 15 3.3 1901 7 16 2.3 1901 7 17 0.5  1901 7 18 0 1901 7 19 0 1901 7 
20 0 1901 7 21 1.8 1901 7 22 0 1901 7 23 1 1901 7 24 0.3 1901  7 25 0.3 1901 7 
26 1.3 1901 7 27 17 1901 7 28 6.6 1901 7 29 6.1 1901 7 30 0.5 1901 7 31 0.3 
1901 8 1 0 1901 8 2 0 1901 8 3 0 1901 8 4 0 1901 8 5 0 1901 8 6 3.3 1901 8 7 
4.1 1901 8 8 0.3  1901 8 9 0 1901 8 10 0 1901 8 11 0 1901 8 12 0 1901 8 13 0 
1901 8 14 0 1901 8 15 0 1901 8 16 0 1901 8 17 0.5 1901 8 18 0 1901 8 19 0 1901 
8 20 0 1901 8 21 0 1901 8 22 0 1901 8 23 0.3 1901 8 24 1 1901 8 25 0 1901 8 26 
0 1901 8 27 10.2 1901 8 28 1.5 1901 8 29 0.5 1901 8 30 1.3  1901 8 31 0 1901 9 
1 0 1901 9 2 3 1901 9 3 1 1901 9 4 0.5 1901 9 5 0.3 1901 9 6 0 1901 9 7 0 1901 
9 8 2.3 1901 9 9 0.3 1901 9 10 0 1901 9 11 0 1901 9 12 0 1901 9 13 0 1901 9 14 
0  1901 9 15 0 1901 9 16 0 1901 9 17 0 1901 9 18 1.8 1901 9 19 8.1 1901 9 20 
0.3 1901 9 21 5.8 1901 9 22 4.1 1901 9 23 0.3 1901 9 24 1.8 1901 9 25 0 1901 9 
26 0 1901 9 27 0 1901 9 28 0 1901  9 29 1.8 1901 9 30 0.8 1901 10 1 0 1901 10 2 
0 1901 10 3 0 1901 10 4 0 1901 10 5 0.3 1901 10 6 0 1901 10 7 0 1901 10 8 0 
1901 10 9 0 1901 10 10 0 1901 10 11 0.3 1901 10 12 3.8 1901 10 13 0.4 1901 10 
14 9 1901 10 15 2 1901 10 16 1 1901 10 17 0 1901 10 18 0 1901 10 19 0 1901 10 
20 0.3 1901 10 21 0 1901 10 22 0 1901 10 23 0 1901 10 24 0 1901 10 25 0 1901 10 
26 0 1901 10 27 14.5  1901 10 28 6.4 1901 10 29 0.8 1901 10 30 0 1901 10 31 0 
1901 11 1 0 1901 11 2 0 1901 11 3 0  1901 11 4 0 1901 11 5 0 1901 11 6 0 1901 
11 7 0 1901 11 8 0 1901 11 9 0 1901 11 10 0 1901 11 11 0 1901 11 12 5.1 1901 11 
13 0.3 1901 11 14 5.8 1901 11 15 0 1901 11 16 0 1901 11 17 1 1901 11 18 0.5 
1901 11 19 0 1901 11 20 0 1901 11 21 0 1901 11 22 0 1901 11 23 0 1901 11 24 0 
1901 11 25 0.3 1901 11 26 0 1901 11 27 0 1901 11 28 0 1901 11 29 0 1901 11 30 
3.3 1901 12 1 0 1901 12 2 0 1901 12 3 0 1901 12 4 0 1901 12 5 0 1901 12 6 0 
1901 12 7 0 1901 12 8 0 1901 12 9 0 1901 12 10 0 1901 12 11 0 1901 12 12 0 1901 
12 13 0 1901 12 14 0 1901 12 15 0 1901 12 16 0 1901 12 17 0 1901 12 18 0 1901 
12 19 0 1901 12 20 0 1901 12 21 6.1 1901 12 22 5.6 1901 12 23 0 1901 12 24 0 
1901 12 25 0 1901 12 26 0 1901 12 27 0 1901 12 28 0 1901 12 29 0 1901 12 30 0 
1901 12 31 9.9 1902 1 1 0 1902 1 2 0 1902 1 3 0 1902 1 4 4.1 1902 1 5 0 1902 1 
6 0 1902 1 7 0 1902 1 8 0 1902 1 9 2.5 1902 1 10 0 1902 1 11 0 1902 1 12 0 1902 
1 13 0.3 1902 1 14 0 1902 1 15 0 1902 1 16 0 1902 1 17 0 1902 1 18 0 19

Re: [R] counting similar strings in data.frame

2015-06-26 Thread PIKAL Petr

OK. I do not have canned solution for you, but

temp <- apply(test,1, table)

gives you number of occurences in each row. From this it shall be possible to 
extract name info and number info

lapply(temp, function(x) x[x>1])

[[1]]
four
   2

[[2]]

one two
  2   2

[[3]]
three
2

[[4]]
four
   3

Here you have numbers and strings and you need to combine them. However I am 
not sure how. If you want to use them for some further computation, maybe list 
structure is as good as data.frame with namy NAs.

Cheers
Petr


> -Original Message-
> From: Knut Krueger [mailto:r...@knut-krueger.de]
> Sent: Friday, June 26, 2015 12:50 PM
> To: PIKAL Petr; r-h...@stat.math.ethz.ch
> Subject: Re: [R] counting similar strings in data.frame
>
> Am 26.06.2015 um 10:38 schrieb PIKAL Petr:
> > Hi
> >
> > I am little bit lost in your logic. Why triple in your fourth line is
> one. I expected it will be four?
> >
> > Petr
> Sorry yes you are right ...
>
> type mismatch
> Knut
>
>
>
>



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting similar strings in data.frame

2015-06-26 Thread Knut Krueger


Am 26.06.2015 um 10:38 schrieb PIKAL Petr:

Hi

I am little bit lost in your logic. Why triple in your fourth line is one. I 
expected it will be four?

Petr

Sorry yes you are right ...

type mismatch
Knut

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting similar strings in data.frame

2015-06-26 Thread Knut Krueger


Sorry last count was wrong ...

test =data.frame("first"=c("seven","two","five","four"),
 "second"=c("three","one","three","one"),
 "third"=c("four","two","three","four"),
 "fourth"=c("four","one","one","four"))

count =data.frame("dobule1"=c("four","two","three","NA"),
 "double2"=c("NA","one","NA","NA"),
 "triple"=c("NA","NA","NA","four"))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting similar strings in data.frame

2015-06-26 Thread PIKAL Petr

Hi

I am little bit lost in your logic. Why triple in your fourth line is one. I 
expected it will be four?

Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Knut
> Krueger
> Sent: Friday, June 26, 2015 10:10 AM
> To: r-h...@stat.math.ethz.ch
> Subject: [R] counting similar strings in data.frame
>
> Dear Members,
>
> is there a better solution to count the amounts of occurrence in a row
> with string data than with loops to get the count data.frame?
>
> test =data.frame("first"=c("seven","two","five","four"),
>   "second"=c("three","one","three","one"),
>   "third"=c("four","two","three","four"),
>   "fourth"=c("four","one","one","four"))
>
>
>
> count =data.frame("double1"=c("four","two","three","NA"),
>   "double2"=c("NA","one","NA","NA"),
>   "triple"=c("NA","NA","NA","one"))
>
>
> double1: first double occurrence in row  (NA if triple available)
> double2: second double occurrence in row (NA if triple available or if
> there is only one double)
> triple: triple occurrence in row (NA if a double available)
>
>
> Kind regards Knut
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] counting similar strings in data.frame

2015-06-26 Thread Knut Krueger


Dear Members,

is there a better solution to count the amounts of occurrence in a row 
with string data than with loops to get the count data.frame?


test =data.frame("first"=c("seven","two","five","four"),
 "second"=c("three","one","three","one"),
 "third"=c("four","two","three","four"),
 "fourth"=c("four","one","one","four"))



count =data.frame("double1"=c("four","two","three","NA"),
 "double2"=c("NA","one","NA","NA"),
 "triple"=c("NA","NA","NA","one"))


double1: first double occurrence in row  (NA if triple available)
double2: second double occurrence in row (NA if triple available or if 
there is only one double)

triple: triple occurrence in row (NA if a double available)


Kind regards Knut

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting consecutive events in R

2015-05-14 Thread Johannes Huesing


I normally use rle() for these problems, see ?rle.

for instance,

k <- rbinom(999, 1, .5)   
series <- function(run) { r <- rle(run)ser <- which(r$lengths > 5 & r$values)  } 
series(k)



returns the indices of consecutive runs that have length 5 or longer.
 


Abhinaba Roy  [Thu, May 14, 2015 at 02:16:31PM CEST]:

Hi,

I have the following dataframe

structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS",
"QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP",
"PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 14:57:15.0.0312",
"2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343",
"2015-04-01 14:57:17.0.2573",
"2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722",
"2015-04-01 14:57:19.0.5265",
"2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839",
"2015-04-01 14:57:25.0.5343",
"2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273",
"2015-04-01 14:58:03.0.0702",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:57:58.0.4134",
"2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126",
"2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126",
"2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134",
"2015-04-01 14:57:07.0.4212",
"2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204",
"2015-04-01 14:57:09.0.6864",
"2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871",
"2015-04-01 14:57:11.0.9360",
"2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251",
"2015-04-01 14:57:14.0.1754",
"2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903",
"2015-04-01 14:57:16.0.4407",
"2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"),
   Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01
14:57:15.0.8346",
   "2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351",
   "2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385",
   "2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858",
   "2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772",
   "2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936",
   "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
   "2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637",
   "2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630",
   "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
   "2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630",
   "2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793",
   "2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435",
   "2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209",
   "2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201",
   "2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364",
   "2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513",
   "2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248",
   "2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162",
   "2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507,
   0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429,
   0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488,
   0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481,
   0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866,
   0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907,
   0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L,
   0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L,
   0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
   3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA,
   NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L,
   1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
   0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L,
   4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
   4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
   4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End",
"Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame",
row.names = c(NA,
-39L))


For each unique value in column 'Type' , I want to check for
consecutive 5 rows (if any) of 'Score' > 0.

Now, if there are five consecutive rows with Score > 0 and 'Type_Desc'
= 0, then we print "Type_low" , else if

'Type_Desc' = 1, we print "Type_h

Re: [R] Counting consecutive events in R

2015-05-14 Thread Sarah Goslee

Assuming I understand the problem correctly, you want to check for
runs of at least length five where both Score and Test_desc assume
particular values. You don't care where they are or what other data
are associated, you just want to know if at least one such run exists
in your data frame.

Here's a function that does that:


checkruns <- function(testdata) {

test1 <- ifelse(testdata$Score > 0 & testdata$Type_Desc == 1 &
!is.na(testdata$Type_Desc), 1, 0)
test0 <- ifelse(testdata$Score > 0 & testdata$Type_Desc == 0 &
!is.na(testdata$Type_Desc), 1, 0)

test1.rle <- rle(test1)
test0.rle <- rle(test0)

if(any(test1.rle$lengths >= 5 & test1.rle$values == 1))
cat("Type_high\n")
if(any(test0.rle$lengths >= 5 & test0.rle$values == 1))
cat("Type_low\n")

invisible()
}

Sarah


On Thu, May 14, 2015 at 8:16 AM, Abhinaba Roy  wrote:
> Hi,
>
> I have the following dataframe
>
> structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS",
> "QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP",
> "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc",
> "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc",
> "QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 
> 14:57:15.0.0312",
> "2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343",
> "2015-04-01 14:57:17.0.2573",
> "2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722",
> "2015-04-01 14:57:19.0.5265",
> "2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839",
> "2015-04-01 14:57:25.0.5343",
> "2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273",
> "2015-04-01 14:58:03.0.0702",
> "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
> "2015-04-01 14:57:58.0.4134",
> "2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126",
> "2015-04-01 14:58:00.0.6630",
> "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
> "2015-04-01 14:58:02.0.9126",
> "2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134",
> "2015-04-01 14:57:07.0.4212",
> "2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204",
> "2015-04-01 14:57:09.0.6864",
> "2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871",
> "2015-04-01 14:57:11.0.9360",
> "2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251",
> "2015-04-01 14:57:14.0.1754",
> "2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903",
> "2015-04-01 14:57:16.0.4407",
> "2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"),
> Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01
> 14:57:15.0.8346",
> "2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351",
> "2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385",
> "2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858",
> "2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772",
> "2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936",
> "2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
> "2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637",
> "2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630",
> "2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
> "2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630",
> "2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793",
> "2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435",
> "2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209",
> "2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201",
> "2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364",
> "2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513",
> "2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248",
> "2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162",
> "2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507,
> 0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429,
> 0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488,
> 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481,
> 0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866,
> 0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907,
> 0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L,
> 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L,
> 0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA,
> NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L,
> 1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End",
> "Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame",
> row.names = c(NA,
> -39L))
>
>
> For each unique value in column 'Type' , I want to check for

[R] Counting consecutive events in R

2015-05-14 Thread Abhinaba Roy

Hi,

I have the following dataframe

structure(list(Type = c("QRS", "QRS", "QRS", "QRS", "QRS", "QRS",
"QRS", "QRS", "QRS", "QRS", "QRS", "QRS", "RR", "RR", "RR", "PP",
"PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc", "QTc",
"QTc", "QTc", "QTc", "QTc"), Time_Point_Start = c("2015-04-01 14:57:15.0.0312",
"2015-04-01 14:57:15.0.7839", "2015-04-01 14:57:16.0.5343",
"2015-04-01 14:57:17.0.2573",
"2015-04-01 14:57:18.0.0234", "2015-04-01 14:57:18.0.7722",
"2015-04-01 14:57:19.0.5265",
"2015-04-01 14:57:24.0.0195", "2015-04-01 14:57:24.0.7839",
"2015-04-01 14:57:25.0.5343",
"2015-04-01 14:57:26.0.2768", "2015-04-01 14:57:27.0.0273",
"2015-04-01 14:58:03.0.0702",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:57:58.0.4134",
"2015-04-01 14:57:59.0.1637", "2015-04-01 14:57:59.0.9126",
"2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126",
"2015-04-01 14:58:03.0.6630", "2015-04-01 14:58:04.0.4134",
"2015-04-01 14:57:07.0.4212",
"2015-04-01 14:57:08.0.1715", "2015-04-01 14:57:08.0.9204",
"2015-04-01 14:57:09.0.6864",
"2015-04-01 14:57:10.0.4368", "2015-04-01 14:57:11.0.1871",
"2015-04-01 14:57:11.0.9360",
"2015-04-01 14:57:12.0.6591", "2015-04-01 14:57:13.0.4251",
"2015-04-01 14:57:14.0.1754",
"2015-04-01 14:57:14.0.9243", "2015-04-01 14:57:15.0.6903",
"2015-04-01 14:57:16.0.4407",
"2015-04-01 14:57:17.0.1676", "2015-04-01 14:57:17.0.9321"),
Time_Point_End = c("2015-04-01 14:57:15.0.0858", "2015-04-01
14:57:15.0.8346",
"2015-04-01 14:57:16.0.6006", "2015-04-01 14:57:17.0.0351",
"2015-04-01 14:57:18.0.1403", "2015-04-01 14:57:18.0.8385",
"2015-04-01 14:57:19.0.5889", "2015-04-01 14:57:24.0.0858",
"2015-04-01 14:57:24.0.8346", "2015-04-01 14:57:25.0.5772",
"2015-04-01 14:57:26.0.3939", "2015-04-01 14:57:27.0.0936",
"2015-04-01 14:58:03.0.8190", "2015-04-01 14:58:04.0.5694",
"2015-04-01 14:58:05.0.3197", "2015-04-01 14:57:59.0.1637",
"2015-04-01 14:57:59.0.9126", "2015-04-01 14:58:00.0.6630",
"2015-04-01 14:58:01.0.4134", "2015-04-01 14:58:02.0.1637",
"2015-04-01 14:58:02.0.9126", "2015-04-01 14:58:03.0.6630",
"2015-04-01 14:58:04.0.4134", "2015-04-01 14:58:05.0.1793",
"2015-04-01 14:57:07.0.8775", "2015-04-01 14:57:08.0.6435",
"2015-04-01 14:57:09.0.3705", "2015-04-01 14:57:10.0.1209",
"2015-04-01 14:57:10.0.8697", "2015-04-01 14:57:11.0.6201",
"2015-04-01 14:57:12.0.3861", "2015-04-01 14:57:13.0.1364",
"2015-04-01 14:57:13.0.8853", "2015-04-01 14:57:14.0.6513",
"2015-04-01 14:57:15.0.4017", "2015-04-01 14:57:16.0.1248",
"2015-04-01 14:57:16.0.9165", "2015-04-01 14:57:17.0.6162",
"2015-04-01 14:57:18.0.3900"), Value = c(0.0546, 0.0507,
0.0663, 0.0936, 0.117, 0.0663, 0.0624, 0.0663, 0.0507, 0.0429,
0.117, 0.0663, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7488,
0.7488, 0.7488, 0.7488, 0.7488, 0.7488, 0.7644, 0.033103481,
0.034056449, 0.032367699, 0.031000613, 0.031405867, 0.031241866,
0.032367699, 0.034337907, 0.033125921, 0.034337907, 0.034337907,
0.031241866, 0.034337907, 0.032367699, 0.032930616), Score = c(0L,
0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L,
0L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), Type_Desc = c(NA, NA, NA,
NA, 1L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L,
1L, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Pat_id = c(4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L)), .Names = c("Type", "Time_Point_Start", "Time_Point_End",
"Value", "Score", "Type_Desc", "Pat_id"), class = "data.frame",
row.names = c(NA,
-39L))


For each unique value in column 'Type' , I want to check for
consecutive 5 rows (if any) of 'Score' > 0.

Now, if there are five consecutive rows with Score > 0 and 'Type_Desc'
= 0, then we print "Type_low" , else if

'Type_Desc' = 1, we print "Type_high". The search should end once 5
consecutive rows have been found.

So, for this data frame we will have two statements as follows,


1.PP_high

(reason - consecutive 5 rows of score > 0 and

'Type_Desc' = 1 )

2.QTc_low
(reason - consecutive 5 rows of score > 0 and

'Type_Desc' = 0 )

How can this problem tackled in R?

Thanks,

Abhinaba

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting Words

2015-01-22 Thread bgnumis bgnum

That' s perfect. Many thanks forma your appreciated help.
El 22/01/2015 19:50, "Chel Hee Lee"  escribió:

> > x <- c("hola mundo mundo");
> > table(unlist(strsplit(x, " ")))
>
>  hola mundo
> 1 2
> >
>
> Is this what you are looking for?  I hope this helps.
>
> Chel Hee Lee
>
> On 1/22/2015 8:25 AM, bgnumis bgnum wrote:
>
>> Hi all,
>>
>> I want to cout the different words in a text.
>>
>> You see if the text is: "hola mundo mundo" the program will count:
>>
>> hola 1
>> mundo 2
>>
>> Is posible that Cran r have a similar function?
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting Words

2015-01-22 Thread MacQueen, Don

In addition to the other suggestions, which are fine for your simple
example, I would take a trip to the CRAN Task View "Natural Language
Processing", and see if there's anything there.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062

On 1/22/15, 6:25 AM, "bgnumis bgnum"  wrote:

>Hi all,
>
>I want to cout the different words in a text.
>
>You see if the text is: "hola mundo mundo" the program will count:
>
>hola 1
>mundo 2
>
>Is posible that Cran r have a similar function?
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting Words

2015-01-22 Thread Ista Zahn

table(strsplit("hola mundo mundo", " ")[[1]])

On Thu, Jan 22, 2015 at 9:25 AM, bgnumis bgnum  wrote:
> Hi all,
>
> I want to cout the different words in a text.
>
> You see if the text is: "hola mundo mundo" the program will count:
>
> hola 1
> mundo 2
>
> Is posible that Cran r have a similar function?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting Words

2015-01-22 Thread Chel Hee Lee

> x <- c("hola mundo mundo");
> table(unlist(strsplit(x, " ")))

 hola mundo
1 2
>

Is this what you are looking for?  I hope this helps.

Chel Hee Lee

On 1/22/2015 8:25 AM, bgnumis bgnum wrote:

Hi all,

I want to cout the different words in a text.

You see if the text is: "hola mundo mundo" the program will count:

hola 1
mundo 2

Is posible that Cran r have a similar function?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting Words

2015-01-22 Thread bgnumis bgnum

Hi all,

I want to cout the different words in a text.

You see if the text is: "hola mundo mundo" the program will count:

hola 1
mundo 2

Is posible that Cran r have a similar function?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting sets of consecutive integers in a vector

2015-01-04 Thread Mike Miller

Thanks, Peter.  Why not cbind your idea for the first column with my idea 
for the second column and get it done in one line?:


v <- c(1,2,5,6,7,8,25,30,31,32,33)
M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v - 1:length(v) 
)$lengths )
M

 [,1] [,2]
[1,]12
[2,]54
[3,]   251
[4,]   304

I find that pretty appealing and I'll probably stick with it.  It seems 
quite fast.  Here's an example:


# make fairly long vector
v <- sort(unique(round(10*runif(10
length(v)
[1] 63274

# time the procedure:
ptm <- proc.time() ; M <- cbind( v[ c(1, which( diff(v) !=1 ) + 1 ) ] , rle( v 
- 1:length(v) )$lengths ) ; proc.time() - ptm
   user  system elapsed
   0.030.000.03

dim(M)
[1] 23212 2

I probably won't be using vectors any longer than that, and this isn't the 
kind of thing that I do over and over again, so that speed is excellent.


Mike



On Mon, 5 Jan 2015, Peter Alspach wrote:


Tena koe Mike

An alternative, which is slightly fast:

 diffv <- diff(v)
 starts <- c(1, which(diffv!=1)+1)
 cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))

Peter Alspach

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller
Sent: Monday, 5 January 2015 1:03 p.m.
To: R-Help List
Subject: [R] counting sets of consecutive integers in a vector

I have a vector of sorted positive integer values (e.g., postive integers after 
applying sort() and unique()).  For example, this:

c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the first 
value in every run of consecutive integer values, and (2) the corresponding 
number of consecutive values.  For example:

c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and
c(1,2,5,6,7,8,25,30,31,32,33) would become

1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v <- c(1,2,5,6,7,8,25,30,31,32,33)
L <- rle( v - 1:length(v) )$lengths
n <- length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

 [,1] [,2]
[1,]12
[2,]54
[3,]   251
[4,]   304

I suppose that works well enough, but there may be a better way, and besides, I 
wouldn't want to deny anyone here the opportunity to solve a fun puzzle.  ;-)

The use for this is that I will be doing repeated seeks of a binary file to 
extract data.  seek() gives the starting point and readBin(n=X) gives the 
number of bytes to read.  So when there are many consecutive variables to be 
read, I can multiply the X in n=X by that number instead of doing many 
different seek() calls.  (The data are in a transposed format where I read in 
every record for some variable as sequential elements.)  I'm probably not the 
first person to deal with this.

Best,

Mike

--
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4J

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be subject to legal 
privilege.
If you are not the intended recipient you must not use, disseminate, distribute 
or
reproduce all or any part of this e-mail or attachments.  If you have received 
this
e-mail in error, please notify the sender and delete all material pertaining to 
this
e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
sender and may not represent those of The New Zealand Institute for Plant and
Food Research Limited.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting sets of consecutive integers in a vector

2015-01-04 Thread jim holtman

Here is a solution using data.table

> require(data.table)
> x <- data.table(v, diff = cumsum(c(1, diff(v)) != 1))
> x
 v diff
 1:  10
 2:  20
 3:  51
 4:  61
 5:  71
 6:  81
 7: 252
 8: 303
 9: 313
10: 323
11: 333
> x[, list(value = v[1L], length = .N), key = 'diff']
   diff value length
1:0 1  2
2:1 5  4
3:225  1
4:330  4
> x[, list(value = v[1L], length = .N), key = 'diff'][, -1, with = FALSE]
# get rid of 'diff' column
   value length
1: 1  2
2: 5  4
3:25  1
4:30  4


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Jan 4, 2015 at 7:03 PM, Mike Miller  wrote:

> I have a vector of sorted positive integer values (e.g., postive integers
> after applying sort() and unique()).  For example, this:
>
> c(1,2,5,6,7,8,25,30,31,32,33)
>
> I want to make a matrix from that vector that has two columns: (1) the
> first value in every run of consecutive integer values, and (2) the
> corresponding number of consecutive values.  For example:
>
> c(1:20) would become this...
>
> 1  20
>
> ...because there are 20 consecutive integers beginning with 1 and
> c(1,2,5,6,7,8,25,30,31,32,33) would become
>
> 1  2
> 5  4
> 25 1
> 30 4
>
> What would be the best way to accomplish this?  Here is my first effort:
>
> v <- c(1,2,5,6,7,8,25,30,31,32,33)
> L <- rle( v - 1:length(v) )$lengths
> n <- length( L )
> matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)
>
>  [,1] [,2]
> [1,]12
> [2,]54
> [3,]   251
> [4,]   304
>
> I suppose that works well enough, but there may be a better way, and
> besides, I wouldn't want to deny anyone here the opportunity to solve a fun
> puzzle.  ;-)
>
> The use for this is that I will be doing repeated seeks of a binary file
> to extract data.  seek() gives the starting point and readBin(n=X) gives
> the number of bytes to read.  So when there are many consecutive variables
> to be read, I can multiply the X in n=X by that number instead of doing
> many different seek() calls.  (The data are in a transposed format where I
> read in every record for some variable as sequential elements.)  I'm
> probably not the first person to deal with this.
>
> Best,
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> University of Minnesota
> http://scholar.google.com/citations?user=EV_phq4J
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting sets of consecutive integers in a vector

2015-01-04 Thread jim holtman

Here is another approach:

> v <- c(1,2,5,6,7,8,25,30,31,32,33)
>
> # split by differences != 1
> t(sapply(split(v, cumsum(c(1, diff(v)) != 1)), function(x){
+ c(value = x[1L], length = length(x))  # output first value and length
+ }))
  value length
0 1  2
1 5  4
225  1
330  4



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Jan 4, 2015 at 8:27 PM, Peter Alspach <
peter.alsp...@plantandfood.co.nz> wrote:

> Tena koe Mike
>
> An alternative, which is slightly fast:
>
>   diffv <- diff(v)
>   starts <- c(1, which(diffv!=1)+1)
>   cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))
>
> Peter Alspach
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike
> Miller
> Sent: Monday, 5 January 2015 1:03 p.m.
> To: R-Help List
> Subject: [R] counting sets of consecutive integers in a vector
>
> I have a vector of sorted positive integer values (e.g., postive integers
> after applying sort() and unique()).  For example, this:
>
> c(1,2,5,6,7,8,25,30,31,32,33)
>
> I want to make a matrix from that vector that has two columns: (1) the
> first value in every run of consecutive integer values, and (2) the
> corresponding number of consecutive values.  For example:
>
> c(1:20) would become this...
>
> 1  20
>
> ...because there are 20 consecutive integers beginning with 1 and
> c(1,2,5,6,7,8,25,30,31,32,33) would become
>
> 1  2
> 5  4
> 25 1
> 30 4
>
> What would be the best way to accomplish this?  Here is my first effort:
>
> v <- c(1,2,5,6,7,8,25,30,31,32,33)
> L <- rle( v - 1:length(v) )$lengths
> n <- length( L )
> matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)
>
>   [,1] [,2]
> [1,]12
> [2,]54
> [3,]   251
> [4,]   304
>
> I suppose that works well enough, but there may be a better way, and
> besides, I wouldn't want to deny anyone here the opportunity to solve a fun
> puzzle.  ;-)
>
> The use for this is that I will be doing repeated seeks of a binary file
> to extract data.  seek() gives the starting point and readBin(n=X) gives
> the number of bytes to read.  So when there are many consecutive variables
> to be read, I can multiply the X in n=X by that number instead of doing
> many different seek() calls.  (The data are in a transposed format where I
> read in every record for some variable as sequential elements.)  I'm
> probably not the first person to deal with this.
>
> Best,
>
> Mike
>
> --
> Michael B. Miller, Ph.D.
> University of Minnesota
> http://scholar.google.com/citations?user=EV_phq4J
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> The contents of this e-mail are confidential and may be ...{{dropped:14}}
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting sets of consecutive integers in a vector

2015-01-04 Thread Peter Alspach

Tena koe Mike

An alternative, which is slightly fast:

  diffv <- diff(v)
  starts <- c(1, which(diffv!=1)+1)
  cbind(v[starts], c(diff(starts), length(v)-starts[length(starts)]+1))

Peter Alspach

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mike Miller
Sent: Monday, 5 January 2015 1:03 p.m.
To: R-Help List
Subject: [R] counting sets of consecutive integers in a vector

I have a vector of sorted positive integer values (e.g., postive integers after 
applying sort() and unique()).  For example, this:

c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the first 
value in every run of consecutive integer values, and (2) the corresponding 
number of consecutive values.  For example:

c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and
c(1,2,5,6,7,8,25,30,31,32,33) would become

1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v <- c(1,2,5,6,7,8,25,30,31,32,33)
L <- rle( v - 1:length(v) )$lengths
n <- length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

  [,1] [,2]
[1,]12
[2,]54
[3,]   251
[4,]   304

I suppose that works well enough, but there may be a better way, and besides, I 
wouldn't want to deny anyone here the opportunity to solve a fun puzzle.  ;-)

The use for this is that I will be doing repeated seeks of a binary file to 
extract data.  seek() gives the starting point and readBin(n=X) gives the 
number of bytes to read.  So when there are many consecutive variables to be 
read, I can multiply the X in n=X by that number instead of doing many 
different seek() calls.  (The data are in a transposed format where I read in 
every record for some variable as sequential elements.)  I'm probably not the 
first person to deal with this.

Best,

Mike

-- 
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4J

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The contents of this e-mail are confidential and may be ...{{dropped:14}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] counting sets of consecutive integers in a vector

2015-01-04 Thread Mike Miller

I have a vector of sorted positive integer values (e.g., postive integers 
after applying sort() and unique()).  For example, this:


c(1,2,5,6,7,8,25,30,31,32,33)

I want to make a matrix from that vector that has two columns: (1) the 
first value in every run of consecutive integer values, and (2) the 
corresponding number of consecutive values.  For example:


c(1:20) would become this...

1  20

...because there are 20 consecutive integers beginning with 1 and 
c(1,2,5,6,7,8,25,30,31,32,33) would become


1  2
5  4
25 1
30 4

What would be the best way to accomplish this?  Here is my first effort:

v <- c(1,2,5,6,7,8,25,30,31,32,33)
L <- rle( v - 1:length(v) )$lengths
n <- length( L )
matrix( c( v[ c( 1, cumsum(L)+1 ) ][1:n], L), nrow=n)

 [,1] [,2]
[1,]12
[2,]54
[3,]   251
[4,]   304

I suppose that works well enough, but there may be a better way, and 
besides, I wouldn't want to deny anyone here the opportunity to solve a 
fun puzzle.  ;-)


The use for this is that I will be doing repeated seeks of a binary file 
to extract data.  seek() gives the starting point and readBin(n=X) gives 
the number of bytes to read.  So when there are many consecutive variables 
to be read, I can multiply the X in n=X by that number instead of doing 
many different seek() calls.  (The data are in a transposed format where I 
read in every record for some variable as sequential elements.)  I'm 
probably not the first person to deal with this.


Best,

Mike

--
Michael B. Miller, Ph.D.
University of Minnesota
http://scholar.google.com/citations?user=EV_phq4J

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting within groups / means by groups

2014-11-10 Thread David L Carlson

In addition to Jeff's recommendation, you need to read a basic introduction to 
R. Your data frame is probably not what you think it is:

> group<-c("A", "A", "A", "B", "B", "B", "B", "C")
> value<-c(1,3,2,2,2,4,4,1)
> df<-as.data.frame(cbind(group, value))
> str(df)
'data.frame':   8 obs. of  2 variables:
 $ group: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 2 3
 $ value: Factor w/ 4 levels "1","2","3","4": 1 3 2 2 2 4 4 1

By using cbind() you combined a character vector and a numeric vector into a 
matrix so R converted the numeric value to characters since a matrix can hold 
only a single data type. The cbind() function is generic and which version you 
get depends on the first argument.
> cbind(group, value)
 group value
[1,] "A"   "1"  
[2,] "A"   "3"  
[3,] "A"   "2"  
[4,] "B"   "2"  
[5,] "B"   "2"  
[6,] "B"   "4"  
[7,] "B"   "4"  
[8,] "C"   "1"  

Then you used as.data.frame() to convert the character matrix to a data.frame. 
The default for character variables is to convert those to factors. All you 
need is
> dfa <- data.frame(group, value)
> str(dfa)
'data.frame':   8 obs. of  2 variables:
 $ group: Factor w/ 3 levels "A","B","C": 1 1 1 2 2 2 2 3
 $ value: num  1 3 2 2 2 4 4 1

I changed df to dfa since df() is the density function for the f distribution. 
R is not likely to get confused, but you might.

Then read the manual page on ave() to see why these work and how to adapt them:

> ave(dfa$value, dfa$group, FUN=length)
[1] 3 3 3 4 4 4 4 1
> ave(dfa$value, dfa$group)
[1] 2 2 2 3 3 3 3 1

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jeff Newmiller
Sent: Monday, November 10, 2014 9:19 AM
To: stude...@gmail.com; r-help@r-project.org
Subject: Re: [R] Counting within groups / means by groups

Help file ?ave should apply here.

Please read the Posting Guide mentioned in the footer of every email on this 
list and on the list manager page for this mailing list. It warns you to read 
the archives before posting and to post in plain text format rather than HTML 
format.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

On November 10, 2014 6:39:47 AM PST, David Studer  wrote:
>Hi everyone!
>
>I have problems finding a solution to the following two problems:
>
>My sample-dataframe consists of two variables "group" and "value":
>
>group<-c("A", "A", "A", "B", "B", "B", "B", "C")
>value<-c(1,3,2,2,2,4,4,1)
>df<-as.data.frame(cbind(group, value))
>
>Problem 1:
>**
>
>Now I'd like to count the number of group-A-cases, group-B-cases etc
>and
>write
>this number into a new column. It should be like:
>
>count_group<-c(3, 3, 3, 4, 4, 4, 4, 1)
>
>Problem 2:
>***
>
>I'd like to add new column with the mean values (or any other function)
>within
>my groups. E.g:
>
>Group A: (1+3+2)/3=2
>Group B: (2+2+4+4)/4=3
>Group C: =1
>
>Now I'd add another column 2 2 3 3 3 3 1
>
>
>Can anyone help me, how this can be done best?
>
>Thank you!
>David
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting within groups / means by groups

2014-11-10 Thread Jeff Newmiller

Help file ?ave should apply here.

Please read the Posting Guide mentioned in the footer of every email on this 
list and on the list manager page for this mailing list. It warns you to read 
the archives before posting and to post in plain text format rather than HTML 
format.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On November 10, 2014 6:39:47 AM PST, David Studer  wrote:
>Hi everyone!
>
>I have problems finding a solution to the following two problems:
>
>My sample-dataframe consists of two variables "group" and "value":
>
>group<-c("A", "A", "A", "B", "B", "B", "B", "C")
>value<-c(1,3,2,2,2,4,4,1)
>df<-as.data.frame(cbind(group, value))
>
>Problem 1:
>**
>
>Now I'd like to count the number of group-A-cases, group-B-cases etc
>and
>write
>this number into a new column. It should be like:
>
>count_group<-c(3, 3, 3, 4, 4, 4, 4, 1)
>
>Problem 2:
>***
>
>I'd like to add new column with the mean values (or any other function)
>within
>my groups. E.g:
>
>Group A: (1+3+2)/3=2
>Group B: (2+2+4+4)/4=3
>Group C: =1
>
>Now I'd add another column 2 2 3 3 3 3 1
>
>
>Can anyone help me, how this can be done best?
>
>Thank you!
>David
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting within groups / means by groups

2014-11-10 Thread David Studer

Hi everyone!

I have problems finding a solution to the following two problems:

My sample-dataframe consists of two variables "group" and "value":

group<-c("A", "A", "A", "B", "B", "B", "B", "C")
value<-c(1,3,2,2,2,4,4,1)
df<-as.data.frame(cbind(group, value))

Problem 1:
**

Now I'd like to count the number of group-A-cases, group-B-cases etc and
write
this number into a new column. It should be like:

count_group<-c(3, 3, 3, 4, 4, 4, 4, 1)

Problem 2:
***

I'd like to add new column with the mean values (or any other function)
within
my groups. E.g:

Group A: (1+3+2)/3=2
Group B: (2+2+4+4)/4=3
Group C: =1

Now I'd add another column 2 2 3 3 3 3 1


Can anyone help me, how this can be done best?

Thank you!
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting the number of rows that satisfy a certain criteria

2014-06-21 Thread arun

Hi,
Try:
set.seed(42)
 X <- as.data.frame(matrix(sample(0:1, 4*50,replace=TRUE), ncol=4))
 table(X[1:2])[4]
#[1] 15

sum(rowSums(X[1:2])==2)
#[1] 15
A.K.




On Saturday, June 21, 2014 10:59 AM, Kate Ignatius  
wrote:
I have 4 columns, and about 300K plus rows with 0s and 1s.

I'm trying to count how many rows satisfy a certain criteria... for
instance, how many rows are there that have the first column == 1 as
well as the second column == 1.

I've tried using rowSums and colSums but it keeps giving me this type of error:

Error in rowSums(X[1] == 1 & X[2] == 1) :
  'x' must be an array of at least two dimensions

Thanks in advance!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting the number of rows that satisfy a certain criteria

2014-06-21 Thread Kate Ignatius

Thanks!

On Sat, Jun 21, 2014 at 11:05 AM, Jorge I Velez
 wrote:
> Hi Kate,
>
> You could try
>
> sum(X[, 1] == 1 &  X[, 2] == 1)
>
> where X is your data set.
>
> HTH,
> Jorge.-
>
>
>
> On Sun, Jun 22, 2014 at 12:57 AM, Kate Ignatius 
> wrote:
>>
>> I have 4 columns, and about 300K plus rows with 0s and 1s.
>>
>> I'm trying to count how many rows satisfy a certain criteria... for
>> instance, how many rows are there that have the first column == 1 as
>> well as the second column == 1.
>>
>> I've tried using rowSums and colSums but it keeps giving me this type of
>> error:
>>
>> Error in rowSums(X[1] == 1 & X[2] == 1) :
>>   'x' must be an array of at least two dimensions
>>
>> Thanks in advance!
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting the number of rows that satisfy a certain criteria

2014-06-21 Thread Jorge I Velez

Hi Kate,

You could try

sum(X[, 1] == 1 &  X[, 2] == 1)

where X is your data set.

HTH,
Jorge.-



On Sun, Jun 22, 2014 at 12:57 AM, Kate Ignatius 
wrote:

> I have 4 columns, and about 300K plus rows with 0s and 1s.
>
> I'm trying to count how many rows satisfy a certain criteria... for
> instance, how many rows are there that have the first column == 1 as
> well as the second column == 1.
>
> I've tried using rowSums and colSums but it keeps giving me this type of
> error:
>
> Error in rowSums(X[1] == 1 & X[2] == 1) :
>   'x' must be an array of at least two dimensions
>
> Thanks in advance!
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] counting the number of rows that satisfy a certain criteria

2014-06-21 Thread Kate Ignatius

I have 4 columns, and about 300K plus rows with 0s and 1s.

I'm trying to count how many rows satisfy a certain criteria... for
instance, how many rows are there that have the first column == 1 as
well as the second column == 1.

I've tried using rowSums and colSums but it keeps giving me this type of error:

Error in rowSums(X[1] == 1 & X[2] == 1) :
  'x' must be an array of at least two dimensions

Thanks in advance!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting number of days my program runs

2014-05-15 Thread Ashis Deb

Hi  all ,


   I  have  a  package   and   i  want  to  count  the  1st
execution  day of the  package   till   30   days   afterwards ?

I  hope  I  am clear  with this question .

Please  reply if  you have anything to  share .

Thanks

ASHIS

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting words that are contained in a list

2014-02-15 Thread arun

Hi,

May be this helps:

vec1 <- c("victory","happiness","medal","war","service","ribbon", "dates")

vec2 <- c("The World War II Victory Medal was first issued as a service ribbon 
referred to as the Victory Ribbon.", "By 1946, a full medal had been 
established which was referred to as the World War II Victory Medal.", "The 
medal commemorates military service during World War II and is awarded to any 
member of the United States military, including members of the armed forces of 
the Government of the Philippine Islands, who served on active duty, or as a 
reservist, between December 7, 1941 and December 31, 1946","This is awarded for 
service between 7 December 1941 and 31 December 1946, both dates inclusive")
 res <-  
sort(table(factor(unlist(regmatches(tolower(vec2),gregexpr(paste(vec1,collapse="|"),vec2,ignore.case=TRUE))),levels=vec1)),decreasing=TRUE)
res
 # war medal   victory   service    ribbon dates happiness 
 #   5 4 3 3 2 1 0 
res[1:5]


A.K.



Hi guys! 

I have a vector with a list of words e.g c("victory","happines"). 

I have a vector of sentence e.g. In "WWII the victory was achived by allied 
forces". 

As word victory is in my list, victory has a frequency of 1, happines 0. 

At the end I wolud like to get 5 most frequent words from my list that appear 
in sentences. 

Can you help me. 

Uros

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting matched elements in two vectors

2014-01-23 Thread Hervé Pagès

On 01/23/2014 04:49 PM, Hervé Pagès wrote:

Hi Mintewab,

With the IRanges packages (from Bioconductor):

   > library(IRanges)
   > countMatches(z, w)
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 3 1 1 0 1 0 0 0 0 0 0 1 3
2 0 0 1 0 0
   [39] 0 0 0 0 0 0 0 0

And if you don't want to depend on IRanges for such a simple operation,
here how countMatches() is implemented:

  countMatches <- function(x, table)
  {
  table2 <- match(table, x)
  x2 <- match(x, x)
  tabulate(table2, nbins=length(x))[x2]
  }

Cheers,
H.

To install the IRanges package:

   source("http://bioconductor.org/biocLite.R";)
   biocLite("IRanges")

Cheers,
H.

On 01/23/2014 07:43 AM, m.beza...@lse.ac.uk wrote:

Hi all,
I have the following reproducible example

z<-c(-5:40)
w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30)
  r<-z %in% w

now r gives me the presence or absence of elements in z that are in w
but I am interested in getting the number of times each element in z
appears (or doesn't appear)  in w. I want the dimension of my
resulting vector to be the same as that of z. How do I do that?

  Thanks in advance
  Mintewab

Please access the attached hyperlink for an important electronic
communications disclaimer: http://lse.ac.uk/emailDisclaimer
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting matched elements in two vectors

2014-01-23 Thread Peter Langfelder

Here's a solution:

# This gives a vector of counts (if z is a data frame, first convert
it to a matrix)
res = sapply(as.vector(z), function(x) sum(w==x))
# This copies the dimensions of the variable 'z' to 'res':
dim(res) = dim(z)

Peter

On Thu, Jan 23, 2014 at 7:43 AM,   wrote:
>Hi all,
> I have the following reproducible example
>
> z<-c(-5:40)
> w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30)
>  r<-z %in% w
>
> now r gives me the presence or absence of elements in z that are in w but I 
> am interested in getting the number of times each element in z appears (or 
> doesn't appear)  in w. I want the dimension of my resulting vector to be the 
> same as that of z. How do I do that?
>
>  Thanks in advance
>  Mintewab
>
>
> Please access the attached hyperlink for an important electronic 
> communications disclaimer: http://lse.ac.uk/emailDisclaimer
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting matched elements in two vectors

2014-01-23 Thread Hervé Pagès

Hi Mintewab,

With the IRanges packages (from Bioconductor):

  > library(IRanges)
  > countMatches(z, w)
   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 3 1 1 0 1 0 0 0 0 0 0 1 3 
2 0 0 1 0 0

  [39] 0 0 0 0 0 0 0 0

To install the IRanges package:

  source("http://bioconductor.org/biocLite.R";)
  biocLite("IRanges")

Cheers,
H.

On 01/23/2014 07:43 AM, m.beza...@lse.ac.uk wrote:

Hi all,
I have the following reproducible example

z<-c(-5:40)
w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30)
  r<-z %in% w

now r gives me the presence or absence of elements in z that are in w but I am 
interested in getting the number of times each element in z appears (or doesn't 
appear)  in w. I want the dimension of my resulting vector to be the same as 
that of z. How do I do that?

  Thanks in advance
  Mintewab

Please access the attached hyperlink for an important electronic communications 
disclaimer: http://lse.ac.uk/emailDisclaimer
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting matched elements in two vectors

2014-01-23 Thread Jeff Newmiller

Thank you for the reproducible example, but your description is missing a clear 
definition of what you want.

For example, if your desired output is 
result <- c(rep(0,16),2,1,0,3,1,1,0,1,0,0,0,0,0,0,1,3,2,0,0,1,rep(0,10))

then one answer might be

as.vector(table(factor(w,levels=z)))

---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

m.beza...@lse.ac.uk wrote:
>   Hi all,
>I have the following reproducible example
>
>z<-c(-5:40)
>w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30)
> r<-z %in% w
>
>now r gives me the presence or absence of elements in z that are in w
>but I am interested in getting the number of times each element in z
>appears (or doesn't appear)  in w. I want the dimension of my resulting
>vector to be the same as that of z. How do I do that?
>
> Thanks in advance
> Mintewab
>
>
>Please access the attached hyperlink for an important electronic
>communications disclaimer: http://lse.ac.uk/emailDisclaimer
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting matches in two vectors

2014-01-23 Thread M.Bezabih

Many thanks, Arun. 
Res 1 is exactly what I wanted. 
Mintewab

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of arun
Sent: 23 January 2014 16:27
To: R help
Subject: Re: [R] counting matches in two vectors

Hi,
May be this helps:
 z1 <- factor(z)
res1 <- table(z1[cut(w,breaks=c(-Inf,z,Inf),labels=F)])
res1
#
#-5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
# 0  0  0  0  0  0  0  0  0  0  2  1  0  3  1  1  0  1  0  0  0  0  0  0  1  3
#21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 # 2  0  0  1  0  0 
 0  0  0  0  0  0  0  0  0  0  0  0  0  0 #or
 res2 <- table(z1[findInterval(w,z)])
 identical(res1,res2)
#[1] TRUE

A.K.

Hi all,
I have the following reproducible example 

z<-c(-5:40)
w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30)
 r<-z %in% w 

now r gives me the presence or absence of elements in z that are  in w but I am 
interested in getting the number of times each element in  z appears (or 
doesn't appear)  in w. I want the dimension of my resulting vector to be the 
same as that of z. How do I do that? 

 Thanks in advance
 Mintewab

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Please access the attached hyperlink for an important electronic communications 
disclaimer: http://lse.ac.uk/emailDisclaimer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting matches in two vectors

2014-01-23 Thread arun

Also,


 res3 <- table(z1[match(w,z1)])
 identical(res3,res1)
#[1] TRUE

A.K.




On Thursday, January 23, 2014 11:26 AM, arun  wrote:
Hi,
May be this helps:
 z1 <- factor(z)
res1 <- table(z1[cut(w,breaks=c(-Inf,z,Inf),labels=F)])
res1
#
#-5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
# 0  0  0  0  0  0  0  0  0  0  2  1  0  3  1  1  0  1  0  0  0  0  0  0  1  3 
#21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 
# 2  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
#or
 res2 <- table(z1[findInterval(w,z)])
 identical(res1,res2)
#[1] TRUE



A.K.


Hi all, 
I have the following reproducible example 

z<-c(-5:40) 
w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) 
 r<-z %in% w 

now r gives me the presence or absence of elements in z that are
in w but I am interested in getting the number of times each element in
z appears (or doesn't appear)  in w. I want the dimension of my 
resulting vector to be the same as that of z. How do I do that? 

 Thanks in advance 
 Mintewab

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting matches in two vectors

2014-01-23 Thread arun

Hi,
May be this helps:
 z1 <- factor(z)
res1 <- table(z1[cut(w,breaks=c(-Inf,z,Inf),labels=F)])
res1
#
#-5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
# 0  0  0  0  0  0  0  0  0  0  2  1  0  3  1  1  0  1  0  0  0  0  0  0  1  3 
#21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 
# 2  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
#or
 res2 <- table(z1[findInterval(w,z)])
 identical(res1,res2)
#[1] TRUE



A.K.


Hi all, 
I have the following reproducible example 

z<-c(-5:40) 
w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30) 
 r<-z %in% w 

now r gives me the presence or absence of elements in z that are
 in w but I am interested in getting the number of times each element in
 z appears (or doesn't appear)  in w. I want the dimension of my 
resulting vector to be the same as that of z. How do I do that? 

 Thanks in advance 
 Mintewab

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] counting matched elements in two vectors

2014-01-23 Thread M.Bezabih

   Hi all,
I have the following reproducible example

z<-c(-5:40)
w<-c(11, 11, 12, 14, 14, 14, 15, 16, 18, 25, 26, 26, 26, 27, 27, 30)
 r<-z %in% w

now r gives me the presence or absence of elements in z that are in w but I am 
interested in getting the number of times each element in z appears (or doesn't 
appear)  in w. I want the dimension of my resulting vector to be the same as 
that of z. How do I do that?

 Thanks in advance
 Mintewab


Please access the attached hyperlink for an important electronic communications 
disclaimer: http://lse.ac.uk/emailDisclaimer
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting variables repeted in dataframe columns to create a presence-absence table

2013-11-28 Thread arun

Hi,
Try:
data_m <- read.table(text="Abortusovis07918 Agona08561 Anatum08125 Arizonae65S 
Braenderup08488
1  S5305B_IGR S5305B_IGR  S5305B_IGR  S5305B_IGR S5305B_IGR
2  S5305A_IGR S5300A_IGR  S5305A_IGR  S5300A_IGR S5300A_IGR
3  S5300A_IGR S5300B_IGR  S5300A_IGR  S5300B_IGR S5300B_IGR
4  S5300B_IGR S5299B_IGR  S5300B_IGR  S5299B_IGR S5299B_IGR
5  S5299B_IGR S5299A_IGR  S5299B_IGR  S5829B_IGR 
S5299A_IGR",sep="",header=TRUE,stringsAsFactors=FALSE)
 data_m$new <-1
library(reshape2)
 dM <- melt(data_m,id.vars="new")
xtabs(new~value+variable,dM)
#or
 dcast(dM,value~variable,value.var="new",fill=0)

A.K.

On Thursday, November 28, 2013 12:18 PM, Gmail  wrote:
Hi!

I'm new in R and I'm writing you asking for some guidance. I had 
analyzed a comparative genomic microarray data of /56 Salmonella/ 
strains to identify absent genes in each of the serovars, and finally I 
got a matrix that looks like that:

> data[1:5,1:5]
   Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488
1       S5305B_IGR S5305B_IGR  S5305B_IGR  S5305B_IGR S5305B_IGR
2       S5305A_IGR S5300A_IGR  S5305A_IGR  S5300A_IGR S5300A_IGR
3       S5300A_IGR S5300B_IGR  S5300A_IGR  S5300B_IGR S5300B_IGR
4       S5300B_IGR S5299B_IGR  S5300B_IGR  S5299B_IGR S5299B_IGR
5       S5299B_IGR S5299A_IGR  S5299B_IGR  S5829B_IGR S5299A_IGR

The variables corresponds to those genes identified as absent in each of 
the serovars. I would like to create a presence-absence matrix of those 
genes comparing all the serovars at the same time, I assume that should 
not be complicated but I don't know how to do it.

I would like a matrix similar to the next one:

> data_m[1:5,1:5]
               Abortusovis07918 Agona08561 Anatum08125 Arizonae65S 
Braenderup08488
S5305B_IGR          1                1           1        1      1
S5305A_IGR          1                0           1        0     0
S5300A_IGR          1                1           1        1      1

Any help would be welcome, and thank you in advance,

Oihane

-- 

Oihane Irazoki Sanchez
PhD Student, Molecular Microbiology

Genetics and Microbiology Department, Faculty of Biosciences
Autonomous University of Barcelona
08193 Bellaterra (Barcelona), Spain

Telf: 34 - 935 811 665
E-mail: oihane.iraz...@uab.cat / o.iraz...@gmail.com

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting variables repeted in dataframe columns to create a presence-absence table

2013-11-28 Thread Gmail

Hi!

I'm new in R and I'm writing you asking for some guidance. I had 
analyzed a comparative genomic microarray data of /56 Salmonella/ 
strains to identify absent genes in each of the serovars, and finally I 
got a matrix that looks like that:

 > data[1:5,1:5]
   Abortusovis07918 Agona08561 Anatum08125 Arizonae65S Braenderup08488
1   S5305B_IGR S5305B_IGR  S5305B_IGR  S5305B_IGR S5305B_IGR
2   S5305A_IGR S5300A_IGR  S5305A_IGR  S5300A_IGR S5300A_IGR
3   S5300A_IGR S5300B_IGR  S5300A_IGR  S5300B_IGR S5300B_IGR
4   S5300B_IGR S5299B_IGR  S5300B_IGR  S5299B_IGR S5299B_IGR
5   S5299B_IGR S5299A_IGR  S5299B_IGR  S5829B_IGR S5299A_IGR

The variables corresponds to those genes identified as absent in each of 
the serovars. I would like to create a presence-absence matrix of those 
genes comparing all the serovars at the same time, I assume that should 
not be complicated but I don't know how to do it.

I would like a matrix similar to the next one:

 > data_m[1:5,1:5]
   Abortusovis07918 Agona08561 Anatum08125 Arizonae65S 
Braenderup08488
S5305B_IGR  11   11  1
S5305A_IGR  10   10 0
S5300A_IGR  11   11  1

Any help would be welcome, and thank you in advance,

Oihane


-- 

Oihane Irazoki Sanchez
PhD Student, Molecular Microbiology

Genetics and Microbiology Department, Faculty of Biosciences
Autonomous University of Barcelona
08193 Bellaterra (Barcelona), Spain

Telf: 34 - 935 811 665
E-mail: oihane.iraz...@uab.cat / o.iraz...@gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting numbers in R

2013-10-04 Thread Shane Carey

I got sorted,

Thanks all


On Fri, Oct 4, 2013 at 2:03 PM, S Ellison  wrote:

> > I have a set of data and I need to find out how many points are below a
> > certain value but R will not calculate this properly for me.
> R will. But you aren't.
>
> > Negative numbers seem to be causing the issue.
> You haven't got any negative numbers in your data set. In fact, you
> haven't got any numbers. It's all character strings. Is there a reason for
> that?
>
> Assuming there is, if you have your data in a data frame 'A' and just want
> the count:
>
> table(as.numeric(A$Tm_ugL) <= 0.0002)
>
> If you just want a complete vector of TRUE or FALSE
> as.numeric(d$Tm_ugL) <= 0.0002)
>
> does that. If you want to add that to your data frame (is it called A?)
> that looks like
> A$Censored <- as.numeric(d$Tm_ugL) <= 0.0002)
>
> But you really shouldn't have numbers in character format; read it as
> numeric. Then it's just
> table(d$Tm_ugL <= 0.0002) and so on. If it's refusing to read as numeric,
> find out why and fix the data.
>
>
> And some comments on code, while I'm here:
>
> > for (i in one:nrow(A))
> ...
> >   if (A[i,two]<=A_LLD)
> Variables called 'one' and 'two' look like a really bad idea. If they are
> equal to 1 and 2, use 1 and 2 (or 1L and 2L if you want to be _sure_ they
> are integer). If not, the names are going to be pretty confusing, no?
>
> > (A_Censored[i,two]<-"TRUE")
> Why use a character string like "TRUE" that R can't interpret as logical
> instead of the logical values TRUE and FALSE?
>
> S Ellison
>
>
> ***
> This email and any attachments are confidential. Any u...{{dropped:17}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting numbers in R

2013-10-04 Thread S Ellison

> I have a set of data and I need to find out how many points are below a
> certain value but R will not calculate this properly for me. 
R will. But you aren't.

> Negative numbers seem to be causing the issue.
You haven't got any negative numbers in your data set. In fact, you haven't got 
any numbers. It's all character strings. Is there a reason for that?

Assuming there is, if you have your data in a data frame 'A' and just want the 
count:

table(as.numeric(A$Tm_ugL) <= 0.0002)

If you just want a complete vector of TRUE or FALSE
as.numeric(d$Tm_ugL) <= 0.0002)

does that. If you want to add that to your data frame (is it called A?) that 
looks like
A$Censored <- as.numeric(d$Tm_ugL) <= 0.0002)

But you really shouldn't have numbers in character format; read it as numeric. 
Then it's just
table(d$Tm_ugL <= 0.0002) and so on. If it's refusing to read as numeric, find 
out why and fix the data.


And some comments on code, while I'm here: 

> for (i in one:nrow(A))
...
>   if (A[i,two]<=A_LLD)
Variables called 'one' and 'two' look like a really bad idea. If they are equal 
to 1 and 2, use 1 and 2 (or 1L and 2L if you want to be _sure_ they are 
integer). If not, the names are going to be pretty confusing, no? 

> (A_Censored[i,two]<-"TRUE")
Why use a character string like "TRUE" that R can't interpret as logical 
instead of the logical values TRUE and FALSE?

S Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] counting process in survival

2013-05-31 Thread Terry Therneau


It is hard to know exactly what you mean with such a generic question.
If you mean "treat survival as a counting process", then the answer is yes.  The survival 
package in S (which is the direct ancestor of the Splus package, which is the direct 
ancestor of the R package) was the very first to do this.  I created the feature in 1984.


Terry Therneau

On 05/31/2013 05:00 AM, r-help-requ...@r-project.org wrote:

HiI have a question, Is there a package to do counting process in survival 
analysis with R?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting number of consecutive occurrences per rows

2013-05-06 Thread PIKAL Petr

Hi

I slightly modified Jim's code

first part is function to split data frame test according to act, juln and day 
and compute repetitions in each chunk.

fff<- function(x) {
fac <-  factor((x[, "act"]==0)*1+(x[,"act"] == 200)*2, levels=c(1,0,2))
int<-interaction(x[,"juln"], x[,"day"], fac)
res <- cumsum(c(1, abs(diff(as.numeric(int)
res
}

test$fac<-fff(test)

Second part evaluates length of each chunk

test$res <- ave(test$fac, test$fac, FUN=length)

Last part computes max (min, sum) of res in each distinct chunk.

fff2<- function(x) {
fac <-  factor((x[, "act"]==0)*1+(x[,"act"] == 200)*2, levels=c(1,0,2), 
labels=c("0", "1-199", "200"))
fac
}

aggregate(test$res, list(test$juln, test$day), max)
aggregate(test$res, list(test$juln, test$day, fff2(test)), max)

Is it what you want?

Petr

From: zuzana zajkova [mailto:zuzu...@gmail.com]
Sent: Friday, May 03, 2013 7:10 PM
To: PIKAL Petr; jholt...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] Counting number of consecutive occurrences per rows

Hi,
I'm sorry that it takes me so much time to respond, finally yesterday I got 
time to try your suggestions. Thank you for them!
I tried both, they give the same results, but in both there are some things I 
still need to solve. I would appreciate your help.
I include a little bigger dataframe (test2, in the end of this email), with 
more differencies in variables, to be able to better explain what I would like 
to calculate in addition.

Jim's code:
I needed to make some changes in assigning the key. Yours worked ok for that 
small "test" data, but when I tried it on my dataframe which has around 
25000rows, it didn't work properly.

test2$key[test2$act == 0] <- 1
test2$key[test2$act > 0 & test2$act < 200] <- 2
test2$key[test2$act == 200] <- 3
# this works ok
test2$resChange <- cumsum(c(1, abs(diff(test2$key
test2$res <- ave(test2$resChange, test2$resChange, FUN = length)
# I added new column by jul date
test2$resJ <- ave(test2$resChange, test2$resChange, test2$juln, FUN = length)
# this works fine as well, for dividing between day 0 and day 1
test2$resJD <- ave(test2$resChange, test2$resChange, test2$juln, test2$day, FUN 
= length)
# resume
test2Resume <- test2[ , list(maxres = max(res)
   , minres = min(res)
   , sumres = length(unique(resChange)))
   , keyby = c('day', 'key')]
# change 'key'
 test2Resume_day$key <- c('0', '1-199', '200')[test2Resume_day$key]
 test2Resume_day
   day   key maxres minres sumres
1:   0 0  2  2  3
2:   0 1-199  3  1  9
3:   0   200  6  1  7
4:   1 0  1  1  1
5:   1 1-199 10  1  7
6:   1   200  6  1  6
# resume by juln
 test2Resume_jul <- test2[ , list(maxres = max(res)
   , minres = min(res)
  , sumres = length(unique(resChange)))
  , keyby = c('juln', 'key')]  # by juln
 # change 'key'
 test2Resume_jul$key <- c('0', '1-199', '200')[test2Resume_jul$key]
 test2Resume_jul
juln   key maxres minres sumres
1: 15173 0  2  2  1
2: 15173 1-199  3  1  7
3: 15173   200  6  1  6
4: 15174 0  2  1  3
5: 15174 1-199 10  1  8
6: 15174   200  6  1  6
It is ok, but what I would like to get is resume for juln and for  variable day 
(0 and 1) aswell.
Like this:
juln   day  key   maxres   minressumres
15173   00
15173   01-199
15173   0200
15173   10
15173   11-199
15173   1200
15174  0 0
15174  0 1-199
15174  0 200
15174  1 0
15174  1 1-199
15174  1 200
...
The other thing is that the "sumres" I would like to calculate like a sum of 
values of occurencies for each "key".
For example, if in the test2 dataframe res values for key 200 (juln 15173) are 
1, 1, 2,2,1,2 the sumres should be 9 (1+1+2+2+1+2), not 6 (which I suppose come 
form sum of number of unique occurencies).

Petr's code:
This works fine also, the thing is that doing the aggregation I would need the 
intervals to be like this
[0, 1)
[1, 199]
(199, 200]
what I don't know if is possible... I checked the hepl for cut, but I found 
that it can be closed just right or left...

Thank you very much for your time and sharing your knowledge!
Zuzana

## here is the bigger test2 dataframe
> dput(test2)
structure(list(daten = structure(c(15173, 15173, 15173, 15173,
15173, 15173, 15173, 1

Re: [R] Counting number of consecutive occurrences per rows

2013-05-03 Thread zuzana zajkova

ot;win", "win", "win", "win", "win", "win",
"win", "win", "win", "win", "win", "win", "win", "win", "win",
"win", "win", "win", "win", "win", "win", "win", "win", "win",
"win", "win", "win", "win", "win", "win", "win", "win", "win",
"win", "win", "win", "win", "win", "win", "win", "win", "win",
"win"), night = structure(c(1310962792, 1310963392, 1310963992,
1310964592, 1310965192, 1310965792, 1310966392, 1310966992, 1310967592,
1310968192, 1310968792, 1310969392, 1310969992, 1310970592, 1310971192,
1310971792, 1310972392, 1310972992, 1310973592, 1310974192, 1310974792,
1310975392, 1311107991, 1311108591, 1311109191, 1311109791, 130391,
130991, 131591, 132191, 132791, 133391, 133991,
134591, 135191, 135791, 136391, 136991, 137591,
138191, 138791, 139391, 139991, 1311034191, 1311034791,
1311035391, 1311035991, 1311036591, 1311037191, 1311037791, 1311038391,
1311038991, 1311039591, 1311040191, 1311040791, 1311041391, 1311041991,
1311042591, 1311043191, 1311043791), class = c("POSIXct", "POSIXt"
), tzone = "GMT"), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), act = c(196, 200, 199, 200, 197, 198, 197,
200, 200, 197, 200, 200, 198, 200, 1, 1, 0, 0, 1, 2, 200, 200,
200, 200, 200, 200, 199, 61, 0, 194, 198, 198, 196, 193, 194,
193, 197, 198, 199, 200, 197, 199, 199, 200, 198, 200, 200, 198,
200, 34, 1, 1, 0, 0, 199, 200, 199, 7, 0, 0)), .Names = c("daten",
"juln", "fen", "night", "day", "act"), row.names = 9990:10049, class =
"data.frame")





On 29 April 2013 14:35, PIKAL Petr  wrote:

> Hi
>
> rrr<-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T)))
> test$res <- rep(rrr$lengths, rrr$lengths)
>
> If you put it in function
>
> fff<- function(x, limits=c(0,1,199,200)) {
> rrr<-rle(as.numeric(cut(x, limits, include.lowest=T)))
> res <- rep(rrr$lengths, rrr$lengths)
> res
> }
>
> you can use split/lapply approach
>
> test$res2<-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))),
> fff))
>
> Beware of correct ordering of days in output. Without correct leveling of
> factor 0 precedes 1.
>
> And for the last part probably aggregate can be the way.
>
> > aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200),
> include.lowest=T)), max)
>   Group.1   Group.2 x
> 1   14655 [0,1] 4
> 2   14655   (1,199] 3
> 3   14655 (199,200] 3
> > aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200),
> include.lowest=T)), min)
>   Group.1   Group.2 x
> 1   14655 [0,1] 4
> 2   14655   (1,199] 1
> 3   14655 (199,200] 2
>
> Regards
> Petr
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> > project.org] On Behalf Of zuzana zajkova
> > Sent: Monday, April 29, 2013 12:45 PM
> > To: r-help@r-project.org
> > Subject: [R] Counting number of consecutive occurrences per rows
> >
> > Hi,
> >
> > I would appreciate if somebody could help me with following
> > calculation.
> > I have a dataframe, by 10 minutes time, for mostly one year data. This
> > is small example:
> >
> > > dput(test)
> > structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655,
> > 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> > 14655), origin = structure(0, class = "Date")),
> > time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> > 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> > 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> > 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> > "GMT"),
> > act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> > 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> > 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> > ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, 518L,
> > 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> > 540L))
> >
> > L

Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread PIKAL Petr

Hi

rrr<-rle(as.numeric(cut(test$act, c(0,1,199,200), include.lowest=T)))
test$res <- rep(rrr$lengths, rrr$lengths)

If you put it in function

fff<- function(x, limits=c(0,1,199,200)) {
rrr<-rle(as.numeric(cut(x, limits, include.lowest=T)))
res <- rep(rrr$lengths, rrr$lengths)
res
}

you can use split/lapply approach

test$res2<-unlist(lapply(split(test$act, factor(test$day, levels=c(1,0))), fff))

Beware of correct ordering of days in output. Without correct leveling of 
factor 0 precedes 1.

And for the last part probably aggregate can be the way.

> aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), 
> include.lowest=T)), max)
  Group.1   Group.2 x
1   14655 [0,1] 4
2   14655   (1,199] 3
3   14655 (199,200] 3
> aggregate(test$res, list(test$jul, cut(test$act, c(0,1,199,200), 
> include.lowest=T)), min)
  Group.1   Group.2 x
1   14655 [0,1] 4
2   14655   (1,199] 1
3   14655 (199,200] 2

Regards
Petr

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of zuzana zajkova
> Sent: Monday, April 29, 2013 12:45 PM
> To: r-help@r-project.org
> Subject: [R] Counting number of consecutive occurrences per rows
> 
> Hi,
> 
> I would appreciate if somebody could help me with following
> calculation.
> I have a dataframe, by 10 minutes time, for mostly one year data. This
> is small example:
> 
> > dput(test)
> structure(list(jul = structure(c(14655, 14655, 14655, 14655, 14655,
> 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> 14655), origin = structure(0, class = "Date")),
> time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> "GMT"),
> act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L, 518L,
> 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> 540L))
> 
> Looks like this:
> 
> > test
>  jultime act day
> 510 14655 2010-02-15 18:25:54 130   1
> 512 14655 2010-02-15 18:35:54  23   1
> 514 14655 2010-02-15 18:45:54  45   1
> 516 14655 2010-02-15 18:55:54 200   1
> 518 14655 2010-02-15 19:05:54 200   1
> 520 14655 2010-02-15 19:15:54 200   1
> 522 14655 2010-02-15 19:25:54 199   1
> 524 14655 2010-02-15 19:35:54 150   1
> 526 14655 2010-02-15 19:45:54   0   1
> 528 14655 2010-02-15 19:55:54   0   1
> 530 14655 2010-02-15 20:05:54   0   0
> 532 14655 2010-02-15 20:15:54   0   0
> 534 14655 2010-02-15 20:25:54  34   0
> 536 14655 2010-02-15 20:35:54 200   0
> 538 14655 2010-02-15 20:45:54 200   0
> 540 14655 2010-02-15 20:55:54 145   0
> 
> 
> What I would like to calculate is the number of consecutive occurrences
> of values 200,  0 and together values from 1 til 199 (in fact the
> values that differ from 200 and 0) in column "act".
> 
> I would like to get something like this (result$res)
> 
> > result
>   jultime act day res res2
> 510 14655 2010-02-15 18:25:54 130   1   33
> 512 14655 2010-02-15 18:35:54  23   1   33
> 514 14655 2010-02-15 18:45:54  45   1   33
> 516 14655 2010-02-15 18:55:54 200   1   33
> 518 14655 2010-02-15 19:05:54 200   1   33
> 520 14655 2010-02-15 19:15:54 200   1   33
> 522 14655 2010-02-15 19:25:54 199   1   22
> 524 14655 2010-02-15 19:35:54 150   1   22
> 526 14655 2010-02-15 19:45:54   0   1   42
> 528 14655 2010-02-15 19:55:54   0   1   42
> 530 14655 2010-02-15 20:05:54   0   0   42
> 532 14655 2010-02-15 20:15:54   0   0   42
> 534 14655 2010-02-15 20:25:54  34   0   11
> 536 14655 2010-02-15 20:35:54 200   0   22
> 538 14655 2010-02-15 20:45:54 200   0   22
> 540 14655 2010-02-15 20:55:54 145   0   11
> 
> And if possible, distinguish among day==1 and day==0 (see the "act"
> values of 0 for example), results as in result$res2.
> 
> After it I would like to make a resume table per days (jul):
> where maxres is max(result$res) for the "act" value where minres is
> min(result$res) for the "act" value where sumres is sum(result$res) for
> the "act" value (for example, if the 200 value ocurrs in different
> times per day(jul) consecutively 3, 5, 1, 6 and 7 times the sumr

Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread jim holtman

try this:

> test <- structure(list(jul = structure(c(14655, 14655, 14655, 14655,
+ 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
+ 14655, 14655, 14655), origin = structure(0, class = "Date")),
+ time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
+ 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
+ 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
+ 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
+ "GMT"),
+ act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
+ 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
+ ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
+ 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
+ 540L))
>
> # add key to separate data
> test$key <- ifelse(test$act == 0
+ , 1L  # 0
+ , ifelse(test$act == 200
+ , 3L  # 200
+ , 2L  # 1-199
+ )
+ )
> # mark changes in sequence
> test$resChange <- cumsum(c(1L, abs(diff(test$key
> test$res <- ave(test$resChange, test$resChange, FUN = length)
>
> test$res2 <- ave(test$resChange, test$resChange, test$day, FUN = length)
>
> test
  jultime act day key resChange res res2
510 14655 2010-02-15 18:25:54 130   1   2 1   33
512 14655 2010-02-15 18:35:54  23   1   2 1   33
514 14655 2010-02-15 18:45:54  45   1   2 1   33
516 14655 2010-02-15 18:55:54 200   1   3 2   33
518 14655 2010-02-15 19:05:54 200   1   3 2   33
520 14655 2010-02-15 19:15:54 200   1   3 2   33
522 14655 2010-02-15 19:25:54 199   1   2 3   22
524 14655 2010-02-15 19:35:54 150   1   2 3   22
526 14655 2010-02-15 19:45:54   0   1   1 4   42
528 14655 2010-02-15 19:55:54   0   1   1 4   42
530 14655 2010-02-15 20:05:54   0   0   1 4   42
532 14655 2010-02-15 20:15:54   0   0   1 4   42
534 14655 2010-02-15 20:25:54  34   0   2 5   11
536 14655 2010-02-15 20:35:54 200   0   3 6   22
538 14655 2010-02-15 20:45:54 200   0   3 6   22
540 14655 2010-02-15 20:55:54 145   0   2 7   11
>



On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova  wrote:

> Hi,
>
> I would appreciate if somebody could help me with following calculation.
> I have a dataframe, by 10 minutes time, for mostly one year data. This is
> small example:
>
> > dput(test)
> structure(list(jul = structure(c(14655, 14655, 14655, 14655,
> 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> 14655, 14655, 14655), origin = structure(0, class = "Date")),
> time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> "GMT"),
> act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
> 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> 540L))
>
> Looks like this:
>
> > test
>  jultime act day
> 510 14655 2010-02-15 18:25:54 130   1
> 512 14655 2010-02-15 18:35:54  23   1
> 514 14655 2010-02-15 18:45:54  45   1
> 516 14655 2010-02-15 18:55:54 200   1
> 518 14655 2010-02-15 19:05:54 200   1
> 520 14655 2010-02-15 19:15:54 200   1
> 522 14655 2010-02-15 19:25:54 199   1
> 524 14655 2010-02-15 19:35:54 150   1
> 526 14655 2010-02-15 19:45:54   0   1
> 528 14655 2010-02-15 19:55:54   0   1
> 530 14655 2010-02-15 20:05:54   0   0
> 532 14655 2010-02-15 20:15:54   0   0
> 534 14655 2010-02-15 20:25:54  34   0
> 536 14655 2010-02-15 20:35:54 200   0
> 538 14655 2010-02-15 20:45:54 200   0
> 540 14655 2010-02-15 20:55:54 145   0
>
>
> What I would like to calculate is the number of consecutive occurrences of
> values 200,  0 and together values from 1 til 199 (in fact the values that
> differ from 200 and 0) in column "act".
>
> I would like to get something like this (result$res)
>
> > result
>   jultime act day res res2
> 510 14655 2010-02-15 18:25:54 130   1   33
> 512 14655 2010-02-15 18:35:54  23   1   33
> 514 14655 2010-02-15 18:45:54  45   1   33
> 516 14655 2010-02-15 18:55:54 200   1   33
> 518 14655 2010-02-15 19:05:54 200   1   33
> 520 14655 2010-02-15 19:15:54 200   1   33
> 522 14655 2010-02-15 19:25:54 199   1   22
> 524 14655 2010-02-15 19:35:54 150   1   22
> 526 14655 2010-02-15 19:45:54   0   1   42
> 528 14655 2010-02-15 19:55:54   0   1   42
> 530 14655 2010-

Re: [R] Counting number of consecutive occurrences per rows

2013-04-29 Thread jim holtman

Forgot the last part of the question:

> test <- structure(list(jul = structure(c(14655, 14655, 14655, 14655,
+ 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
+ 14655, 14655, 14655), origin = structure(0, class = "Date")),
+ time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
+ 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
+ 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
+ 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
+ "GMT"),
+ act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
+ 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+ 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
+ ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
+ 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
+ 540L))
>
> # add key to separate data
> test$key <- ifelse(test$act == 0
+ , 1L  # 0
+ , ifelse(test$act == 200
+ , 3L  # 200
+ , 2L  # 1-199
+ )
+ )
> # mark changes in sequence
> test$resChange <- cumsum(c(1L, abs(diff(test$key
> test$res <- ave(test$resChange, test$resChange, FUN = length)
>
> test$res2 <- ave(test$resChange, test$resChange, test$day, FUN = length)
>
> require(data.table)  # use this for aggregation
> test <- data.table(test)
> testResume <- test[
+ , list(maxres = max(res)
+ , minres = min(res)
+ , sumres = length(unique(resChange))
+ )
+ , keyby = c('day', 'key')
+ ]
> # change 'key'
> testResume$key <- c('0', '1-199', '200')[testResume$key]
> testResume
   day   key maxres minres sumres
1:   0 0  4  4  1
2:   0 1-199  1  1  2
3:   0   200  2  2  1
4:   1 0  4  4  1
5:   1 1-199  3  2  2
6:   1   200  3  3  1
>



On Mon, Apr 29, 2013 at 6:44 AM, zuzana zajkova  wrote:

> Hi,
>
> I would appreciate if somebody could help me with following calculation.
> I have a dataframe, by 10 minutes time, for mostly one year data. This is
> small example:
>
> > dput(test)
> structure(list(jul = structure(c(14655, 14655, 14655, 14655,
> 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
> 14655, 14655, 14655), origin = structure(0, class = "Date")),
> time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
> 1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
> 1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
> 1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
> "GMT"),
> act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
> 34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
> ), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
> 518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
> 540L))
>
> Looks like this:
>
> > test
>  jultime act day
> 510 14655 2010-02-15 18:25:54 130   1
> 512 14655 2010-02-15 18:35:54  23   1
> 514 14655 2010-02-15 18:45:54  45   1
> 516 14655 2010-02-15 18:55:54 200   1
> 518 14655 2010-02-15 19:05:54 200   1
> 520 14655 2010-02-15 19:15:54 200   1
> 522 14655 2010-02-15 19:25:54 199   1
> 524 14655 2010-02-15 19:35:54 150   1
> 526 14655 2010-02-15 19:45:54   0   1
> 528 14655 2010-02-15 19:55:54   0   1
> 530 14655 2010-02-15 20:05:54   0   0
> 532 14655 2010-02-15 20:15:54   0   0
> 534 14655 2010-02-15 20:25:54  34   0
> 536 14655 2010-02-15 20:35:54 200   0
> 538 14655 2010-02-15 20:45:54 200   0
> 540 14655 2010-02-15 20:55:54 145   0
>
>
> What I would like to calculate is the number of consecutive occurrences of
> values 200,  0 and together values from 1 til 199 (in fact the values that
> differ from 200 and 0) in column "act".
>
> I would like to get something like this (result$res)
>
> > result
>   jultime act day res res2
> 510 14655 2010-02-15 18:25:54 130   1   33
> 512 14655 2010-02-15 18:35:54  23   1   33
> 514 14655 2010-02-15 18:45:54  45   1   33
> 516 14655 2010-02-15 18:55:54 200   1   33
> 518 14655 2010-02-15 19:05:54 200   1   33
> 520 14655 2010-02-15 19:15:54 200   1   33
> 522 14655 2010-02-15 19:25:54 199   1   22
> 524 14655 2010-02-15 19:35:54 150   1   22
> 526 14655 2010-02-15 19:45:54   0   1   42
> 528 14655 2010-02-15 19:55:54   0   1   42
> 530 14655 2010-02-15 20:05:54   0   0   42
> 532 14655 2010-02-15 20:15:54   0   0   42
> 534 14655 2010-02-15 20:25:54  34   0   11
> 536 14655 2010-02-15 20:35:54 200   0   22
> 538 14655 2010-02-15 20:45:54 200   0   22
> 540 14655 2010-02-15 20:55:54 145   0   11
>
> And if possible, distinguish among day==1 and day==0 (see the "act" values
> of 0 for example), results as in result$res2.
>
> After it I would like

[R] Counting number of consecutive occurrences per rows

2013-04-29 Thread zuzana zajkova

Hi,

I would appreciate if somebody could help me with following calculation.
I have a dataframe, by 10 minutes time, for mostly one year data. This is
small example:

> dput(test)
structure(list(jul = structure(c(14655, 14655, 14655, 14655,
14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655, 14655,
14655, 14655, 14655), origin = structure(0, class = "Date")),
time = structure(c(1266258354, 1266258954, 1266259554, 1266260154,
1266260754, 1266261354, 1266261954, 1266262554, 1266263154,
1266263754, 1266264354, 1266264954, 1266265554, 1266266154,
1266266754, 1266267354), class = c("POSIXct", "POSIXt"), tzone =
"GMT"),
act = c(130, 23, 45, 200, 200, 200, 199, 150, 0, 0, 0, 0,
34, 200, 200, 145), day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0)), .Names = c("jul", "time", "act", "day"
), class = "data.frame", row.names = c(510L, 512L, 514L, 516L,
518L, 520L, 522L, 524L, 526L, 528L, 530L, 532L, 534L, 536L, 538L,
540L))

Looks like this:

> test
 jultime act day
510 14655 2010-02-15 18:25:54 130   1
512 14655 2010-02-15 18:35:54  23   1
514 14655 2010-02-15 18:45:54  45   1
516 14655 2010-02-15 18:55:54 200   1
518 14655 2010-02-15 19:05:54 200   1
520 14655 2010-02-15 19:15:54 200   1
522 14655 2010-02-15 19:25:54 199   1
524 14655 2010-02-15 19:35:54 150   1
526 14655 2010-02-15 19:45:54   0   1
528 14655 2010-02-15 19:55:54   0   1
530 14655 2010-02-15 20:05:54   0   0
532 14655 2010-02-15 20:15:54   0   0
534 14655 2010-02-15 20:25:54  34   0
536 14655 2010-02-15 20:35:54 200   0
538 14655 2010-02-15 20:45:54 200   0
540 14655 2010-02-15 20:55:54 145   0


What I would like to calculate is the number of consecutive occurrences of
values 200,  0 and together values from 1 til 199 (in fact the values that
differ from 200 and 0) in column "act".

I would like to get something like this (result$res)

> result
  jultime act day res res2
510 14655 2010-02-15 18:25:54 130   1   33
512 14655 2010-02-15 18:35:54  23   1   33
514 14655 2010-02-15 18:45:54  45   1   33
516 14655 2010-02-15 18:55:54 200   1   33
518 14655 2010-02-15 19:05:54 200   1   33
520 14655 2010-02-15 19:15:54 200   1   33
522 14655 2010-02-15 19:25:54 199   1   22
524 14655 2010-02-15 19:35:54 150   1   22
526 14655 2010-02-15 19:45:54   0   1   42
528 14655 2010-02-15 19:55:54   0   1   42
530 14655 2010-02-15 20:05:54   0   0   42
532 14655 2010-02-15 20:15:54   0   0   42
534 14655 2010-02-15 20:25:54  34   0   11
536 14655 2010-02-15 20:35:54 200   0   22
538 14655 2010-02-15 20:45:54 200   0   22
540 14655 2010-02-15 20:55:54 145   0   11

And if possible, distinguish among day==1 and day==0 (see the "act" values
of 0 for example), results as in result$res2.

After it I would like to make a resume table per days (jul):
where maxres is max(result$res) for the "act" value
where minres is min(result$res) for the "act" value
where sumres is sum(result$res) for the "act" value (for example, if the
200 value ocurrs in different times per day(jul) consecutively 3, 5, 1, 6
and 7 times the sumres would be 3+5+1+6+7= 22)

something like this (this are made up numbers):

julact maxres  minres sumres
146550  4   1   25
14655 200 32  48
146551-199   3171
146560   8238
14656 200 15360
146561-199   114 46
...
(theoretically the sum of sumres per day(jul) should be 144)


> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)


I hope my explanation is sufficient. I appreciate any hint.
Thank you,

Zuzana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting various elemnts in a vactor

2013-03-26 Thread arun

Hi,
library(plyr)
 df1<-count(df)
rep(df1[,1],df1[,2]*100)
count(as.character(rep(df1[,1],df1[,2]*100)))
#  x freq
#1 A  200
#2 B  200
#3 C  200
#4 D  400
#5 F  400
A.K.



- Original Message -
From: Katherine Gobin 
To: r-help@r-project.org
Cc: 
Sent: Tuesday, March 26, 2013 4:12 AM
Subject: [R] Counting various elemnts in a vactor

Dear R forum

I have a vector say as given below

df = c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B",    "C")

I need to find 

(1) how many times each element occurs? e.g. in above vector F occurs 4 times, 
C occurs 2 times etc.

(2) Depending on the number of occurrences, I need to repeat the element 100 
times of the occurrences e.g. I need to repeat F 6 * 100 = 600 times, C 2*100 = 
200 times.

I can manage the second part i.e. repeating but I am not able to count the 
number of times the element is appearing in a given vector.

Kindly guide 
 
Katherine











    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting various elemnts in a vactor

2013-03-26 Thread Katherine Gobin

Dear Sir,

Thanks a lot for your great help. I couldn't have figured it out. 

Thanks again.

Regards

Katherine

--- On Tue, 26/3/13, D. Rizopoulos  wrote:

From: D. Rizopoulos 
Subject: Re: [R] Counting various elemnts in a vactor
To: "Katherine Gobin" 
Cc: "r-help@r-project.org" 
Date: Tuesday, 26 March, 2013, 8:23 AM

try this:

df <- c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B", 
    "C")

tab <- table(df)
tab
rep(names(tab), 100 * tab)


I hope it helps.

Best,
Dimitris


On 3/26/2013 9:12 AM, Katherine Gobin wrote:
> Dear R forum
>
> I have a vector say as given below
>
> df = c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B",    
> "C")
>
> I need to find
>
> (1) how many times each element occurs? e.g. in above vector F occurs 4 
> times, C occurs 2 times etc.
>
> (2) Depending on the number of occurrences, I need to repeat the element 100 
> times of the occurrences e.g. I need to repeat F 6 * 100 = 600 times, C 2*100 
> = 200 times.
>
> I can manage the second part i.e. repeating but I am not able to count the 
> number of times the element is appearing in a given vector.
>
> Kindly guide
>
> Katherine
>
>
>
>
>
>
>
>
>
>
>
>     [[alternative HTML version deleted]]
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting various elemnts in a vactor

2013-03-26 Thread D. Rizopoulos

try this:

df <- c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B", 
"C")

tab <- table(df)
tab
rep(names(tab), 100 * tab)


I hope it helps.

Best,
Dimitris


On 3/26/2013 9:12 AM, Katherine Gobin wrote:
> Dear R forum
>
> I have a vector say as given below
>
> df = c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B",
> "C")
>
> I need to find
>
> (1) how many times each element occurs? e.g. in above vector F occurs 4 
> times, C occurs 2 times etc.
>
> (2) Depending on the number of occurrences, I need to repeat the element 100 
> times of the occurrences e.g. I need to repeat F 6 * 100 = 600 times, C 2*100 
> = 200 times.
>
> I can manage the second part i.e. repeating but I am not able to count the 
> number of times the element is appearing in a given vector.
>
> Kindly guide
>
> Katherine
>
>
>
>
>
>
>
>
>
>
>
>   [[alternative HTML version deleted]]
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Counting various elemnts in a vactor

2013-03-26 Thread Katherine Gobin

Dear R forum

I have a vector say as given below

df = c("F", "C", "F", "B", "D", "A", "D", "D", "A", "F", "D", "F", "B",    "C")

I need to find 

(1) how many times each element occurs? e.g. in above vector F occurs 4 times, 
C occurs 2 times etc.

(2) Depending on the number of occurrences, I need to repeat the element 100 
times of the occurrences e.g. I need to repeat F 6 * 100 = 600 times, C 2*100 = 
200 times.

I can manage the second part i.e. repeating but I am not able to count the 
number of times the element is appearing in a given vector.

Kindly guide 
 
Katherine











[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting confidence intervals

2013-03-20 Thread David Winsemius

You should look at findInterval. Used with as.numeric it could do what you 
request although it has a much wider range of uses.

-- 
David

Sent from my iPhone

On Mar 20, 2013, at 5:15 PM, Greg Snow <538...@gmail.com> wrote:

> The TeachingDemos package has %<% and %<=% functions that can be chained
> simply, so you could do something like:
> 
> sum( 5:1 %<=% 1:5 %<=% 10:14 )
> 
> and other similar approaches.
> 
> The idea is that you can do comparisons as:
> 
> lower %<% x %<% upper
> 
> instead of
> 
> lower < x & x < upper
> 
> 
> 
> On Mon, Mar 18, 2013 at 10:16 AM, S Ellison  wrote:
> 
 I want to cont how many
 times a number say 12 lies in the interval. Can anyone assist?
>> 
>> Has anyone else ever wished there was a moderately general 'inside' or
>> 'within' function in R for this problem?
>> 
>> For example, something that behaves more or less like
>> 
>> within <- function(x, interval=NULL, closed=c(TRUE, TRUE),
>> lower=min(interval), upper=max(interval)) {
>>#interval must be a length 2 vector
>>#closed is taken in the order (lower, upper)
>>#lower and upper may be vectors and will be recycled (by "<" etc)
>> if not of length length(x)
>> 
>>low.comp <- if(closed[1]) "<=" else "<"
>>high.comp <- if(closed[2]) ">=" else ">"
>> 
>>do.call(low.comp, list(lower, x)) & do.call(high.comp, list(upper,
>> x))
>> }
>> 
>> 
>> #Examples
>> within(1:5, c(2,4))
>> 
>> within(1:5, c(2,4), closed=c(FALSE, TRUE))
>> 
>> within(1:5, lower=5:1, upper=10:14)
>> 
>> 
>> S Ellison
>> LGC
>> 
>> ***
>> This email and any attachments are confidential. Any u...{{dropped:19}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting confidence intervals

2013-03-20 Thread Greg Snow

The TeachingDemos package has %<% and %<=% functions that can be chained
simply, so you could do something like:

sum( 5:1 %<=% 1:5 %<=% 10:14 )

and other similar approaches.

The idea is that you can do comparisons as:

lower %<% x %<% upper

instead of

lower < x & x < upper

On Mon, Mar 18, 2013 at 10:16 AM, S Ellison  wrote:

> > > I want to cont how many
> > > times a number say 12 lies in the interval. Can anyone assist?
>
> Has anyone else ever wished there was a moderately general 'inside' or
> 'within' function in R for this problem?
>
> For example, something that behaves more or less like
>
> within <- function(x, interval=NULL, closed=c(TRUE, TRUE),
> lower=min(interval), upper=max(interval)) {
> #interval must be a length 2 vector
> #closed is taken in the order (lower, upper)
> #lower and upper may be vectors and will be recycled (by "<" etc)
> if not of length length(x)
>
> low.comp <- if(closed[1]) "<=" else "<"
> high.comp <- if(closed[2]) ">=" else ">"
>
> do.call(low.comp, list(lower, x)) & do.call(high.comp, list(upper,
> x))
> }
>
>
> #Examples
> within(1:5, c(2,4))
>
> within(1:5, c(2,4), closed=c(FALSE, TRUE))
>
> within(1:5, lower=5:1, upper=10:14)
>
>
> S Ellison
> LGC
>
> ***
> This email and any attachments are confidential. Any u...{{dropped:19}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting confidence intervals

2013-03-19 Thread S Ellison

 
> There _is_ a function ?within. 
Drat! of course there is. I even use it, though not often.


> Maybe your function can be 
> named 'between'
Good thought - thanks

Steve E

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Counting confidence intervals

2013-03-18 Thread Rui Barradas


Hello,

There _is_ a function ?within. Maybe your function can be named 'between'

Rui Barradas

Em 18-03-2013 16:16, S Ellison escreveu:

I want to cont how many
times a number say 12 lies in the interval. Can anyone assist?


Has anyone else ever wished there was a moderately general 'inside' or 'within' 
function in R for this problem?

For example, something that behaves more or less like

within <- function(x, interval=NULL, closed=c(TRUE, TRUE), lower=min(interval), 
upper=max(interval)) {
#interval must be a length 2 vector
#closed is taken in the order (lower, upper)
#lower and upper may be vectors and will be recycled (by "<" etc) if 
not of length length(x)

low.comp <- if(closed[1]) "<=" else "<"
high.comp <- if(closed[2]) ">=" else ">"

do.call(low.comp, list(lower, x)) & do.call(high.comp, list(upper, x))
}


#Examples
within(1:5, c(2,4))

within(1:5, c(2,4), closed=c(FALSE, TRUE))

within(1:5, lower=5:1, upper=10:14)


S Ellison
LGC

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 >

1 - 100 of 436 matches

Mail list logo