Re: [R] Counting enumerated items in each element of a character vector

2017-04-26 Thread Boris Steipe
Let's be a bit careful.

You'll probably need a regular expression. But maybe a regex can't work in 
principle, so one can't just gloss over the details.

You said: "blah blah blah" can contain ANY text. If this is true, "blah blah 
blah" could contain the delimiters. If that is the case, a regex is not 
powerful enough in principle and you need a context-sensitive parser.

So let's have a list of valid demarcations. From what you write I can guess 
that ...

text2 <- c(
"blah   1) blah blah blah 1",
"blah   10. blah blah blah 1",
"blah 1)  1) blah blah blah 1",
"blah 1.  10) blah blah blah 1",
"blah 1)  1. blah blah blah 1",
"blah 10.  10. blah blah blah 1"
)

... captures the variation. But that's just my guess from staring at your 
examples. I can't be sure - that's your task to contribute.

On text2, the regular expression ...

"(\d+(\)|\.)\s*){1,2}"

... gives the expected result of
# [1] 1 1 1 1 1 1
... and ...
# [1] 5 5 5 5
... on your text1.

In code:

library(stringr)
str_count(text1, "(\\d+(\\)|\\.)\\s*){1,2}")






> On Apr 26, 2017, at 10:13 AM, Dan Abner  wrote:
> 
> Hi all,
> 
> I am looking for a streamlined way of counting the number of enumerated items 
> are each element of a character vector. For example:
> 
> 
> text1<-c("blah blah blah.
> blah blah blah
> 1) blah blah blah 1
> 2) blah blah blah
> 10) blah 10 blah blah
> blah blah blah
> 1) blah blah blah
> 2) blah blah blah 2
> blah blah blah.","blah blah blah.
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> 10.blah 10 blah blah
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) 
> blah blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) 
> blah blah blah. blah blah blah."
> ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah. 10. 
> blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah blah. 
> blah blah blah.")
> 
> text1
> 
> ===
> 
> I would like the result to be c(5,5,5,5). Notice that sometimes there are 
> leading hard returns, other times not. Sometimes are there separate lists and 
> the same numbers are used in the enumerated items multiple times within each 
> character string. Sometimes the leading numbers for the enumerated items 
> exceed single digits. Notice that the delimiter may be ) or a period (.). If 
> the delimiter is a period and there are hard returns (example 2), then I 
> expect that will be easy enough to differentiate sentences ending with a 
> number from enumerated items. However, I imagine it would be much more 
> difficult to differentiate the two for example 4.
> 
> Any suggestions are appreciated.
> 
> Best,
> 
> Dan
> 
> On Wed, Apr 26, 2017 at 8:35 AM, Boris Steipe  
> wrote:
> What's the expected output for this sample?
> 
> How do _you_ define what should be counted?
> 
> 
> 
> 
> 
> > On Apr 26, 2017, at 8:33 AM, Dan Abner  wrote:
> >
> > Hi all,
> >
> > I was not clearly enough in my example code. Please see below where "blah
> > blah blah" can be ANY text or numbers: No predictable pattern at all to
> > what may or may not be written in place of "blah blah blah".
> >
> > text1<-c("blah blah blah.
> > blah blah blah
> > 1) blah blah blah 1
> > 2) blah blah blah
> > 10) blah 10 blah blah
> > blah blah blah
> > 1) blah blah blah
> > 2) blah blah blah 2
> > blah blah blah.","blah blah blah.
> > blah blah blah
> > 1. blah blah blah 1
> > 2. blah blah blah
> > 10.blah 10 blah blah
> > blah blah blah
> > 1. blah blah blah 1
> > 2. blah blah blah
> > blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) 
> > blah
> > blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah
> > blah blah. blah blah blah."
> > ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah.
> > 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah
> > blah. blah blah blah.")
> >
> > text1
> >
> > Thank you in advance for your suggestions and/or guidance.
> >
> > Best,
> >
> > Dan
> >
> >
> > On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon  >> wrote:
> >
> >> Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
> >> hadn't use stringr.
> >>
> >> -- Mike
> >>
> >>
> >> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn  wrote:
> >>> stringr::str_count (and stringi::stri_count that it wraps) interpret
> >>> the pattern argument as a regular expression by default.
> >>>
> >>> Best,
> >>> Ista
> >>>
> >>> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
> >>>  wrote:
>  I like Boris's "Hadley" solution.  For the record, I've appended a
>  version that uses regular expressions, the only benefit of which is
>  that it could be generalized to find more-complicated patterns.
> 
>  -- Mike
> 
>  counts 

Re: [R] Counting enumerated items in each element of a character vector

2017-04-26 Thread Boris Steipe
What's the expected output for this sample?

How do _you_ define what should be counted?





> On Apr 26, 2017, at 8:33 AM, Dan Abner  wrote:
> 
> Hi all,
> 
> I was not clearly enough in my example code. Please see below where "blah
> blah blah" can be ANY text or numbers: No predictable pattern at all to
> what may or may not be written in place of "blah blah blah".
> 
> text1<-c("blah blah blah.
> blah blah blah
> 1) blah blah blah 1
> 2) blah blah blah
> 10) blah 10 blah blah
> blah blah blah
> 1) blah blah blah
> 2) blah blah blah 2
> blah blah blah.","blah blah blah.
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> 10.blah 10 blah blah
> blah blah blah
> 1. blah blah blah 1
> 2. blah blah blah
> blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) blah
> blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah
> blah blah. blah blah blah."
> ,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah.
> 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah
> blah. blah blah blah.")
> 
> text1
> 
> Thank you in advance for your suggestions and/or guidance.
> 
> Best,
> 
> Dan
> 
> 
> On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon > wrote:
> 
>> Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
>> hadn't use stringr.
>> 
>> -- Mike
>> 
>> 
>> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn  wrote:
>>> stringr::str_count (and stringi::stri_count that it wraps) interpret
>>> the pattern argument as a regular expression by default.
>>> 
>>> Best,
>>> Ista
>>> 
>>> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
>>>  wrote:
 I like Boris's "Hadley" solution.  For the record, I've appended a
 version that uses regular expressions, the only benefit of which is
 that it could be generalized to find more-complicated patterns.
 
 -- Mike
 
 counts <- sapply(text1, function(next_string) {
loc_example <- length(gregexpr("Example", next_string)[[1]])
loc_example
 }, USE.NAMES=FALSE)
 
> counts
 [1] 5 5 5 5
> 
 
 On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe 
>> wrote:
> I should add: there's a str_count() function in the stringr package.
> 
> library(stringr)
> str_count(text1, "Example")
> # [1] 5 5 5 5
> 
> I guess that would be the neater solution.
> 
> B.
> 
> 
> 
>> On Apr 25, 2017, at 8:23 PM, Boris Steipe 
>> wrote:
>> 
>> How about:
>> 
>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1
>> } ))
>> 
>> 
>> Splitting your string on the five "Examples" in each gives six
>> elements. length(x) - 1 is the number of
>> matches. You can use any regex instead of "example" if you need to
>> tweak what you are looking for.
>> 
>> 
>> B.
>> 
>> 
>> 
>> 
>>> On Apr 25, 2017, at 8:14 PM, Dan Abner 
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I am looking for a streamlined way of counting the number of
>> enumerated
>>> items are each element of a character vector. For example:
>>> 
>>> 
>>> text1<-c("This is an example.
>>> List 1
>>> 1) Example 1
>>> 2) Example 2
>>> 10) Example 10
>>> List 2
>>> 1) Example 1
>>> 2) Example 2
>>> These have been examples.","This is another example.
>>> List 1
>>> 1. Example 1
>>> 2. Example 2
>>> 10. Example 10
>>> List 2
>>> 1. Example 1
>>> 2. Example 2
>>> These have been examples.","This is a third example. List 1 1)
>> Example 1.
>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2.
>> These have
>>> been examples."
>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10.
>> Example
>>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>>> 
>>> text1
>>> 
>>> ===
>>> 
>>> I would like the result to be c(5,5,5,5). Notice that sometimes
>> there are
>>> leading hard returns, other times not. Sometimes are there separate
>> lists
>>> and the same numbers are used in the enumerated items multiple times
>> within
>>> each character string. Sometimes the leading numbers for the
>> enumerated
>>> items exceed single digits. Notice that the delimiter may be ) or a
>> period
>>> (.). If the delimiter is a period and there are hard returns
>> (example 2),
>>> then I expect that will be easy enough to differentiate sentences
>> ending
>>> with a number from enumerated items. However, I imagine it would be
>> much
>>> more difficult to differentiate the two for example 4.
>>> 
>>> Any suggestions are appreciated.
>>> 
>>> Best,
>>> 
>>> Dan
>>> 
>>> [[alternative HTML version 

Re: [R] Counting enumerated items in each element of a character vector

2017-04-26 Thread Dan Abner
Hi all,

I was not clearly enough in my example code. Please see below where "blah
blah blah" can be ANY text or numbers: No predictable pattern at all to
what may or may not be written in place of "blah blah blah".

text1<-c("blah blah blah.
blah blah blah
1) blah blah blah 1
2) blah blah blah
10) blah 10 blah blah
blah blah blah
1) blah blah blah
2) blah blah blah 2
blah blah blah.","blah blah blah.
blah blah blah
1. blah blah blah 1
2. blah blah blah
10.blah 10 blah blah
blah blah blah
1. blah blah blah 1
2. blah blah blah
blah blah blah.","blah blah blah. blah blah blah 1 1)blah blah blah 1. 2) blah
blah blah 10) blah 10 blah blah blah blah blah 1) blah blah blah 1. 2) blah
blah blah. blah blah blah."
,"blah blah blah. blah blah blah 1 1.blah blah blah 1. 2. blah blah blah.
 10. blah 10 blah blah. blah blah blah 1. blah blah blah 1. 2. blah blah
blah. blah blah blah.")

text1

Thank you in advance for your suggestions and/or guidance.

Best,

Dan


On Wed, Apr 26, 2017 at 12:52 AM, Michael Hannon  wrote:

> Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
> hadn't use stringr.
>
> -- Mike
>
>
> On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn  wrote:
> > stringr::str_count (and stringi::stri_count that it wraps) interpret
> > the pattern argument as a regular expression by default.
> >
> > Best,
> > Ista
> >
> > On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
> >  wrote:
> >> I like Boris's "Hadley" solution.  For the record, I've appended a
> >> version that uses regular expressions, the only benefit of which is
> >> that it could be generalized to find more-complicated patterns.
> >>
> >> -- Mike
> >>
> >> counts <- sapply(text1, function(next_string) {
> >> loc_example <- length(gregexpr("Example", next_string)[[1]])
> >> loc_example
> >> }, USE.NAMES=FALSE)
> >>
> >>> counts
> >> [1] 5 5 5 5
> >>>
> >>
> >> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe 
> wrote:
> >>> I should add: there's a str_count() function in the stringr package.
> >>>
> >>> library(stringr)
> >>> str_count(text1, "Example")
> >>> # [1] 5 5 5 5
> >>>
> >>> I guess that would be the neater solution.
> >>>
> >>> B.
> >>>
> >>>
> >>>
>  On Apr 25, 2017, at 8:23 PM, Boris Steipe 
> wrote:
> 
>  How about:
> 
>  unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1
> } ))
> 
> 
>  Splitting your string on the five "Examples" in each gives six
> elements. length(x) - 1 is the number of
>  matches. You can use any regex instead of "example" if you need to
> tweak what you are looking for.
> 
> 
>  B.
> 
> 
> 
> 
> > On Apr 25, 2017, at 8:14 PM, Dan Abner 
> wrote:
> >
> > Hi all,
> >
> > I am looking for a streamlined way of counting the number of
> enumerated
> > items are each element of a character vector. For example:
> >
> >
> > text1<-c("This is an example.
> > List 1
> > 1) Example 1
> > 2) Example 2
> > 10) Example 10
> > List 2
> > 1) Example 1
> > 2) Example 2
> > These have been examples.","This is another example.
> > List 1
> > 1. Example 1
> > 2. Example 2
> > 10. Example 10
> > List 2
> > 1. Example 1
> > 2. Example 2
> > These have been examples.","This is a third example. List 1 1)
> Example 1.
> > 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2.
> These have
> > been examples."
> > ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10.
> Example
> > 10. List 2 Example 1. 2. Example 2. These have been examples.")
> >
> > text1
> >
> > ===
> >
> > I would like the result to be c(5,5,5,5). Notice that sometimes
> there are
> > leading hard returns, other times not. Sometimes are there separate
> lists
> > and the same numbers are used in the enumerated items multiple times
> within
> > each character string. Sometimes the leading numbers for the
> enumerated
> > items exceed single digits. Notice that the delimiter may be ) or a
> period
> > (.). If the delimiter is a period and there are hard returns
> (example 2),
> > then I expect that will be easy enough to differentiate sentences
> ending
> > with a number from enumerated items. However, I imagine it would be
> much
> > more difficult to differentiate the two for example 4.
> >
> > Any suggestions are appreciated.
> >
> > Best,
> >
> > Dan
> >
> >  [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide 

Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Michael Hannon
Thanks, Ista.  I thought there might be a "tidy" way to do this, but I
hadn't use stringr.

-- Mike


On Tue, Apr 25, 2017 at 8:47 PM, Ista Zahn  wrote:
> stringr::str_count (and stringi::stri_count that it wraps) interpret
> the pattern argument as a regular expression by default.
>
> Best,
> Ista
>
> On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
>  wrote:
>> I like Boris's "Hadley" solution.  For the record, I've appended a
>> version that uses regular expressions, the only benefit of which is
>> that it could be generalized to find more-complicated patterns.
>>
>> -- Mike
>>
>> counts <- sapply(text1, function(next_string) {
>> loc_example <- length(gregexpr("Example", next_string)[[1]])
>> loc_example
>> }, USE.NAMES=FALSE)
>>
>>> counts
>> [1] 5 5 5 5
>>>
>>
>> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe  
>> wrote:
>>> I should add: there's a str_count() function in the stringr package.
>>>
>>> library(stringr)
>>> str_count(text1, "Example")
>>> # [1] 5 5 5 5
>>>
>>> I guess that would be the neater solution.
>>>
>>> B.
>>>
>>>
>>>
 On Apr 25, 2017, at 8:23 PM, Boris Steipe  wrote:

 How about:

 unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))


 Splitting your string on the five "Examples" in each gives six elements. 
 length(x) - 1 is the number of
 matches. You can use any regex instead of "example" if you need to tweak 
 what you are looking for.


 B.




> On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:
>
> Hi all,
>
> I am looking for a streamlined way of counting the number of enumerated
> items are each element of a character vector. For example:
>
>
> text1<-c("This is an example.
> List 1
> 1) Example 1
> 2) Example 2
> 10) Example 10
> List 2
> 1) Example 1
> 2) Example 2
> These have been examples.","This is another example.
> List 1
> 1. Example 1
> 2. Example 2
> 10. Example 10
> List 2
> 1. Example 1
> 2. Example 2
> These have been examples.","This is a third example. List 1 1) Example 1.
> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These 
> have
> been examples."
> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>
> text1
>
> ===
>
> I would like the result to be c(5,5,5,5). Notice that sometimes there are
> leading hard returns, other times not. Sometimes are there separate lists
> and the same numbers are used in the enumerated items multiple times 
> within
> each character string. Sometimes the leading numbers for the enumerated
> items exceed single digits. Notice that the delimiter may be ) or a period
> (.). If the delimiter is a period and there are hard returns (example 2),
> then I expect that will be easy enough to differentiate sentences ending
> with a number from enumerated items. However, I imagine it would be much
> more difficult to differentiate the two for example 4.
>
> Any suggestions are appreciated.
>
> Best,
>
> Dan
>
>  [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide 

Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Ista Zahn
stringr::str_count (and stringi::stri_count that it wraps) interpret
the pattern argument as a regular expression by default.

Best,
Ista

On Tue, Apr 25, 2017 at 11:40 PM, Michael Hannon
 wrote:
> I like Boris's "Hadley" solution.  For the record, I've appended a
> version that uses regular expressions, the only benefit of which is
> that it could be generalized to find more-complicated patterns.
>
> -- Mike
>
> counts <- sapply(text1, function(next_string) {
> loc_example <- length(gregexpr("Example", next_string)[[1]])
> loc_example
> }, USE.NAMES=FALSE)
>
>> counts
> [1] 5 5 5 5
>>
>
> On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe  
> wrote:
>> I should add: there's a str_count() function in the stringr package.
>>
>> library(stringr)
>> str_count(text1, "Example")
>> # [1] 5 5 5 5
>>
>> I guess that would be the neater solution.
>>
>> B.
>>
>>
>>
>>> On Apr 25, 2017, at 8:23 PM, Boris Steipe  wrote:
>>>
>>> How about:
>>>
>>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))
>>>
>>>
>>> Splitting your string on the five "Examples" in each gives six elements. 
>>> length(x) - 1 is the number of
>>> matches. You can use any regex instead of "example" if you need to tweak 
>>> what you are looking for.
>>>
>>>
>>> B.
>>>
>>>
>>>
>>>
 On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:

 Hi all,

 I am looking for a streamlined way of counting the number of enumerated
 items are each element of a character vector. For example:


 text1<-c("This is an example.
 List 1
 1) Example 1
 2) Example 2
 10) Example 10
 List 2
 1) Example 1
 2) Example 2
 These have been examples.","This is another example.
 List 1
 1. Example 1
 2. Example 2
 10. Example 10
 List 2
 1. Example 1
 2. Example 2
 These have been examples.","This is a third example. List 1 1) Example 1.
 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
 been examples."
 ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
 10. List 2 Example 1. 2. Example 2. These have been examples.")

 text1

 ===

 I would like the result to be c(5,5,5,5). Notice that sometimes there are
 leading hard returns, other times not. Sometimes are there separate lists
 and the same numbers are used in the enumerated items multiple times within
 each character string. Sometimes the leading numbers for the enumerated
 items exceed single digits. Notice that the delimiter may be ) or a period
 (.). If the delimiter is a period and there are hard returns (example 2),
 then I expect that will be easy enough to differentiate sentences ending
 with a number from enumerated items. However, I imagine it would be much
 more difficult to differentiate the two for example 4.

 Any suggestions are appreciated.

 Best,

 Dan

  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Michael Hannon
I like Boris's "Hadley" solution.  For the record, I've appended a
version that uses regular expressions, the only benefit of which is
that it could be generalized to find more-complicated patterns.

-- Mike

counts <- sapply(text1, function(next_string) {
loc_example <- length(gregexpr("Example", next_string)[[1]])
loc_example
}, USE.NAMES=FALSE)

> counts
[1] 5 5 5 5
>

On Tue, Apr 25, 2017 at 5:33 PM, Boris Steipe  wrote:
> I should add: there's a str_count() function in the stringr package.
>
> library(stringr)
> str_count(text1, "Example")
> # [1] 5 5 5 5
>
> I guess that would be the neater solution.
>
> B.
>
>
>
>> On Apr 25, 2017, at 8:23 PM, Boris Steipe  wrote:
>>
>> How about:
>>
>> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))
>>
>>
>> Splitting your string on the five "Examples" in each gives six elements. 
>> length(x) - 1 is the number of
>> matches. You can use any regex instead of "example" if you need to tweak 
>> what you are looking for.
>>
>>
>> B.
>>
>>
>>
>>
>>> On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:
>>>
>>> Hi all,
>>>
>>> I am looking for a streamlined way of counting the number of enumerated
>>> items are each element of a character vector. For example:
>>>
>>>
>>> text1<-c("This is an example.
>>> List 1
>>> 1) Example 1
>>> 2) Example 2
>>> 10) Example 10
>>> List 2
>>> 1) Example 1
>>> 2) Example 2
>>> These have been examples.","This is another example.
>>> List 1
>>> 1. Example 1
>>> 2. Example 2
>>> 10. Example 10
>>> List 2
>>> 1. Example 1
>>> 2. Example 2
>>> These have been examples.","This is a third example. List 1 1) Example 1.
>>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
>>> been examples."
>>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
>>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>>>
>>> text1
>>>
>>> ===
>>>
>>> I would like the result to be c(5,5,5,5). Notice that sometimes there are
>>> leading hard returns, other times not. Sometimes are there separate lists
>>> and the same numbers are used in the enumerated items multiple times within
>>> each character string. Sometimes the leading numbers for the enumerated
>>> items exceed single digits. Notice that the delimiter may be ) or a period
>>> (.). If the delimiter is a period and there are hard returns (example 2),
>>> then I expect that will be easy enough to differentiate sentences ending
>>> with a number from enumerated items. However, I imagine it would be much
>>> more difficult to differentiate the two for example 4.
>>>
>>> Any suggestions are appreciated.
>>>
>>> Best,
>>>
>>> Dan
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Boris Steipe
I should add: there's a str_count() function in the stringr package.

library(stringr)
str_count(text1, "Example")
# [1] 5 5 5 5

I guess that would be the neater solution.

B.



> On Apr 25, 2017, at 8:23 PM, Boris Steipe  wrote:
> 
> How about:
> 
> unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))
> 
> 
> Splitting your string on the five "Examples" in each gives six elements. 
> length(x) - 1 is the number of
> matches. You can use any regex instead of "example" if you need to tweak what 
> you are looking for.
> 
> 
> B.
> 
> 
> 
> 
>> On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:
>> 
>> Hi all,
>> 
>> I am looking for a streamlined way of counting the number of enumerated
>> items are each element of a character vector. For example:
>> 
>> 
>> text1<-c("This is an example.
>> List 1
>> 1) Example 1
>> 2) Example 2
>> 10) Example 10
>> List 2
>> 1) Example 1
>> 2) Example 2
>> These have been examples.","This is another example.
>> List 1
>> 1. Example 1
>> 2. Example 2
>> 10. Example 10
>> List 2
>> 1. Example 1
>> 2. Example 2
>> These have been examples.","This is a third example. List 1 1) Example 1.
>> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
>> been examples."
>> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
>> 10. List 2 Example 1. 2. Example 2. These have been examples.")
>> 
>> text1
>> 
>> ===
>> 
>> I would like the result to be c(5,5,5,5). Notice that sometimes there are
>> leading hard returns, other times not. Sometimes are there separate lists
>> and the same numbers are used in the enumerated items multiple times within
>> each character string. Sometimes the leading numbers for the enumerated
>> items exceed single digits. Notice that the delimiter may be ) or a period
>> (.). If the delimiter is a period and there are hard returns (example 2),
>> then I expect that will be easy enough to differentiate sentences ending
>> with a number from enumerated items. However, I imagine it would be much
>> more difficult to differentiate the two for example 4.
>> 
>> Any suggestions are appreciated.
>> 
>> Best,
>> 
>> Dan
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Boris Steipe
How about:

unlist(lapply(strsplit(text1, "Example"), function(x) { length(x) - 1 } ))


Splitting your string on the five "Examples" in each gives six elements. 
length(x) - 1 is the number of
matches. You can use any regex instead of "example" if you need to tweak what 
you are looking for.


B.




> On Apr 25, 2017, at 8:14 PM, Dan Abner  wrote:
> 
> Hi all,
> 
> I am looking for a streamlined way of counting the number of enumerated
> items are each element of a character vector. For example:
> 
> 
> text1<-c("This is an example.
> List 1
> 1) Example 1
> 2) Example 2
> 10) Example 10
> List 2
> 1) Example 1
> 2) Example 2
> These have been examples.","This is another example.
> List 1
> 1. Example 1
> 2. Example 2
> 10. Example 10
> List 2
> 1. Example 1
> 2. Example 2
> These have been examples.","This is a third example. List 1 1) Example 1.
> 2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
> been examples."
> ,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
> 10. List 2 Example 1. 2. Example 2. These have been examples.")
> 
> text1
> 
> ===
> 
> I would like the result to be c(5,5,5,5). Notice that sometimes there are
> leading hard returns, other times not. Sometimes are there separate lists
> and the same numbers are used in the enumerated items multiple times within
> each character string. Sometimes the leading numbers for the enumerated
> items exceed single digits. Notice that the delimiter may be ) or a period
> (.). If the delimiter is a period and there are hard returns (example 2),
> then I expect that will be easy enough to differentiate sentences ending
> with a number from enumerated items. However, I imagine it would be much
> more difficult to differentiate the two for example 4.
> 
> Any suggestions are appreciated.
> 
> Best,
> 
> Dan
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Counting enumerated items in each element of a character vector

2017-04-25 Thread Dan Abner
Hi all,

I am looking for a streamlined way of counting the number of enumerated
items are each element of a character vector. For example:


text1<-c("This is an example.
List 1
1) Example 1
2) Example 2
10) Example 10
List 2
1) Example 1
2) Example 2
These have been examples.","This is another example.
List 1
1. Example 1
2. Example 2
10. Example 10
List 2
1. Example 1
2. Example 2
These have been examples.","This is a third example. List 1 1) Example 1.
2) Example 2. 10) Example 10. List 2 1) Example 1. 2) Example 2. These have
been examples."
,"This is a fourth example. List 1 1. Example 1. 2. Example 2. 10. Example
10. List 2 Example 1. 2. Example 2. These have been examples.")

text1

===

I would like the result to be c(5,5,5,5). Notice that sometimes there are
leading hard returns, other times not. Sometimes are there separate lists
and the same numbers are used in the enumerated items multiple times within
each character string. Sometimes the leading numbers for the enumerated
items exceed single digits. Notice that the delimiter may be ) or a period
(.). If the delimiter is a period and there are hard returns (example 2),
then I expect that will be easy enough to differentiate sentences ending
with a number from enumerated items. However, I imagine it would be much
more difficult to differentiate the two for example 4.

Any suggestions are appreciated.

Best,

Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.