Hi Gabor, thanks fort his great advice. Just one more question: I cannot find how to switch off case sensitivity for the regex in the documentation for gsubfn or strapply, like e.g. in gregexpr the ignore.case =TRUE command. Is there a way?
TIA, Mark ------------------------------- Mark Heckmann + 49 (0) 421 - 1614618 www.markheckmann.de R-Blog: http://ryouready.wordpress.com -----Ursprüngliche Nachricht----- Von: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Gesendet: Dienstag, 30. Juni 2009 18:31 An: Mark Heckmann Cc: r-help@r-project.org Betreff: Re: [R] Using regular expressions to detect clusters of consonants in a string Try this: library(gsubfn) s <- "mystring" strapply(s, "[bcdfghjklmnpqrstvwxyz]+", nchar)[[1]] which returns a vector of consonant string lengths. Now apply your algorithm to that. See http://gsubfn.googlecode.com for more. On Tue, Jun 30, 2009 at 11:30 AM, Mark Heckmann<mark.heckm...@gmx.de> wrote: > Hi, > > I want to parse a string extracting the number of occurrences where two > consonants clump together. Consider for example the word "hallo". Here I > want the algorithm to return 1. For "chess" if want it to return 2. For the > word "screw" the result should be negative as it is a clump of three > consonants not two. Also for word "abstraction" I do not want the algorithm > to detect two times a two consonant cluster. In this case the result should > be negative as well as it is four consonants in a row. > > str <- "hallo" > gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE, > extended = TRUE)[[1]] > > [1] 3 > attr(,"match.length") > [1] 3 > > The result is correct. Now I change the word to "hall" > > str <- "hall" > gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE, > extended = TRUE)[[1]] > > [1] -1 > attr(,"match.length") > [1] -1 > > Here my expression fails. How can I write a correct regex to do this? I > always encounter problems at the beginning or end of a string. > > Also: > > str <- "abstraction" > gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE, > extended = TRUE)[[1]] > > [1] 4 7 > attr(,"match.length") > [1] 3 3 > > This also fails. > > Thanks in advance, > Mark > > ------------------------------- > Mark Heckmann > www.markheckmann.de > R-Blog: http://ryouready.wordpress.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.