Mark,
"Abstraction" also has a valid two consonant cluster ("ct"). Some logic
could be added to reject words that have valid twos if they also have
longer strings of consonants.
This may work as a starting off point, using strsplit:
twocons = function(word){
chars = strsplit(word, "[aeiou]")
conlengths = lapply(chars, nchar)
numtwos = sum(conlengths[[1]] == 2)
return(numtwos)
}
words = c("test", "hello", "fail", "pass", "assess")
lapply(words, twocons)
[[1]]
[1] 1
[[2]]
[1] 1
[[3]]
[1] 0
[[4]]
[1] 1
[[5]]
[1] 2
I hope this is helpful,
Greg
Mark Heckmann wrote:
Hi,
I want to parse a string extracting the number of occurrences where two
consonants clump together. Consider for example the word "hallo". Here I
want the algorithm to return 1. For "chess" if want it to return 2. For the
word "screw" the result should be negative as it is a clump of three
consonants not two. Also for word "abstraction" I do not want the algorithm
to detect two times a two consonant cluster. In this case the result should
be negative as well as it is four consonants in a row.
str <- "hallo"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]
[1] 3
attr(,"match.length")
[1] 3
The result is correct. Now I change the word to "hall"
str <- "hall"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]
[1] -1
attr(,"match.length")
[1] -1
Here my expression fails. How can I write a correct regex to do this? I
always encounter problems at the beginning or end of a string.
Also:
str <- "abstraction"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]
[1] 4 7
attr(,"match.length")
[1] 3 3
This also fails.
Thanks in advance,
Mark
-------------------------------
Mark Heckmann
www.markheckmann.de
R-Blog: http://ryouready.wordpress.com
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Greg Hirson
ghir...@ucdavis.edu
Graduate Student
Agricultural and Environmental Chemistry
1106 Robert Mondavi Institute North
One Shields Avenue
Davis, CA 95616
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.