Re: [R] Re gular Expression help

Gabor Grothendieck Sat, 08 Nov 2008 13:38:40 -0800

For the problem at hand I think I would use your solution
which is both easily understood and fastest.  On the
other hand the tapply based solutions are coordinate
free (i.e. no explicit mucking with indices) and readily
generalize to more than 2 groups -- just replace [^pq] with
[^pqr], say.


On Sat, Nov 8, 2008 at 4:21 PM, Wacek Kusnierczyk
<[EMAIL PROTECTED]> wrote:
> Gabor Grothendieck wrote:
>> Here are a few more solutions.  x is the input vector
>> of character strings.
>>
>> The first is a slightly shorter version of one of Wacek's.
>> The next three all create an anonymous grouping variable
>> (using sub, substr/gsub and strapply respectively)
>> whose components are "p" and "q" and then tapply
>> is used to separate out the corresponding components
>> of x according to the grouping:
>>
>> sapply(c(p = "^[^pq]*p", q = "^[^pq]*q"), grep, x = x, value = TRUE)
>>
>> tapply(x, sub("^[^pq]*(.).*", "\\1", x), c)
>>
>> tapply(x, substr(gsub("[^pq]", "", x), 1, 1), c)
>>
>> library(gsubfn)
>> tapply(x, strapply(x, "^[^pq]*(.)", simplify = c), c)
>>
>
> wow!  cool stuff.  if you're interested in comparing their efficiency,
> source the attached script.
>
> vQ
>
> generate = function(n, m)
>        replicate(n, paste(sample(letters, m, replace=TRUE), collapse=""))
>
> tests = list(
>
>        wacek =
>        function(data) {
>                p = grep("^[^pq]*p", data)
>                list(p=data[p], q=data[-p])
>        },
>
>        gabor1 =
>        function(data)
>                sapply(c(p="^[^pq]*p", q="^[^pq]*q"), grep, x=data, 
> value=TRUE),
>
>        gabor2 =
>        function(data)
>                tapply(data, sub("^[^pq]*p(.).*", "\\1", data), c),
>
>        gabor3 =
>        function(data)
>                tapply(data, substr(gsub("[^pq]", "", data), 1, 1), c),
>
>        gabor4 =
>        { library(gsubfn); function(data)
>                tapply(data, strapply(data, "^[^pq]*(.)", simplify=c), c) }
> )
>
> data = generate(1000,10)
> lapply(names(tests),
>        function(name) {
>                cat(name, ":\n", sep="")
>                print(system.time(replicate(30,tests[[name]](data)))) } )
>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re gular Expression help

Reply via email to