Or use make.unique() in place of apseq()

On Sun, 22 Jun 2008, Gabor Grothendieck wrote:

Try this.  apseq() sorts the input and appends a
sequence number: 0, 1, ... to successive
occurrences of each value.  Apply that to both
vectors transforms it into a problem that works
with ordinary match:

lookupTable <- c("a", "a","b","c","d","e","f")
matchSample <- c("a", "a","a","b","d")

# sort and append sequence no
apseq <- function(x) {
+ x <- sort(x)
+ s <- cumsum(!duplicated(x))
+ paste(x, seq(s) - match(s, s))
+ }

match(apseq(matchSample), apseq(lookupTable))
[1]  1  2 NA  3  5


On Sun, Jun 22, 2008 at 10:57 PM,  <[EMAIL PROTECTED]> wrote:
Hi folks,

Can anyone suggest an efficient way to do "matching without
replacement", or "one-to-one matching"?  pmatch() doesn't quite provide
what I need...

For example,

lookupTable <- c("a","b","c","d","e","f")
matchSample <- c("a","a","b","d")
##Normal match() behaviour:
match(matchSample,lookupTable)
[1] 1 1 2 4

My problem here is that both "a"s in matchSample are matched to the same
"a" in the lookup table.  I need the elements of the lookup table to be
excluded from the table as they are matched, so that no match can be
found for the second "a".

Function pmatch() comes close to what I need:

pmatch(matchSample,lookupTable)
[1] 1 NA 2 4

Yep!  However, pmatch() incorporates partial matching, which I
definitely don't want:

lookupTable <- c("a","b","c","d","e","aaaaaaaaf")
matchSample <- c("a","a","b","d")
pmatch(matchSample,lookupTable)
[1] 1 6 2 4
## i.e. the second "a", matches "aaaaaaaaf" - I don't want this.

Of course, when identical items ARE duplicated in both sample and lookup
table, I need the matching to reflect this:

lookupTable <- c("a","a","c","d","e","f")
matchSample <- c("a","a","c","d")
##Normal match() behaviour
match(matchSample,lookupTable)
[1] 1 1 3 4

No good - pmatch() is better:

lookupTable <- c("a","a","c","d","e","f")
matchSample <- c("a","a","c","d")
pmatch(matchSample,lookupTable)
[1] 1 2 3 4

...but we still have the partial matching issue...

##And of course, as per the usual behaviour of match(), sample elements
missing from the lookup table should return NA:

matchSample <- c("a","frog","e","d") ; print(matchSample)
match(matchSample,lookupTable)

Is there a nifty way to get what I'm after without resorting to a for
loop? (my code's already got too blasted many of those...)

Thanks,

Alec Zwart
CMIS CSIRO
[EMAIL PROTECTED]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]                  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to