I was thinking something like: setequal <- function(x,y) { xu = unique(x) yu = unique(y) if (length(xu) != length(yu)) { return FALSE; } return (all( match( xu, yu, 0L ) > 0L ) ) }
This lets you fail early for cheap (skipping the allocation from the ">0L"s). Whether or not this goes fast depends a lot on the uniqueness of x and y and whether or not you want to optimize for the TRUE or FALSE case. You'd do much better to make some real hashes in C and compare the keys, but it's probably not worth the complexity. Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty <phave...@gene.com> wrote: > How about unique them both and compare the lengths? It's less work, > especially allocation. > > > > Pete > > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com > > On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard <pda...@gmail.com> wrote: > >> If you look at the definition of %in%, you'll find that it is implemented >> using match, so if we did as you suggest, I give it about three days before >> someone suggests to inline the function call... Readability of source code >> is not usually our prime concern. >> >> The && idea does have some merit, though. >> >> Apropos, why is there no setcontains()? >> >> -pd >> >> > On 06 Jan 2015, at 22:02 , Herv� Pag�s <hpa...@fredhutch.org> wrote: >> > >> > Hi, >> > >> > Current implementation: >> > >> > setequal <- function (x, y) >> > { >> > x <- as.vector(x) >> > y <- as.vector(y) >> > all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L)) >> > } >> > >> > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L) >> > 0L' >> > with 'x %in% y' and 'y %in% x', respectively. They're strictly >> > equivalent but the latter form is a lot more readable than the former >> > (isn't this the "raison d'�tre" of %in%?): >> > >> > setequal <- function (x, y) >> > { >> > x <- as.vector(x) >> > y <- as.vector(y) >> > all(c(x %in% y, y %in% x)) >> > } >> > >> > Furthermore, replacing 'all(c(x %in% y, y %in x))' with >> > 'all(x %in% y) && all(y %in% x)' improves readability even more and, >> > more importantly, reduces memory footprint significantly on big vectors >> > (e.g. by 15% on integer vectors with 15M elements): >> > >> > setequal <- function (x, y) >> > { >> > x <- as.vector(x) >> > y <- as.vector(y) >> > all(x %in% y) && all(y %in% x) >> > } >> > >> > It also seems to speed up things a little bit (not in a significant >> > way though). >> > >> > Cheers, >> > H. >> > >> > -- >> > Herv� Pag�s >> > >> > Program in Computational Biology >> > Division of Public Health Sciences >> > Fred Hutchinson Cancer Research Center >> > 1100 Fairview Ave. N, M1-B514 >> > P.O. Box 19024 >> > Seattle, WA 98109-1024 >> > >> > E-mail: hpa...@fredhutch.org >> > Phone: (206) 667-5791 >> > Fax: (206) 667-1319 >> > >> > ______________________________________________ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: pd....@cbs.dk Priv: pda...@gmail.com >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > [[alternative HTML version deleted]]
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel