I was wondering if there were any R packages supporting fast set intersection.
I am aware of base::intersection, bit::`&`, and set::set_intersection, each of which has its advantages. base::intersection is space-efficient and accepts arbitrary unsorted lists of arbitrary base types. set::set_intersection is very fast but requires space and time proportional to the size of the universe, a contiguous range of integers. set::set_intersection is nicely structured but very slow. For example: q<-sample(1e7,2e6); r<-sample(1e7,1e4) system.time(intersect(q,r)) => 70 mS qb<-as.bit.which(q,1e7); qr<- as.bit.which(r,1e7); system.time(qb&qr) => 3 mS system.time(set_intersection(as.set(q),as.set(r))) => many minutes (with |q|=2e3, time is 3 S) I was wondering if there were other set packages which use different representations of sets (e.g. sorted lists, heaps, etc.) to make some set operations more efficient. In particular, I'd be interested in an intersection operator which is fast and space-efficient when the sets are sparse and quite different in size, such as binary merge or Baeza-Yates (2004). Along the same lines, I wonder if there is a version of 'match' which takes a sorted 'table' argument and uses binary or interpolation search? Thanks, -s ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.