Thanks, I didn't think about the bloom filter variant. That's the solution I was looking for :-)
Thibaut -- View this message in context: http://www.nabble.com/Finding-small-subset-in-very-large-dataset-tp21964853p21977132.html Sent from the Hadoop core-user mailing list archive at Nabble.com.