Wait - Habermaas like in Critical Theory???? -----Original Message----- From: Habermaas, William [mailto:william.haberm...@fatwire.com] Sent: Monday, June 27, 2011 2:55 PM To: common-user@hadoop.apache.org Subject: RE: How to select random n records using mapreduce ?
I did something similar. Basically I had a random sampling algorithm that I called from the mapper. If it returned true I would collect the data, otherwise I would discard it. Bill -----Original Message----- From: ni...@basj.es [mailto:ni...@basj.es] On Behalf Of Niels Basjes Sent: Monday, June 27, 2011 3:29 PM To: mapreduce-u...@hadoop.apache.org Cc: core-u...@hadoop.apache.org Subject: Re: How to select random n records using mapreduce ? The only solution I can think of is by creating a counter in Hadoop that is incremented each time a mapper lets a record through. As soon as the value reaches a preselected value the mappers simply discard the additional input they receive. Note that this will not at all be random.... yet it's the best I can come up with right now. HTH On Mon, Jun 27, 2011 at 09:11, Jeff Zhang <zjf...@gmail.com> wrote: > > Hi all, > I'd like to select random N records from a large amount of data using > hadoop, just wonder how can I archive this ? Currently my idea is that let > each mapper task select N / mapper_number records. Does anyone has such > experience ? > > -- > Best Regards > > Jeff Zhang > -- Best regards / Met vriendelijke groeten, Niels Basjes