Hi all, I'd like to select random N records from a large amount of data using hadoop, just wonder how can I archive this ? Currently my idea is that let each mapper task select N / mapper_number records. Does anyone has such experience ?
-- Best Regards Jeff Zhang