Hi, I have a need to randomize my input file before processing. I understand I can chain Hadoop jobs together so the first could take the input file randomize it and then the second could take the randomized file and do the processing.
The input file has one entry per line and I want to mix up the lines before the main processing. Is there an inbuilt ability I have missed or will I have to try and write a Hadoop program to shuffle my input file? Cheers, John