Dear All, I am implementing an algorithm that read a data file(.txt file, approximately 90MB), compare each line of the data file with each line of a specific samples file(.txt file, approximately 20MB). To do this, I need to pass each line of the samples file as parameters to map-reduce job. And they are large, in a sense.
My current way is that I use the job.set and job.get to set and retrieve these lines as configurations. But it is not efficient at all! Could anyone help me with an alternative solution? Thanks a million! Boyu Zhang University of Delaware