You can write a custom InputFormat whose #getSplits(...) returns your
required InputSplit objects (with randomised offsets + lengths, etc.).

On Fri, Feb 7, 2014 at 9:50 PM, Suresh S <suresh...@gmail.com> wrote:
> Dear Friends,
>
>           I have some very large file in HDFS with 3000+ blocks.
>
> I want run a job with various input size. I want to use the same file as a
> input. Usually the number of task is equal to number of blocks/splits.
> Suppose the job with 2 task need to process randomly any two block of the
> given input file.
>
> How to give a random set of HDFS blocks as a input of a job?
>
> note:  my aim is not processing the input file to produce some output.
> I want to replicate the individual block based on the load.
>
> *Regards*
> *S.Suresh,*
> *Research Scholar,*
> *Department of Computer Applications,*
> *National Institute of Technology,*
> *Tiruchirappalli - 620015.*
> *+91-9941506562*



-- 
Harsh J

Reply via email to