subject:"Regarding minimum number of partitions while reading data from Hadoop"

Regarding minimum number of partitions while reading data from Hadoop

2015-02-19 Thread twinkle sachdeva

Hi, In our job, we need to process the data in small chunks, so as to avoid GC and other stuff. For this, we are using old API of hadoop as that let us specify parameter like minPartitions. Does any one knows, If there a way to do the same via newHadoopAPI also? How that way will be different

Re: Regarding minimum number of partitions while reading data from Hadoop

2015-02-19 Thread Sean Owen

I think that the newer Hadoop API does not expose this suggested min partitions parameter like the old one did. I believe you can try setting mapreduce.input.fileinputformat.split.{min,max}size instead on the Hadoop Configuration to suggest a max/min split size, and therefore bound the number of