Hi,
In our job, we need to process the data in small chunks, so as to avoid GC
and other stuff. For this, we are using old API of hadoop as that let us
specify parameter like minPartitions.
Does any one knows, If there a way to do the same via newHadoopAPI also?
How that way will be different
I think that the newer Hadoop API does not expose this suggested min
partitions parameter like the old one did. I believe you can try
setting mapreduce.input.fileinputformat.split.{min,max}size instead on
the Hadoop Configuration to suggest a max/min split size, and
therefore bound the number of