You can control the input to a computer program, but not (arbitrarily) how
much output it generates. The only way to generate output files of a fixed
size is to write a custom output format which shifts to a new filename every
time that size is exceeded, but you will still get some small bits left
over. The plumbing in this is pretty ugly, and I would not recommend it
casually.

You may be able to write a second map-only job which reprocesses the output
from the first job in chunks of X bytes, and just writes them out. Use an
IdentityMapper and set the split size. I have not tried this at home.

S.

On 26 October 2011 07:03, Mapred Learn <mapred.le...@gmail.com> wrote:

>
> >
>
> > Hi,
> > I am trying to create output files of fixed size by using :
> > -Dmapred.max.split.size=6442450812 (6 Gb)
> >
> > But the problem is that the input Data size and metadata varies  and I
> have to adjust above value manually to achieve fixed size.
> >
> > Is there a way I can programmatically determine split size that would
> yield me fixed sized output files. For eg 200 MB each ?
> >
> > Thanks,
> > JJ
>

Reply via email to