You can control the input to a computer program, but not (arbitrarily) how much output it generates. The only way to generate output files of a fixed size is to write a custom output format which shifts to a new filename every time that size is exceeded, but you will still get some small bits left over. The plumbing in this is pretty ugly, and I would not recommend it casually.
You may be able to write a second map-only job which reprocesses the output from the first job in chunks of X bytes, and just writes them out. Use an IdentityMapper and set the split size. I have not tried this at home. S. On 26 October 2011 07:03, Mapred Learn <mapred.le...@gmail.com> wrote: > > > > > > Hi, > > I am trying to create output files of fixed size by using : > > -Dmapred.max.split.size=6442450812 (6 Gb) > > > > But the problem is that the input Data size and metadata varies and I > have to adjust above value manually to achieve fixed size. > > > > Is there a way I can programmatically determine split size that would > yield me fixed sized output files. For eg 200 MB each ? > > > > Thanks, > > JJ >