on number of input files and split size

Prasan Ary Fri, 04 Apr 2008 12:58:53 -0700

I have a question on how input files are split before they are given out to Map 
functions.
  Say I have an input directory containing  1000 files whose total size is 100 
MB, and I have 10 machines in my cluster and I have configured 10 
mapred.map.tasks in hadoop-site.xml.
   
  1. With this configuration, do we have a way to know what size each split 
will be of?
  2. Does split size depend on how many files there are in the input directory? 
What if I have only 10 files in input directory, but the total size of all 
these files is still 100 MB? Will it affect split size?
   
  Thanks.


       
---------------------------------
You rock. That's why Blockbuster's offering you one month of Blockbuster Total 
Access, No Cost.

on number of input files and split size

Reply via email to