Re: merging into MapFile

2008-12-10 Thread yoav.morag

let me rephrase my question : 
are all the parts of a MapFile necessarily affected by a merge ? if so, it's
not scalable, no matter what is the block size is.
however, since MapFile is essentially a directory and not a file, I don't
see a reason why all parts should be affected. can anyone comment on the
actual implementation of the merge algorithm ? 


Elia Mazzawi-2 wrote:
 
 it has to do with the data block size,
 
 I had many small files and the performance because much better when i 
 merged them,
 
 the default block size is 64Mb so redo your files to = 64MB (what i did 
 and recommend)
 or reconfigure your hadoop.
 
 property
   namedfs.block.size/name
   value67108864/value
   descriptionThe default block size for new files./description
 /property
 
 do something like
 cat * | rotatelogs ./merged/m 64M
 it will merge and chop the data for you.
 
 yoav.morag wrote:
 hi all -
 can anyone comment on the performance cost of merging many small files
 into
 an increasingly large MapFile ? will that cost be dependent on the size
 of
 the larger MapFile (since I have to rewrite it) or is there a built-in
 strategy to split it into smaller parts, affecting only those which were
 touched ? 
 thanks -
 Yoav.
   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/merging-into-MapFile-tp20914388p20930594.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



merging into MapFile

2008-12-09 Thread yoav.morag

hi all -
can anyone comment on the performance cost of merging many small files into
an increasingly large MapFile ? will that cost be dependent on the size of
the larger MapFile (since I have to rewrite it) or is there a built-in
strategy to split it into smaller parts, affecting only those which were
touched ? 
thanks -
Yoav.
-- 
View this message in context: 
http://www.nabble.com/merging-into-MapFile-tp20914388p20914388.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.