let me rephrase my question :
are all the parts of a MapFile necessarily affected by a merge ? if so, it's
not scalable, no matter what is the block size is.
however, since MapFile is essentially a directory and not a file, I don't
see a reason why all parts should be affected. can anyone comment on the
actual implementation of the merge algorithm ?
Elia Mazzawi-2 wrote:
it has to do with the data block size,
I had many small files and the performance because much better when i
merged them,
the default block size is 64Mb so redo your files to = 64MB (what i did
and recommend)
or reconfigure your hadoop.
property
namedfs.block.size/name
value67108864/value
descriptionThe default block size for new files./description
/property
do something like
cat * | rotatelogs ./merged/m 64M
it will merge and chop the data for you.
yoav.morag wrote:
hi all -
can anyone comment on the performance cost of merging many small files
into
an increasingly large MapFile ? will that cost be dependent on the size
of
the larger MapFile (since I have to rewrite it) or is there a built-in
strategy to split it into smaller parts, affecting only those which were
touched ?
thanks -
Yoav.
--
View this message in context:
http://www.nabble.com/merging-into-MapFile-tp20914388p20930594.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.