let me rephrase my question :
are all the parts of a MapFile necessarily affected by a merge ? if so, it's
not scalable, no matter what is the block size is.
however, since MapFile is essentially a directory and not a file, I don't
see a reason why all parts should be affected. can anyone comment on the
actual implementation of the merge algorithm ?
Elia Mazzawi-2 wrote:
>
> it has to do with the data block size,
>
> I had many small files and the performance because much better when i
> merged them,
>
> the default block size is 64Mb so redo your files to <= 64MB (what i did
> and recommend)
> or reconfigure your hadoop.
>
>
> dfs.block.size
> 67108864
> The default block size for new files.
>
>
> do something like
> cat * | rotatelogs ./merged/m 64M
> it will merge and chop the data for you.
>
> yoav.morag wrote:
>> hi all -
>> can anyone comment on the performance cost of merging many small files
>> into
>> an increasingly large MapFile ? will that cost be dependent on the size
>> of
>> the larger MapFile (since I have to rewrite it) or is there a built-in
>> strategy to split it into smaller parts, affecting only those which were
>> touched ?
>> thanks -
>> Yoav.
>>
>
>
>
--
View this message in context:
http://www.nabble.com/merging-into-MapFile-tp20914388p20930594.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.