let me rephrase my question : 
are all the parts of a MapFile necessarily affected by a merge ? if so, it's
not scalable, no matter what is the block size is.
however, since MapFile is essentially a directory and not a file, I don't
see a reason why all parts should be affected. can anyone comment on the
actual implementation of the merge algorithm ? 


Elia Mazzawi-2 wrote:
> 
> it has to do with the data block size,
> 
> I had many small files and the performance because much better when i 
> merged them,
> 
> the default block size is 64Mb so redo your files to <= 64MB (what i did 
> and recommend)
> or reconfigure your hadoop.
> 
> <property>
>   <name>dfs.block.size</name>
>   <value>67108864</value>
>   <description>The default block size for new files.</description>
> </property>
> 
> do something like
> cat * | rotatelogs ./merged/m 64M
> it will merge and chop the data for you.
> 
> yoav.morag wrote:
>> hi all -
>> can anyone comment on the performance cost of merging many small files
>> into
>> an increasingly large MapFile ? will that cost be dependent on the size
>> of
>> the larger MapFile (since I have to rewrite it) or is there a built-in
>> strategy to split it into smaller parts, affecting only those which were
>> touched ? 
>> thanks -
>> Yoav.
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/merging-into-MapFile-tp20914388p20930594.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to