Re: How can I map-reduce large gzip file in parallel ?

Doug Cutting Thu, 25 Jan 2007 11:22:52 -0800

심탁길 wrote:

I need to handle 4GB gzip style file.I thought that I could map-reduce even such a large gzip file in parallel. --;In reality, we should deal with the gzip style log file which is larger than the default block size(64M)and whenever face the situation , full-scanning and processing a large log file with only one commodity machine isnot desirableis there any idea to solve this kind of issue ?

A gzip file with a single member must be processed by a single thread,since decompression must begin at the start of file. One gzip file withmultiple members can be split, if the boundaries between members can beidentified, either with an index or using a magic string indicating thestart of each member.


Can you instead produce smaller, e.g., 100MB, gzipped inputs?

Doug

Re: How can I map-reduce large gzip file in parallel ?

Reply via email to