Hi I need to handle 4GB gzip style file. I thought that I could map-reduce even such a large gzip file in parallel. --; In reality, we should deal with the gzip style log file which is larger than the default block size(64M) and whenever face the situation , full-scanning and processing a large log file with only one commodity machine is not desirable is there any idea to solve this kind of issue ?
- How can I map-reduce large gzip file in parallel ? 심탁길