Thanks for everything so far. It has been really helpful. I have one more question. Is there a way to merge MapFile index/data files? Assuming there is, what is the best way to do so? I was reading the Java docs on it and it looked like this is possible but it wasn't very explicit. Obviously I could specify to use a single reducer, but with my data size that would be really slow.
Thanks, -Xavier -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 04, 2008 12:53 PM To: core-user@hadoop.apache.org Subject: Re: What's the best way to get to a single key? Xavier Stevens wrote: > Is there a way to do this when your input data is using SequenceFile > compression? Yes. A MapFile is simply a directory containing two SequenceFiles named "data" and "index". MapFileOutputFormat uses the same compression parameters as SequenceFileOutputFormat. SequenceFileInputFormat recognizes MapFiles and reads the "data" file. So you should be able to just switch from specifying SequenceFileOutputFormat to MapFileOutputFormat in your jobs and everything should work the same except you'll have index files that permit random access. Doug