Re: What's the best way to get to a single key?

Doug Cutting Mon, 03 Mar 2008 14:52:47 -0800

Use MapFileOutputFormat to write your data, then call:


http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapFileOutputFormat.html#getEntry(org.apache.hadoop.io.MapFile.Reader[],%20org.apache.hadoop.mapred.Partitioner,%20K,%20V)

The documentation is pretty sparse, but the intent is that you open aMapFile.Reader for each mapreduce output, pass the partitioner used, thekey, and the value to be read into.

A MapFile maintains an index of keys, so the entire file need not bescanned. If you really only need the value of a single key then youmight avoid opening all of the output files. In that case you couldmight use the Partitioner and the MapFile API directly.


Doug


Xavier Stevens wrote:

I am curious how others might be solving this problem.  I want to
retrieve a record from HDFS based on its key.  Are there any methods
that can shortcut this type of search to avoid parsing all data until
you find it?  Obviously Hbase would do this as well, but I wanted to
know if there is a way to do it using just Map/Reduce and HDFS.

Thanks,

-Xavier

Re: What's the best way to get to a single key?

Reply via email to