Use MapFileOutputFormat to write your data, then call:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapFileOutputFormat.html#getEntry(org.apache.hadoop.io.MapFile.Reader[],%20org.apache.hadoop.mapred.Partitioner,%20K,%20V)
The documentation is pretty sparse, but the intent is that you open a
MapFile.Reader for each mapreduce output, pass the partitioner used, the
key, and the value to be read into.
A MapFile maintains an index of keys, so the entire file need not be
scanned. If you really only need the value of a single key then you
might avoid opening all of the output files. In that case you could
might use the Partitioner and the MapFile API directly.
Doug
Xavier Stevens wrote:
I am curious how others might be solving this problem. I want to
retrieve a record from HDFS based on its key. Are there any methods
that can shortcut this type of search to avoid parsing all data until
you find it? Obviously Hbase would do this as well, but I wanted to
know if there is a way to do it using just Map/Reduce and HDFS.
Thanks,
-Xavier