Johan Oskarsson wrote:
I'm considering using the sequence file output of hadoop jobs to serve
data from as it would mean I could skip the conversion from sequence
file -> other file format step.
To do this efficiently I would need the data to be in one file.
I think it should be more efficient to keep things in separate files.
If you use MapFileOutputFormat, there are methods to randomly access
entries from job output:
http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/MapFileOutputFormat.html
SequenceFileOutputFormat will also let you open all readers, but there's
no random access, since a SequenceFile has no index.
http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/SequenceFileOutputFormat.html
Will these suffice?
Doug