Zheng,
I'll take a look at that.
It seems the easiest thing would be to subclass
SequenceFileInputFormat, and override getRecordReader(), to return a
RecordReader which wraps SequenceFileRecordReader and overrides
RecordReader.next....right?
Is it safe to assume that K,V are both Text writables, so I can just
append the bytes of one to the other?
Bobby
On Oct 6, 2009, at 8:10 PM, Zheng Shao wrote:
Hi Bobby,
We just need a special FileInputFormat - The FileInputFormat should
be able to read SequenceFile, and then prepend the key to the value
before it's returned to the Hive framework.
Then in Hive language, we can say:
add jar my.jar;
CREATE TABLE mytable (key STRING, value STRING)
STORED AS INPUTFORMAT 'com.my.inputformat' OUTPUTFORMAT
'org.apache.hadoop.io.SequenceFileOutputFormat';
See http://issues.apache.org/jira/browse/HIVE-177
You may also want to write your own OutputFileFormat which split the
row passed in into key and value and store them separately. But that
is not needed unless you want to use Hive to INSERT to this table
(LOAD does NOT need this).
Zheng
On Tue, Oct 6, 2009 at 6:19 PM, Bobby Rullo <[email protected]> wrote:
Hi there,
It seems that Hive ignores the key when reading hadoop sequence
files. Is there a way to make it not do that?
If there's no way to do this with a 'stock' Hive build, could
someone point me to the code that reads sequence files in Hive and I
can have a go at it? It's sort of a show-stopper for us - we have a
bunch of large files where the key field is important.
Bobby
--
Yours,
Zheng