Re: Hive ignores key when reading sequence files?

Bobby Rullo Thu, 08 Oct 2009 17:53:01 -0700

Zheng,

I'll take a look at that.

It seems the easiest thing would be to subclassSequenceFileInputFormat, and override getRecordReader(), to return aRecordReader which wraps SequenceFileRecordReader and overridesRecordReader.next....right?

Is it safe to assume that K,V are both Text writables, so I can justappend the bytes of one to the other?


Bobby
On Oct 6, 2009, at 8:10 PM, Zheng Shao wrote:

Hi Bobby,
We just need a special FileInputFormat - The FileInputFormat shouldbe able to read SequenceFile, and then prepend the key to the valuebefore it's returned to the Hive framework.
Then in Hive language, we can say:

add jar my.jar;
CREATE TABLE mytable (key STRING, value STRING)
STORED AS INPUTFORMAT 'com.my.inputformat' OUTPUTFORMAT'org.apache.hadoop.io.SequenceFileOutputFormat';
See http://issues.apache.org/jira/browse/HIVE-177
You may also want to write your own OutputFileFormat which split therow passed in into key and value and store them separately. But thatis not needed unless you want to use Hive to INSERT to this table(LOAD does NOT need this).
Zheng

On Tue, Oct 6, 2009 at 6:19 PM, Bobby Rullo <[email protected]> wrote:
Hi there,
It seems that Hive ignores the key when reading hadoop sequencefiles. Is there a way to make it not do that?
If there's no way to do this with a 'stock' Hive build, couldsomeone point me to the code that reads sequence files in Hive and Ican have a go at it? It's sort of a show-stopper for us - we have abunch of large files where the key field is important.
Bobby



--
Yours,
Zheng

Re: Hive ignores key when reading sequence files?

Reply via email to