Hi Bobby,

We just need a special FileInputFormat - The FileInputFormat should be able
to read SequenceFile, and then prepend the key to the value before it's
returned to the Hive framework.

Then in Hive language, we can say:

add jar my.jar;
CREATE TABLE mytable (key STRING, value STRING)
STORED AS INPUTFORMAT 'com.my.inputformat' OUTPUTFORMAT
'org.apache.hadoop.io.SequenceFileOutputFormat';

See http://issues.apache.org/jira/browse/HIVE-177

You may also want to write your own OutputFileFormat which split the row
passed in into key and value and store them separately. But that is not
needed unless you want to use Hive to INSERT to this table (LOAD does NOT
need this).

Zheng

On Tue, Oct 6, 2009 at 6:19 PM, Bobby Rullo <[email protected]> wrote:

> Hi there,
>
> It seems that Hive ignores the key when reading hadoop sequence files. Is
> there a way to make it not do that?
>
> If there's no way to do this with a 'stock' Hive build, could someone point
> me to the code that reads sequence files in Hive and I can have a go at it?
> It's sort of a show-stopper for us - we have a bunch of large files where
> the key field is important.
>
> Bobby
>



-- 
Yours,
Zheng

Reply via email to