Re: Hive ignores key when reading sequence files?

Zheng Shao Fri, 09 Oct 2009 23:53:13 -0700

Yes.

It is safe as long as your own data files has both K and V as Text
writables.


Zheng

On Thu, Oct 8, 2009 at 5:52 PM, Bobby Rullo <[email protected]> wrote:

> Zheng,
> I'll take a look at that.
>
> It seems the easiest thing would be to subclass SequenceFileInputFormat,
> and override getRecordReader(), to return a RecordReader which wraps
> SequenceFileRecordReader and overrides RecordReader.next....right?
>
> Is it safe to assume that K,V are both Text writables, so I can just append
> the bytes of one to the other?
>
> Bobby
> On Oct 6, 2009, at 8:10 PM, Zheng Shao wrote:
>
> Hi Bobby,
>
> We just need a special FileInputFormat - The FileInputFormat should be able
> to read SequenceFile, and then prepend the key to the value before it's
> returned to the Hive framework.
>
> Then in Hive language, we can say:
>
> add jar my.jar;
> CREATE TABLE mytable (key STRING, value STRING)
> STORED AS INPUTFORMAT 'com.my.inputformat' OUTPUTFORMAT
> 'org.apache.hadoop.io.SequenceFileOutputFormat';
>
> See http://issues.apache.org/jira/browse/HIVE-177
>
> You may also want to write your own OutputFileFormat which split the row
> passed in into key and value and store them separately. But that is not
> needed unless you want to use Hive to INSERT to this table (LOAD does NOT
> need this).
>
> Zheng
>
> On Tue, Oct 6, 2009 at 6:19 PM, Bobby Rullo <[email protected]> wrote:
>
>> Hi there,
>>
>> It seems that Hive ignores the key when reading hadoop sequence files. Is
>> there a way to make it not do that?
>>
>> If there's no way to do this with a 'stock' Hive build, could someone
>> point me to the code that reads sequence files in Hive and I can have a go
>> at it? It's sort of a show-stopper for us - we have a bunch of large files
>> where the key field is important.
>>
>> Bobby
>>
>
>
>
> --
> Yours,
> Zheng
>
>
>


-- 
Yours,
Zheng

Re: Hive ignores key when reading sequence files?

Reply via email to