Hi Bobby, We just need a special FileInputFormat - The FileInputFormat should be able to read SequenceFile, and then prepend the key to the value before it's returned to the Hive framework.
Then in Hive language, we can say: add jar my.jar; CREATE TABLE mytable (key STRING, value STRING) STORED AS INPUTFORMAT 'com.my.inputformat' OUTPUTFORMAT 'org.apache.hadoop.io.SequenceFileOutputFormat'; See http://issues.apache.org/jira/browse/HIVE-177 You may also want to write your own OutputFileFormat which split the row passed in into key and value and store them separately. But that is not needed unless you want to use Hive to INSERT to this table (LOAD does NOT need this). Zheng On Tue, Oct 6, 2009 at 6:19 PM, Bobby Rullo <[email protected]> wrote: > Hi there, > > It seems that Hive ignores the key when reading hadoop sequence files. Is > there a way to make it not do that? > > If there's no way to do this with a 'stock' Hive build, could someone point > me to the code that reads sequence files in Hive and I can have a go at it? > It's sort of a show-stopper for us - we have a bunch of large files where > the key field is important. > > Bobby > -- Yours, Zheng
