Yes. It is safe as long as your own data files has both K and V as Text writables.
Zheng On Thu, Oct 8, 2009 at 5:52 PM, Bobby Rullo <[email protected]> wrote: > Zheng, > I'll take a look at that. > > It seems the easiest thing would be to subclass SequenceFileInputFormat, > and override getRecordReader(), to return a RecordReader which wraps > SequenceFileRecordReader and overrides RecordReader.next....right? > > Is it safe to assume that K,V are both Text writables, so I can just append > the bytes of one to the other? > > Bobby > On Oct 6, 2009, at 8:10 PM, Zheng Shao wrote: > > Hi Bobby, > > We just need a special FileInputFormat - The FileInputFormat should be able > to read SequenceFile, and then prepend the key to the value before it's > returned to the Hive framework. > > Then in Hive language, we can say: > > add jar my.jar; > CREATE TABLE mytable (key STRING, value STRING) > STORED AS INPUTFORMAT 'com.my.inputformat' OUTPUTFORMAT > 'org.apache.hadoop.io.SequenceFileOutputFormat'; > > See http://issues.apache.org/jira/browse/HIVE-177 > > You may also want to write your own OutputFileFormat which split the row > passed in into key and value and store them separately. But that is not > needed unless you want to use Hive to INSERT to this table (LOAD does NOT > need this). > > Zheng > > On Tue, Oct 6, 2009 at 6:19 PM, Bobby Rullo <[email protected]> wrote: > >> Hi there, >> >> It seems that Hive ignores the key when reading hadoop sequence files. Is >> there a way to make it not do that? >> >> If there's no way to do this with a 'stock' Hive build, could someone >> point me to the code that reads sequence files in Hive and I can have a go >> at it? It's sort of a show-stopper for us - we have a bunch of large files >> where the key field is important. >> >> Bobby >> > > > > -- > Yours, > Zheng > > > -- Yours, Zheng
