Re: hi all:
吴志敏 wrote: I want to read the stored segments to a xml file, but when I read the SegmentReader.java, I find that it ‘s not a simple thing. it’s a hadoop’s job to dump a text file. I just want to dump the segments’ some content witch I interested to a xml. So some one can tell me hwo to do this, any reply will be appreciated! Segment data is basically just a bunch of files containing key-value pairs, so there's always the possibility of reading the data directly with help of: http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/io/SequenceFile.Reader.html To see what kind of object to expect you can just examine the beginning of file where there is some metadata stored - like class used for key and class used for value (that metadata is also available from methods of SequenceFile.Reader class). For example to read the contents of Content data from a segment one could use something like: SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf); Text url = new Text(); //key Content content = new Content();//value while (reader.next(url, content)) { //now just use url and content the way you like } -- Sami Siren
Re: hi all:
thx very much ,i'll try it On 12/9/06, Sami Siren [EMAIL PROTECTED] wrote: 吴志敏 wrote: I want to read the stored segments to a xml file, but when I read the SegmentReader.java, I find that it 's not a simple thing. it's a hadoop's job to dump a text file. I just want to dump the segments' some content witch I interested to a xml. So some one can tell me hwo to do this, any reply will be appreciated! Segment data is basically just a bunch of files containing key-value pairs, so there's always the possibility of reading the data directly with help of: http://lucene.apache.org/hadoop/docs/api/org/apache/hadoop/io/SequenceFile.Reader.html To see what kind of object to expect you can just examine the beginning of file where there is some metadata stored - like class used for key and class used for value (that metadata is also available from methods of SequenceFile.Reader class). For example to read the contents of Content data from a segment one could use something like: SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf); Text url = new Text(); //key Content content = new Content();//value while (reader.next(url, content)) { //now just use url and content the way you like } -- Sami Siren -- www.babatu.com
hi all:
I want to read the stored segments to a xml file, but when I read the SegmentReader.java, I find that it ‘s not a simple thing. it’s a hadoop’s job to dump a text file. I just want to dump the segments’ some content witch I interested to a xml. So some one can tell me hwo to do this, any reply will be appreciated!