On Fri, Jul 24, 2009 at 17:21, Saurabh Suman<[email protected]> wrote: > > Hi > I am usinh Nutch-1.0. I want to add field in parseData parseMeta. > In org.apache.nutch.parse.html.HtmlParser two fields are already added in > original code. > metadata.set(Metadata.ORIGINAL_CHAR_ENCODING, > encoding); > metadata.set(Metadata.CHAR_ENCODING_FOR_CONVERSION, > encoding); > i added third field > metadata.set(Metadata.AGE, "23"); > > in org.apache.nutch.indexer.IndexerMapReduce in public void reduce(Text key, > Iterator<NutchWritable> values, > OutputCollector<Text, NutchDocument> output, Reporter > reporter) > throws IOException method > two fields are being added in NutchDocument. > > NutchDocument doc = new NutchDocument(); > final Metadata metadata = parseData.getContentMeta(); > > // add segment, used to map from merged index back to segment files > doc.add("segment", metadata.get(Nutch.SEGMENT_NAME_KEY)); > > // add digest, used by dedup > doc.add("digest", metadata.get(Nutch.SIGNATURE_KEY)); > > > i added third field what i have set in HtmlParser like this. > doc.add("age", parseData.getParseMeta().get("age")); > > By doing so , at indexing level i am getting exception as follow- > > LinkDb: adding segment: > file:/home/ithurs/nutch-1.0/crawl/segments/20090724193527 > LinkDb: done > Indexer: starting > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:72) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:152) > > > please tell me > (i)How to remove this exception? > (ii)how can i add new field in ParseData parseMeta?
You are probably adding your field to parseMeta so trying to get it from contentMeta fails. Just do a parseData.getParseMeta in indexer and it may work. > -- > View this message in context: > http://www.nabble.com/IO-exception-while-adding-field-in-Parsedata-parsemeta.-tp24645429p24645429.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- Doğacan Güney
