Trying to Add an new NutchDoc from plugin
Hi there, Im new to the forum and nutch as well... I wrote a plugin to nutch that implements the IndexingFilter... Now i want to add a new Document to the index from the plugin (split the current doc) I tryed testing it from something like this NutchIndexWriter[] Writers = NutchIndexWriterFactory.getNutchIndexWriters(getConf()); Writers[0].write(doc); the doc is the doc i get in the method not something new i created.(just for testing) And i get the error it doesn't make sense to have a field that is neither indexed nor stored Any suggestions? -- View this message in context: http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598076.html Sent from the Nutch - Dev mailing list archive at Nabble.com.
Re: Trying to Add an new NutchDoc from plugin
Maybe I can try...debugging an Indexing plugin is kinda tricky. can you attach the req files and folders and tell me exactly what procedure to follow? Also any settings to be modified On Tue, Feb 16, 2010 at 12:10 AM, UDd dekelu...@gmail.com wrote: Hi there, Im new to the forum and nutch as well... I wrote a plugin to nutch that implements the IndexingFilter... Now i want to add a new Document to the index from the plugin (split the current doc) I tryed testing it from something like this NutchIndexWriter[] Writers = NutchIndexWriterFactory.getNutchIndexWriters(getConf()); Writers[0].write(doc); the doc is the doc i get in the method not something new i created.(just for testing) And i get the error it doesn't make sense to have a field that is neither indexed nor stored Any suggestions? -- View this message in context: http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598076.html Sent from the Nutch - Dev mailing list archive at Nabble.com.
Re: Trying to Add an new NutchDoc from plugin
Thx for the quick response, Well i wrote a very simple plugin that tryes to the the same doc twice and if there is and error then put it in the orniginal doc custom field: public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException { // filter out if url contains archive, label or feeds LOGGER.debug(Found Url: + new String(url.getBytes())); NutchIndexWriter[] Writers = NutchIndexWriterFactory.getNutchIndexWriters(getConf()); //doc.add(js, String.valueOf(Writers.length)); try { Writers[0].write(doc); } catch (Exception e) { // TODO Auto-generated catch block LOGGER.debug(Error adding Doc + e.getMessage()); doc.add(js, e.getMessage()); } doc.add(js, AfterTest); //return doc; return doc; } and after the nutch run i just look at the index with lukeall-1.0.0 , I added the compiled plugin jar if you can try to debug it... or if you can tell me how to debug it will be great (I have the nutch working from ecplise). http://old.nabble.com/file/p27598879/myplugins.rar myplugins.rar -- View this message in context: http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598879.html Sent from the Nutch - Dev mailing list archive at Nabble.com.