Hello all,

I have installed XML Parser plugin to Nutch 0.9 and it is working correctly.
Running the plugin from commandline, it displays both parsed text and parsed
data. However, the parser did not managed to extract any outlinks. 
The outlinks is extracted from the parsed text using the following code,
basically extracting the link from the text extracted from the xml.

        Outlink[] outlinks = OutlinkExtractor.getOutlinks(text, getConf());

>From my test, the parsed text displays with a few links and all of them are
separated by spaces. I have verified that the variable String text contains
the extracted contents. Since these links have a different domain, I have
made sure db.ignore.external.links in nutch config is set to false.

I cannot see anything else that will prevent this code from extracting the
links. Does anyone have any idea or have managed to resolve this issue?

Ta. 

Reply via email to