Hi I wounder how I can extract the info that the language-identifier plugin produces. If I was allowed to wish I would like the info to come when I dump the data from the segments with the following command
bin/nutch readseg -dump crawl/segments/... output -nofetch -nogenerate -noparse -noparsedata -parsetext something like URL:: http://... Language:: en Is it possible to get nutch to output it like, if not, is it possible to get the info in some other way? As it is now I cant seem to find the info anywhere. I've done the invert links and index step, but I have no clue on where my language info is stored so I can extract it. -- View this message in context: http://www.nabble.com/Getting-the-language-identifier-info-tp23813763p23813763.html Sent from the Nutch - User mailing list archive at Nabble.com.
