i have done something like that. the problem is the parsers filter the outlinks according to the configuration of the url filters. i have had to delete the *parse* directories, reparse the segments and before parsing reset the url filters, that no outlink is filtered out.
kevin chen schrieb: > You can dump segment info to a directory, let's say "tmps", > $NUTCH_HOME/bin/nutch readseg -dump $segment tmps -nocontent > > Then, go to the directory, you should see a file "dump" > grep outlink: dump | cut -f5 -d" " > outlinks > > On Fri, 2009-07-17 at 18:43 +0200, reinhard schwab wrote: > >> is any tool available to dump all outlinks (filtered outlinks included)? >> (i know the tools to dump crawldb, linkdb and segments) >> or do i have to implement such a tool and if, how? >> i want to know them to adapt/manage the url filters. >> parse the contents with urlfilters disabled? >> >> reinhard >> > > >
