i have done something like that.
the problem is the parsers filter the outlinks according to the
configuration of the url filters.
i have had to delete the *parse* directories, reparse the segments and
before parsing reset the url filters,
that no outlink is filtered out.

kevin chen schrieb:
> You can dump segment info to a directory, let's say "tmps",
> $NUTCH_HOME/bin/nutch readseg -dump $segment tmps -nocontent
>
> Then, go to the directory, you should see a file "dump"
> grep outlink: dump | cut -f5 -d" " > outlinks
>
> On Fri, 2009-07-17 at 18:43 +0200, reinhard schwab wrote:
>   
>> is any tool available to dump all outlinks (filtered outlinks included)?
>> (i know the tools to dump crawldb, linkdb and segments)
>> or do i have to implement such a tool and if, how?
>> i want to know them to adapt/manage the url filters.
>> parse the contents with urlfilters disabled?
>>
>> reinhard
>>     
>
>
>   

Reply via email to