Re: [Nutch-general] getting document link graph

Enis Soztutar Tue, 24 Jul 2007 23:22:00 -0700

Linkdb contains all the information about the web graph. After fetching 
the segments, you should run bin/nutch invertlinks to build the linkdb, 
which is a MapFile. The entries in the MapFile are <key,value> pairs, 
where keys are Text objects(containing urls) and values are Inlinks 
objects. In fact FYI, linkdb can easily be "processed" by map-reduce jobs.


DS jha wrote:
> Hi -
>
> I want to read the map of incoming and outgoing links of a document
> and use that for some analysis purpose.  Does nutch store link graph
> once fetch/parse/index is complete?
>
> After browsing thru the code, it does seem that during document
> parsing and storing, incoming and outgoing links are getting passed
> around between objects but is that information available once the
> process is complete - by reading segment or index information?
>
> Thanks,
> Jha
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] getting document link graph

Reply via email to