Linkdb contains all the information about the web graph. After fetching the segments, you should run bin/nutch invertlinks to build the linkdb, which is a MapFile. The entries in the MapFile are <key,value> pairs, where keys are Text objects(containing urls) and values are Inlinks objects. In fact FYI, linkdb can easily be "processed" by map-reduce jobs.
DS jha wrote: > Hi - > > I want to read the map of incoming and outgoing links of a document > and use that for some analysis purpose. Does nutch store link graph > once fetch/parse/index is complete? > > After browsing thru the code, it does seem that during document > parsing and storing, incoming and outgoing links are getting passed > around between objects but is that information available once the > process is complete - by reading segment or index information? > > Thanks, > Jha > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
