Ashish wrote: > Hey everyone.... > > > > What would be a good way to read inlinks (anchor text associated with > inlinks, actually), for each crawled page ? > > Is there some way to make this information available at fetch-time ? Any > pointers to sample code would be a huge help ! I'm using Nutch 0.8.1. > Thanks.... >
Pages contain only outlinks, so until you build the inverted relationship (using invertlinks) it won't be available. That's what linkdb is for. Why do you need this during fetching? You could modify the fetcher to access linkdb during fetching, or you could modify Generator to include information from linkdb when it generates new segments, whichever way is more suitable to your requirements. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
