Dennis,
  I am in the same dilemma as you are.
  Here are my thoughts.
   
  1. I am planning to write the Plugin to do it where in the plugin can be 
modified based on the site map and levels
  2. The Fetcher itself can be modified. But again code merging with latest 
contributons fixes and enhancement from community will be very hard.
  3. Other way is to write a prefetcher which will fetch all the urls from a 
site, populate the file. Then the Nutch Crawler can be triggered to crawl the 
prefetched urls. Within the prefetched url pages, any unnecessary URLs not to 
be crawled, will have to be ignored. I am still trying a way to do this.
   
  Please share your thoughts..
  Thanks
   
  

Dennis Kubes <[EMAIL PROTECTED]> wrote:
  I am trying to modify Nutch to add level to the website parse data. 
What I mean by this is suppose you start parsing a website at its 
homepage that would be level one. Any links in the same site from the 
homepage would be level two, links from those pages would be level three 
and so on. I am only counting links in the same site.

How would I go about modifying Nutch to handle this? I was thinking 
that I would have to modify Fetcher to do this, adding the level to the 
parse metadata. What I am not gettings is how would I get the link 
level initially? I was thinking I would have to modify something in the 
generator but didn't know what.

Dennis



  Sudhi Seshachala
  http://sudhilogs.blogspot.com/
   


                
---------------------------------
Blab-away for as little as 1ยข/min. Make  PC-to-Phone Calls using Yahoo! 
Messenger with Voice.

Reply via email to