Consolidate code for Fetcher and Fetcher2
-----------------------------------------

                 Key: NUTCH-669
                 URL: https://issues.apache.org/jira/browse/NUTCH-669
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 0.9.0
            Reporter: Todd Lipcon
            Priority: Minor


I'd like to consolidate a lot of the common code between Fetcher and 
Fetcher2.java.

It seems to me like there are the following differences:
  - Fetcher relies on the Protocol to obey robots.txt and crawl delay settings 
whereas Fetcher2 implements them itself
  - Fetcher2 uses a different queueing model (queue per crawl host) to 
accomplish the per-host limiting without making the Protocol do it.

I've begun work on this but want to check with people on the following:

- What reason is there for Fetcher existing at all since Fetcher2 seems to be a 
superset of functionality?

- Is it on the road map to remove the robots/delay logic from the Http protocol 
and make Fetcher2's delegation of duties the standard?

- Any other improvements wanted for Fetcher while I am in and around the code?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to