[ https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1934: ---------------------------------------- Attachment: NUTCH-1934.patch Patch for trunk. Some early observations: * Existing Nutch tests pass locally * The way I have approach this is to make explicit casts to existing fetchQueue objects as **FetcherThread** is now an independent Class. In my test crawling, i have come across no ClassCastExceptions (as of yet!!!) however this is something we should remain vigilant about e.g. {code} ((FetchItemQueues) fetchQueues).getTotalSize() {code} * We now have pretty verbose constructor for **FetcherThread** (hey whats new it's the Nutch Fetcher.java), however this is pretty verbose even by Nutch Fetcher.java standards. {code} public FetcherThread(Configuration conf, AtomicInteger activeThreads, FetchItemQueues fetchQueues, QueueFeeder feeder, AtomicInteger spinWaiting, AtomicLong lastRequestStart, Reporter reporter, AtomicInteger errors, String segmentName, boolean parsing, OutputCollector<Text, NutchWritable> output, boolean storingContent, AtomicInteger pages, AtomicLong bytes) { {code} Some initial comments would be very helpful. Thanks > Refactor Fetcher in trunk > ------------------------- > > Key: NUTCH-1934 > URL: https://issues.apache.org/jira/browse/NUTCH-1934 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 1.10 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Attachments: NUTCH-1934.patch > > > Put simply > [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java] > is too big. > This is kinda strange as the size of this file is unique (I think) from every > other class within Nutch. The others are reasonably well modularized and > split into constituent classes which make sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)