[ 
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1934:
----------------------------------------
    Attachment: NUTCH-1934.patch

Patch for trunk.
Some early observations:
 * Existing Nutch tests pass locally
 * The way I have approach this is to make explicit casts to existing 
fetchQueue objects as **FetcherThread** is now an independent Class. In my test 
crawling, i have come across no ClassCastExceptions (as of yet!!!) however this 
is something we should remain vigilant about e.g.
{code}
((FetchItemQueues) fetchQueues).getTotalSize()
{code}
 * We now have pretty verbose constructor for **FetcherThread** (hey whats new 
it's the Nutch Fetcher.java), however this is pretty verbose even by Nutch 
Fetcher.java standards.
{code}
  public FetcherThread(Configuration conf, AtomicInteger activeThreads, 
FetchItemQueues fetchQueues, 
      QueueFeeder feeder, AtomicInteger spinWaiting, AtomicLong 
lastRequestStart, Reporter reporter,
      AtomicInteger errors, String segmentName, boolean parsing, 
OutputCollector<Text, NutchWritable> output,
      boolean storingContent, AtomicInteger pages, AtomicLong bytes) {
{code}

Some initial comments would be very helpful. Thanks

> Refactor Fetcher in trunk
> -------------------------
>
>                 Key: NUTCH-1934
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1934
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.10
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>         Attachments: NUTCH-1934.patch
>
>
> Put simply 
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
>  is too big.
> This is kinda strange as the size of this file is unique (I think) from every 
> other class within Nutch. The others are reasonably well modularized and 
> split into constituent classes which make sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to