[ 
https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1360:
----------------------------------------

    Attachment: NUTCH-1360v3.patch

This patch looks like a lot of change but it is much simpler than the last ones 
I uploaded. It is for 2.x HEAD

* The configuration property has been changed to fetcher.store.ip.address
* The code is now built in to the FetcherReducer class. This reduces the 
requirement to obtain the InetAddress more than once as per the previous 
patches and per Ferdy's comments to put the _ip_ property and value into the 
WebPage metadata field. 
* I wrote a simple function e.g. getIp which takes an URL and WebPage as key 
and value to assignan _ip_ (if the value is true) to a WebPage regardless of 
which queueMode we use within the Fetching of WebPages/Documents.
* This patch should also remove the requirement to edit plugins for different 
protocol's as all protocol's will execute the code.

If someone could apply this patch and test it out it would be excellent.
Thank you
Lewis

> Suport the storing of IP address connected to when web crawling
> ---------------------------------------------------------------
>
>                 Key: NUTCH-1360
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1360
>             Project: Nutch
>          Issue Type: New Feature
>          Components: protocol
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>             Fix For: 1.8
>
>         Attachments: NUTCH-1360-nutchgora-v2.patch, 
> NUTCH-1360-nutchgora.patch, NUTCH-1360-trunk.patch, NUTCH-1360v3.patch
>
>
> Simple issue enabling us to capture the specific IP address of the host which 
> we connect to to fetch a page.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to