[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1360: ---------------------------------------- Attachment: NUTCH-1360v3.patch This patch looks like a lot of change but it is much simpler than the last ones I uploaded. It is for 2.x HEAD * The configuration property has been changed to fetcher.store.ip.address * The code is now built in to the FetcherReducer class. This reduces the requirement to obtain the InetAddress more than once as per the previous patches and per Ferdy's comments to put the _ip_ property and value into the WebPage metadata field. * I wrote a simple function e.g. getIp which takes an URL and WebPage as key and value to assignan _ip_ (if the value is true) to a WebPage regardless of which queueMode we use within the Fetching of WebPages/Documents. * This patch should also remove the requirement to edit plugins for different protocol's as all protocol's will execute the code. If someone could apply this patch and test it out it would be excellent. Thank you Lewis > Suport the storing of IP address connected to when web crawling > --------------------------------------------------------------- > > Key: NUTCH-1360 > URL: https://issues.apache.org/jira/browse/NUTCH-1360 > Project: Nutch > Issue Type: New Feature > Components: protocol > Affects Versions: nutchgora, 1.5 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Priority: Minor > Fix For: 1.8 > > Attachments: NUTCH-1360-nutchgora-v2.patch, > NUTCH-1360-nutchgora.patch, NUTCH-1360-trunk.patch, NUTCH-1360v3.patch > > > Simple issue enabling us to capture the specific IP address of the host which > we connect to to fetch a page. -- This message was sent by Atlassian JIRA (v6.1#6144)