[
https://issues.apache.org/jira/browse/NUTCH-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053183#comment-18053183
]
ASF GitHub Bot commented on NUTCH-3064:
---------------------------------------
lewismc commented on PR #825:
URL: https://github.com/apache/nutch/pull/825#issuecomment-3776075746
Most recent updates address a field duplication issue which could result
when chaining multiple GeoIP databases.
Here's the example of running `indexchecker`
```
./runtime/local/bin/nutch indexchecker https://nutch.apache.org
...
accuracyRadius : 1000
isPublicProxy : false
countryIsoCode : US
cityNetworkAddress : 151.101.0.0/21
countryNetworkAddress : 151.101.0.0/21
countryGeoNameId : 6252001
autonomousSystemNumber : 54113
title : Apache Nutch™
content : Apache Nutch™
Apache Nutch™
Apache Nutch™
Community
Development
Docs
Download
News
The Apache Softwa
isHostingProvider : false
isTorExitNode : false
digest : 09f55cdd88bb9a668023f96143ec9605
host : nutch.apache.org
id : https://nutch.apache.org
isAnycast : false
continentCode : NA
isLegitimateProxy : false
ip : 151.101.2.132
timeZone : America/Chicago
isAnonymousVpn : false
isResidentialProxy : false
autonomousSystemOrganization : FASTLY
url : https://nutch.apache.org
isAnonymous : false
tstamp : Tue Jan 20 20:21:34 PST 2026
latLon : 37.751,-97.822
countryInEuropeanUnion : false
continentGeoNameId : 6255149
countryName : United States
continentName : North America
asnNetworkAddress : 151.101.0.0/16
```
Required configuration
```
<property>
<name>store.ip.address</name>
<value>true</value>
<description>Enables us to capture the specific IP address
(InetSocketAddress) of the host which we connect to via the given
protocol. Currently supported by: protocol-ftp, protocol-http,
protocol-okhttp, protocol-htmlunit, protocol-selenium. Note that
the IP address is required by the plugin index-geoip and when
writing WARC files.
</description>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor|geoip)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
By default Nutch includes plugins to crawl HTML and various other
document formats via HTTP/HTTPS and indexing the crawled content
into Solr. More plugins are available to support more indexing
backends, to fetch ftp:// and file:// URLs, for focused crawling,
and many other use cases.
</description>
</property>
<property>
<name>index.geoip.db.asn</name>
<value>GeoLite2-ASN.mmdb</value>
<description>
GeoIP2/GeoLite2 ASN database file (MMDB format).
Provides autonomous system number and organization information.
</description>
</property>
<property>
<name>index.geoip.db.city</name>
<value>GeoLite2-City.mmdb</value>
<description>
GeoIP2/GeoLite2 City database file (MMDB format).
Provides city, subdivision, country, continent, and location data.
</description>
</property>
<property>
<name>index.geoip.db.country</name>
<value>GeoLite2-Country.mmdb</value>
<description>
GeoIP2/GeoLite2 Country database file (MMDB format).
Provides country, continent, and represented country information.
This is a lighter-weight alternative to the City database when only
country-level information is needed.
</description>
</property>
```
> Upgrade index-geoip to GeoIP2 5.0.2
> -----------------------------------
>
> Key: NUTCH-3064
> URL: https://issues.apache.org/jira/browse/NUTCH-3064
> Project: Nutch
> Issue Type: Task
> Components: index-geoip, plugin
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Minor
> Fix For: 1.22
>
>
> A recent mailing list question about the index-geoip plugin prompted me to
> take a look at it and perform any necessary maintenance.
> As of writing, the latest dependency can be found at
> [https://central.sonatype.com/artifact/com.maxmind.geoip2/geoip2] at v4.2.0.
> At a minimum this ticket will accomplish the dependency update. I'll also
> have a look at documentation and maybe provide some unit tests... which I
> neglected to furnish last time around.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)