[ https://issues.apache.org/jira/browse/NUTCH-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-2387: ----------------------------------- Fix Version/s: (was: 1.14) 1.15 > Nutch should not index document with "noindex" meta > --------------------------------------------------- > > Key: NUTCH-2387 > URL: https://issues.apache.org/jira/browse/NUTCH-2387 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.13 > Environment: Linux mint 18, > Reporter: Eyeris Rodriguez Rueda > Labels: index, meta, robots, > Fix For: 1.15 > > > I'm using nutch 1.12 in local mode and solr 4.10.3. > For some reason i have detected that nutch index document with "noindex" > robots meta. > I have use nutch script for a complete cycle: > bin/crawl -i urls/ crawl/ -2 > with this url: > https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/ > After various testing the problem persist and aproximately 200 document with > this robots meta are indexed incorrectly. > I have read the method configure in IndexerMapReduce.java class and it has a > line for that property but for some reason it is not doing appropiately. > this.deleteRobotsNoIndex = > job.getBoolean(INDEXER_DELETE_ROBOTS_NOINDEX,false); (line 97) -- This message was sent by Atlassian JIRA (v6.4.14#64029)