[ https://issues.apache.org/jira/browse/NUTCH-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573114#comment-14573114 ]
Luis Lopez commented on NUTCH-2034: ----------------------------------- Yes, we can use a general counter and say that or we could even be more specific and count by filter. > CrawlDB filtered documents counter. > ----------------------------------- > > Key: NUTCH-2034 > URL: https://issues.apache.org/jira/browse/NUTCH-2034 > Project: Nutch > Issue Type: Improvement > Components: crawldb > Affects Versions: 1.10 > Reporter: Luis Lopez > Priority: Minor > Labels: counters, crawldb, filter, info, regex > Fix For: 1.11 > > > When we are doing big crawls we would like to know how many of the URLs are > being discarded by the regex filters, this is only presented in the Inject > class: > Injector: Total number of urls rejected by filters: 0 > It will be nice to have a counter in the CrawlDB class so we know in every > round how many were discarded by our filters: > CrawlDb update: Total number of URLs filtered by regex filters: 31415 -- This message was sent by Atlassian JIRA (v6.3.4#6332)