[ 
https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570050#comment-13570050
 ] 

Tejas Patil commented on NUTCH-1521:
------------------------------------

Hi Lufeng,
In 2.x, some classes are given different names as compared to the legacy 1.x 
classes. The "CrawlDbFilter" class in 1.x is replaced by "DbUpdateMapper" in 
2.x. After a quick peek into it, I found that your change need not be ported 
there as DbUpdateMapper doesnt seem to perform url normalization. (Maybe its 
already done by fetcher so that update wont have to bother about it and use 
those urls right away.)

@Dev: Can anyone confirm about this ?
                
> CrawlDbFilter pass null url to urlNormailzers
> ---------------------------------------------
>
>                 Key: NUTCH-1521
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1521
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.7
>            Reporter: lufeng
>            Assignee: lufeng
>            Priority: Trivial
>             Fix For: 1.7, 2.2
>
>         Attachments: CrawlDbFilter_v1.patch, NUTCH-1521-trunk.patch, 
> TestCrawlDbFilter.java
>
>
> urlNormalizers will get null url if we set CRAWLDB_PURGE_404, and it will 
> throw NullPointerException. and the WARN Log will output something like this 
> "Skipping null NullPointerException".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to