[ 
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doğacan Güney updated NUTCH-443:
--------------------------------

    Attachment: redirect_and_index.patch

Patch for the problem. 

Now, if Fetcher gets a null content, instead of pushing an empty content, it 
filters null content. 

It may change the semantics very slightly, but I don't think that it will be a 
problem. Before this patch, Fetcher creates an empty content than passes score 
from datum to content. Parse then passes it from content to parse data so that 
it can distribute the score to outlinks. But empty pages don't have outlinks 
anyway and they should not be indexed (so an adjust datum has no purpose).

Sorry about missing this bug in the first place, but, man, this is a subtle one.


> allow parsers to return multiple Parse object, this will speed up the rss 
> parser
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-443
>                 URL: https://issues.apache.org/jira/browse/NUTCH-443
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: Renaud Richardet
>         Assigned To: Andrzej Bialecki 
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-443-draft-v1.patch, NUTCH-443-draft-v2.patch, 
> NUTCH-443-draft-v3.patch, NUTCH-443-draft-v4.patch, NUTCH-443-draft-v5.patch, 
> NUTCH-443-draft-v6.patch, NUTCH-443-draft-v7.patch, 
> NUTCH-443.022507.patch.txt, NUTCH-443.02282007-v2.patch, 
> NUTCH-443.02282007.patch, NUTCH-443.08052007.patch, 
> parse-map-core-draft-v1.patch, parse-map-core-untested.patch, parsers.diff, 
> redirect_and_index.patch
>
>
> allow Parser#parse to return a Map<String,Parse>. This way, the RSS parser 
> can return multiple parse objects, that will all be indexed separately. 
> Advantage: no need to fetch all feed-items separately.
> see the discussion at 
> http://www.nabble.com/RSS-fecter-and-index-individul-how-can-i-realize-this-function-tf3146271.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to