[
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471703
]
nutch.newbie commented on NUTCH-443:
------------------------------------
I tried the patch with about 100 rss feed. Some problems
1. atom+xml content type gives trouble .. I am not sure if commons feedparser
supports atom 1.0
2. In my case sometime the RSS URL doesn't end with .xml or .rss so some of the
feeds got indexed like the way current nutch trunk do i.e as html.
Just some early feedback.. I will do some more testing this weekend. One
question I do have is that - it still doesn't solve the problem of index just
the RSS feeds.. even if I take away all my other parsers .. I still need HTML
parser and index-basic.. maybe its time for index-rss? no?
Cheers
> allow parsers to return multiple Parse object, this will speed up the rss
> parser
> --------------------------------------------------------------------------------
>
> Key: NUTCH-443
> URL: https://issues.apache.org/jira/browse/NUTCH-443
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher
> Affects Versions: 0.9.0
> Reporter: Renaud Richardet
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: NUTCH-443-draft-v1.patch, NUTCH-443-draft-v2.patch,
> parse-map-core-draft-v1.patch, parse-map-core-untested.patch, parsers.diff
>
>
> allow Parser#parse to return a Map<String,Parse>. This way, the RSS parser
> can return multiple parse objects, that will all be indexed separately.
> Advantage: no need to fetch all feed-items separately.
> see the discussion at
> http://www.nabble.com/RSS-fecter-and-index-individul-how-can-i-realize-this-function-tf3146271.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers