[ 
https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471780
 ] 

Chris A. Mattmann commented on NUTCH-443:
-----------------------------------------

Nutch Newbie,

   What exactly do you mean when you mention Apache politics? Feedparser wasn't 
selected because it was an Apache sub-project. In fact, that's as far from the 
truth as possible. I selected feedparser at the time (in May 2005 or so), 
because it was the only one of the three RSS reading APIs (informa, feedparser 
and rome) that I could figure out. The time that it took me to just understand 
rome, and informa was far greater than the time that it took me to write the 
entire RSS parser using feedparser.

   That said, things may have changed in the past year and a half. Perhaps Rome 
provides an easier API than feedparser now. Perhaps informa is faster. I'm not 
exactly sure what the answer to these and other questions on this subject are. 
However, before anything is said about feedparser, it's only fair that the 
folks who wrote it get to chime in. For that matter, it would probably be a 
good idea to contact Kevin Burton, the lead developer of the 
commons-feedparser, and ask him about its relationship to rome, and other apis 
such as Stax, or informa even...

Cheers,
  Chris


> allow parsers to return multiple Parse object, this will speed up the rss 
> parser
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-443
>                 URL: https://issues.apache.org/jira/browse/NUTCH-443
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: Renaud Richardet
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: NUTCH-443-draft-v1.patch, NUTCH-443-draft-v2.patch, 
> parse-map-core-draft-v1.patch, parse-map-core-untested.patch, parsers.diff
>
>
> allow Parser#parse to return a Map<String,Parse>. This way, the RSS parser 
> can return multiple parse objects, that will all be indexed separately. 
> Advantage: no need to fetch all feed-items separately.
> see the discussion at 
> http://www.nabble.com/RSS-fecter-and-index-individul-how-can-i-realize-this-function-tf3146271.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to