[
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471952
]
nutch.newbie commented on NUTCH-444:
------------------------------------
Renaud :
Thanks for moving the discussion here. First to answer your question yes its
based on mime type detectation problem. The goal of the trial was to see if one
could make just a feed search site i.e just feeds but I didn't succeed. I will
give it a go over the weekend.
Dogcan:
Yes, one could just replace the feedparser with rome or stax and submit back
here or use it internally. My discussion point was to see how others see about
it and maybe there are others who have ran into problem and their experience.
As Gal pointed out about rome (At least it is being further developed) and stax
and you pointed out that you are doing something with rome.. I just wanted to
know what other think and their experience thats all. Yes you are correct i
posted it in the wrong forum nutch-443. But Nutch-443 started off as someone
having trouble with RSS and it is important in my view to discuss the issue as
we are using (feedparser) which is not going to solve the original issue if one
tries to create just a RSS search engine. Nutch -443 would have not surfaced in
the first place.
I am looking forward to that day when I can use nutch just to do rss feed
search engine so Dogcan I am very interested in your rome impl. maybe you can
post the code here so that i can participate.
> Possibly use a different library to parse RSS feed for improved performance
> and compatibility
> ---------------------------------------------------------------------------------------------
>
> Key: NUTCH-444
> URL: https://issues.apache.org/jira/browse/NUTCH-444
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 0.9.0
> Reporter: Renaud Richardet
> Priority: Minor
> Fix For: 0.9.0
>
>
> As discussed by Nutch Newbie, Gal, and Chris on NUTCH-443, the current
> library (feedparser) has the following issues:
> - OutOfMemory when parsing > 100k feeds, since it has to convert the feed to
> jdom first
> - no support for Atom 1.0
> - there has been no development in the last year
> Alternatives are:
> - Rome
> - Informa
> - custom implementation based on Stax
> - ??
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers