Hi there,
On 1/30/07 7:00 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> Chris,
>
> I saw your name associated with the rss parser in nutch. My understanding is
> that nutch is using feedparser. I had two questions:
>
> 1. Have you looked at vtd as an rss parser?
I haven't in fact; what are its benefits over those of commons-feedparser?
> 2. Any view on asynchronous communication as the underlying protocol? I do
> not believe that feedparser uses that at this point.
I'm not sure exactly what asynchronous communication when parsing rss feeds
affords you: what type of communications are you talking about above? Nutch
handles the communications layer for fetching content using a pluggable,
Protocol-based model. The only feature that Nutch's rss parser uses from the
underlying feedparser library is its object model and callback framework for
parsing RSS/Atom/Feed XML documents. When you mention asynchronous above,
are you talking about the protocol for fetching the different RSS documents?
Thanks!
Cheers,
Chris
>
> Thanks
>
>
> -----Original Message-----
> From: Chris Mattmann <[EMAIL PROTECTED]>
> Date: Tue, 30 Jan 2007 18:16:44
> To:<[email protected]>
> Subject: Re: RSS-fecter and index individul-how can i realize this function
>
> Hi there,
>
> I could most likely be of assistance, if you gave me some more information.
> For instance: I'm wondering if the use case you describe below is already
> supported by the current RSS parse plugin?
>
> The current RSS parser, parse-rss, does in fact index individual items that
> are pointed to by an RSS document. The items are added as Nutch Outlinks,
> and added to the overall queue of URLs to fetch. Doesn't this satisfy what
> you mention below? Or am I missing something?
>
> Cheers,
> Chris
>
>
>
> On 1/30/07 6:01 PM, "kauu" <[EMAIL PROTECTED]> wrote:
>
>> Hi folks :
>>
>> What’s I want to do is to separate a rss file into several pages .
>>
>> Just as what has been discussed before. I want fetch a rss page and index
>> it as different documents in the index. So the searcher can search the
>> Item’s info as a individual hit.
>>
>> What’s my opinion create a protocol for fetch the rss page and store it as
>> several one which just contain one ITEM tag .but the unique key is the url ,
>> so how can I store them with the ITEM’s link tag as the unique key for a
>> document.
>>
>> So my question is how to realize this function in nutch-.0.8.x.
>>
>> I’ve check the code of the plug-in protocol-http’s code ,but I can’t
>> find the code where to store a page to a document. I want to separate the
>> rss page to several ones before storing it as a document but several ones.
>>
>> So any one can give me some hints?
>>
>> Any reply will be appreciated !
>>
>>
>>
>>
>>
>> ITEM’s structure
>>
>> <item>
>>
>>
>> <title>欧洲暴风雪后发制人 致航班延误交通混乱(组图)</title>
>>
>>
>> <description>暴风雪横扫欧洲,导致多次航班延误 1月24日,几架民航客机在德
>> 国斯图加特机场内等待去除机身上冰雪。1月24日,工作人员在德国南部的慕尼黑机场
>> 清扫飞机跑道上的积雪。 据报道,迟来的暴风雪连续两天横扫中...
>>
>>
>>
>> </description>
>>
>>
>> <link>http://news.sohu.com/20070125
>> <http://news.sohu.com/20070125/n247833568.shtml> /n247833568.shtml</
>> link>
>>
>>
>> <category>搜狐焦点图新闻</category>
>>
>>
>> <author>[EMAIL PROTECTED]
>> </author>
>>
>>
>> <pubDate>Thu, 25 Jan 2007 11:29:11 +0800</pubDate>
>>
>>
>> <comments
>>> http://comment.news.sohu.com
>> <http://comment.news.sohu.com/comment/topic.jsp?id=247833847>
>> /comment/topic.jsp?id=247833847</comments>
>>
>>
>> </item
>>
>>
>>
>
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers