>> 2. It sounds like a pretty fundamental API shift in Nutch, to support a
>> single type of content, RSS. Even if there are more content types that
>> follow this model, as Doug and Renaud both pointed out, there aren't a
>> multitude of them (perhaps archive files, but can you think of any
others)?

> Also true.  On the other hand, Nutch provides 98% of an RSS search 
> engine.  It'd be a shame to have to re-invent everything else and it 
> would be great if Nutch could evolve to support RSS well.
>
> Could image search might also benefit from this?  One could generate a 
> Parse for each image on a page whose text was from the page.  Product 
> search too, perhaps.

Another application could be splitting certain enterprise documents up,
either based on passage retrieval algorithms or simply based on the table of
content entries.  For example, a long contract or user guide could be split
up into separate searchable documents.

Best regards,
Alan
_________________________
Alan Tanaman
iDNA Solutions
http://blog.idna-solutions.com



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to