Hi,

The CGI use case could be treated as a special case of integrating Nutch with another largely incompatible environment, in a loosely-coupled system. A popular way to do this would be to use an XML-based API from a CGI script.

This is yet another case that speaks in favor of adding an "out-of-the-box" XML API to Nutch. There are only a couple of ways to do it that make sense, IMHO:

* REST - HTTP GET or POST request, with query parameters contained in GET or POST parameters. An XML data document with results is a response. Lightweight, easy to implement and create, and relatively easy to consume. Lack of high-level API-s in most programming languages could be a problem, though.

* RSS - a special case of the above, where the response follows a standard schema. A big advantage to use this is its popularity and a large base of tools (libraries, readers, aggregators).

* SOAP - SOAP-encoded request and response. Well integrated into most programming languages, but certainly less efficient (consumes more bandwidth, CPU and memory to create and consume).

* XML-RPC - more lightweight than SOAP, but follows a similar RPC paradigm.

AFAIK, there is a specification called OpenSearch, an extension to RSS, created by Amazon/A9. However, I was unable to find the terms of use for that specification, so it might be encumbered. As I wrote above, using RSS gives strong advantages, so it would be nice to figure out if we can use it.

Existing API-s from other search engines are unfortunately encumbered by their restrictive terms of use, so it is dangerous to re-use them.

I believe that Nutch community is uniquely positioned to propose and promote an open, unencumbered XML API for search results syndication. Let's have a discussion about this - I already implemented a REST interface, which I could clean up and contribute, there were other people on the list who planned to implement the SOAP interface.

--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to