Hi Doug
I check the code this morning. Pity my weekend without network and I
implement NutchSearchRss in my way ---- I extended rss4j(I know one
target of nutch is keep the search engine small, and rss4j is a small
jar without any dependencies) to generate OpenSearch rss and
NutchSearch rss. If you have some interesting, I can post you.
And here I find something may be wrong in the code hosted in svn.
addNode(doc, channel, "nutch", "cache", base+"/cached.jsp?"+id);
addNode(doc, channel, "nutch", "explain", base+"/explain.jsp?"+id
+"&query="+URLEncoder.encode(queryString,"UTF-8"));
it should be:
addNode(doc, item, "nutch", "cache", base+"/cached.jsp?"+id);
addNode(doc, item, "nutch", "explain", base+"/explain.jsp?"+id
+"&query="+URLEncoder.encode(queryString,"UTF-8"));
isn't it?
Besides I draft nutch rss spec(attached), please review:)
/Jack
On 4/16/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Jack Tang wrote:
> > If no more comments this thread more. I plan to implement it next two weeks.
>
> Please note that I implemented an OpenSearch servlet a few days ago.
> (Folks who are interested in closely tracking Nutch development should
> subscribe to the nutch-commits list too.)
>
> http://svn.apache.org/viewcvs.cgi/incubator/nutch/trunk/src/java/org/apache/nutch/searcher/OpenSearchServlet.java?view=markup
>
> I have not yet written an OpenSearch description document page, but that
> should be easy to implement using JSP
> (http://opensearch.a9.com/spec/opensearchdescription/1.0/).
>
> Nor have I yet done anything to generate HTML from the XML. It would be
> great if you could help develop the architecture to generate HTML result
> pages from the servlet's XML, and to generate the opensearch description
> document.
>
> I still intend to add a few things to the servlet. In particular, I
> intend to dump all stored fields for each document into the item xml, in
> the nutch namespace. I also intend to add navigation links the the channel.
>
> Cheers,
>
> Doug
>
Nutch RSS 1.0
"The Nutch RSS Present search results using A9's OpenSearch extensions to RSS,
plus a few Nutch-specific extensions.."
Nutch RSS 1.0 is an extension to the RSS 2.0 standard, conforming to the
guidelines for RSS extensibility as outlined by the RSS 2.0 specification.
Version 1.0 of Nutch RSS adds only five new elements, each within the nutch XML
namespace.
Future versions of Nutch RSS will attempt to maintain backwards compatibility
with Nutch RSS 1.0. More complicated search extensions to RSS 2.0, such as ??? .
An example Nutch RSS 1.0 search result document would look like:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<rss version="2.0" xmlns="http://feedvalidator.org/docs/rss2.html"
xmlns:nutch="http://www.nutch.org/opensearchrss/1.0/"
xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/">
<channel>
<title>Nutch Search: java</title>
<link>http://localhost/nutch-0.7-dev/opensearch?query=javap&amp;start=0&amp;hitsPerPage=10&amp;hitsPerSite=2</link>
<description>Nutch search results for query: java</description>
<language>en-us</language>
<openSearch:totalResults>4000</openSearch:totalResults>
<openSearch:startIndex>0</openSearch:startIndex>
<openSearch:itemsPerPage>10</openSearch:itemsPerPage>
<nutch:itemsPerSite>2</nutch:itemsPerSite>
<item>
<title>Networking Apps With Redenzvous</title>
<link>http://localhost/c/11028.html</link>
<description>Avoid the difficulties of internetworked applications with
this fast, robust framework package.</description>
<nutch:cache>http://localhost/nutch-0.7-dev/cache.jsp?idx=0&id=90</nutch:cache>
<nutch:explain>http://localhost/nutch-0.7-dev/explain.jsp?idx=0&id=90&query=java</nutch:explain>
<nutch:anchors>http://localhost/nutch-0.7-dev/anchors.jsp?idx=0&id=90</nutch:anchors>
<nutch:moreFromSite>http://localhost/nutch-0.7-dev/search.jsp?query=site:localhost+jsp&hitsPerPage=10&hitsPerSite=0</nutch:moreFromSite>
</item>
<item>
<title>Secure Sockets with JSSE & OpenSSL</title>
<link>http://localhost/c/11201.html</link>
<description>Find out how to implement robust secure communnications
between your clients and servers, including making your own certificate
authority.</description>
<nutch:cache>http://localhost/nutch-0.7-dev/cache.jsp?idx=0&id=92</nutch:cache>
<nutch:explain>http://localhost/nutch-0.7-dev/explain.jsp?idx=0&id=92&query=java</nutch:explain>
<nutch:anchors>http://localhost/nutch-0.7-dev/anchors.jsp?idx=0&id=69</nutch:anchors>
<nutch:moreFromSite>http://localhost/nutch-0.7-dev/search.jsp?query=site:localhost+jsp&amp;hitsPerPage=10&amp;hitsPerSite=0</nutch:moreFromSite>
</item>
</channel>
</rss>
The XML Namespace:
The nutch namespace is specified as an attribute of the opening <rss/> element.
The location of this XML schema is:
http://www.nutch.org/opensearchrss/1.0/
Thus the xmlns declaration in the opening rss element appears as follows:
<rss version="2.0" xmlns:nutch="http://www.nutch.org/opensearchrss/1.0/"
...>...</rss>
-----------------------------------------------------------------------------------------------
List of nutch elements introduced by this specification:
nutch:itemsPerSite �C The number of search results available in multi-sites
searching.
Parent: channel
Note: If no results are available, the server should respond with 0.
For example: <nutch:itemsPerSite>2</nutch:itemsPerSite>
Optional, if this does not appear the client should that 2 items are returned
per site.
nutch:moreFromSite �C A URL to the servlet(?) that providers the more search
results.
Parent: item
Note: If itemsPerSite is 0, the URL is useless.
For example:
<nutch:moreFromSite>http://localhost/nutch-0.7-dev/opensearch?query=site%3Alocalhost+jsp&hitsPerPage=10&hitsPerSite=0</nutch:moreFromSite>
Optional, In practice, this should always exist when itemsPerSite > 0.
nutch:cache �C A URL to the cache.jsp that shows the cached content in page
database.
Parent: item
For example:
<nutch:cache>http://localhost/nutch-0.7-dev/cache.jsp?idx=0&id=90</nutch:cache>
Optional, In practice, this should always exist.
nutch:explain �C A URL to the explain.jsp that explains the page scoring.
Parent: item
For example:
<nutch:explain>http://localhost/nutch-0.7-dev/explain.jsp?idx=0&id=90&query=jsp</nutch:explain>
Optional, In practice, this should always exist.
nutch:anchor �C A URL to the anchors.jsp that returns anchors of a hit
document..
Parent: item
For example:
<nutch:anchor>http://localhost/nutch-0.7-dev/anchors.jsp?idx=0&id=69</nutch:anchor>
Optional.
-------------------------------------------------------------------------------------------
List of standard OpenSearch elements given enhanced meaning by this
specification:
openSearch:totalResults �C The number of search results available.
Parent: channel
Note: If no results are available, the server should respond with 0.
For example: <openSearch:totalResults>0</openSearch:totalResults>
Optional, if this does not appear the client should assume all search results
were returned in this request.
openSearch:startIndex �C The index of the first item returned in the result,
starting with 1.
Parent: channel
Optional, if this does not appear the client should assume that the first index
returned is 1.
openSearch:itemsPerPage �C The maximum number of items that appear on one page.
Parent: channel
Optional, if this does not appear the client should that 10 items are returned
per page.
------------------------------------------------------------------------------------------
List of standard RSS elements given enhanced meaning by this specification:
title �C The name of the search result provider.
Parent: channel
Notes: Should contain the search terms themselves.
Restrictions: Can not contain HTML markup.
Required
link �C A URL to the website that providers the search results.
Parent: channel
Notes: Should return a URL that can recreate the search in HTML format.
Restrictions: Can not contain HTML markup.
Required
description �C Phrase or sentence describing the search.
Parent: channel
Restriction: Can contain simple escaped HTML markup, such as <b> and <i>
elements.
Required
title �C The text title of the search result.
Parent: item
Restrictions: Can not contain HTML markup.
Optional. In practice, this should always exist.
link �C The URL pointing to the content referred to in the search result.
Parent: item
Restrictions: Can not contain HTML markup.
Optional. However, at least one of either link or description must exist, and
usually both elements do.
description �C A text snippet describing the search result.
Parent: item
Restriction: Can contain simple escaped HTML markup, such as <b>, <i>, <a>, and
<img> elements.
Optional. However, at least one of either link or description must exist, and
usually both elements do.