Hi Doug

I check the code this morning. Pity my weekend without network and I
implement NutchSearchRss in my way ---- I extended rss4j(I know one
target of nutch is keep the search engine small, and rss4j is a small
jar without any dependencies) to generate OpenSearch rss and
NutchSearch rss. If you have some interesting, I can post you.

And here I find something may be wrong in the code hosted in svn.

        addNode(doc, channel, "nutch", "cache", base+"/cached.jsp?"+id);
        addNode(doc, channel, "nutch", "explain", base+"/explain.jsp?"+id
                +"&query="+URLEncoder.encode(queryString,"UTF-8"));

it should be:

        addNode(doc, item, "nutch", "cache", base+"/cached.jsp?"+id);
        addNode(doc, item, "nutch", "explain", base+"/explain.jsp?"+id
                +"&query="+URLEncoder.encode(queryString,"UTF-8"));

isn't it?

Besides I draft nutch rss spec(attached), please review:)

/Jack

On 4/16/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Jack Tang wrote:
> > If no more comments this thread more. I plan to implement it next two weeks.
> 
> Please note that I implemented an OpenSearch servlet a few days ago.
> (Folks who are interested in closely tracking Nutch development should
> subscribe to the nutch-commits list too.)
> 
> http://svn.apache.org/viewcvs.cgi/incubator/nutch/trunk/src/java/org/apache/nutch/searcher/OpenSearchServlet.java?view=markup
> 
> I have not yet written an OpenSearch description document page, but that
> should be easy to implement using JSP
> (http://opensearch.a9.com/spec/opensearchdescription/1.0/).
> 
> Nor have I yet done anything to generate HTML from the XML.  It would be
> great if you could help develop the architecture to generate HTML result
> pages from the servlet's XML, and to generate the opensearch description
> document.
> 
> I still intend to add a few things to the servlet.  In particular, I
> intend to dump all stored fields for each document into the item xml, in
> the nutch namespace.  I also intend to add navigation links the the channel.
> 
> Cheers,
> 
> Doug
>
Nutch RSS 1.0
"The Nutch RSS Present search results using A9's OpenSearch extensions to RSS, 
plus a few Nutch-specific extensions.."

Nutch RSS 1.0 is an extension to the RSS 2.0 standard, conforming to the 
guidelines for RSS extensibility as outlined by the RSS 2.0 specification.

Version 1.0 of Nutch RSS adds only five new elements, each within the nutch XML 
namespace. 

Future versions of Nutch RSS will attempt to maintain backwards compatibility 
with Nutch RSS 1.0. More complicated search extensions to RSS 2.0, such as ??? .

An example Nutch RSS 1.0 search result document would look like:


<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<rss version="2.0" xmlns="http://feedvalidator.org/docs/rss2.html";
  xmlns:nutch="http://www.nutch.org/opensearchrss/1.0/"; 
xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/";>
  <channel>
    <title>Nutch Search: java</title>
    
<link>http://localhost/nutch-0.7-dev/opensearch?query=javap&amp;amp;start=0&amp;amp;hitsPerPage=10&amp;amp;hitsPerSite=2</link>
    <description>Nutch search results for query: java</description>
    <language>en-us</language>
    <openSearch:totalResults>4000</openSearch:totalResults>
    <openSearch:startIndex>0</openSearch:startIndex>
    <openSearch:itemsPerPage>10</openSearch:itemsPerPage>
    <nutch:itemsPerSite>2</nutch:itemsPerSite>
    <item>
      <title>Networking Apps With Redenzvous</title>
      <link>http://localhost/c/11028.html</link>
      <description>Avoid the difficulties of internetworked applications with 
this fast, robust framework package.</description>
      
<nutch:cache>http://localhost/nutch-0.7-dev/cache.jsp?idx=0&amp;id=90</nutch:cache>
      
<nutch:explain>http://localhost/nutch-0.7-dev/explain.jsp?idx=0&amp;id=90&amp;query=java</nutch:explain>
      
<nutch:anchors>http://localhost/nutch-0.7-dev/anchors.jsp?idx=0&amp;id=90</nutch:anchors>
      
<nutch:moreFromSite>http://localhost/nutch-0.7-dev/search.jsp?query=site:localhost+jsp&amp;hitsPerPage=10&amp;hitsPerSite=0</nutch:moreFromSite>
    </item>
    <item>
      <title>Secure Sockets with JSSE &amp; OpenSSL</title>
      <link>http://localhost/c/11201.html</link>
      <description>Find out how to implement robust secure communnications 
between your clients and servers, including making your own certificate 
authority.</description>
      
<nutch:cache>http://localhost/nutch-0.7-dev/cache.jsp?idx=0&amp;id=92</nutch:cache>
      
<nutch:explain>http://localhost/nutch-0.7-dev/explain.jsp?idx=0&amp;id=92&amp;query=java</nutch:explain>
      
<nutch:anchors>http://localhost/nutch-0.7-dev/anchors.jsp?idx=0&amp;id=69</nutch:anchors>
      
<nutch:moreFromSite>http://localhost/nutch-0.7-dev/search.jsp?query=site:localhost+jsp&amp;amp;hitsPerPage=10&amp;amp;hitsPerSite=0</nutch:moreFromSite>
    </item>
  </channel>
</rss>



The XML Namespace:
The nutch namespace is specified as an attribute of the opening <rss/> element. 
The location of this XML schema is: 

http://www.nutch.org/opensearchrss/1.0/
Thus the xmlns declaration in the opening rss element appears as follows:

  <rss version="2.0" xmlns:nutch="http://www.nutch.org/opensearchrss/1.0/"; 
...>...</rss>

-----------------------------------------------------------------------------------------------
List of nutch elements introduced by this specification:



nutch:itemsPerSite �C The number of search results available in multi-sites 
searching. 
Parent: channel 
Note: If no results are available, the server should respond with 0.
For example: <nutch:itemsPerSite>2</nutch:itemsPerSite> 
Optional, if this does not appear the client should that 2 items are returned 
per site.

nutch:moreFromSite �C A URL to the servlet(?) that providers the more search 
results. 
Parent: item 
Note: If itemsPerSite is 0, the URL is useless.
For example: 
<nutch:moreFromSite>http://localhost/nutch-0.7-dev/opensearch?query=site%3Alocalhost+jsp&amp;hitsPerPage=10&amp;hitsPerSite=0</nutch:moreFromSite>
 
Optional, In practice, this should always exist when itemsPerSite > 0.

nutch:cache �C A URL to the cache.jsp that shows the cached content in page 
database. 
Parent: item 
For example: 
<nutch:cache>http://localhost/nutch-0.7-dev/cache.jsp?idx=0&amp;id=90</nutch:cache>
Optional, In practice, this should always exist. 


nutch:explain �C A URL to the explain.jsp that explains the page scoring. 
Parent: item 
For example: 
<nutch:explain>http://localhost/nutch-0.7-dev/explain.jsp?idx=0&amp;id=90&amp;query=jsp</nutch:explain>
Optional, In practice, this should always exist. 


nutch:anchor �C A URL to the anchors.jsp that returns anchors of a hit 
document.. 
Parent: item 
For example: 
<nutch:anchor>http://localhost/nutch-0.7-dev/anchors.jsp?idx=0&id=69</nutch:anchor>
Optional. 

-------------------------------------------------------------------------------------------
List of standard OpenSearch elements given enhanced meaning by this 
specification:

openSearch:totalResults �C The number of search results available. 
Parent: channel 
Note: If no results are available, the server should respond with 0.
For example: <openSearch:totalResults>0</openSearch:totalResults> 
Optional, if this does not appear the client should assume all search results 
were returned in this request. 


openSearch:startIndex �C The index of the first item returned in the result, 
starting with 1. 
Parent: channel 
Optional, if this does not appear the client should assume that the first index 
returned is 1. 


openSearch:itemsPerPage �C The maximum number of items that appear on one page. 
Parent: channel 
Optional, if this does not appear the client should that 10 items are returned 
per page. 


------------------------------------------------------------------------------------------
List of standard RSS elements given enhanced meaning by this specification:


title �C The name of the search result provider. 
Parent: channel 
Notes: Should contain the search terms themselves. 
Restrictions: Can not contain HTML markup. 
Required 



link �C A URL to the website that providers the search results. 
Parent: channel 
Notes: Should return a URL that can recreate the search in HTML format. 
Restrictions: Can not contain HTML markup. 
Required 


description �C Phrase or sentence describing the search. 
Parent: channel 
Restriction: Can contain simple escaped HTML markup, such as <b> and <i> 
elements. 
Required 


title �C The text title of the search result. 
Parent: item 
Restrictions: Can not contain HTML markup. 
Optional. In practice, this should always exist. 


link �C The URL pointing to the content referred to in the search result. 
Parent: item 
Restrictions: Can not contain HTML markup. 
Optional. However, at least one of either link or description must exist, and 
usually both elements do. 


description �C A text snippet describing the search result. 
Parent: item 
Restriction: Can contain simple escaped HTML markup, such as <b>, <i>, <a>, and 
<img> elements. 
Optional. However, at least one of either link or description must exist, and 
usually both elements do. 

Reply via email to