Hi,
I suggest you implement a own query filter plugin and transport your custom query syntax to this plugin. Insight the query filter implementation class you have fully access to the lucene query object. If you do it this way you don't need to take care of converting hit objects. However take a look to the NutchBean if you are interested to get an idea how the converting is done.

HTH
Stefan



Am 24.11.2005 um 14:00 schrieb Bruno Patini Furtado:

Hi,
I have the need of extracting the content of some web page crawled by Nutch. The same functionality behind the cached link on the result pages of the
webapp that comes with this great project.

As I had to use a more complex query language than the one provided by nutch I´m doing the queries directly to the lucene index using the lucene query
language.

As a side effect of this I have as my search results a lucene Hits class instance. That cannot be used as a parameter to the operation getDetails:

package org.apache.nutch.searcher;
class NutchBean {
    ...
    public HitDetails getDetails(org.apache.nutch.searcher.Hit hit)...
}

That would return me a HitDetails instance by which I could get a URL
content using the operation getValues:

HitDetails details = nutch.getDetails(nutchHit);
String webPageContent = d.getValues("content")[0];

So my problem is:

   - how can I get the content of a crawled URL accessing directly the
   lucene index? or
    - how can I get a Nutch Hit object from a Lucene Hits object? or
    - is there any other way to retrieve the content of a crawled URL?

Any tip or suggestion will be most appreciated :)


--
"Minds are like parachutes, they work best when open."

Bruno Patini Furtado
Software Developer
webpage: www.bpfurtado.net
blog: http://www.livejournal.com/users/bpfurtado/



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to