Hi,
I suggest you implement a own query filter plugin and transport your
custom query syntax to this plugin.
Insight the query filter implementation class you have fully access
to the lucene query object.
If you do it this way you don't need to take care of converting hit
objects.
However take a look to the NutchBean if you are interested to get an
idea how the converting is done.
HTH
Stefan
Am 24.11.2005 um 14:00 schrieb Bruno Patini Furtado:
Hi,
I have the need of extracting the content of some web page crawled
by Nutch.
The same functionality behind the cached link on the result pages
of the
webapp that comes with this great project.
As I had to use a more complex query language than the one provided
by nutch
I´m doing the queries directly to the lucene index using the lucene
query
language.
As a side effect of this I have as my search results a lucene Hits
class
instance. That cannot be used as a parameter to the operation
getDetails:
package org.apache.nutch.searcher;
class NutchBean {
...
public HitDetails getDetails(org.apache.nutch.searcher.Hit hit)...
}
That would return me a HitDetails instance by which I could get a URL
content using the operation getValues:
HitDetails details = nutch.getDetails(nutchHit);
String webPageContent = d.getValues("content")[0];
So my problem is:
- how can I get the content of a crawled URL accessing directly the
lucene index? or
- how can I get a Nutch Hit object from a Lucene Hits object? or
- is there any other way to retrieve the content of a crawled URL?
Any tip or suggestion will be most appreciated :)
--
"Minds are like parachutes, they work best when open."
Bruno Patini Furtado
Software Developer
webpage: www.bpfurtado.net
blog: http://www.livejournal.com/users/bpfurtado/
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general