[Nutch-general] Re: getting page content for nutch search result

Leslie Rohde Thu, 26 Jan 2006 09:59:03 -0800

duh!  Simple as I thought, just much higher up in the "stack".


public byte[] *getContent*(HitDetails 
<mailbox:///home/leslie/MyMail/Sent?number=230511583&part=1.2&filename=HitDetails.html>
 hit) in NutchBean

This works for me.

Although, BTW, digging into this method, it would be

convolved to pass around a reference to the documentcontent w/o the NutchBean instance. NutchBean tiestogether the several pieces of data required to getto the documents found via a query. As far as I can

tell at this point, the bean is the only place these
individual parts are brought together.

leslie



Leslie Rohde wrote:


The list archives contain _some_ questions and answers on this,
but nothing that is definitive.  What I want is like the "cache"
button in google results pages -- not as a user interface feature,
but to be able to access and process the entire web page that
is referenced by a nutch search results.

Page does not do the trick (despite the name).
Content I have not figured out.
SegmentReader was suggested in one of the archived messages,
but the relation it has to search results is far from clear.

I gotta' believe that this is simple and I just don't know where
to look.  All pointers appreciated.

Thanks,
Leslie.



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: getting page content for nutch search result

Reply via email to