duh! Simple as I thought, just much higher up in the "stack".
public byte[] *getContent*(HitDetails
<mailbox:///home/leslie/MyMail/Sent?number=230511583&part=1.2&filename=HitDetails.html>
hit) in NutchBean
This works for me.
Although, BTW, digging into this method, it would be
convolved to pass around a reference to the document
content w/o the NutchBean instance. NutchBean ties
together the several pieces of data required to get
to the documents found via a query. As far as I can
tell at this point, the bean is the only place these
individual parts are brought together.
leslie
Leslie Rohde wrote:
The list archives contain _some_ questions and answers on this,
but nothing that is definitive. What I want is like the "cache"
button in google results pages -- not as a user interface feature,
but to be able to access and process the entire web page that
is referenced by a nutch search results.
Page does not do the trick (despite the name).
Content I have not figured out.
SegmentReader was suggested in one of the archived messages,
but the relation it has to search results is far from clear.
I gotta' believe that this is simple and I just don't know where
to look. All pointers appreciated.
Thanks,
Leslie.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general