Source code of web pages crawled by Nutch

Gaurang Patel Mon, 11 May 2009 14:16:08 -0700

Hi All,*

*Can anyone help me with this problem?*


Here is my problem:*

I want to get the source code of the hits I get using nutch crawler. I am
not sure whether nutch stores the content of a web page(i.e actual source
code for web page) in the crawled results. I am afraid if it does not!

If nutch stores these contents, do you have idea how can I retrieve the
contents using any nuch libraries? I have my eye on these classes:
NutchBean, Hit, HitDetails. May be I can find some method of these classes
that gives me contents of the page. I am being hopeless from this classes as
no method gets the content of webpage.

Any kind of help is appreciated.

Regards,
Gaurang

Source code of web pages crawled by Nutch

Reply via email to