Well you should look into segments/xxxxxxxx/content/part-xxxxx if I'm not
mistaken, but you don't get the HTML only the content and/or the meta-data.
I understand nutch correctly.
Not sure why you want to read the HTML

-Ray-

2009/4/27 sgirao <[email protected]>

>
> Hello, i'm new at this, i'm using the nutch version 1.0 , and i want to
> retrieved the html that i crawl.
> I use the wiki http://wiki.apache.org/nutch/ to understand how works the
> nucth.
> I know the things that was crawled are in the folder segments, but i was
> searching how to get the html and i don't find nothing!
> If anyone can help me i appreciated.
>
> P.S. -  Forgive my English.
>
> --
> View this message in context:
> http://www.nabble.com/How-to-get-the-html-that-i-crawled-tp23254318p23254318.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

Reply via email to