Well you should look into segments/xxxxxxxx/content/part-xxxxx if I'm not mistaken, but you don't get the HTML only the content and/or the meta-data. I understand nutch correctly. Not sure why you want to read the HTML
-Ray- 2009/4/27 sgirao <[email protected]> > > Hello, i'm new at this, i'm using the nutch version 1.0 , and i want to > retrieved the html that i crawl. > I use the wiki http://wiki.apache.org/nutch/ to understand how works the > nucth. > I know the things that was crawled are in the folder segments, but i was > searching how to get the html and i don't find nothing! > If anyone can help me i appreciated. > > P.S. - Forgive my English. > > -- > View this message in context: > http://www.nabble.com/How-to-get-the-html-that-i-crawled-tp23254318p23254318.html > Sent from the Nutch - User mailing list archive at Nabble.com. > >
