Sorry in my previous posting the output of nutch "readseg -get" was wrong .. here is the actual output:

-Corrado

SegmentReader: get 'http://testmachine.test.net/index.html'
Content::
Version: 2
url: http://testmachine.test.net/index.html
base: http://testmachine.test.net/index.html
contentType: text/html
metadata: Content-Length=345 Connection=close ETag="2f4ac-159-421166c12a140" nutch.segment.name=20061108113703 nutch.crawl.score=1.0 Recommended=plugins nutch.content.digest=82e307c71d7476ce729a8e6d3b0de50a Accept-Ranges=bytes Server=Apache/2.2.0 (Fedora) Content-Type=text/html; charset=UTF-8 date=Wed, 08 Nov 2006 10:37:57 GMT Last-Modified=Tue, 31 Oct 2006 07:34:53 GMT
Content:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd";>
<HTML>
<HEAD>
<TITLE>
PLUG-IN TEST
</TITLE>
</HEAD>
<meta name="recommended" content="plugins">
<A HREF="http://testmachine.test.net/omniORB/index.html";>omniORB</A>
<BR>
<A HREF="http://testmachine.test.net/nutch/index.html";>Nutch</A>
</HTML>

Crawl Generate::
Version: 4
Status: 1 (DB_unfetched)
Fetch time: Wed Nov 08 11:36:31 CET 2006
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 30.0 days
Score: 1.0
Signature: null
Metadata: null

Crawl Fetch::
Version: 4
Status: 5 (fetch_success)
Fetch time: Wed Nov 08 11:37:58 CET 2006
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 30.0 days
Score: 1.0
Signature: 82e307c71d7476ce729a8e6d3b0de50a
Metadata: null

Crawl Parse::
Version: 4
Status: 4 (linked)
Fetch time: Wed Nov 08 11:38:05 CET 2006
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 30.0 days
Score: 0.5
Signature: null
Metadata: null

ParseData::
Version: 5
Status: success(1,0)
Title: PLUG-IN TEST
Outlinks: 2
outlink: toUrl: http://testmachine.test.net/omniORB/index.html anchor: omniORB
 outlink: toUrl: http://testmachine.test.net/nutch/index.html anchor: Nutch
Content Metadata: Connection=close Content-Length=345 nutch.crawl.score=1.0 nutch.segment.name=20061108113703 ETag="2f4ac-159-421166c12a140" Recommended=plugins nutch.content.digest=82e307c71d7476ce729a8e6d3b0de50a Accept-Ranges=bytes Content-Type=text/html; charset=UTF-8 Server=Apache/2.2.0 (Fedora) Last-Modified=Tue, 31 Oct 2006 07:34:53 GMT date=Wed, 08 Nov 2006 10:37:57 GMT
Parse Metadata: OriginalCharEncoding=UTF-8 CharEncodingForConversion=UTF-8

ParseText::
PLUG-IN TEST omniORB Nutch

Reply via email to