Ok I've made the following observation, the contentLength seems to work with 
.html files and not .php.  I renamed a file from sitemap.php to sitemap.html 
(very same file) and reindexed :


        * segment = 20080806143531
        * digest = 2aefb939d9d04fe5c75261ba490396c2
        * url = http://localhost/www.hdlservices.com/sitemap.php
        * title = HDL Services. Website map
        * tstamp = 20080806183058514
        * type = text/html
        * primaryType = application/octet-stream
        * boost = 0.7468931


        * segment = 20080806165517
        * digest = 2aefb939d9d04fe5c75261ba490396c2
        * url = http://localhost/www.hdlservices.com/sitemap.html
        * title = HDL Services. Website map
        * tstamp = 20080806205506892
        * lastModified = 1218054823000
        * contentLength = 11322
        * type = text/html
        * primaryType = application/octet-stream
        * boost = 0.5714286
Any particular reason why this is the case?

Thanks in advance.


 Hilkiah G. Lavinier MEng (Hons), ACGI 
6 Winston Lane, 
Goodwill, 
Roseau, Dominica


Mbl: (767) 275 3382
Fax: (767) 440 4991
VoIP (646) 432 4487


Email: [EMAIL PROTECTED]
Email: [EMAIL PROTECTED]
IM: Yahoo hilkiah / MSN [EMAIL PROTECTED]
IM: ICQ #8978201  / AOL hilkiah21



----- Original Message ----
From: Hilkiah Lavinier <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, August 6, 2008 4:08:27 PM
Subject: index-more and contentLength field

Hi Guys,

Can anyone explain 'how' the contentLength field is populated?  I've been 
indexing a few sites and some seem to have this field available while others 
don't. I really don't understand why.  I've looked through 
MoreIndexingFilter.java, ParseData.java, HttpHeaders.java and Metadata.java 
source files as well as the logs of the various crawls (fetch...index) but 
can't seem to figure out why..

I'm using nutch trunk with index-more and query-more enabled.

Regards,

Hilkiah G. Lavinier MEng (Hons), ACGI 
6 Winston Lane, 
Goodwill, 
Roseau, Dominica


Mbl: (767) 275 3382
Fax: (767) 440 4991
VoIP (646) 432 4487


Email: [EMAIL PROTECTED]
Email: [EMAIL PROTECTED]
IM: Yahoo hilkiah / MSN [EMAIL PROTECTED]
IM: ICQ #8978201  / AOL hilkiah21


      

Reply via email to