Probably the best way to check the filetype is to read the
response header from the server, it includes a mime type.
If you do not collect that info, but rather go for the file
endings, what happens when you stumble across a script that
generates an excel sheet, or a script that generates a gif
image? Well you will probably record the file as a script,
not as a gif image or excel.

On the other hand it might not be important to record how
many documents of a dynamic nature a script can produce.
But if you have a webserver that only generates content
from a database instead of a filesystem then you might want
to know how many HTML pages the script produces etc..

I guess its easy to gather file endings, i have a script
that sorts all the urls that HTDIG has passed through, and
presents them on a page (so that our webadmin can check
what files has been indexed), and getting that log from
HTDIG is easy.

When it comes to gather the server header responses i
really dont know if that information is stored in the
logs.. But you can always take the list of urls, and with
the help of "curl" go through that list and only save the
mime responses..

(curl comes with Mac os X i belive.. And from your
emailaddress i guess your a mac user)

Best Regards
Martin Quensel



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to