Probably the best way to check the filetype is to read the response header from the server, it includes a mime type. If you do not collect that info, but rather go for the file endings, what happens when you stumble across a script that generates an excel sheet, or a script that generates a gif image? Well you will probably record the file as a script, not as a gif image or excel.
On the other hand it might not be important to record how many documents of a dynamic nature a script can produce. But if you have a webserver that only generates content from a database instead of a filesystem then you might want to know how many HTML pages the script produces etc.. I guess its easy to gather file endings, i have a script that sorts all the urls that HTDIG has passed through, and presents them on a page (so that our webadmin can check what files has been indexed), and getting that log from HTDIG is easy. When it comes to gather the server header responses i really dont know if that information is stored in the logs.. But you can always take the list of urls, and with the help of "curl" go through that list and only save the mime responses.. (curl comes with Mac os X i belive.. And from your emailaddress i guess your a mac user) Best Regards Martin Quensel ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

