Simon Wilcox wrote:

Yes, in this case I know there are no messy rewrites going on (it's site from an IIS server :)

The specific case I have is an old site with close to 1,000 files in its
directory tree, of which only 380 are actually referenced from the
navigable pages. Even allowing for some pages that are standalone, that's a lot of cruft I can clean out !

Er... wouldn't some kinda web spider be the obvious approach? Eg. find one that works (no clues available here). Or trundle a search engine such as htdig over the site, which "should"(R)(tm) hit all visible pages. Subtract access log contents from dir list, and bob is your uncle.

That's should, unless there are some javascript-dependent links, or bugs in htdig.

Cheers

Tim





Reply via email to