I have just been tasked to find/build a tool that can 1) spider / build 
a site map of our web site and 2) copy identified files to a new 
directory structure.  What we are looking for is to take a site with 
more then a decade of flotsam and jetsam and identify the fluff from the 
chaff (how's that for metaphor mixing) and possibly move the good 
content to a new location.  If not automatically move it at least create 
a report of what should be moved.

I'm not too concerned with the traditional links in html/cfml files, I 
am confident I can find/modify/build something that would work through 
this part of our sites content.  The part I am unfamiliar with is how to 
get to links inside of thousands of PDF documents used in our site.  I 
imagine there must be some way to scan and parse links out of PDFs.  If 
we can parse these files to search for content, then this should be 
doable.  But how would one do this and/or has it already been done?

Can anybody provide any suggestions or pointers or other concerns?


ColdFusion 8 beta – Build next generation applications today.
Free beta download on Labs

Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm

Reply via email to