I have just been tasked to find/build a tool that can 1) spider / build a site map of our web site and 2) copy identified files to a new directory structure. What we are looking for is to take a site with more then a decade of flotsam and jetsam and identify the fluff from the chaff (how's that for metaphor mixing) and possibly move the good content to a new location. If not automatically move it at least create a report of what should be moved.
I'm not too concerned with the traditional links in html/cfml files, I am confident I can find/modify/build something that would work through this part of our sites content. The part I am unfamiliar with is how to get to links inside of thousands of PDF documents used in our site. I imagine there must be some way to scan and parse links out of PDFs. If we can parse these files to search for content, then this should be doable. But how would one do this and/or has it already been done? Can anybody provide any suggestions or pointers or other concerns? TIA Ian ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| ColdFusion 8 beta â Build next generation applications today. Free beta download on Labs http://www.adobe.com/cfusion/entitlement/index.cfm?e=labs_adobecf8_beta Archive: http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:283808 Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4