> On Mon, May 02, 2005 at 01:27:41PM +0100, Richard Lyons wrote: > > I am considering how to crawl a site which is dynamically generated, > > and create a static version of all generated pages (or selected > > generated pages). I guess it would be simplest to start with an > > existing crawler, and bolt on some code. Or, alternatively, write a > > script (perl, I fear) to modify the cache built by a crawler. > > > > The idea is to allow a static ecommerce site to be generated from any > > database-generated shopping cart system. > > > > Any advice where to begin?
Well, I don't know an "elegant" solution... one dirty approach would be to first download the site with "wget -r", then you would get lots of files with names like this: index.php?lang=es&tipo=obras&com=extracto index.php?lang=es&tipo=obras&com=lista index.php?lang=es&tipo=obras&com=susobras So it would be quite easy to write a simple perl script that substitutes the special characters for others more "static-like", and you would get something like: index_lang-es_tipo-obras_com-extracto.html index_lang-es_tipo-obras_com-lista.html index_lang-es_tipo-obras_com-susobras.html Also, surely you should have to parse the content of each file to substitute the links inside them. Maybe too complicated? Regards: Nacho -- No book comes out of a vacuum (G. Buehler) http://www.lascartasdelavida.com -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]