Re: webcrawl to cache dynamic pages

Richard Lyons Sun, 08 May 2005 16:47:44 -0700

On Sun, May 08, 2005 at 09:48:07AM +0200, Nacho wrote:
> > On Mon, May 02, 2005 at 01:27:41PM +0100, Richard Lyons wrote:
> > > I am considering how to crawl a site which is dynamically generated,
> > > and create a static version of all generated pages (or selected
[...]
> 
> Well, I don't know an "elegant" solution... one dirty approach would be to
> first download the site with "wget -r", then you would get lots of files with
> names like this:
> 
> index.php?lang=es&tipo=obras&com=extracto
> index.php?lang=es&tipo=obras&com=lista
> index.php?lang=es&tipo=obras&com=susobras
> 
> So it would be quite easy to write a simple perl script that substitutes the
> special characters for others more "static-like", and you would get something
> like:
> 
> index_lang-es_tipo-obras_com-extracto.html
> index_lang-es_tipo-obras_com-lista.html
> index_lang-es_tipo-obras_com-susobras.html
> 
> Also, surely you should have to parse the content of each file to substitute
> the links inside them.
> 
> Maybe too complicated?


Yes... that is the kind of thing I was imagining.  It will probably be
quite simple once I get started.  But first I need to find time :-(

Thanks for the pointer.

-- 
richard


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Re: webcrawl to cache dynamic pages

Reply via email to