Hi guys, Just a quick query...
Most web browsers offer the option to save a page locally.. ie download the page and any images on it, save it locally and change the image links to their local variants, and change any relative hrefs to absolute. I know that this can be done manually using TokeParser and UserAgent, my question is has this work been done already??? is this already a sub or function of an existing module that i have not heard of yet??? I have already written a script using tokeparser amounst other things, to do this, but it seems to mess up in some situations.. it uses regex to do some of the work and it works in the following steps. 1. use UserAgent to fetch the page in question and save to a file. 2. use Tokeparser to get a hash of the images in the HTML. 3. Fetch those images. 4. use a regex substitute a relative img src tag with the full url that Tokeparser got and write them to the file. 5. Ditto with relative hrefs. 6. open the file in slurp mode, remove new lines.. so html is all on one line. then use regex to remove potentially nasty stuff like embed /applet etc.. (since i don't want this script to be tricked into running ActiveX controls or java applets from our server.) 7. print the new page from our own server inside our own SSL cert. (which is the reason for the whole script. to retrieve an outside page without leaving our secure cert.) I am guessing that I am not the first person that needed this functionality, so I thought it wise to ask the experts if this work has already been done and is in a module somewhere... Any tips would be fantastic.. Many thanks.. rgds Frank.
