Re: [PHP] Using Curl to replicate a site
On Thu, 2009-12-10 at 16:25 +, Ashley Sheridan wrote: > On Thu, 2009-12-10 at 11:25 -0500, Robert Cummings wrote: > > > Joseph Thayne wrote: > > > If the site can be a few minutes behind, (say 15-30 minutes), then what > > > I recommend is to create a caching script that will update the necessary > > > files if the md5 checksum has changed at all (or a specified time period > > > has past). Then store those files locally, and run local copies of the > > > files. Your performance will be much better than if you have to request > > > the page from another server every time. You could run this script > > > every 15-30 minutes depending on your needs via a cron job. > > > > Use URL rewriting or capture 404 errors to handle the proxy request. No > > need to download and cache the entire site if everyone is just > > requesting the homepage. > > > > Cheers, > > Rob. > > -- > > http://www.interjinn.com > > Application and Templating Framework for PHP > > > > > Yeah, I was going to use the page request to trigger the caching > mechanism, as it's unlikely that all pages are going to be equally as > popular as one another. I'll let you all know how it goes on! > > Thanks, > Ash > http://www.ashleysheridan.co.uk > > Well I got it working just great in the end. Aside from the odd issue with relative URLs use in referencing images and Javascripts that I had to sort out, everything seems to be working fine and is live. I've got it on a 12-hour refresh, as the site will probably not be changing very often at all. Thanks for all the pointers! Thanks, Ash http://www.ashleysheridan.co.uk
Re: [PHP] Using Curl to replicate a site
On Thu, 2009-12-10 at 11:25 -0500, Robert Cummings wrote: > Joseph Thayne wrote: > > If the site can be a few minutes behind, (say 15-30 minutes), then what > > I recommend is to create a caching script that will update the necessary > > files if the md5 checksum has changed at all (or a specified time period > > has past). Then store those files locally, and run local copies of the > > files. Your performance will be much better than if you have to request > > the page from another server every time. You could run this script > > every 15-30 minutes depending on your needs via a cron job. > > Use URL rewriting or capture 404 errors to handle the proxy request. No > need to download and cache the entire site if everyone is just > requesting the homepage. > > Cheers, > Rob. > -- > http://www.interjinn.com > Application and Templating Framework for PHP > Yeah, I was going to use the page request to trigger the caching mechanism, as it's unlikely that all pages are going to be equally as popular as one another. I'll let you all know how it goes on! Thanks, Ash http://www.ashleysheridan.co.uk
Re: [PHP] Using Curl to replicate a site
Joseph Thayne wrote: If the site can be a few minutes behind, (say 15-30 minutes), then what I recommend is to create a caching script that will update the necessary files if the md5 checksum has changed at all (or a specified time period has past). Then store those files locally, and run local copies of the files. Your performance will be much better than if you have to request the page from another server every time. You could run this script every 15-30 minutes depending on your needs via a cron job. Use URL rewriting or capture 404 errors to handle the proxy request. No need to download and cache the entire site if everyone is just requesting the homepage. Cheers, Rob. -- http://www.interjinn.com Application and Templating Framework for PHP -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Using Curl to replicate a site
Ashley Sheridan wrote: On Thu, 2009-12-10 at 11:10 -0500, Robert Cummings wrote: Ashley Sheridan wrote: > Hi, > > I need to replicate a site on another domain, and in this case, an > iframe won't really do, as I need to remove some of the graphics, etc > around the content. The owner of the site I'm needing to copy has asked > for the site to be duplicated, and unfortunately in this case, because > of the CMS he's used (which is owned by the hosting he uses) I need a > way to have the site replicated on an already existing domain as a > microsite, but in a way that it is always up-to-date. > > I'm fine using Curl to grab the site, and even alter the content that is > returned, but I was thinking about a caching mechanism. Has anyone any > suggestions on this? Sounds like you're creating a proxy with post processing/caching on the forwarded content. It should be fairly straightforward to direct page requests to your proxy app, then make the remote request, and post-process, cache, then send to the browser. The only gotcha will be for forms if you do caching. Cheers, Rob. -- http://www.interjinn.com Application and Templating Framework for PHP The only forms are processed on another site, so there's nothing I can really do about that, as they return to the original site. How would I go about doing what you suggested though? I'd assumed to use Curl, but your email suggests not to? Nope, wasn't suggesting not to. You can use many techniques, but cURL is probably the most robust. The best way to facilitate this, IMHO, is to have a rewrite rule that directs all traffic for the proxy site to your application. Then rewrite the REQUEST_URI to point to the page on the real domain. Then check your cache for the content and if empty use cURL to retrieve the content, apply your post-processing (to strip out what you don't want and apply a new page layout or whatever), then cache (if not already cached) the content (this can be a simple database table with the request URI and a timestamp), then output the content. Cheers, Rob. -- http://www.interjinn.com Application and Templating Framework for PHP -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Using Curl to replicate a site
If the site can be a few minutes behind, (say 15-30 minutes), then what I recommend is to create a caching script that will update the necessary files if the md5 checksum has changed at all (or a specified time period has past). Then store those files locally, and run local copies of the files. Your performance will be much better than if you have to request the page from another server every time. You could run this script every 15-30 minutes depending on your needs via a cron job. Joseph Ashley Sheridan wrote: Hi, I need to replicate a site on another domain, and in this case, an iframe won't really do, as I need to remove some of the graphics, etc around the content. The owner of the site I'm needing to copy has asked for the site to be duplicated, and unfortunately in this case, because of the CMS he's used (which is owned by the hosting he uses) I need a way to have the site replicated on an already existing domain as a microsite, but in a way that it is always up-to-date. I'm fine using Curl to grab the site, and even alter the content that is returned, but I was thinking about a caching mechanism. Has anyone any suggestions on this? Thanks, Ash http://www.ashleysheridan.co.uk -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Using Curl to replicate a site
On Thu, 2009-12-10 at 11:10 -0500, Robert Cummings wrote: > Ashley Sheridan wrote: > > Hi, > > > > I need to replicate a site on another domain, and in this case, an > > iframe won't really do, as I need to remove some of the graphics, etc > > around the content. The owner of the site I'm needing to copy has asked > > for the site to be duplicated, and unfortunately in this case, because > > of the CMS he's used (which is owned by the hosting he uses) I need a > > way to have the site replicated on an already existing domain as a > > microsite, but in a way that it is always up-to-date. > > > > I'm fine using Curl to grab the site, and even alter the content that is > > returned, but I was thinking about a caching mechanism. Has anyone any > > suggestions on this? > > Sounds like you're creating a proxy with post processing/caching on the > forwarded content. It should be fairly straightforward to direct page > requests to your proxy app, then make the remote request, and > post-process, cache, then send to the browser. The only gotcha will be > for forms if you do caching. > > Cheers, > Rob. > -- > http://www.interjinn.com > Application and Templating Framework for PHP > The only forms are processed on another site, so there's nothing I can really do about that, as they return to the original site. How would I go about doing what you suggested though? I'd assumed to use Curl, but your email suggests not to? Thanks, Ash http://www.ashleysheridan.co.uk
Re: [PHP] Using Curl to replicate a site
Ashley Sheridan wrote: Hi, I need to replicate a site on another domain, and in this case, an iframe won't really do, as I need to remove some of the graphics, etc around the content. The owner of the site I'm needing to copy has asked for the site to be duplicated, and unfortunately in this case, because of the CMS he's used (which is owned by the hosting he uses) I need a way to have the site replicated on an already existing domain as a microsite, but in a way that it is always up-to-date. I'm fine using Curl to grab the site, and even alter the content that is returned, but I was thinking about a caching mechanism. Has anyone any suggestions on this? Sounds like you're creating a proxy with post processing/caching on the forwarded content. It should be fairly straightforward to direct page requests to your proxy app, then make the remote request, and post-process, cache, then send to the browser. The only gotcha will be for forms if you do caching. Cheers, Rob. -- http://www.interjinn.com Application and Templating Framework for PHP -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Using Curl to replicate a site
Hi, I need to replicate a site on another domain, and in this case, an iframe won't really do, as I need to remove some of the graphics, etc around the content. The owner of the site I'm needing to copy has asked for the site to be duplicated, and unfortunately in this case, because of the CMS he's used (which is owned by the hosting he uses) I need a way to have the site replicated on an already existing domain as a microsite, but in a way that it is always up-to-date. I'm fine using Curl to grab the site, and even alter the content that is returned, but I was thinking about a caching mechanism. Has anyone any suggestions on this? Thanks, Ash http://www.ashleysheridan.co.uk