Re: [PHP] Using Curl to replicate a site

2009-12-11 Thread Ashley Sheridan
On Thu, 2009-12-10 at 16:25 +, Ashley Sheridan wrote:

 On Thu, 2009-12-10 at 11:25 -0500, Robert Cummings wrote:
 
  Joseph Thayne wrote:
   If the site can be a few minutes behind, (say 15-30 minutes), then what 
   I recommend is to create a caching script that will update the necessary 
   files if the md5 checksum has changed at all (or a specified time period 
   has past).  Then store those files locally, and run local copies of the 
   files.  Your performance will be much better than if you have to request 
   the page from another server every time.  You could run this script 
   every 15-30 minutes depending on your needs via a cron job.
  
  Use URL rewriting or capture 404 errors to handle the proxy request. No 
  need to download and cache the entire site if everyone is just 
  requesting the homepage.
  
  Cheers,
  Rob.
  -- 
  http://www.interjinn.com
  Application and Templating Framework for PHP
  
 
 
 Yeah, I was going to use the page request to trigger the caching
 mechanism, as it's unlikely that all pages are going to be equally as
 popular as one another. I'll let you all know how it goes on!
 
 Thanks,
 Ash
 http://www.ashleysheridan.co.uk
 
 


Well I got it working just great in the end. Aside from the odd issue
with relative URLs use in referencing images and Javascripts that I had
to sort out, everything seems to be working fine and is live. I've got
it on a 12-hour refresh, as the site will probably not be changing very
often at all. Thanks for all the pointers!

Thanks,
Ash
http://www.ashleysheridan.co.uk




[PHP] Using Curl to replicate a site

2009-12-10 Thread Ashley Sheridan
Hi,

I need to replicate a site on another domain, and in this case, an
iframe won't really do, as I need to remove some of the graphics, etc
around the content. The owner of the site I'm needing to copy has asked
for the site to be duplicated, and unfortunately in this case, because
of the CMS he's used (which is owned by the hosting he uses) I need a
way to have the site replicated on an already existing domain as a
microsite, but in a way that it is always up-to-date.

I'm fine using Curl to grab the site, and even alter the content that is
returned, but I was thinking about a caching mechanism. Has anyone any
suggestions on this?

Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] Using Curl to replicate a site

2009-12-10 Thread Robert Cummings

Ashley Sheridan wrote:

Hi,

I need to replicate a site on another domain, and in this case, an
iframe won't really do, as I need to remove some of the graphics, etc
around the content. The owner of the site I'm needing to copy has asked
for the site to be duplicated, and unfortunately in this case, because
of the CMS he's used (which is owned by the hosting he uses) I need a
way to have the site replicated on an already existing domain as a
microsite, but in a way that it is always up-to-date.

I'm fine using Curl to grab the site, and even alter the content that is
returned, but I was thinking about a caching mechanism. Has anyone any
suggestions on this?


Sounds like you're creating a proxy with post processing/caching on the 
forwarded content. It should be fairly straightforward to direct page 
requests to your proxy app, then make the remote request, and 
post-process, cache, then send to the browser. The only gotcha will be 
for forms if you do caching.


Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Using Curl to replicate a site

2009-12-10 Thread Ashley Sheridan
On Thu, 2009-12-10 at 11:10 -0500, Robert Cummings wrote:

 Ashley Sheridan wrote:
  Hi,
  
  I need to replicate a site on another domain, and in this case, an
  iframe won't really do, as I need to remove some of the graphics, etc
  around the content. The owner of the site I'm needing to copy has asked
  for the site to be duplicated, and unfortunately in this case, because
  of the CMS he's used (which is owned by the hosting he uses) I need a
  way to have the site replicated on an already existing domain as a
  microsite, but in a way that it is always up-to-date.
  
  I'm fine using Curl to grab the site, and even alter the content that is
  returned, but I was thinking about a caching mechanism. Has anyone any
  suggestions on this?
 
 Sounds like you're creating a proxy with post processing/caching on the 
 forwarded content. It should be fairly straightforward to direct page 
 requests to your proxy app, then make the remote request, and 
 post-process, cache, then send to the browser. The only gotcha will be 
 for forms if you do caching.
 
 Cheers,
 Rob.
 -- 
 http://www.interjinn.com
 Application and Templating Framework for PHP
 


The only forms are processed on another site, so there's nothing I can
really do about that, as they return to the original site.

How would I go about doing what you suggested though? I'd assumed to use
Curl, but your email suggests not to?

Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] Using Curl to replicate a site

2009-12-10 Thread Joseph Thayne
If the site can be a few minutes behind, (say 15-30 minutes), then what 
I recommend is to create a caching script that will update the necessary 
files if the md5 checksum has changed at all (or a specified time period 
has past).  Then store those files locally, and run local copies of the 
files.  Your performance will be much better than if you have to request 
the page from another server every time.  You could run this script 
every 15-30 minutes depending on your needs via a cron job.


Joseph

Ashley Sheridan wrote:

Hi,

I need to replicate a site on another domain, and in this case, an
iframe won't really do, as I need to remove some of the graphics, etc
around the content. The owner of the site I'm needing to copy has asked
for the site to be duplicated, and unfortunately in this case, because
of the CMS he's used (which is owned by the hosting he uses) I need a
way to have the site replicated on an already existing domain as a
microsite, but in a way that it is always up-to-date.

I'm fine using Curl to grab the site, and even alter the content that is
returned, but I was thinking about a caching mechanism. Has anyone any
suggestions on this?

Thanks,
Ash
http://www.ashleysheridan.co.uk



  


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Using Curl to replicate a site

2009-12-10 Thread Robert Cummings

Ashley Sheridan wrote:

On Thu, 2009-12-10 at 11:10 -0500, Robert Cummings wrote:

Ashley Sheridan wrote:
 Hi,
 
 I need to replicate a site on another domain, and in this case, an

 iframe won't really do, as I need to remove some of the graphics, etc
 around the content. The owner of the site I'm needing to copy has asked
 for the site to be duplicated, and unfortunately in this case, because
 of the CMS he's used (which is owned by the hosting he uses) I need a
 way to have the site replicated on an already existing domain as a
 microsite, but in a way that it is always up-to-date.
 
 I'm fine using Curl to grab the site, and even alter the content that is

 returned, but I was thinking about a caching mechanism. Has anyone any
 suggestions on this?

Sounds like you're creating a proxy with post processing/caching on the 
forwarded content. It should be fairly straightforward to direct page 
requests to your proxy app, then make the remote request, and 
post-process, cache, then send to the browser. The only gotcha will be 
for forms if you do caching.


Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP



The only forms are processed on another site, so there's nothing I can 
really do about that, as they return to the original site.


How would I go about doing what you suggested though? I'd assumed to use 
Curl, but your email suggests not to?


Nope, wasn't suggesting not to. You can use many techniques, but cURL is 
probably the most robust. The best way to facilitate this, IMHO, is to 
have a rewrite rule that directs all traffic for the proxy site to your 
application. Then rewrite the REQUEST_URI to point to the page on the 
real domain. Then check your cache for the content and if empty use cURL 
to retrieve the content, apply your post-processing (to strip out what 
you don't want and apply a new page layout or whatever), then cache (if 
not already cached) the content (this can be a simple database table 
with the request URI and a timestamp), then output the content.


Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Using Curl to replicate a site

2009-12-10 Thread Robert Cummings

Joseph Thayne wrote:
If the site can be a few minutes behind, (say 15-30 minutes), then what 
I recommend is to create a caching script that will update the necessary 
files if the md5 checksum has changed at all (or a specified time period 
has past).  Then store those files locally, and run local copies of the 
files.  Your performance will be much better than if you have to request 
the page from another server every time.  You could run this script 
every 15-30 minutes depending on your needs via a cron job.


Use URL rewriting or capture 404 errors to handle the proxy request. No 
need to download and cache the entire site if everyone is just 
requesting the homepage.


Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Using Curl to replicate a site

2009-12-10 Thread Ashley Sheridan
On Thu, 2009-12-10 at 11:25 -0500, Robert Cummings wrote:

 Joseph Thayne wrote:
  If the site can be a few minutes behind, (say 15-30 minutes), then what 
  I recommend is to create a caching script that will update the necessary 
  files if the md5 checksum has changed at all (or a specified time period 
  has past).  Then store those files locally, and run local copies of the 
  files.  Your performance will be much better than if you have to request 
  the page from another server every time.  You could run this script 
  every 15-30 minutes depending on your needs via a cron job.
 
 Use URL rewriting or capture 404 errors to handle the proxy request. No 
 need to download and cache the entire site if everyone is just 
 requesting the homepage.
 
 Cheers,
 Rob.
 -- 
 http://www.interjinn.com
 Application and Templating Framework for PHP
 


Yeah, I was going to use the page request to trigger the caching
mechanism, as it's unlikely that all pages are going to be equally as
popular as one another. I'll let you all know how it goes on!

Thanks,
Ash
http://www.ashleysheridan.co.uk