Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime

2008-08-07 Thread Per Jessen
Marcelo de Moraes Serpa wrote:

 My next project will be a kind of online photo viewer. All of these
 photos will need to have watermark applied to them. The problem is
 that, depending on the picture, different watermarks need to be
 applied. The easiest solution would be to process these picture at
 runtime using GD, apply the watermark(s) and serve them. The other
 approach, would be to pre-process them (maybe using GD) and create
 different copies on the disk, the obvious advantage being that it
 could be served directly via the webserver (apache), but, it would be
 much harder to manage (need to fix a watermark error? Re-process and
 re-create the images on the disk...) and would take much more disk
 space. I would rather process them at runtime, per request, however,
 this site will probably have lots of traffic. So, I've reached a
 deadend. Could someone share his/her experiences and thoughts and help
 me decide? :)

I think it depends on the amount of traffic you expect - 

high - off-line
low-to-medium - on-line, on-demand, but cached.

Disk-space is cheap, especially if you don't need to be worried about
backup etc.  I'm not sure why you think applying watermarks in an
off-line process would any less manageable than doing it on-line.

 FYI, The application would be custom built from the ground up using
 PHP 5 (Not sure if we will use a framework, if we happen to use, it
 will be probably CakePHP). At first, there would be no clusters,
 proxies or balancers, just a plain dedicated server with a good CPU,
 about 4GB RAM and lots of disk space.

Sounds like you are planning to do the processing off-line then.  You
could even do a mix - if you've got a lot of photos (millions and
milloins), applying the watermarks could take a while in itself, so you
could leave that running slowly in the background, but combine it with
an on-line process that does on-demand watermarking (when the photo is
displayed).


/Per Jessen, Zürich


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime

2008-08-07 Thread Nathan Nobbe
On Wed, Aug 6, 2008 at 3:04 PM, Marcelo de Moraes Serpa [EMAIL PROTECTED]
 wrote:

 Hello,

 My next project will be a kind of online photo viewer. All of these photos
 will need to have watermark applied to them. The problem is that, depending
 on the picture, different watermarks need to be applied. The easiest
 solution would be to process these picture at runtime using GD, apply the
 watermark(s) and serve them. The other approach, would be to pre-process
 them (maybe using GD) and create different copies on the disk, the obvious
 advantage being that it could be served directly via the webserver
 (apache),
 but, it would be much harder to manage (need to fix a watermark error?
 Re-process and re-create the images on the disk...) and would take much
 more
 disk space. I would rather process them at runtime, per request, however,
 this site will probably have lots of traffic. So, I've reached a deadend.
 Could someone share his/her experiences and thoughts and help me decide? :)

 FYI, The application would be custom built from the ground up using PHP 5
 (Not sure if we will use a framework, if we happen to use, it will be
 probably CakePHP). At first, there would be no clusters, proxies or
 balancers, just a plain dedicated server with a good CPU, about 4GB RAM and
 lots of disk space.

 PS: I've put DoS in the subject tagline meaning Denial of Service as I
 think
 that maybe dynamic processing of images X lots of request could result in
 DoS.


for the code that will invoke the watermarking, put it behind another layer,
so that you can easily alter it in the future as the site grows.  for
example, you might use strategy pattern, and your initial strategy will use
the current webserver directly.  however, as the site begins to grow, you
can add additional webservers, dedicated to running gd on top of php.  you
can then write a strategy which will pass the requests off to those boxe(s),
and it will be transparent to your existing code that knows only of the
strategy interface.

also, as you grow, distributed filesystems are key.  for example, your
front-end webserver can handle requests from users on the site, dispatch a
request (restful for instance) to another box, dedicated to gd.  since both
boxes share a common filesystem via nfs (or other) the gd box can create the
watermark, which will then be immediately available to the front-end box,
which it could signal w/ another request to say 'hey, the watermark is
ready'.

-nathan


Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime

2008-08-07 Thread Per Jessen
Bernhard Kohl wrote:

 I think it also depends on the size of your images. If they are huge
 megapixel files processing them on the fly might cause severe lag.
 Still adding a watermark to an image with 100-200 thousand pixels is
 done within milliseconds on a modern machine.
 

(You probably meant to send this to the list)

The OP spoke about a kind of online photo viewer, so I assumed e.g.
JPEGs at 1024x768 as a typical size, so about 700K pixels.


/Per Jessen, Zürich



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime

2008-08-07 Thread Marcelo de Moraes Serpa
@Per Jessen

Disk-space is cheap, especially if you don't need to be worried about
backup etc.  I'm not sure why you think applying watermarks in an
off-line process would any less manageable than doing it on-line.

Well, the processing will be online in the sense that it will be triggered
via an admin interface. The pictures will then be batch-processed by a php
script using GD and saved to the disk and later served statically, without
the overhead of applying the watermark per-request, at runtime.

Less manegeable becouse I would have to keep copies of the pictures on the
disk. If I ever want to change these watermarks, I would have to somehow
recreate them. It is more work to do than if I used the per-request runtime
applying of watermark approach, since in this case, I would just apply the
watermarks I wanted and then serve the stream directly from memory.


Sounds like you are planning to do the processing off-line then.  You
 could even do a mix - if you've got a lot of photos (millions and
 milloins), applying the watermarks could take a while in itself, so you
 could leave that running slowly in the background, but combine it with
 an on-line process that does on-demand watermarking (when the photo is
 displayed).

Yes, applying the watermarks offline in a batch to lots of images could
take a while, but the album wouldn't be published before this process is
done. So, I don't really understand what you mean by mixing the two
approaches.

@Nathan

for the code that will invoke the watermarking, put it behind another layer,
 so that you can easily alter it in the future as the site grows.  for
 example, you might use strategy pattern, and your initial strategy will use
 the current webserver directly.  however, as the site begins to grow, you
 can add additional webservers, dedicated to running gd on top of php.  you
 can then write a strategy which will pass the requests off to those boxe(s),
 and it will be transparent to your existing code that knows only of the
 strategy interface.

 also, as you grow, distributed filesystems are key.  for example, your
 front-end webserver can handle requests from users on the site, dispatch a
 request (restful for instance) to another box, dedicated to gd.  since both
 boxes share a common filesystem via nfs (or other) the gd box can create the
 watermark, which will then be immediately available to the front-end box,
 which it could signal w/ another request to say 'hey, the watermark is
 ready'.


You have come with some great insights, the strategy idea seems nice and
could work. Adding dedicated image processing boxes is a good idea, even
better if the software to apply it is written in C, but I don't think my use
case justifies such an investment of time and money.

Another thing that you mentioned that is of great interest to me is the use
of a distributed filesystem, since I think I will just pre-process the
images in batch to add the watermark, the use of HDD space will grow
considerably as time goes by and the app grow. Is this approach transparent
enough so that architectural changes to the app wouldn't be necessary?

Thank you all for the replies!

Marcelo.




On Thu, Aug 7, 2008 at 3:52 AM, Per Jessen [EMAIL PROTECTED] wrote:

 Bernhard Kohl wrote:

  I think it also depends on the size of your images. If they are huge
  megapixel files processing them on the fly might cause severe lag.
  Still adding a watermark to an image with 100-200 thousand pixels is
  done within milliseconds on a modern machine.
 

 (You probably meant to send this to the list)

 The OP spoke about a kind of online photo viewer, so I assumed e.g.
 JPEGs at 1024x768 as a typical size, so about 700K pixels.


 /Per Jessen, Zürich



 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime

2008-08-07 Thread Per Jessen
Marcelo de Moraes Serpa wrote:

 Less manegeable becouse I would have to keep copies of the pictures on
 the disk. If I ever want to change these watermarks, I would have to
 somehow recreate them. It is more work to do than if I used the
 per-request runtime applying of watermark approach, since in this
 case, I would just apply the watermarks I wanted and then serve the
 stream directly from memory.

Hmm, I don't usually think more work = less managable, but that's a
matter for you. 

My personal take on this type of thing - 

I would go for the on-demand watermarking, but with a cached copy of
everything that is watermarked.  on-demand = when a photo is
published the first time.  Like Bernhard said earlier, it probably
takes a few milliseconds to apply a watermark, so the very first time a
photo is viewed, the viewer might just experience the slightest delay.  

With apache this is really easy to do:

RewriteEngine on
RewriteCond %{REQUEST_FILENAME}   !-s
RewriteRule ^(.+)$ apply_watermark.php?name=$1

This means: if photo-with-watermark doesn't exist,
run apply-watermark.php to apply a watermark, write the
photo-with-watermark to cache/disk, and then output the watermarked
photo. 

If you need to change the watermark, just erase the cached copies and
they're regenerated next time someone wants to view a photo.  To save
on disk-space if that is a concern, you can run regular purges of
cached copies that haven't been viewed for a while:

find cachedir -atime +30 -type f | xargs rm


/Per Jessen, Zürich


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] [scalability, performance, DoS] To or not to process images at runtime

2008-08-06 Thread Marcelo de Moraes Serpa
Hello,

My next project will be a kind of online photo viewer. All of these photos
will need to have watermark applied to them. The problem is that, depending
on the picture, different watermarks need to be applied. The easiest
solution would be to process these picture at runtime using GD, apply the
watermark(s) and serve them. The other approach, would be to pre-process
them (maybe using GD) and create different copies on the disk, the obvious
advantage being that it could be served directly via the webserver (apache),
but, it would be much harder to manage (need to fix a watermark error?
Re-process and re-create the images on the disk...) and would take much more
disk space. I would rather process them at runtime, per request, however,
this site will probably have lots of traffic. So, I've reached a deadend.
Could someone share his/her experiences and thoughts and help me decide? :)

FYI, The application would be custom built from the ground up using PHP 5
(Not sure if we will use a framework, if we happen to use, it will be
probably CakePHP). At first, there would be no clusters, proxies or
balancers, just a plain dedicated server with a good CPU, about 4GB RAM and
lots of disk space.

PS: I've put DoS in the subject tagline meaning Denial of Service as I think
that maybe dynamic processing of images X lots of request could result in
DoS.

Thanks in advance,

Marcelo.