Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime
Marcelo de Moraes Serpa wrote: My next project will be a kind of online photo viewer. All of these photos will need to have watermark applied to them. The problem is that, depending on the picture, different watermarks need to be applied. The easiest solution would be to process these picture at runtime using GD, apply the watermark(s) and serve them. The other approach, would be to pre-process them (maybe using GD) and create different copies on the disk, the obvious advantage being that it could be served directly via the webserver (apache), but, it would be much harder to manage (need to fix a watermark error? Re-process and re-create the images on the disk...) and would take much more disk space. I would rather process them at runtime, per request, however, this site will probably have lots of traffic. So, I've reached a deadend. Could someone share his/her experiences and thoughts and help me decide? :) I think it depends on the amount of traffic you expect - high - off-line low-to-medium - on-line, on-demand, but cached. Disk-space is cheap, especially if you don't need to be worried about backup etc. I'm not sure why you think applying watermarks in an off-line process would any less manageable than doing it on-line. FYI, The application would be custom built from the ground up using PHP 5 (Not sure if we will use a framework, if we happen to use, it will be probably CakePHP). At first, there would be no clusters, proxies or balancers, just a plain dedicated server with a good CPU, about 4GB RAM and lots of disk space. Sounds like you are planning to do the processing off-line then. You could even do a mix - if you've got a lot of photos (millions and milloins), applying the watermarks could take a while in itself, so you could leave that running slowly in the background, but combine it with an on-line process that does on-demand watermarking (when the photo is displayed). /Per Jessen, Zürich -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime
On Wed, Aug 6, 2008 at 3:04 PM, Marcelo de Moraes Serpa [EMAIL PROTECTED] wrote: Hello, My next project will be a kind of online photo viewer. All of these photos will need to have watermark applied to them. The problem is that, depending on the picture, different watermarks need to be applied. The easiest solution would be to process these picture at runtime using GD, apply the watermark(s) and serve them. The other approach, would be to pre-process them (maybe using GD) and create different copies on the disk, the obvious advantage being that it could be served directly via the webserver (apache), but, it would be much harder to manage (need to fix a watermark error? Re-process and re-create the images on the disk...) and would take much more disk space. I would rather process them at runtime, per request, however, this site will probably have lots of traffic. So, I've reached a deadend. Could someone share his/her experiences and thoughts and help me decide? :) FYI, The application would be custom built from the ground up using PHP 5 (Not sure if we will use a framework, if we happen to use, it will be probably CakePHP). At first, there would be no clusters, proxies or balancers, just a plain dedicated server with a good CPU, about 4GB RAM and lots of disk space. PS: I've put DoS in the subject tagline meaning Denial of Service as I think that maybe dynamic processing of images X lots of request could result in DoS. for the code that will invoke the watermarking, put it behind another layer, so that you can easily alter it in the future as the site grows. for example, you might use strategy pattern, and your initial strategy will use the current webserver directly. however, as the site begins to grow, you can add additional webservers, dedicated to running gd on top of php. you can then write a strategy which will pass the requests off to those boxe(s), and it will be transparent to your existing code that knows only of the strategy interface. also, as you grow, distributed filesystems are key. for example, your front-end webserver can handle requests from users on the site, dispatch a request (restful for instance) to another box, dedicated to gd. since both boxes share a common filesystem via nfs (or other) the gd box can create the watermark, which will then be immediately available to the front-end box, which it could signal w/ another request to say 'hey, the watermark is ready'. -nathan
Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime
Bernhard Kohl wrote: I think it also depends on the size of your images. If they are huge megapixel files processing them on the fly might cause severe lag. Still adding a watermark to an image with 100-200 thousand pixels is done within milliseconds on a modern machine. (You probably meant to send this to the list) The OP spoke about a kind of online photo viewer, so I assumed e.g. JPEGs at 1024x768 as a typical size, so about 700K pixels. /Per Jessen, Zürich -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime
@Per Jessen Disk-space is cheap, especially if you don't need to be worried about backup etc. I'm not sure why you think applying watermarks in an off-line process would any less manageable than doing it on-line. Well, the processing will be online in the sense that it will be triggered via an admin interface. The pictures will then be batch-processed by a php script using GD and saved to the disk and later served statically, without the overhead of applying the watermark per-request, at runtime. Less manegeable becouse I would have to keep copies of the pictures on the disk. If I ever want to change these watermarks, I would have to somehow recreate them. It is more work to do than if I used the per-request runtime applying of watermark approach, since in this case, I would just apply the watermarks I wanted and then serve the stream directly from memory. Sounds like you are planning to do the processing off-line then. You could even do a mix - if you've got a lot of photos (millions and milloins), applying the watermarks could take a while in itself, so you could leave that running slowly in the background, but combine it with an on-line process that does on-demand watermarking (when the photo is displayed). Yes, applying the watermarks offline in a batch to lots of images could take a while, but the album wouldn't be published before this process is done. So, I don't really understand what you mean by mixing the two approaches. @Nathan for the code that will invoke the watermarking, put it behind another layer, so that you can easily alter it in the future as the site grows. for example, you might use strategy pattern, and your initial strategy will use the current webserver directly. however, as the site begins to grow, you can add additional webservers, dedicated to running gd on top of php. you can then write a strategy which will pass the requests off to those boxe(s), and it will be transparent to your existing code that knows only of the strategy interface. also, as you grow, distributed filesystems are key. for example, your front-end webserver can handle requests from users on the site, dispatch a request (restful for instance) to another box, dedicated to gd. since both boxes share a common filesystem via nfs (or other) the gd box can create the watermark, which will then be immediately available to the front-end box, which it could signal w/ another request to say 'hey, the watermark is ready'. You have come with some great insights, the strategy idea seems nice and could work. Adding dedicated image processing boxes is a good idea, even better if the software to apply it is written in C, but I don't think my use case justifies such an investment of time and money. Another thing that you mentioned that is of great interest to me is the use of a distributed filesystem, since I think I will just pre-process the images in batch to add the watermark, the use of HDD space will grow considerably as time goes by and the app grow. Is this approach transparent enough so that architectural changes to the app wouldn't be necessary? Thank you all for the replies! Marcelo. On Thu, Aug 7, 2008 at 3:52 AM, Per Jessen [EMAIL PROTECTED] wrote: Bernhard Kohl wrote: I think it also depends on the size of your images. If they are huge megapixel files processing them on the fly might cause severe lag. Still adding a watermark to an image with 100-200 thousand pixels is done within milliseconds on a modern machine. (You probably meant to send this to the list) The OP spoke about a kind of online photo viewer, so I assumed e.g. JPEGs at 1024x768 as a typical size, so about 700K pixels. /Per Jessen, Zürich -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] [scalability, performance, DoS] To or not to process images at runtime
Marcelo de Moraes Serpa wrote: Less manegeable becouse I would have to keep copies of the pictures on the disk. If I ever want to change these watermarks, I would have to somehow recreate them. It is more work to do than if I used the per-request runtime applying of watermark approach, since in this case, I would just apply the watermarks I wanted and then serve the stream directly from memory. Hmm, I don't usually think more work = less managable, but that's a matter for you. My personal take on this type of thing - I would go for the on-demand watermarking, but with a cached copy of everything that is watermarked. on-demand = when a photo is published the first time. Like Bernhard said earlier, it probably takes a few milliseconds to apply a watermark, so the very first time a photo is viewed, the viewer might just experience the slightest delay. With apache this is really easy to do: RewriteEngine on RewriteCond %{REQUEST_FILENAME} !-s RewriteRule ^(.+)$ apply_watermark.php?name=$1 This means: if photo-with-watermark doesn't exist, run apply-watermark.php to apply a watermark, write the photo-with-watermark to cache/disk, and then output the watermarked photo. If you need to change the watermark, just erase the cached copies and they're regenerated next time someone wants to view a photo. To save on disk-space if that is a concern, you can run regular purges of cached copies that haven't been viewed for a while: find cachedir -atime +30 -type f | xargs rm /Per Jessen, Zürich -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] [scalability, performance, DoS] To or not to process images at runtime
Hello, My next project will be a kind of online photo viewer. All of these photos will need to have watermark applied to them. The problem is that, depending on the picture, different watermarks need to be applied. The easiest solution would be to process these picture at runtime using GD, apply the watermark(s) and serve them. The other approach, would be to pre-process them (maybe using GD) and create different copies on the disk, the obvious advantage being that it could be served directly via the webserver (apache), but, it would be much harder to manage (need to fix a watermark error? Re-process and re-create the images on the disk...) and would take much more disk space. I would rather process them at runtime, per request, however, this site will probably have lots of traffic. So, I've reached a deadend. Could someone share his/her experiences and thoughts and help me decide? :) FYI, The application would be custom built from the ground up using PHP 5 (Not sure if we will use a framework, if we happen to use, it will be probably CakePHP). At first, there would be no clusters, proxies or balancers, just a plain dedicated server with a good CPU, about 4GB RAM and lots of disk space. PS: I've put DoS in the subject tagline meaning Denial of Service as I think that maybe dynamic processing of images X lots of request could result in DoS. Thanks in advance, Marcelo.