Storing and Serving Upload Images

2002-06-29 Thread Richard Clarke

List,
I find the following situation one which arises many times for me when
creating modperl applications for people. However, I always find myself
thinking there is a better way to do it. I wondered if the list would like
to share their thoughts on the best way. It concerns storing and serving
images/media uploaded by users of the webpage.

An example could be a website letting you set up your own shops to sell
products. The shop maker may allow you to upload preview images of products.
Assuming the product data is stored in a database, I personally wouldnt
store the binary image in the databas (assuming mysql here). A solution
springing to mind is to store a hash/id in the database and have a common
directory (/htdocs/_previews/) which holds the pictures named after that
hash/id. That way, either the modperl application can auto create the link
using src=/htdocs/_previews/imageid.jpg or a lightweight handler can be
used. For example /getimage?id=asdf09sd8fsa could then rewrite the uri to
the real location or perform a content subrequest and let apache serve the
image that way. Of course there are many solutions, but I'm wondering. Is
there a best one?

Any thoughts appreciated. I realise that the same situation might occur
using vanilla cgi, but mod_perl provides unique ways of solving the problem,
hence I post to this list.

Richard.





Re: Storing and Serving Upload Images

2002-06-29 Thread Stas Bekman

Richard Clarke wrote:
 List,
 I find the following situation one which arises many times for me when
 creating modperl applications for people. However, I always find myself
 thinking there is a better way to do it. I wondered if the list would like
 to share their thoughts on the best way. It concerns storing and serving
 images/media uploaded by users of the webpage.
 
 An example could be a website letting you set up your own shops to sell
 products. The shop maker may allow you to upload preview images of products.
 Assuming the product data is stored in a database, I personally wouldnt
 store the binary image in the databas (assuming mysql here). A solution
 springing to mind is to store a hash/id in the database and have a common
 directory (/htdocs/_previews/) which holds the pictures named after that
 hash/id. That way, either the modperl application can auto create the link
 using src=/htdocs/_previews/imageid.jpg or a lightweight handler can be
 used. For example /getimage?id=asdf09sd8fsa could then rewrite the uri to
 the real location or perform a content subrequest and let apache serve the
 image that way. Of course there are many solutions, but I'm wondering. Is
 there a best one?
 
 Any thoughts appreciated. I realise that the same situation might occur
 using vanilla cgi, but mod_perl provides unique ways of solving the problem,
 hence I post to this list.

I doubt this has anything to do specifically with mod_perl , since you 
are talking about storage/retrieval techniques, it'll work the same for 
any other technology out there. Though it's an interesting topic.

A *good* filesystem can serve well as database, though you should be 
aware of the issue with how many files you store in each directory: the 
more files you put into one directory the slower the access time. Modern 
filesystems (definitely don't use FAT based fs) implement internal 
hashing of file names, but you've to check the filesystem that you use 
for its limits. The retrieval speed significantly slows down as the 
search becomes linear after you pass that limit. In that case you should 
do your own hashing. so you map a filename 'abcdef.gif' into 
a/b/c/abcdef.gif (3 levels)
Again how many level of hashing to use depends on how many files you 
plan to store and how many files you can put into each directory so the 
filesystem will not go to the linear lookup. Too many levels is not good 
since each extar sub-dir slows things down. Once you have the numbers 
it's easy to calculate how much levels to use

Make your code transparent to the hashing function, so in the future you 
can easily scale and move into extra levels of hashing. Of course if you 
can benchmark things comparing the RDBMS' BLOB objects retrieval speed 
with filesystem fetch that will help to make the decision.

It also depends on the caching patterns: if you have certain images 
being fetched frequently the kernel/filesystem will do the caching for 
you. Of course you can do extra caching by yourself 
(squid/mod_proxy/etc) but if you can get it for free from the os level 
it could be even better.

Check also Perrin's article, but if I remember correctly it doesn't talk 
about this issue.
http://perl.apache.org/release/docs/tutorials/apps/scale_etoys/etoys.html

p.s. to hash (3 levels) you can use something like:

% perl -le '$a = super_pc.gif; print join /, (split //, $a, 
4)[0..2], $a'
s/u/p/super_pc.gif

of course you can use a more effective hashing.

__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com