Richard Clarke wrote:
List,
I find the following situation one which arises many times for me when
creating modperl applications for people. However, I always find myself
thinking there is a better way to do it. I wondered if the list would like
to share their thoughts on the best way. It concerns storing and serving
images/media uploaded by users of the webpage.
An example could be a website letting you set up your own shops to sell
products. The shop maker may allow you to upload preview images of products.
Assuming the product data is stored in a database, I personally wouldnt
store the binary image in the databas (assuming mysql here). A solution
springing to mind is to store a hash/id in the database and have a common
directory (/htdocs/_previews/) which holds the pictures named after that
hash/id. That way, either the modperl application can auto create the link
using src=/htdocs/_previews/imageid.jpg or a lightweight handler can be
used. For example /getimage?id=asdf09sd8fsa could then rewrite the uri to
the real location or perform a content subrequest and let apache serve the
image that way. Of course there are many solutions, but I'm wondering. Is
there a best one?
Any thoughts appreciated. I realise that the same situation might occur
using vanilla cgi, but mod_perl provides unique ways of solving the problem,
hence I post to this list.
I doubt this has anything to do specifically with mod_perl , since you
are talking about storage/retrieval techniques, it'll work the same for
any other technology out there. Though it's an interesting topic.
A *good* filesystem can serve well as database, though you should be
aware of the issue with how many files you store in each directory: the
more files you put into one directory the slower the access time. Modern
filesystems (definitely don't use FAT based fs) implement internal
hashing of file names, but you've to check the filesystem that you use
for its limits. The retrieval speed significantly slows down as the
search becomes linear after you pass that limit. In that case you should
do your own hashing. so you map a filename 'abcdef.gif' into
a/b/c/abcdef.gif (3 levels)
Again how many level of hashing to use depends on how many files you
plan to store and how many files you can put into each directory so the
filesystem will not go to the linear lookup. Too many levels is not good
since each extar sub-dir slows things down. Once you have the numbers
it's easy to calculate how much levels to use
Make your code transparent to the hashing function, so in the future you
can easily scale and move into extra levels of hashing. Of course if you
can benchmark things comparing the RDBMS' BLOB objects retrieval speed
with filesystem fetch that will help to make the decision.
It also depends on the caching patterns: if you have certain images
being fetched frequently the kernel/filesystem will do the caching for
you. Of course you can do extra caching by yourself
(squid/mod_proxy/etc) but if you can get it for free from the os level
it could be even better.
Check also Perrin's article, but if I remember correctly it doesn't talk
about this issue.
http://perl.apache.org/release/docs/tutorials/apps/scale_etoys/etoys.html
p.s. to hash (3 levels) you can use something like:
% perl -le '$a = super_pc.gif; print join /, (split //, $a,
4)[0..2], $a'
s/u/p/super_pc.gif
of course you can use a more effective hashing.
__
Stas BekmanJAm_pH -- Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide --- http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org http://ticketmaster.com