Richard Clarke wrote: > List, > I find the following situation one which arises many times for me when > creating modperl applications for people. However, I always find myself > thinking there is a better way to do it. I wondered if the list would like > to share their thoughts on the best way. It concerns storing and serving > images/media uploaded by users of the webpage. > > An example could be a website letting you set up your own shops to sell > products. The shop maker may allow you to upload preview images of products. > Assuming the product data is stored in a database, I personally wouldnt > store the binary image in the databas (assuming mysql here). A solution > springing to mind is to store a hash/id in the database and have a common > directory (/htdocs/_previews/) which holds the pictures named after that > hash/id. That way, either the modperl application can auto create the link > using src=/htdocs/_previews/imageid.jpg or a lightweight handler can be > used. For example /getimage?id=asdf09sd8fsa could then rewrite the uri to > the real location or perform a content subrequest and let apache serve the > image that way. Of course there are many solutions, but I'm wondering. Is > there a best one? > > Any thoughts appreciated. I realise that the same situation might occur > using vanilla cgi, but mod_perl provides unique ways of solving the problem, > hence I post to this list.
I doubt this has anything to do specifically with mod_perl , since you are talking about storage/retrieval techniques, it'll work the same for any other technology out there. Though it's an interesting topic. A *good* filesystem can serve well as database, though you should be aware of the issue with how many files you store in each directory: the more files you put into one directory the slower the access time. Modern filesystems (definitely don't use FAT based fs) implement internal hashing of file names, but you've to check the filesystem that you use for its limits. The retrieval speed significantly slows down as the search becomes linear after you pass that limit. In that case you should do your own hashing. so you map a filename 'abcdef.gif' into a/b/c/abcdef.gif (3 levels) Again how many level of hashing to use depends on how many files you plan to store and how many files you can put into each directory so the filesystem will not go to the linear lookup. Too many levels is not good since each extar sub-dir slows things down. Once you have the numbers it's easy to calculate how much levels to use Make your code transparent to the hashing function, so in the future you can easily scale and move into extra levels of hashing. Of course if you can benchmark things comparing the RDBMS' BLOB objects retrieval speed with filesystem fetch that will help to make the decision. It also depends on the caching patterns: if you have certain images being fetched frequently the kernel/filesystem will do the caching for you. Of course you can do extra caching by yourself (squid/mod_proxy/etc) but if you can get it for free from the os level it could be even better. Check also Perrin's article, but if I remember correctly it doesn't talk about this issue. http://perl.apache.org/release/docs/tutorials/apps/scale_etoys/etoys.html p.s. to hash (3 levels) you can use something like: % perl -le '$a = "super_pc.gif"; print join "/", (split //, $a, 4)[0..2], $a' s/u/p/super_pc.gif of course you can use a more effective hashing. __________________________________________________________________ Stas Bekman JAm_pH ------> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com