Re: Where to store uploads

2007-07-03 Thread Foo JH
Depending on the number of files you're expecting, you may want to limit 
the number of files you put in a single folder. For example, in your 
shared folder you may want to create 26 subfolders - one for each letter 
of the alphabet. Then you drop the files in the subfolder matching the 
first letter of the filename. There's a bit of creative play on the 
subfolder 'hash', depending again on your expected filename format.


Clinton Gormley wrote:

Following on from the thread "questions on serving big file & SQL
statement parsing", I have a related question:

Where do you store your uploads?

I can think of two approaches:

1) In the DB, store the name of the server to which your file has been 
   uploaded


2) Store your upload in a shared partition (eg on a SAN, NFS, 
   iSCSI/OCFS2)


The advantage I see in the second approach is redundancy, the
disadvantage is that there will be a slight performance cost.

Anybody have recommendations/war-stories about these or other
approaches?

thanks

Clint

  




Re: Where to store uploads

2007-07-03 Thread Perrin Harkins

On 7/3/07, Clinton Gormley <[EMAIL PROTECTED]> wrote:

1) In the DB, store the name of the server to which your file has been
   uploaded


I try to avoid files in the DB.  It always ends in tears.


2) Store your upload in a shared partition (eg on a SAN, NFS,
   iSCSI/OCFS2)


That's ok if you need them on every server.  Many applications just
upload a file and process it on one server, so they don't need this.

- Perrin


Re: Where to store uploads

2007-07-03 Thread Michael Peters
Clinton Gormley wrote:

> I can think of two approaches:
>
> 1) In the DB, store the name of the server to which your file has been
>uploaded

Don't do that. The moment you put a file into the database you loose all
of the nice tools that your OS gives you for working with files (grep, ls,
find, etc).  Plus when you send those files to the client you have to
query and stream it from the database instead of just using the
filesystem.

> 2) Store your upload in a shared partition (eg on a SAN, NFS,
>iSCSI/OCFS2)

If you need to access those files from multiple machines, than this is
much better than sticking them in the db.

> The advantage I see in the second approach is redundancy, the
> disadvantage is that there will be a slight performance cost.

Why would there by a higher performance cost? I can't imagine that a
shared filesystem would really have much more overhead than a database.

-- 
Michael Peters
Developer
Plus Three, LP


Re: Where to store uploads

2007-07-03 Thread Clinton Gormley
On Tue, 2007-07-03 at 10:26 -0400, Perrin Harkins wrote:
> On 7/3/07, Clinton Gormley <[EMAIL PROTECTED]> wrote:
> > 1) In the DB, store the name of the server to which your file has been
> >uploaded
> 
> I try to avoid files in the DB.  It always ends in tears.

Sorry - I meant, store this in the DB:
 - ID:  1234
 - type:image/jpeg
 - path:12/34/1234.jpg
 - server:  images1.domain.com

So that your program would construct a URL pointing to the correct
server, or a translation layer would forward the request to the correct
server

> 
> > 2) Store your upload in a shared partition (eg on a SAN, NFS,
> >iSCSI/OCFS2)
> 
> That's ok if you need them on every server.  Many applications just
> upload a file and process it on one server, so they don't need this.

Sure - I was thinking primarily of image hosting.

thanks

Clint
> 
> - Perrin



Re: Where to store uploads

2007-07-03 Thread Clinton Gormley
On Tue, 2007-07-03 at 10:26 -0400, Michael Peters wrote:
> Clinton Gormley wrote:
> 
> > I can think of two approaches:
> >
> > 1) In the DB, store the name of the server to which your file has been
> >uploaded
> 
> Don't do that. The moment you put a file into the database you loose all
> of the nice tools that your OS gives you for working with files (grep, ls,
> find, etc).  Plus when you send those files to the client you have to
> query and stream it from the database instead of just using the
> filesystem.
> 


I didn't realise that line was so confusing :)

I didn't mean: stick the file in the DB. 

I meant, stick the file into a directory on a particular machine, and
then put this into the DB:
Sorry - I meant, store this in the DB:
 - ID:  1234
 - type:image/jpeg
 - path:12/34/1234.jpg
 - server:  images1.domain.com  

So that your program would construct a URL pointing to the correct
server, or a translation layer would forward the request to the correct
server

Clint 



Re: Where to store uploads

2007-07-03 Thread Jonathan Vanasco


On Jul 3, 2007, at 10:38 AM, Clinton Gormley wrote:


I didn't mean: stick the file in the DB.

I meant, stick the file into a directory on a particular machine, and
then put this into the DB:
Sorry - I meant, store this in the DB:
 - ID:  1234
 - type:image/jpeg
 - path:12/34/1234.jpg
 - server:  images1.domain.com

So that your program would construct a URL pointing to the correct
server, or a translation layer would forward the request to the  
correct

server


i always do that... metadata in db and file on os

if you expect a large amount of files, you should do hash the file  
name and store in buckets


ie
$name= 'file'
$hash= md5($name);
	$path= sprintf( "/%s/%s/%s/%s" , $root, substr($name,0,2), $root,  
substr($name,2,4),$name );


you can't store by name alone because of character distribution among  
english words and numbers -- you'll end up with buckets that have 20k  
files and others that have 1.  md5 will give you a good distro in 32  
chars ( or bump to a higher base and show it in 16chars !)


if you're doing actual numbers, put in buckets working backwards --  
ie, by the power of 1,10,100 etc, and not reading frontwards.  you'll  
get more even distribution that way.


these are 2 mathematical principles... i can't remember their names.   
but they are good reads if you can find the names.  the irs uses the  
latter one for tax audits.


also keep in mind the os performance with files.  most os's do the  
best at ~1k files per directory; some are good up to 5k


md5 with base16 can give you a 3 deep directory  \d\d\d = 16*16*16 =  
4096 files
md5 with base32 can give you a 2 deep directory  \d\d = 32*32 = 1024  
files
md5 with base64 can give you a 2 deep directory  \d\d = 64*64 = 4096  
files


if only base32 were more common thats a good sweet spot.  for  
simplicity, i usually do 3 base16 chars.  but 2 base32 might be  
better for your os.



// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -

|   CEO/Founder SyndiClick Networks
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -

| Founder/CTO/CVO
|  FindMeOn.com - The cure for Multiple Web Personality Disorder
|  Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -

|  RoadSound.com - Tools For Bands, Stuff For Fans
|  Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -





Re: Where to store uploads

2007-07-06 Thread Frank Wiles
On Tue, 3 Jul 2007 10:26:55 -0400
"Perrin Harkins" <[EMAIL PROTECTED]> wrote:

> > 2) Store your upload in a shared partition (eg on a SAN, NFS,
> >iSCSI/OCFS2)
> 
> That's ok if you need them on every server.  Many applications just
> upload a file and process it on one server, so they don't need this.

  Another option for this is to use MogileFS
  (http://www.danga.com/mogilefs/) to store your files.  You can
  control the redundancy at the app level and it just "figures" 
  out where the file is when you request it. 

  Another great tool from the makers of memcached, Perlbal, 
  LiveJournal.com, etc. 

 ---
   Frank Wiles, Revolution Systems, LLC. 
 Personal : [EMAIL PROTECTED]  http://www.wiles.org
 Work : [EMAIL PROTECTED] http://www.revsys.com 



Re: Where to store uploads

2007-07-06 Thread Perrin Harkins

On 7/6/07, Frank Wiles <[EMAIL PROTECTED]> wrote:

  Another option for this is to use MogileFS
  (http://www.danga.com/mogilefs/) to store your files.  You can
  control the redundancy at the app level and it just "figures"
  out where the file is when you request it.


Keep in mind, you have to access these files using a special library
provided with it.  Existing code has to be changed in order to take
advantage of it.

- Perrin


Re: Where to store uploads

2007-07-09 Thread Boysenberry Payne

I handle files with a db pointer.
Works really good for me.  I house all of the files
on a "static" server and put pointers in the DB.
Then I just update the DB with new pointers or remove
pointers as needed.
When looking up a file I request the DB file pointer
then use that info to grab it from the static server.
Works like a charm.

-bop


On Jul 3, 2007, at 9:34 AM, Clinton Gormley wrote:


On Tue, 2007-07-03 at 10:26 -0400, Perrin Harkins wrote:

On 7/3/07, Clinton Gormley <[EMAIL PROTECTED]> wrote:
1) In the DB, store the name of the server to which your file has  
been

   uploaded


I try to avoid files in the DB.  It always ends in tears.


Sorry - I meant, store this in the DB:
 - ID:  1234
 - type:image/jpeg
 - path:12/34/1234.jpg
 - server:  images1.domain.com

So that your program would construct a URL pointing to the correct
server, or a translation layer would forward the request to the  
correct

server




2) Store your upload in a shared partition (eg on a SAN, NFS,
   iSCSI/OCFS2)


That's ok if you need them on every server.  Many applications just
upload a file and process it on one server, so they don't need this.


Sure - I was thinking primarily of image hosting.

thanks

Clint


- Perrin