subject:"\[GENERAL\] Database versus filesystem for storing images"

Re: [GENERAL] Database versus filesystem for storing images

2007-01-08 Thread Clodoaldo


2007/1/6, Maurice Aubrey [EMAIL PROTECTED]:

Clodoaldo wrote:
 But the main factor to push me in the file system direction is the
 HTTP cache management. I want the internet web clients and proxies to
  cache the images. The Apache web server has it ready and easy. If
 the images where to be stored in the DB I would have to handle the
 HTTP cache headers myself. Another code layer. Not too big a deal,
 but if Apache give me it for free...

There's a hybrid approach which has worked well for us.

You store the binary data in the database along with a signature.

On the Apache side, you write a 404 handler that, based on the request,
fetches the binary from the database and writes it locally to the
filesystem based on the signature (using a multi-level hashing scheme
possibly as detailed in previous posts).

When a request comes in to Apache, if the file exists it is served
directly without any db interaction. OTOH, if it's missing, your 404
handler kicks in to build it and you get a single trip to the db.

You get the benefits of keeping the data in the db (transaction
semantics, etc.) but also get the scalability and caching benefits
of having the front-end webservers handle delivery.

If you lose the locally cached data it's not an issue. They'll be
faulted back into existence on demand.

With multiple webservers, you can just allow the data to be cached on
each machine, or if there's too much data for that, have your load
balancer divide the requests to different webserver pools based on the
signature.

As an extension, if you need different versions of the data (like
different sizes of an image, etc.), you can modify your URLs to indicate
the version wanted and have the 404 handler take that into account when
building them. You only store the original content in the database but
could have any number of transformed versions on the webservers. Again,
losing those versions is not an issue and do not require backup.


Very interesting approach. And I think it is also original as I have
not seen any mention of it. Thanks for sharing it.

--
Clodoaldo Pinto Neto

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Database versus filesystem for storing images

2007-01-08 Thread Andrew Chernow


apache has very good page and image caching.  You could take advantage
of that using this technique.

 I wonder why this HTTP cache headers argument didn't surface in this
 heated debate.

I did other up this argument by the way.

Andrew


Clodoaldo wrote:

2007/1/5, Jorge Godoy [EMAIL PROTECTED]:

Andrew Chernow [EMAIL PROTECTED] writes:
 meet those requirements.  It is far more effecient to have apache 
access

 them

Where weren't we meeting his/her requirements?  All the discussion is 
around
available means to do that.  One option is having the files on the 
database,
the other is on the filesystem.  From my understanding we're 
discussing the

benefits of each one.  Aren't we?


Yes, although I suggested two solutions I asked for anything that
would be considered the best practice. Now I think there is not a best
practice or better, there should be one best practice for each of the
solutions.

I have done an intranet application that stored images in the
database. It worked perfectly and I used the same engine in another
intranet application to store not only images but any document which
also worked perfectly.  The decision to go the dabatase only route was
easy: The filesystem space would have to be negotiated while the space
occupied by the databases were not controlled and used an advanced
storage solution that gave lots of terabytes to be used at will. Also
the any document application should not loose a single document and
access control should be strictly enforced which was much easier to do
with the database since I had no control over the webserver and even
if I had I think the database access is still easier to control than
the filesystem access. That was in a corporate intranet.

What I'm doing now is an internet application. While the FS x DB
synchronicity is very important in some kinds of document management,
it is not in this application. Indeed if a few images are lost each
day it has no meaning in a 500K to 1M inventory. The offended clients
just upload them again. No one will be sued. The images are all
public. No need to control the access.

But the main factor to push me in the file system direction is the
HTTP cache management. I want the internet web clients and proxies to
cache the images. The Apache web server has it ready and easy. If the
images where to be stored in the DB I would have to handle the HTTP
cache headers myself. Another code layer. Not too big a deal, but if
Apache give me it for free...

I wonder why this HTTP cache headers argument didn't surface in this
heated debate. Aren't DB developers/admins aware of the internet
client's bandwidth limitations? Or they just assume the application
would handle the HTTP cache headers? In the applications I created for
intranet bandwidth was almost a non issue and I didn't care to make
them bandwidth efficient, but for the internet the problem is there
and it is big.

Regards,


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Database versus filesystem for storing images

2007-01-08 Thread Maurice Aubrey


Clodoaldo wrote:
But the main factor to push me in the file system direction is the 
HTTP cache management. I want the internet web clients and proxies to

 cache the images. The Apache web server has it ready and easy. If
the images where to be stored in the DB I would have to handle the
HTTP cache headers myself. Another code layer. Not too big a deal,
but if Apache give me it for free...


There's a hybrid approach which has worked well for us.

You store the binary data in the database along with a signature.

On the Apache side, you write a 404 handler that, based on the request,
fetches the binary from the database and writes it locally to the
filesystem based on the signature (using a multi-level hashing scheme
possibly as detailed in previous posts).

When a request comes in to Apache, if the file exists it is served
directly without any db interaction. OTOH, if it's missing, your 404
handler kicks in to build it and you get a single trip to the db.

You get the benefits of keeping the data in the db (transaction
semantics, etc.) but also get the scalability and caching benefits
of having the front-end webservers handle delivery.

If you lose the locally cached data it's not an issue. They'll be
faulted back into existence on demand.

With multiple webservers, you can just allow the data to be cached on
each machine, or if there's too much data for that, have your load
balancer divide the requests to different webserver pools based on the
signature.

As an extension, if you need different versions of the data (like
different sizes of an image, etc.), you can modify your URLs to indicate
the version wanted and have the 404 handler take that into account when
building them. You only store the original content in the database but
could have any number of transformed versions on the webservers. Again,
losing those versions is not an issue and do not require backup.

Maurice

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Database versus filesystem for storing images

2007-01-08 Thread Maurice Aubrey


Clodoaldo wrote:
But the main factor to push me in the file system direction is the 
HTTP cache management. I want the internet web clients and proxies to

 cache the images. The Apache web server has it ready and easy. If
the images where to be stored in the DB I would have to handle the
HTTP cache headers myself. Another code layer. Not too big a deal,
but if Apache give me it for free...


There's a hybrid approach which has worked well for us.

You store the binary data in the database along with a signature.

On the Apache side, you write a 404 handler that, based on the request,
fetches the binary from the database and writes it locally to the
filesystem based on the signature (using a multi-level hashing scheme
possibly as detailed in previous posts).

When a request comes in to Apache, if the file exists it is served
directly without any db interaction. OTOH, if it's missing, your 404
handler kicks in to build it and you get a single trip to the db.

You get the benefits of keeping the data in the db (transaction
semantics, etc.) but also get the scalability and caching benefits
of having the front-end webservers handle delivery.

If you lose the locally cached data it's not an issue. They'll be
faulted back into existence on demand.

With multiple webservers, you can just allow the data to be cached on
each machine, or if there's too much data for that, have your load
balancer divide the requests to different webserver pools based on the
signature.

As an extension, if you need different versions of the data (like
different sizes of an image, etc.), you can modify your URLs to indicate
the version wanted and have the 404 handler take that into account when
building them. You only store the original content in the database but
could have any number of transformed versions on the webservers. Again,
losing those versions is not an issue and do not require backup.

Maurice


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Database versus filesystem for storing images

2007-01-08 Thread Scott Ribe

 My point is: if I need to be 100% sure that what is referenced on the database
 is accessible all the time when the reference is, then I need to have this on
 the database

Not necessarily. It does take carefully controlling access, with a good deal
of thought and error-checking on the part of the code that has write access
to the files, but it can certainly be done.

 But if there are other people touching
 things -- moving / creating / deleting / changing files and directories --
 then things get more complicated to manage.

Absolutely. But allowing the kinds of tools mentioned earlier for examining
files does not require giving anyone write access ;-)


-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Database versus filesystem for storing images

2007-01-06 Thread imageguy


Clodoaldo wrote:
 5 Jan 2007 06:59:18 -0800, imageguy [EMAIL PROTECTED]:
 
  I think I know the answer,

 If you know the answer please tell it as I have read some discussions
 on the web and although I have decided on a solution I'm still not
 sure about the best answer, if there is a best answer after all.

Sorry, didn't mean to sound like and expert on this, I am actually
quite a newbie.  From all of the discussions I have read and even the
ones in this thread, including your own comments below, it would seem
that to store the files in the files system you need some sort of
application erver or :middleware - like a webserver - to handle the
retreiving and serving of the files.
My organization is developing a commercial application for document
tracking. It is not a Browser application, but rather a more
traditional windows thick client app.

At the present time we do not intend to deploy any sort of application
server - web server, ftp server, and not all of the workstations will
have access to a consistent network share.

So in this case, it is my understanding that our only real choice is to
store the documents and images in the database itself.

... unless someone knows of a postgresql function that would allow us
to server the file from the filesystem via the dbserver ??



  but if you don't have an application
  server - ie a webserver, etc,

 Yes I have an application server, the Apache server.

  and many of the workstations/clients
  that need access to the images but may not have access to a network
  share,

 network share? I don't understand. The images will be loaded by html
 pages with the img tag like in img
 src=http://domain.com/images/xxx.jpg;

  isn't the database the only choice ?

 No. It is one of the choices. The other is to store the images in the
 file system, in a directory readable by Apache.

See above.  WE are trying to reduce the dependancies on other
applications to ensure a simple deployment of the application.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Database versus filesystem for storing images

2007-01-06 Thread Clodoaldo


2007/1/5, Jorge Godoy [EMAIL PROTECTED]:

Andrew Chernow [EMAIL PROTECTED] writes:
 meet those requirements.  It is far more effecient to have apache access
 them

Where weren't we meeting his/her requirements?  All the discussion is around
available means to do that.  One option is having the files on the database,
the other is on the filesystem.  From my understanding we're discussing the
benefits of each one.  Aren't we?


Yes, although I suggested two solutions I asked for anything that
would be considered the best practice. Now I think there is not a best
practice or better, there should be one best practice for each of the
solutions.

I have done an intranet application that stored images in the
database. It worked perfectly and I used the same engine in another
intranet application to store not only images but any document which
also worked perfectly.  The decision to go the dabatase only route was
easy: The filesystem space would have to be negotiated while the space
occupied by the databases were not controlled and used an advanced
storage solution that gave lots of terabytes to be used at will. Also
the any document application should not loose a single document and
access control should be strictly enforced which was much easier to do
with the database since I had no control over the webserver and even
if I had I think the database access is still easier to control than
the filesystem access. That was in a corporate intranet.

What I'm doing now is an internet application. While the FS x DB
synchronicity is very important in some kinds of document management,
it is not in this application. Indeed if a few images are lost each
day it has no meaning in a 500K to 1M inventory. The offended clients
just upload them again. No one will be sued. The images are all
public. No need to control the access.

But the main factor to push me in the file system direction is the
HTTP cache management. I want the internet web clients and proxies to
cache the images. The Apache web server has it ready and easy. If the
images where to be stored in the DB I would have to handle the HTTP
cache headers myself. Another code layer. Not too big a deal, but if
Apache give me it for free...

I wonder why this HTTP cache headers argument didn't surface in this
heated debate. Aren't DB developers/admins aware of the internet
client's bandwidth limitations? Or they just assume the application
would handle the HTTP cache headers? In the applications I created for
intranet bandwidth was almost a non issue and I didn't care to make
them bandwidth efficient, but for the internet the problem is there
and it is big.

Regards,
--
Clodoaldo Pinto Neto

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Database versus filesystem for storing images

2007-01-06 Thread Clodoaldo


2007/1/6, Andrew Chernow [EMAIL PROTECTED]:

 apache has very good page and image caching.  You could take advantage
 of that using this technique.

  I wonder why this HTTP cache headers argument didn't surface in this
  heated debate.

I did other up this argument by the way.


Sorry, I understood you were talking about server side caching while
what I refer to is client side caching.

Regards,
--
Clodoaldo Pinto Neto

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Database versus filesystem for storing images

2007-01-06 Thread Merlin Moncure


On 1/6/07, Jorge Godoy [EMAIL PROTECTED] wrote:

Andrew Chernow [EMAIL PROTECTED] writes:

I mean, how do you handle integrity with data
 outside the database?
 You don't, the file system handles integrity of the stored data.  Although,
 one must careful to avoid db and fs orphans.  Meaning, a record with no
 corresponding file or a file with no corresponging record.  Always
 write()/insert an image file to the system within a transaction, including
 writing the image out to the fs.  Make sure to unlink any paritally written
 image files.

And how do you guarantee that after a failure?  You're restoring two
different sets of data here:

  - backup from your database
  - backup from your files


you have a point -- keeping two sets of data in sync is more difficult
than working of a single interface (you have to implement your own
referential integrity of sorts),  but that has to be balanced against
the fact that postgresql is not great (yet) at storing and retrieving
huge numbers of large binary objects.  then again, neither are most
filesystems in my opinion.  also, and this is coming from a sql junky,
sql is not really that good at handling binary data, you have to go
right to binary prepared statements before things become even remotely
reasonable.

taking your side for a moment, the backup argument (against databases)
is a little bit disingenuous because while dumping a 10 terabyte
database is a pain, backing up a 10 terabyte filesystem is no picnic
either, rsync will get seizures...you have to implement some smart
logic that takes advantage of the fact that the information rarely if
ever changes.

andrew's posts hint at a way to do that that could be implemented
directly in a database or a filesystem...one would choose a filesystem
for this because the overhead is lower but there are other ways...


How do you link them together on that specific operation?  Or even on a daily
basis, if you get corrupted data...


if you take this approach, you have to layer a middleware over the
filesystem and use that always.  it's a programming challenge but it
can be done...


How do you plan your backup routine
 In regards to backup, backup the files one-by-one.  Grab the lastest image
 file refs from the database and start backing up those images.  Each
 successfully backed up image should be followed by inserting that file's
 database record into a remote db server.  If anything fails, cleanup the
 partial image file (to avoid orphaned data) and rollout the transaction.

 just one idea.  i'm sure there are other ways of doing it.  point is, this is
 completely possible to do reliably.

Wouldn't replication with, e.g., Slony be easier?  And wouldn't letting the
database handle all the integrity be easier?  I mean, create an images table
and then make your record depends on this table, so if there's no record with
the image, you won't have any references to it left.


I think if you were to replicate a really big database, for something
like this, a log based replication approach (pitr, or a modified
flavor of it) would be a better bet.  pg_dump should not even enter
into your vocabulary, unless you did some hackery like storing data in
progressive tables.


It would also make the backup plan easier: backup the database.


backing up big *anythings* can be tough, pg_dump is not a scalable tool.


Not counting that depending on your choice of filesystem and image size you
might get a very poor performance.


performance of storing large blobs in the database is not great...if
you can get 50mb/sec writing to a filesystem, count yourself lucky if
you get 10mb/sec writing to the database, and this is only if you make
your client code very efficient.

I think at some point in the future postgresql might be a useful tool
in the way you are describing.  currently, there are some performance
issues, more flexibility in backup options would be required, and
maybe some other things.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Database versus filesystem for storing images

2007-01-06 Thread Jeremy Haile

 I wonder why this HTTP cache headers argument didn't surface in this
 heated debate. 

I mentioned this earlier as well.  Although you could do it in the app
layer - it would be easier to just let the web server handle it.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Database versus filesystem for storing images

2007-01-06 Thread John McCawley

Is there any overwhelming reason you can't just stick an apache server 
on your DB server?  Unless you expect this thing to get hit *hard*, the 
performance of having them both on the same machine is pretty acceptable 
(I know, everyone's opinion about what constitutes heavy usage 
differs).  If this is a simple intranet application with around 100 
users, the performance should be OK.


If I had to write a client-side app to pull images off of a server, I'd 
much rather code HTTP GETs into my app than deal with binary data from 
the database.  I've done both, and HTTP GETs are much easier.  Bear in 
mind I haven't pulled binaries from a database in about 5 years, so 
things might be easier now.


imageguy wrote:


Clodoaldo wrote:
 


5 Jan 2007 06:59:18 -0800, imageguy [EMAIL PROTECTED]:
   


I think I know the answer,
 


If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.
   



Sorry, didn't mean to sound like and expert on this, I am actually
quite a newbie.  From all of the discussions I have read and even the
ones in this thread, including your own comments below, it would seem
that to store the files in the files system you need some sort of
application erver or :middleware - like a webserver - to handle the
retreiving and serving of the files.
My organization is developing a commercial application for document
tracking. It is not a Browser application, but rather a more
traditional windows thick client app.

At the present time we do not intend to deploy any sort of application
server - web server, ftp server, and not all of the workstations will
have access to a consistent network share.

So in this case, it is my understanding that our only real choice is to
store the documents and images in the database itself.

... unless someone knows of a postgresql function that would allow us
to server the file from the filesystem via the dbserver ??


 


but if you don't have an application
server - ie a webserver, etc,
 


Yes I have an application server, the Apache server.

   


and many of the workstations/clients
that need access to the images but may not have access to a network
share,
 


network share? I don't understand. The images will be loaded by html
pages with the img tag like in img
src=http://domain.com/images/xxx.jpg;

   


isn't the database the only choice ?
 


No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.
   



See above.  WE are trying to reduce the dependancies on other
applications to ensure a simple deployment of the application.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Database versus filesystem for storing images

2007-01-06 Thread Dawid Kuroczko


On 1/5/07, Jorge Godoy [EMAIL PROTECTED] wrote:

Andrew Chernow [EMAIL PROTECTED] writes:

 And how do you guarantee that after a failure?  You're restoring two
 different sets of data here:

 How do you link them together on that specific operation?  Or even on a daily
 basis, if you get corrupted data...

 I answered that already.

I'm sorry.  It must be the flu, the pain or something else, but I really don't
remember reading your message about how you can be 100% sure that all
references to the filesystem have their corresponding files present and also
all present files have their respective database entry.


By designing the system so that you can be sure.  For instance delegate
removing data from filesystem to a dedicated queue table within database,
and carefully check that code.  Let no other software delete data.

If you need PITR, you can stop the remover utility during backups (think of
it as a VACUUM for filesystem ;)).


I've seen HA measures (I don't imagine anyone sacrificing their customers
copying 3 billion files and a few hundred terabytes while still maintaining an
adequate service rate with part of its infra-structure down, just to use your
example to that answer...), ideas about requiring an answer from the
filesystem before considering the transaction done DB-wise (who grants you
that the image really went to the disk and is not on cache when the machine
has a power failure and shuts down abruptly?)...


And who grants you that the WAL log file really went to the disk and is not on
cache when the machine has a power failure and shuts down abruptly?

Use a trustworthy hardware.  You would have to do it anyway, if you wanted
to go with all-in-DB approach.


 Some people have seen this as a disadvantage on this thread, I personally
 don't see it that why.

I am questioning two points that show two situations where it is bad.
Specially if those images are important to the records (e.g. product failure
images, prize winning images, product specs, prototype images, blueprints --
after all, we don't need to restrict our files to images, right? --,
agreements, spreadsheets with the last years of company account movements,
documents received from lawyers, etc.).


I don't think noone is saying that storing images in DB isn't better from
data integrity point of view.  But it has drawbacks, which sometimes make
pepople store their images elsewhere in real life.

Of course if one had infinite budget...  But other than that, if you are storing
lots of data, and you can afford a trade-off between data safety (that you
can loose some data or that your data can be bloated by dead data) and
costs, then you have to consider storing data otherwise.


 I guess it depends on access needs, many files and how much data you have.
 What if you had 3 billion files across a few hundred terabytes?  Can you say
 with experience how the database would hold up in this situation?

I'd have partitioning if I had a case like that.  Part of those would be
delegated to one machine, part to another and so on.  Even if that solution --
partitioning -- makes the overall MTBF lower...


And how do you handle data integrity between many machines? The answer
is of 2PC, I guess.  But still, managing integrity between many machines is
also hard -- remember, you don't have RI constraints for remote data.


And I still can't imagine how you guarantee that all 3 billion files have
their corresponding entries on the database.  Couting them is not enough since
I can have one file with the wrong name present on the filesystem or some
duplicate record on the DB...


Depends what you mean by integrity.

For my needs it is sufficient that I can guarantee that every object mentioned
in a database is present on the filesystem.  Orphaned files are not a problem
and can be removed (if there really is need to) by a monthly cron job (find
old files not mentioned in DB, find not accessed old files and check if they
are in DB, etc. etc.).

More important still, is that I can make a snapshot of data.  With filesystem
it would be just:
 stop remover utility
 pg_dumptar (or whatever) the files
 start remover utility

With DB, one would have to start pg_dump and wait until it finishes.
And observe
how your tables start to bloat as the data needs to be preserved for
data integrity
reasons.  How your database is more and mroe filled with dead tuples.  And how,
after it finishes VACUUM struggles to clean up the tables.  The more data you,
have, the more frequently it changes, the bigger problem it is.
Filesystem is simply
more efficient at storing data (the non-transactionness, and limited
metadata being
the tradeoff).

I don't say all in DB is fundamentally wrong.  It's just that its
niches are low
bandwidth services, like intranets or rarely visited data, some forms of very
important data store services, where one must be absolutely sure about
safety and integrity, and finally, let's call it academic/hobbyst research ;-)

files outside of DB

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread imageguy


Scott Ribe wrote:
 Personally, I'd put them on the file system, because then backup software
 can perform incremental backups. In the database, that becomes more of a
 difficulty. One suggestion, don't use a file name from a hash to store the
 image, just use the serial id, and break them up by hundreds or thousands,
 iow image 1123 might be in images/000/01/01123.

 --
 Scott Ribe
 [EMAIL PROTECTED]
 http://www.killerbytes.com/
 (303) 722-0567 voice

I think I know the answer, but if you don't have an application
server - ie a webserver, etc, and many of the workstations/clients
that need access to the images but may not have access to a network
share, isn't the database the only choice ?

 - or is there a postgresql function/utility that will server the
file from the file system based on the reference/link embeded in the
database ??

Geoff.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Clodoaldo


5 Jan 2007 06:59:18 -0800, imageguy [EMAIL PROTECTED]:


I think I know the answer,


If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.


but if you don't have an application
server - ie a webserver, etc,


Yes I have an application server, the Apache server.


and many of the workstations/clients
that need access to the images but may not have access to a network
share,


network share? I don't understand. The images will be loaded by html
pages with the img tag like in img
src=http://domain.com/images/xxx.jpg;


isn't the database the only choice ?


No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.


 - or is there a postgresql function/utility that will server the
file from the file system based on the reference/link embeded in the
database ??


I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,
--
Clodoaldo Pinto Neto

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jeremy Haile

It's almost always better to store the images on the file system and
just store the filename or relative path in the database.  

This is more efficient, doesn't bloat the database by storing files in
it, and is easier to get proper browser caching behavior (depending on
how your app is setup).   I try to avoid BLOBs whenever possible.

Cheers,
Jeremy Haile
 

On Fri, 5 Jan 2007 17:18:10 -0200, Clodoaldo
[EMAIL PROTECTED] said:
 5 Jan 2007 06:59:18 -0800, imageguy [EMAIL PROTECTED]:
 
  I think I know the answer,
 
 If you know the answer please tell it as I have read some discussions
 on the web and although I have decided on a solution I'm still not
 sure about the best answer, if there is a best answer after all.
 
  but if you don't have an application
  server - ie a webserver, etc,
 
 Yes I have an application server, the Apache server.
 
  and many of the workstations/clients
  that need access to the images but may not have access to a network
  share,
 
 network share? I don't understand. The images will be loaded by html
 pages with the img tag like in img
 src=http://domain.com/images/xxx.jpg;
 
  isn't the database the only choice ?
 
 No. It is one of the choices. The other is to store the images in the
 file system, in a directory readable by Apache.
 
   - or is there a postgresql function/utility that will server the
  file from the file system based on the reference/link embeded in the
  database ??
 
 I think some procedure languages can read files. In this case what
 would be the gain in introducing a middle man, the db server?
 
 Regards,
 -- 
 Clodoaldo Pinto Neto
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread John McCawley

Don't store your images in the database.  Store them on the filesystem 
and store their path in the database.  Anyone that tells you otherwise 
is a stark raving madman :)


My system is very heavily used, and our pg_dump is only a few gigs.  
Meanwhile our images/documents storage is well over a hundred gigs.  I'd 
hate to think that I'd have to dump and restore 100 gigs every time I 
wanted to dump the newest data to the development database.



As far as how they actually get to the client machine, typically these 
days people use web servers for this sort of thing. 


Clodoaldo wrote:


5 Jan 2007 06:59:18 -0800, imageguy [EMAIL PROTECTED]:



I think I know the answer,



If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.


but if you don't have an application
server - ie a webserver, etc,



Yes I have an application server, the Apache server.


and many of the workstations/clients
that need access to the images but may not have access to a network
share,



network share? I don't understand. The images will be loaded by html
pages with the img tag like in img
src=http://domain.com/images/xxx.jpg;


isn't the database the only choice ?



No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.


 - or is there a postgresql function/utility that will server the
file from the file system based on the reference/link embeded in the
database ??



I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

John McCawley [EMAIL PROTECTED] writes:

 Don't store your images in the database.  Store them on the filesystem and
 store their path in the database.  Anyone that tells you otherwise is a stark
 raving madman :)

 My system is very heavily used, and our pg_dump is only a few gigs.  Meanwhile
 our images/documents storage is well over a hundred gigs.  I'd hate to think
 that I'd have to dump and restore 100 gigs every time I wanted to dump the
 newest data to the development database.

How do you plan your backup routine and how do you guarantee that on a failure
all needed data is restored?  I mean, how do you handle integrity with data
outside the database?

-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread James Neff

... and Moses said unto them, 'The eleventh commandment :  thou shalt 
store images in a database!'...


What if you had another database where you stored just the images and 
not back it up if you don't want to?


As an application developer, I like the idea of storing files and images 
in the database because it makes it much easier to control access and 
security from an application standpoint.


I think Microsoft SQL Server stores blobs in a separate file, and only 
retains pointers in the actually database field for that blob.  So when 
you SELECT on that blob MS SQL reads the external file for you as if it 
lived in the database.  I don't know if Postgres does the same thing, 
but if it did, you wouldn't have to worry about bloating database files.


Sounds like this is for an Apache web application.  Think about how web 
sites like Flickr and Webshots store their images in a database.  You 
could write a cool Apache mod so that the url:  
http://mycompany.com/images/01234.jpg;  would go through this module, 
pull the appropriate image from the database and send it back; all the 
while the client is none-the-wiser.  Just a thought.


I think its one of those things where there's not right or wrong 
answer.  Instead you just have to do the minimum of what your 
application requires.  If you don't need application-level control over 
the files, then by all means store them on the file system.  But if you 
need to control security than you have to prevent physical access to the 
file (which means no file system storage) and pull the image from the 
database through the application.


My two cents,
James



John McCawley wrote:
Don't store your images in the database.  Store them on the filesystem 
and store their path in the database.  Anyone that tells you otherwise 
is a stark raving madman :)


My system is very heavily used, and our pg_dump is only a few gigs.  
Meanwhile our images/documents storage is well over a hundred gigs.  
I'd hate to think that I'd have to dump and restore 100 gigs every 
time I wanted to dump the newest data to the development database.



As far as how they actually get to the client machine, typically these 
days people use web servers for this sort of thing.

Clodoaldo wrote:


5 Jan 2007 06:59:18 -0800, imageguy [EMAIL PROTECTED]:



I think I know the answer,



If you know the answer please tell it as I have read some discussions
on the web and although I have decided on a solution I'm still not
sure about the best answer, if there is a best answer after all.


but if you don't have an application
server - ie a webserver, etc,



Yes I have an application server, the Apache server.


and many of the workstations/clients
that need access to the images but may not have access to a network
share,



network share? I don't understand. The images will be loaded by html
pages with the img tag like in img
src=http://domain.com/images/xxx.jpg;


isn't the database the only choice ?



No. It is one of the choices. The other is to store the images in the
file system, in a directory readable by Apache.


 - or is there a postgresql function/utility that will server the
file from the file system based on the reference/link embeded in the
database ??



I think some procedure languages can read files. In this case what
would be the gain in introducing a middle man, the db server?

Regards,



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Andrew Chernow


 Don't store your images in the database.  Store them on the filesystem and
 store their path in the database
I 100% agree.  Use the database as a lookup into the filesystem.  Don't load the 
database up with terabytes of non-searchable binary data?  not sure how that 
would help you?


Here is one idea:

Have a 64-bit sequence that you use to generate an image_id (becomes file name). 
 Hash that id out over a 3-level deep directory structure that allows 4096 
entries per directory.  Should give you 64-bit worth of files.


CREATE OR REPLACE FUNCTION get_image_path(image_id BIGINT)
RETURNS TEXT AS $$
  DECLARE
-- 40963, avoids integer out of range
dir1_val BIGINT := 68719476736;
dir1 BIGINT;
dir2 BIGINT;
dir3 BIGINT;
path TEXT;

  BEGIN
dir1 := (image_id / dir1_val) % 4096;
dir2 := (image_id / (4096 * 4096)) % 4096;
dir3 := (image_id / 4096) % 4096;
RETURN '/BASE_PATH/' || dir1 || '/' || dir2 ||
   '/' || dir3 || '/' || image_id;
  END;
$$ LANGUAGE PLPGSQL;

test=# select get_image_path(20039932);
   get_image_path
-
 /BASE_PATH/2/3752/2991/20039932
(1 row)

I mean, how do you handle integrity with data
 outside the database?
You don't, the file system handles integrity of the stored data.  Although, one 
must careful to avoid db and fs orphans.  Meaning, a record with no 
corresponding file or a file with no corresponging record.  Always 
write()/insert an image file to the system within a transaction, including 
writing the image out to the fs.  Make sure to unlink any paritally written 
image files.


How do you plan your backup routine
In regards to backup, backup the files one-by-one.  Grab the lastest image file 
refs from the database and start backing up those images.  Each successfully 
backed up image should be followed by inserting that file's database record into 
a remote db server.  If anything fails, cleanup the partial image file (to avoid 
orphaned data) and rollout the transaction.


just one idea.  i'm sure there are other ways of doing it.  point is, this is 
completely possible to do reliably.


andrew



Jorge Godoy wrote:

John McCawley [EMAIL PROTECTED] writes:


Don't store your images in the database.  Store them on the filesystem and
store their path in the database.  Anyone that tells you otherwise is a stark
raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs.  Meanwhile
our images/documents storage is well over a hundred gigs.  I'd hate to think
that I'd have to dump and restore 100 gigs every time I wanted to dump the
newest data to the development database.


How do you plan your backup routine and how do you guarantee that on a failure
all needed data is restored?  I mean, how do you handle integrity with data
outside the database?



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread John McCawley

This is a web app, so in my example all of the images live on a web 
server, and our data lives on a separate database server.  We have a 
completely duplicated setup offsite, and mirror images of every server 
at the backup site.  Every night we use rsync to duplicate everything 
offsite.  Also, a cron job pg_dumps every night and copies the dump over 
to the backup DB server.


And before anybody gives me any guff, our office is in New Orleans, and 
we went through Katrina with less than an hour of downtime, and without 
losing anything.  So there ;)




Jorge Godoy wrote:


John McCawley [EMAIL PROTECTED] writes:

 


Don't store your images in the database.  Store them on the filesystem and
store their path in the database.  Anyone that tells you otherwise is a stark
raving madman :)

My system is very heavily used, and our pg_dump is only a few gigs.  Meanwhile
our images/documents storage is well over a hundred gigs.  I'd hate to think
that I'd have to dump and restore 100 gigs every time I wanted to dump the
newest data to the development database.
   



How do you plan your backup routine and how do you guarantee that on a failure
all needed data is restored?  I mean, how do you handle integrity with data
outside the database?

 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jeanna Geier

We use WebDAV and Apache's Slide to store our images and, as someone pointed 
out earlier, store the links to the images in our database.

WebDAV has provided us with excellent access control and security...
http://www.webdav.org/
http://jakarta.apache.org/slide/index.html

Just my 1/2 cents,
-Jeanna

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of James Neff
Sent: Friday, January 05, 2007 2:27 PM
To: John McCawley
Cc: Clodoaldo; imageguy; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database versus filesystem for storing images


... and Moses said unto them, 'The eleventh commandment :  thou shalt 
store images in a database!'...

What if you had another database where you stored just the images and 
not back it up if you don't want to?

As an application developer, I like the idea of storing files and images 
in the database because it makes it much easier to control access and 
security from an application standpoint.

I think Microsoft SQL Server stores blobs in a separate file, and only 
retains pointers in the actually database field for that blob.  So when 
you SELECT on that blob MS SQL reads the external file for you as if it 
lived in the database.  I don't know if Postgres does the same thing, 
but if it did, you wouldn't have to worry about bloating database files.

Sounds like this is for an Apache web application.  Think about how web 
sites like Flickr and Webshots store their images in a database.  You 
could write a cool Apache mod so that the url:  
http://mycompany.com/images/01234.jpg;  would go through this module, 
pull the appropriate image from the database and send it back; all the 
while the client is none-the-wiser.  Just a thought.

I think its one of those things where there's not right or wrong 
answer.  Instead you just have to do the minimum of what your 
application requires.  If you don't need application-level control over 
the files, then by all means store them on the file system.  But if you 
need to control security than you have to prevent physical access to the 
file (which means no file system storage) and pull the image from the 
database through the application.

My two cents,
James



John McCawley wrote:
 Don't store your images in the database.  Store them on the filesystem 
 and store their path in the database.  Anyone that tells you otherwise 
 is a stark raving madman :)

 My system is very heavily used, and our pg_dump is only a few gigs.  
 Meanwhile our images/documents storage is well over a hundred gigs.  
 I'd hate to think that I'd have to dump and restore 100 gigs every 
 time I wanted to dump the newest data to the development database.


 As far as how they actually get to the client machine, typically these 
 days people use web servers for this sort of thing.
 Clodoaldo wrote:

 5 Jan 2007 06:59:18 -0800, imageguy [EMAIL PROTECTED]:


 I think I know the answer,


 If you know the answer please tell it as I have read some discussions
 on the web and although I have decided on a solution I'm still not
 sure about the best answer, if there is a best answer after all.

 but if you don't have an application
 server - ie a webserver, etc,


 Yes I have an application server, the Apache server.

 and many of the workstations/clients
 that need access to the images but may not have access to a network
 share,


 network share? I don't understand. The images will be loaded by html
 pages with the img tag like in img
 src=http://domain.com/images/xxx.jpg;

 isn't the database the only choice ?


 No. It is one of the choices. The other is to store the images in the
 file system, in a directory readable by Apache.

  - or is there a postgresql function/utility that will server the
 file from the file system based on the reference/link embeded in the
 database ??


 I think some procedure languages can read files. In this case what
 would be the gain in introducing a middle man, the db server?

 Regards,


 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread John McCawley

My comment about stark raving madmen was somewhat tongue-in-cheek.  
There is certainly a case to be made for images in a database under some 
circumstances.  However, for the average Joe web developer, and 
certainly someone new to the whole problem, I think storing them on the 
filesystem is going to be by far the most painless and efficient approach.


James Neff wrote:

... and Moses said unto them, 'The eleventh commandment :  thou shalt 
store images in a database!'...


What if you had another database where you stored just the images and 
not back it up if you don't want to?


As an application developer, I like the idea of storing files and 
images in the database because it makes it much easier to control 
access and security from an application standpoint.


I think Microsoft SQL Server stores blobs in a separate file, and only 
retains pointers in the actually database field for that blob.  So 
when you SELECT on that blob MS SQL reads the external file for you as 
if it lived in the database.  I don't know if Postgres does the same 
thing, but if it did, you wouldn't have to worry about bloating 
database files.


Sounds like this is for an Apache web application.  Think about how 
web sites like Flickr and Webshots store their images in a database.  
You could write a cool Apache mod so that the url:  
http://mycompany.com/images/01234.jpg;  would go through this module, 
pull the appropriate image from the database and send it back; all the 
while the client is none-the-wiser.  Just a thought.


I think its one of those things where there's not right or wrong 
answer.  Instead you just have to do the minimum of what your 
application requires.  If you don't need application-level control 
over the files, then by all means store them on the file system.  But 
if you need to control security than you have to prevent physical 
access to the file (which means no file system storage) and pull the 
image from the database through the application.


My two cents,
James



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jeremy Haile

How does it make it easier to control access and security?  If your web
app makes a decision about allowing access to the database, it can just
as easily make a decision about allowing access to the filesystem. 
Storing the images on the file system doesn't mean that there isn't a
piece of code that determines whether or not users can access a
particular file.  

I see security and access as a non-issue in making this decision. 
Either way, it's got to be controlled at the application level (if it is
even necessary).



On Fri, 05 Jan 2007 15:26:45 -0500, James Neff
[EMAIL PROTECTED] said:
 ... and Moses said unto them, 'The eleventh commandment :  thou shalt 
 store images in a database!'...
 
 What if you had another database where you stored just the images and 
 not back it up if you don't want to?
 
 As an application developer, I like the idea of storing files and images 
 in the database because it makes it much easier to control access and 
 security from an application standpoint.
 
 I think Microsoft SQL Server stores blobs in a separate file, and only 
 retains pointers in the actually database field for that blob.  So when 
 you SELECT on that blob MS SQL reads the external file for you as if it 
 lived in the database.  I don't know if Postgres does the same thing, 
 but if it did, you wouldn't have to worry about bloating database
 files.
 
 Sounds like this is for an Apache web application.  Think about how web 
 sites like Flickr and Webshots store their images in a database.  You 
 could write a cool Apache mod so that the url:  
 http://mycompany.com/images/01234.jpg;  would go through this module, 
 pull the appropriate image from the database and send it back; all the 
 while the client is none-the-wiser.  Just a thought.
 
 I think its one of those things where there's not right or wrong 
 answer.  Instead you just have to do the minimum of what your 
 application requires.  If you don't need application-level control over 
 the files, then by all means store them on the file system.  But if you 
 need to control security than you have to prevent physical access to the 
 file (which means no file system storage) and pull the image from the 
 database through the application.
 
 My two cents,
 James
 
 
 
 John McCawley wrote:
  Don't store your images in the database.  Store them on the filesystem 
  and store their path in the database.  Anyone that tells you otherwise 
  is a stark raving madman :)
 
  My system is very heavily used, and our pg_dump is only a few gigs.  
  Meanwhile our images/documents storage is well over a hundred gigs.  
  I'd hate to think that I'd have to dump and restore 100 gigs every 
  time I wanted to dump the newest data to the development database.
 
 
  As far as how they actually get to the client machine, typically these 
  days people use web servers for this sort of thing.
  Clodoaldo wrote:
 
  5 Jan 2007 06:59:18 -0800, imageguy [EMAIL PROTECTED]:
 
 
  I think I know the answer,
 
 
  If you know the answer please tell it as I have read some discussions
  on the web and although I have decided on a solution I'm still not
  sure about the best answer, if there is a best answer after all.
 
  but if you don't have an application
  server - ie a webserver, etc,
 
 
  Yes I have an application server, the Apache server.
 
  and many of the workstations/clients
  that need access to the images but may not have access to a network
  share,
 
 
  network share? I don't understand. The images will be loaded by html
  pages with the img tag like in img
  src=http://domain.com/images/xxx.jpg;
 
  isn't the database the only choice ?
 
 
  No. It is one of the choices. The other is to store the images in the
  file system, in a directory readable by Apache.
 
   - or is there a postgresql function/utility that will server the
  file from the file system based on the reference/link embeded in the
  database ??
 
 
  I think some procedure languages can read files. In this case what
  would be the gain in introducing a middle man, the db server?
 
  Regards,
 
 
  ---(end of broadcast)---
  TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org/
 
 
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

John McCawley [EMAIL PROTECTED] writes:

 This is a web app, so in my example all of the images live on a web server,
 and our data lives on a separate database server.  We have a completely
 duplicated setup offsite, and mirror images of every server at the backup
 site.  Every night we use rsync to duplicate everything offsite.  Also, a cron
 job pg_dumps every night and copies the dump over to the backup DB server.

 And before anybody gives me any guff, our office is in New Orleans, and we
 went through Katrina with less than an hour of downtime, and without losing
 anything.  So there ;)

Anyway, you have no guarantee that all your images exist on file and that all
existing files have a corresponding entry in your database.

-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

Andrew Chernow [EMAIL PROTECTED] writes:

I mean, how do you handle integrity with data
 outside the database?
 You don't, the file system handles integrity of the stored data.  Although,
 one must careful to avoid db and fs orphans.  Meaning, a record with no
 corresponding file or a file with no corresponging record.  Always
 write()/insert an image file to the system within a transaction, including
 writing the image out to the fs.  Make sure to unlink any paritally written
 image files.

And how do you guarantee that after a failure?  You're restoring two
different sets of data here: 

  - backup from your database
  - backup from your files

How do you link them together on that specific operation?  Or even on a daily
basis, if you get corrupted data...

How do you plan your backup routine
 In regards to backup, backup the files one-by-one.  Grab the lastest image
 file refs from the database and start backing up those images.  Each
 successfully backed up image should be followed by inserting that file's
 database record into a remote db server.  If anything fails, cleanup the
 partial image file (to avoid orphaned data) and rollout the transaction.

 just one idea.  i'm sure there are other ways of doing it.  point is, this is
 completely possible to do reliably.

Wouldn't replication with, e.g., Slony be easier?  And wouldn't letting the
database handle all the integrity be easier?  I mean, create an images table
and then make your record depends on this table, so if there's no record with
the image, you won't have any references to it left. 

It would also make the backup plan easier: backup the database.

Not counting that depending on your choice of filesystem and image size you
might get a very poor performance. 



-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Andrew Chernow


 And how do you guarantee that after a failure?  You're restoring two
 different sets of data here:

 How do you link them together on that specific operation?  Or even on a daily
 basis, if you get corrupted data...

I answered that already.

 Not counting that depending on your choice of filesystem and image size you
 might get a very poor performance.
apache has very good page and image caching.  You could take advantage of that 
using this technique.


Another nice feature is the database and images can be handled spearately.  Some 
people have seen this as a disadvantage on this thread, I personally don't see 
it that why.


I guess it depends on access needs, many files and how much data you have.  What 
if you had 3 billion files across a few hundred terabytes?  Can you say with 
experience how the database would hold up in this situation?


andrew



Jorge Godoy wrote:

Andrew Chernow [EMAIL PROTECTED] writes:


I mean, how do you handle integrity with data
outside the database?

You don't, the file system handles integrity of the stored data.  Although,
one must careful to avoid db and fs orphans.  Meaning, a record with no
corresponding file or a file with no corresponging record.  Always
write()/insert an image file to the system within a transaction, including
writing the image out to the fs.  Make sure to unlink any paritally written
image files.


And how do you guarantee that after a failure?  You're restoring two
different sets of data here: 


  - backup from your database
  - backup from your files

How do you link them together on that specific operation?  Or even on a daily
basis, if you get corrupted data...


How do you plan your backup routine

In regards to backup, backup the files one-by-one.  Grab the lastest image
file refs from the database and start backing up those images.  Each
successfully backed up image should be followed by inserting that file's
database record into a remote db server.  If anything fails, cleanup the
partial image file (to avoid orphaned data) and rollout the transaction.

just one idea.  i'm sure there are other ways of doing it.  point is, this is
completely possible to do reliably.


Wouldn't replication with, e.g., Slony be easier?  And wouldn't letting the
database handle all the integrity be easier?  I mean, create an images table
and then make your record depends on this table, so if there's no record with
the image, you won't have any references to it left. 


It would also make the backup plan easier: backup the database.

Not counting that depending on your choice of filesystem and image size you
might get a very poor performance. 






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Bruno Wolff III

On Fri, Jan 05, 2007 at 15:26:45 -0500,
  James Neff [EMAIL PROTECTED] wrote:
 ... and Moses said unto them, 'The eleventh commandment :  thou shalt 
 store images in a database!'...
 
 What if you had another database where you stored just the images and 
 not back it up if you don't want to?

I think the main reason to keep images in the database is if you need
transactional semantics. If you are updating images and transactions that
started before the update, need to see the old version you are going to
want them in the database. I suspect this need isn't very common though.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jeremy Haile

Yeah - it can make it easier to implement transactional semantics by
storing them in the database, although for simple operations it wouldn't
be hard to replicate this manually.  And you are going to incur a
performance penalty by storing them in the database.  

Another thing to consider is that storing them in the file system makes
it much easier to browse the images using third-party tools, update
them, archive them (by gzipping or whatever).  This is much more
difficult if they are stored in the database.


On Fri, 5 Jan 2007 15:51:59 -0600, Bruno Wolff III [EMAIL PROTECTED]
said:
 On Fri, Jan 05, 2007 at 15:26:45 -0500,
   James Neff [EMAIL PROTECTED] wrote:
  ... and Moses said unto them, 'The eleventh commandment :  thou shalt 
  store images in a database!'...
  
  What if you had another database where you stored just the images and 
  not back it up if you don't want to?
 
 I think the main reason to keep images in the database is if you need
 transactional semantics. If you are updating images and transactions that
 started before the update, need to see the old version you are going to
 want them in the database. I suspect this need isn't very common though.
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Scott Marlowe

On Fri, 2007-01-05 at 15:54, Jeremy Haile wrote:
 Yeah - it can make it easier to implement transactional semantics by
 storing them in the database, although for simple operations it wouldn't
 be hard to replicate this manually.  And you are going to incur a
 performance penalty by storing them in the database.  
 
 Another thing to consider is that storing them in the file system makes
 it much easier to browse the images using third-party tools, update
 them, archive them (by gzipping or whatever).  This is much more
 difficult if they are stored in the database.

The biggest performance penalty from them being in the same database as
your data is that they're going to be the majority of your kernel /
database buffers.  So, as mentioned earlier, it's almost a necessity
(for performance reasons) to put them into their own db server.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

Jeremy Haile [EMAIL PROTECTED] writes:

 Another thing to consider is that storing them in the file system makes
 it much easier to browse the images using third-party tools, update
 them, archive them (by gzipping or whatever).  This is much more
 difficult if they are stored in the database.

If you touch the files with third-party tools how are you going to prevent
that they aren't missing when the database say they are there?  If you're
referencing them somehow, you have to guarantee that they are there...  Or am
I the only one that is thinking about referential integrity with those files? 

-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Andrew Chernow


 Or am
I the only one that is thinking about referential integrity with those files?
Not at all.  I'm not sure how 3rd party tools like apache, `ls`, `gzip`, `find`, 
nfs, etc... are breaking integrity.  Any php, jsp, C or shell script you write 
would be doing the same thing, accessing the data.  All your doing is making 
your system more accessible to a wider range of tools, other than your own.


Just like you are cautious about not deleting the pg_data folder, big no-no, you 
need to be cautious about not deleting or modifying these image files. 
Basically, the image files are an extension of the database that you would glue 
together.  I think there is a clear separation of tasks here.  I think this is 
required if you were handling any sizeable amounts of data.


The other thing is the original poster needs apache to access these images. 
This is a requirement of his/her project.  Probably a good idea to meet those 
requirements.  It is far more effecient to have apache access them directly then 
pounding your database with web requests for image file data.  It is good 
design, and distribution of tasks, to get the image paths from the database and 
and have apache server the data; select images paths from php or something.  Now 
you can have the data anywhere, on a different server, over an nfs mount, gfs, 
wherever.  Much more flexible and distributed.


Andrew


Jorge Godoy wrote:

Jeremy Haile [EMAIL PROTECTED] writes:


Another thing to consider is that storing them in the file system makes
it much easier to browse the images using third-party tools, update
them, archive them (by gzipping or whatever).  This is much more
difficult if they are stored in the database.


If you touch the files with third-party tools how are you going to prevent
that they aren't missing when the database say they are there?  If you're
referencing them somehow, you have to guarantee that they are there...  Or am
I the only one that is thinking about referential integrity with those files? 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

Andrew Chernow [EMAIL PROTECTED] writes:

 And how do you guarantee that after a failure?  You're restoring two
 different sets of data here:

 How do you link them together on that specific operation?  Or even on a daily
 basis, if you get corrupted data...

 I answered that already.

I'm sorry.  It must be the flu, the pain or something else, but I really don't
remember reading your message about how you can be 100% sure that all
references to the filesystem have their corresponding files present and also
all present files have their respective database entry.

I've seen HA measures (I don't imagine anyone sacrificing their customers
copying 3 billion files and a few hundred terabytes while still maintaining an
adequate service rate with part of its infra-structure down, just to use your
example to that answer...), ideas about requiring an answer from the
filesystem before considering the transaction done DB-wise (who grants you
that the image really went to the disk and is not on cache when the machine
has a power failure and shuts down abruptly?)...

I might have missed your message, though.  Would you be gentle to quote that
again, please?

 Another nice feature is the database and images can be handled spearately.

What might be bad.

 Some people have seen this as a disadvantage on this thread, I personally
 don't see it that why.

I am questioning two points that show two situations where it is bad.
Specially if those images are important to the records (e.g. product failure
images, prize winning images, product specs, prototype images, blueprints --
after all, we don't need to restrict our files to images, right? --,
agreements, spreadsheets with the last years of company account movements,
documents received from lawyers, etc.). 

 I guess it depends on access needs, many files and how much data you have.
 What if you had 3 billion files across a few hundred terabytes?  Can you say
 with experience how the database would hold up in this situation?

I'd have partitioning if I had a case like that.  Part of those would be
delegated to one machine, part to another and so on.  Even if that solution --
partitioning -- makes the overall MTBF lower...

And I still can't imagine how you guarantee that all 3 billion files have
their corresponding entries on the database.  Couting them is not enough since
I can have one file with the wrong name present on the filesystem or some
duplicate record on the DB... 


-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jeremy Haile

Referential integrity would be one positive for storing the files in the
database.  I wasn't responding to that.  I'm simply saying that browsing
them with third-party tools, updating, compressing/archiving, etc. is
easier if they are not in the database.  Those are all actions that I've
found useful on other projects when storing user-uploaded images.

Depending upon the number of images on disk, it might not be hard to
verify that all the images referenced from the DB are there.  You could
have a small program that steps through each record and verifies its
existence on disk.  That could easily be run after a restore or as a
batch-process.  If you're talking about trillions of images - sure
that'd be tough.  If these images are extremely important - maybe that
solution is unacceptable.  But this is just a case of too little
information to make a decision.  

There are pros and cons to both approach - in every project I've worked
on that faced this decision, I felt the pros of storing it in the file
system outweighed the pros of storing it in the DB.  But there is no
right or wrong answer to the question (unless of course you had special
circumstances that made one option clearly superior - but I don't think
we know enough details to make that call)

My two cents,
Jeremy Haile

On Fri, 05 Jan 2007 20:24:05 -0200, Jorge Godoy [EMAIL PROTECTED]
said:
 Jeremy Haile [EMAIL PROTECTED] writes:
 
  Another thing to consider is that storing them in the file system makes
  it much easier to browse the images using third-party tools, update
  them, archive them (by gzipping or whatever).  This is much more
  difficult if they are stored in the database.
 
 If you touch the files with third-party tools how are you going to
 prevent
 that they aren't missing when the database say they are there?  If you're
 referencing them somehow, you have to guarantee that they are there... 
 Or am
 I the only one that is thinking about referential integrity with those
 files? 
 
 -- 
 Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Ragnar

On fös, 2007-01-05 at 15:49 -0500, Andrew Chernow wrote:
 I 100% agree.  Use the database as a lookup into the filesystem.  Don't load 
 the 
 database up with terabytes of non-searchable binary data?  not sure how that 
 would help you?
 

  I mean, how do you handle integrity with data
   outside the database?
 You don't, the file system handles integrity of the stored data.  Although, 
 one 
 must careful to avoid db and fs orphans.  Meaning, a record with no 
 corresponding file or a file with no corresponging record.  Always 
 write()/insert an image file to the system within a transaction, including 
 writing the image out to the fs.  Make sure to unlink any paritally written 
 image files.

what happens if you rollback a transaction that just updated 
an image file?

for that matter, what happens if one transaction is using or
even reading an image while another is updating it?

gnari



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Andrew Chernow


 what happens if you rollback a transaction that just updated
 an image file?

 for that matter, what happens if one transaction is using or
 even reading an image while another is updating it?

One thing I mentioned was about a point in time backup, not updating the image. 
 This would rollback the transaction on the backup server, not the live one.


I was also proposing to never update an image.  Just generate a new file, you 
got 64-bits worth of ids.  If you need to rollback an update to image on the 
live server, unlink() the file and rollback.   If all is good, unlink the old 
version.


andrew



Ragnar wrote:

On fös, 2007-01-05 at 15:49 -0500, Andrew Chernow wrote:
I 100% agree.  Use the database as a lookup into the filesystem.  Don't load the 
database up with terabytes of non-searchable binary data?  not sure how that 
would help you?





 I mean, how do you handle integrity with data
  outside the database?
You don't, the file system handles integrity of the stored data.  Although, one 
must careful to avoid db and fs orphans.  Meaning, a record with no 
corresponding file or a file with no corresponging record.  Always 
write()/insert an image file to the system within a transaction, including 
writing the image out to the fs.  Make sure to unlink any paritally written 
image files.


what happens if you rollback a transaction that just updated 
an image file?


for that matter, what happens if one transaction is using or
even reading an image while another is updating it?

gnari



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Andrew Chernow


 copying 3 billion files and a few hundred terabytes while still maintaining an
 adequate service rate with part of its infra-structure down, just to use your

I wasn't saying to do this each time you run a backup, geez that would be 
horrible.  Pickup from where you left off the last time you backed up 
data/records.  How many images and how much data is being generated in a 60 
second period?  I dought 3 billion files and hundreds of terabytes.  When you 
know what your data generation is, you know what resources you need to replicate 
this information to a backup server (local or remote).


How is this any different than db replication.  It would have to backup the same 
amount of information?  You would require the same horse power and bandwidth.


andrew


Jorge Godoy wrote:

Andrew Chernow [EMAIL PROTECTED] writes:


And how do you guarantee that after a failure?  You're restoring two
different sets of data here:
How do you link them together on that specific operation?  Or even on a daily
basis, if you get corrupted data...

I answered that already.


I'm sorry.  It must be the flu, the pain or something else, but I really don't
remember reading your message about how you can be 100% sure that all
references to the filesystem have their corresponding files present and also
all present files have their respective database entry.

I've seen HA measures (I don't imagine anyone sacrificing their customers
copying 3 billion files and a few hundred terabytes while still maintaining an
adequate service rate with part of its infra-structure down, just to use your
example to that answer...), ideas about requiring an answer from the
filesystem before considering the transaction done DB-wise (who grants you
that the image really went to the disk and is not on cache when the machine
has a power failure and shuts down abruptly?)...

I might have missed your message, though.  Would you be gentle to quote that
again, please?


Another nice feature is the database and images can be handled spearately.


What might be bad.


Some people have seen this as a disadvantage on this thread, I personally
don't see it that why.


I am questioning two points that show two situations where it is bad.
Specially if those images are important to the records (e.g. product failure
images, prize winning images, product specs, prototype images, blueprints --
after all, we don't need to restrict our files to images, right? --,
agreements, spreadsheets with the last years of company account movements,
documents received from lawyers, etc.). 


I guess it depends on access needs, many files and how much data you have.
What if you had 3 billion files across a few hundred terabytes?  Can you say
with experience how the database would hold up in this situation?


I'd have partitioning if I had a case like that.  Part of those would be
delegated to one machine, part to another and so on.  Even if that solution --
partitioning -- makes the overall MTBF lower...

And I still can't imagine how you guarantee that all 3 billion files have
their corresponding entries on the database.  Couting them is not enough since
I can have one file with the wrong name present on the filesystem or some
duplicate record on the DB... 





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

Ragnar [EMAIL PROTECTED] writes:

 for that matter, what happens if one transaction is using or
 even reading an image while another is updating it?

I believe that this also depends on how the file is updated.  Some
applications create a temporary file with the new (or changed) content and
then replace the old file with this one.

There are also cases where you can keep a file descriptor open to one file
that has no name (it existed once) and that was replaced.  

It all depends on how / who's generating and how / who's consuming the
information.

-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

Andrew Chernow [EMAIL PROTECTED] writes:

 Or am
I the only one that is thinking about referential integrity with those files?
 Not at all.  I'm not sure how 3rd party tools like apache, `ls`, `gzip`,
 `find`, nfs, etc... are breaking integrity.  Any php, jsp, C or shell script

For gzip, for example:

- DB record contains /some/dir/file.ext
- Filesystem contains /some/dir/file.ext.gz

NFS can also be guilty if it fails or the server goes down.  If I have a share
mounted as /some/remote/dir and I say that the file is at
/some/remote/dir/file.ext but the NFS server is down then it is the same as
if the file didn't exist at all since it can't be reached.  

For both cases, if the file is inside the database and I am referencing it
then I know that it *is* there.  Referential integrity takes care of that for
me with no cost or any other action of mine. 

 you write would be doing the same thing, accessing the data.  All your doing
 is making your system more accessible to a wider range of tools, other than
 your own.

And introducint more failing points.

 Just like you are cautious about not deleting the pg_data folder, big no-no,
 you need to be cautious about not deleting or modifying these image
 files. Basically, the image files are an extension of the database that you
 would glue together.  I think there is a clear separation of tasks here.  I
 think this is required if you were handling any sizeable amounts of data.

So you have added the possibility of manipulating (which is different from
reading or accessing) the files directly but you say don't touch them!.

 The other thing is the original poster needs apache to access these
 images. This is a requirement of his/her project.  Probably a good idea to

And nothing prevents those files from being served from the database.

 meet those requirements.  It is far more effecient to have apache access
 them

Where weren't we meeting his/her requirements?  All the discussion is around
available means to do that.  One option is having the files on the database,
the other is on the filesystem.  From my understanding we're discussing the
benefits of each one.  Aren't we? 

 directly then pounding your database with web requests for image file data.

It might be.  If you can be certain that the image *is* there when it tries to
access it.  Both examples above -- gzip + NFS -- show two ways of having
different things inside the DB and on the FS. 

 It is good design, and distribution of tasks, to get the image paths from the
 database and and have apache server the data; select images paths from php or
 something.  Now you can have the data anywhere, on a different server, over an
 nfs mount, gfs, wherever.  Much more flexible and distributed.

And also more uncertain that the referred data is there at all.  

-- 
Jorge Godoy  [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Andrew Chernow


 And introducint more failing points.
depends on how you do it.  not everything has to go in a database to be 
reliable.  Part of good engineering is realizing when to use a tool and when not 
to.


I think a 10K view of the issue is in order.  The bigger picture is being 
missed, or I am not communicating well.


I bet if I gave you a million dollars, you could implement what I proposed; in a 
reliable, redundant fasion.  You could probably convince anyone of its merits, 
making my case for me.  All you would have to do is entertain the idea ... I 
mean you sound smart enough ... probably smarter than me :)


andrew



Jorge Godoy wrote:

Andrew Chernow [EMAIL PROTECTED] writes:


Or am
I the only one that is thinking about referential integrity with those files?

Not at all.  I'm not sure how 3rd party tools like apache, `ls`, `gzip`,
`find`, nfs, etc... are breaking integrity.  Any php, jsp, C or shell script


For gzip, for example:

- DB record contains /some/dir/file.ext
- Filesystem contains /some/dir/file.ext.gz

NFS can also be guilty if it fails or the server goes down.  If I have a share
mounted as /some/remote/dir and I say that the file is at
/some/remote/dir/file.ext but the NFS server is down then it is the same as
if the file didn't exist at all since it can't be reached.  


For both cases, if the file is inside the database and I am referencing it
then I know that it *is* there.  Referential integrity takes care of that for
me with no cost or any other action of mine. 


you write would be doing the same thing, accessing the data.  All your doing
is making your system more accessible to a wider range of tools, other than
your own.


And introducint more failing points.


Just like you are cautious about not deleting the pg_data folder, big no-no,
you need to be cautious about not deleting or modifying these image
files. Basically, the image files are an extension of the database that you
would glue together.  I think there is a clear separation of tasks here.  I
think this is required if you were handling any sizeable amounts of data.


So you have added the possibility of manipulating (which is different from
reading or accessing) the files directly but you say don't touch them!.


The other thing is the original poster needs apache to access these
images. This is a requirement of his/her project.  Probably a good idea to


And nothing prevents those files from being served from the database.


meet those requirements.  It is far more effecient to have apache access
them


Where weren't we meeting his/her requirements?  All the discussion is around
available means to do that.  One option is having the files on the database,
the other is on the filesystem.  From my understanding we're discussing the
benefits of each one.  Aren't we? 


directly then pounding your database with web requests for image file data.


It might be.  If you can be certain that the image *is* there when it tries to
access it.  Both examples above -- gzip + NFS -- show two ways of having
different things inside the DB and on the FS. 


It is good design, and distribution of tasks, to get the image paths from the
database and and have apache server the data; select images paths from php or
something.  Now you can have the data anywhere, on a different server, over an
nfs mount, gfs, wherever.  Much more flexible and distributed.


And also more uncertain that the referred data is there at all.  



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

Andrew Chernow [EMAIL PROTECTED] writes:

 I wasn't saying to do this each time you run a backup, geez that would be
 horrible.  Pickup from where you left off the last time you backed up
 data/records.  How many images and how much data is being generated in a 60
 second period?  I dought 3 billion files and hundreds of terabytes.  When you
 know what your data generation is, you know what resources you need to
 replicate this information to a backup server (local or remote).

I'm not talking about backups.  I'm talking about restores.  

 How is this any different than db replication.  It would have to backup the
 same amount of information?  You would require the same horse power and
 bandwidth.

The difference is that I'd restore the data and then sync the difference from
some point in time to now.  The referential integrity would be guaranteed by
the database itself and I won't have any pointers to files that doesn't exist
or files without pointers to it.

-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Database versus filesystem for storing images

2007-01-05 Thread Jorge Godoy

Andrew Chernow [EMAIL PROTECTED] writes:

 depends on how you do it.  not everything has to go in a database to be
 reliable.  Part of good engineering is realizing when to use a tool and when
 not to.

 I think a 10K view of the issue is in order.  The bigger picture is being
 missed, or I am not communicating well.

 I bet if I gave you a million dollars, you could implement what I proposed;
 in a reliable, redundant fasion.  You could probably convince anyone of its
 merits, making my case for me.  All you would have to do is entertain the
 idea ... I mean you sound smart enough ... probably smarter than me :)

LOL.  :-)  This proposal was made to Demi Moore and her marriage ended in a
disaster... ;-) 

My point is: if I need to be 100% sure that what is referenced on the database
is accessible all the time when the reference is, then I need to have this on
the database; if it can be missed sometimes then using the filesystem is
acceptable. 

I'd include on the 100% category fiscal records, client's documents, company
projects, etc.  On the second the drawing the user made of his grandmother,
the picture of the pretty neighbor in the shower, all reports generated
automatically from data existing on the database itself, etc.


I have systems with both solutions and the ones that give me more trouble to
manage are the ones that use external files.  If nobody has access to the
server, then it is a piece of cake.  But if there are other people touching
things -- moving / creating / deleting / changing files and directories --
then things get more complicated to manage.  Inside the DB they usually don't
dare to touch things because they know that they might destroy very important
data.  It is something that is inherent to people that everything the board of
directors use might cost their job.  Even if it is just the coffee
plate... :-) 


Anyway, do you need my account number? ;-)  You can leave the transfer charges
to me, after all friends do that ;-)  


And I am not smarter than anyone that listen to and learn with other people.
I am still learning here since I'm really a newbie in this database world.
Thanks for that, though :-)

-- 
Jorge Godoy  [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Database versus filesystem for storing images

2007-01-03 Thread Scott Ribe

Personally, I'd put them on the file system, because then backup software
can perform incremental backups. In the database, that becomes more of a
difficulty. One suggestion, don't use a file name from a hash to store the
image, just use the serial id, and break them up by hundreds or thousands,
iow image 1123 might be in images/000/01/01123.

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Database versus filesystem for storing images

2006-12-31 Thread Clodoaldo


A web application will store hundreds of thousands images.

The images meta data will be stored in postgresql but I'm in doubt
about where to store the images themselves, if in the database as byte
strings or in the file system.

Would one of them have a faster image search than the other?

In the database the images would be indexed by a serial id, that of
the file submitter, and an image number. They would always be searched
by the same key.

In the file system there would be a two or three level directory
structure and the directory names would be hexadecimal numbers like
'/images_directory/f/0/d' taken from the 2/3 first letters of the file
name, which would be a hash.

Whenever I want to know the hash name of an image so I can find it in
the file system I would have to search it first in the database by the
index. So I suppose the database will always be faster.

But there is an advantage in using the file system. It is that the
Apache's HTTP file caching management is there for free. If I store
the images in the database I will have to handle the request and
response HTTP headers myself if I want the images to be cached by the
clients.

What is the best practice in this situation? I mean not only the two
options as above but any.

Regards,
--
Clodoaldo Pinto Neto

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Database versus filesystem for storing images

2006-12-31 Thread Guy Rouillier


Clodoaldo wrote:

A web application will store hundreds of thousands images.


snip


What is the best practice in this situation? I mean not only the two
options as above but any.


This discussion comes up regularly.  See the archives for a thread 
titled Storing images in PostgreSQL databases (again) for the latest.


--
Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

44 matches

Mail list logo