> To follow up with Ted, nobody said using the filesystem is bad,

No, it is the most efficient way.

Not in my environment. All db servers have RAID 10 over 8 SCSI 15K
disks. Pulling from them is always faster than a webserver pulling
from its SATA drive.

Also, there is issue of going to the database and going to the
filesystem. Timing wise, you

Add an image server (or 20) and change the HTML to point to the image
server.

I can't imagine Flickr doing that.

I believe that the only "new" thing I have to add is for newbies.

I believe that for a newbie, it would be easier to use the filesystem
rather than the DB.

True, you then have to do some extra cleanup/management work for
deleted records, so that the related images go away.

But storing them in the DB invariably ends up with too many issues
involving DB storage size and query buffer size, compounded by data
escaping/security issues.

Strange... I came to the opposite conclusion. Using prepared
statements eliminates data escaping issues, etc. And putting the files
in the db removes the extra cleanup/management stuff. And easier to
backup (though not efficient).

Again, the problem of replication or distribution does not require a
database. If you are saying that your single database will contain all
your bitmap files, then that's messed up and your database will be a
bottleneck.

You've stated a problem: A large amount of data spread across multiple
machines, this is a real problem domain, but it absolutely does not say
why a database is the right solution or even a solution at all.

I guess you skimmed what I wrote. What I wrote was about using a
database for meta data and server and file location. I was talking
about using that date to intelligently know where the file was .... on
a file system. Mostly because it is cheaper to scale that way as you
can tune things to add more replicas for redundancy and performance.
Then that can hit scalability problems with many hundred of servers.
But easy to solve by breaking down meta data and storage parts. Have
fewer storage servers talk to a meta data (database) server such that
you can run the databases on the same cheap machines that you run the
storage stuff. Then you have a task server that manages the meta data
servers and storage servers.

Chaining things this may see like a lot of steps to go through. But it
can be very efficient throughput wise, which matters far more than a
benchmark. To anyone that has designed CPUs, this will look a little
familiar (though more flexible).

At that point you also never have a complete backup on one machine. I
remember that being a weird thing....

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to