Perrin Harkins writes: > The trouble here should be obvious: sooner or later it becomes hard to scale > the database. You can cache the read-only data, but the read/write data > isn't so simple.
Good point. Fortunately, the problem isn't new. > Theoretically, the big players like Oracle and DB2 offer clustering > solutions to deal with this, but they don't seem to get used very > often. Oracle was built on an SMP assumption. They added clustering later. It doesn't scale well, which is probably why you haven't heard of people using their parallel server solutions. I don't know much about DB2, but I'm pretty sure it assumes shared memory. Tandem's Non-Stop SQL is a shared nothing architecture. It scales well, but isn't cheap to walk in the door. > Other sites find ways to divide their traffic up (users 1 - n go to > this database, n - m go to that one, etc.) Partitioning is a great way to get scalability, if you can do it. > However, you can usually scale up enough just by getting a bigger > box to run your database on until you reach the reach the realm of > Yahoo and Amazon, so this doesn't become an issue for most sites. I agree. This is why I think Apache/mod_perl is a great solution for the majority of web apps. The scaling issues supposedly being solved by J2EE don't exist. On another note, one of the ways to make sure your database scales better is to keep the database as simple as possible. I've seen a lot of solutions which rely on stored procedures to "get performance". All this does is make the database slower and more of a bottleneck. > But how can you actually make a shared nothing system for a commerce web > site? They may not be sharing local memory, but you'll need read/write > access to the same data, which means shared locking and waiting somewhere > along the line. I meant "shared nothing" in the sense of multiprocessor architectures. SMP (symmetric multiprocessing) relies on shared memory. This is the J2EE/E10K model. "shared nothing" is the Neo Classical model. Really these are NUMAs (non-uniform memory architecture), because most servers are SMPs. Here's a classic from Stonebraker on the subject: http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf DeWitt has a lot of papers on parallelism and distributed db design: http://www.cs.wisc.edu/~dewitt/ Cheers, Rob