Did some research over the weekend. First of all some good news: Heavy duty sites that run Postgres include; A) RubyForge (rubyforge.org). They have 300K records. Not too many but moderately high. B) .org domain. c) sourceforge used to run postgres. AFAIK, they moved out because IBM made that a condition for funding them and not because of any issues with postgres.
Now, the bad news: Postgres is still not clusterable. What postgres calls clustering is using triggers to copy data from one server to another. This is not what is usually meant by database clusters. There would be no good way to handle globally unique columns or handle conflicting writes and transactions in general. (If you call using triggers and copy clustering. I have a geographically distributed mysql "cluster" that uses rsync and cron !!) When I say database cluster, I mean that you put a bullet thru one of the servers while your query is running and your query still returns the right results as if nothing happened. This can be achieved by something like mounting RAID-5 (or raid-10) (software or hardware) on two postgres boxes to somehow mirror data over two servers and the two servers somehow aware which of them is writing which block so that they do not knock each other's writes. Seems like that is pretty far away. To be precise, it is not really RAID but actually raid-like-transparent-mirroring-over-network and does not exist in mainstream linux kernel. However, it seems oracle is giving similar technology (called OpenGFS)(http://otn.oracle.com/tech/linux/open_source.html) as well as tools to run this kind of network over firewire or fibre. (100 Mbps might end up being too slow for such an exercise). IMHO, oracle is giving this technology away so that they can sell more RAC licenses on linux. However, oracle or not, I would prefer seeing something similar to OpenGFS integrated into main stream kernel (I believe the closest today in mainstream kernel would be mounting network block devices in somekind of software RAID volumes). Other than clustering, things that are currently not in postges but exists in oracle: A) materialized views (views that are calculated early on and not just-in-time) B) index-organized tables (where the entire table is in index. Great for performance on narrow tables) C) database links (this is linking columns to point at values in other databases) D) point-in-time recovery (restore database state to as on XXX) E) nested transactions F) savepoint G) good cursor support (Right now executing a query fetches the entire result set in memory. This is not scalable). This is easy to do for simple cases but probably very difficult for queries inside transactions H) execute batch /bulk updates etc. (there is copy but that is not standard) i) peripheral tools. Lot of people use oracle forms and reports. I have not heard of anything similar for postgres. It would be also nice to have some good migration tools. J) tablespaces K) multi-column function-based indexes Many things just came in 7.3 and are not completely tested. They inculded: A) stored procedures that can return result sets (table functions) B) schema support C) prepared queries D) dependancy tracking E) good secuirty and priviledge model (before 7.3 it was rather elementary) F) improved internationalization support (this still has to go some way) tarun _______________________________________________ ilugd mailing list [EMAIL PROTECTED] http://frodo.hserus.net/mailman/listinfo/ilugd