Did some research over the weekend.

First of all some good news:
Heavy duty sites that run Postgres include;
A) RubyForge (rubyforge.org). They have 300K records. Not too many but
moderately high.
B) .org domain.
c) sourceforge used to run postgres. AFAIK, they moved out because IBM made
that a condition for funding them and not because of any issues with
postgres.

Now, the bad news:

Postgres is still not clusterable. What postgres calls clustering is using
triggers to copy data from one server to another. This is not what is
usually meant by database clusters. There would be no good way to handle
globally unique columns or handle conflicting writes and transactions in
general.

(If you call using triggers and copy clustering. I have a geographically
distributed mysql "cluster" that uses rsync and cron !!)

When I say database cluster, I mean that you put a bullet thru one of the
servers while your query is running and your query still returns the right
results as if nothing happened. This can be achieved by something like
mounting RAID-5 (or raid-10) (software or hardware) on two postgres boxes to
somehow mirror data over two servers and the two servers somehow aware which
of them is writing which block so that they do not knock each other's
writes. Seems like that is pretty far away. To be precise, it is not really
RAID but actually raid-like-transparent-mirroring-over-network and does not
exist in mainstream linux kernel. However, it seems oracle is giving similar
technology (called
OpenGFS)(http://otn.oracle.com/tech/linux/open_source.html) as well as tools
to run this kind of network over firewire or fibre. (100 Mbps might end up
being too slow for such an exercise). IMHO, oracle is giving this technology
away so that they can sell more RAC licenses on linux. However, oracle or
not, I would prefer seeing something similar to OpenGFS integrated into main
stream kernel (I believe the closest today in mainstream kernel would be
mounting network block devices in somekind of software RAID volumes). 

Other than clustering, things that are currently not in postges but exists
in oracle:
A) materialized views (views that are calculated early on and not
just-in-time)
B) index-organized tables (where the entire table is in index. Great for
performance on narrow tables)
C) database links (this is linking columns to point at values in other
databases)
D) point-in-time recovery (restore database state to as on XXX)
E) nested transactions
F) savepoint
G) good cursor support (Right now executing a query fetches the entire
result set in memory. This is not scalable). This is easy to do for simple
cases but probably very difficult for queries inside transactions
H) execute batch /bulk updates etc. (there is copy but that is not standard)
i) peripheral tools. Lot of people use oracle forms and reports. I have not
heard of anything similar for postgres. It would be also nice to have some
good migration tools.
J) tablespaces
K) multi-column function-based indexes

Many things just came in 7.3 and are not completely tested. They inculded:
A) stored procedures that can return result sets (table functions)
B) schema support
C) prepared queries
D) dependancy tracking
E) good secuirty and priviledge model (before 7.3 it was rather elementary)
F) improved internationalization support (this still has to go some way)

tarun


_______________________________________________
ilugd mailing list
[EMAIL PROTECTED]
http://frodo.hserus.net/mailman/listinfo/ilugd

Reply via email to