On June 14th 2012, YesGood wrote: > And, but it's possible to build a load balancing cluster of database with > LVS? > where all clients connects to database servers in write and read mode. > or > there are other best tools for this target?
It's possible, when the database servers do support such a pattern. In terms of databases, you're looking for terms like "synchronous multi-master replication". As far as I know, PG doesn't support multi-master replication out of the box, but there are a few third-party tools, who claim to add such support to PG. http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling looks like a good starting point. Maybe you'd like to take a look at http://www.postgresql.org/docs/current/interactive/high-availability.html and http://www.postgresql.org/docs/current/interactive/different-replication-solutions.html as well (both for load balancing as well as HA). The next question is wether the overall design is actually an improvement to a simple active/passive HA cluster. > I want to build a cluster with postgreSQL, and then I would use the port > 5432 in the keepalived.conf > For the script in the MISC_CHECK, there are some rules? > And I don't know nagios, but with the nagios, what condition I should check? Basically Nagios consists of a daemon, a web interface and a set of so-called "plugins". The later ones are developed independently from Nagios and are simple command line utilities available via http://nagiosplugins.org/. For example this one: ---cut $ /usr/lib/nagios/plugins/check_pgsql --help check_pgsql v1.4.15 (nagios-plugins 1.4.15) Copyright (c) 1999-2007 Nagios Plugin Development Team <[email protected]> Test whether a PostgreSQL Database is accepting connections. Usage: check_pgsql [-H <host>] [-P <port>] [-c <critical time>] [-w <warning time>] [-t <timeout>] [-d <database>] [-l <logname>] [-p <password>] [...] This plugin tests a PostgreSQL DBMS to determine whether it is active and accepting queries. In its current operation, it simply connects to the specified database, and then disconnects. If no database is specified, it connects to the template1 database, which is present in every functioning PostgreSQL DBMS. The plugin will connect to a local postmaster if no host is specified. To connect to a remote host, be sure that the remote postmaster accepts TCP/IP connections (start the postmaster with the -i option). [...] ---cut would give a good start for such a "check_misc" skript. > For the replicas in the database servers pool, I thinking work with the > DRBD+ocfs2. >From what you're writing, I do assume the following: -you'll be installing PG on two nodes "behind" a LVS load balancer. -DRBD is used to give all nodes a shared block storage. -A shared OCFS2 filesystem will be run on top of the DRBD volumes. In order to speed up database queries, a lot of information is being cached in local memory (RAM) of each node. So if one node actually wrote a block onto the ocfs2 filesystem in a not multi-master-aware DBMS, the other nodes won't know about this changed information: the changing DBMS isn't aware of any other DBMS to notify about this. If the other nodes were asked for the same information, they might answer with an locally cached, but outdated information. So if your application is going to read from one PG node and write requests will go to a different PG node, you may invalidate your data. That's just a very simple example. In reality, your nodes will not only write "wrong" records to their shared database, they may also damage indexes, overwrite consistent data with inconsistent data or simply shred your data in ways your DBMS didn't ever think about and can't repair. This issue may be skipped, if PG is configured not to use any caching at all and services every read request from disk (I don't know if that's possible). However, this way you're also skipping what makes PG fast: its cache in memory. Retrieving data from memory is a lot faster than retrieving data from disk. Depending on the exact configuration, writes may also be temporarily delayed in RAM of each node, but that's another story and may shred even more data much faster in this setup. A multi-master-aware DBMS would lock affected database records per write (much like the lock manager for OCFS2 does) and ensure that any changed data has also been written onto all other nodes. Network latency usually becomes an issue here, that's why many of those systems do rely on non-Ethernet-networks for their synchronization work. Then the next dimension: even if this clustering wouldn't damage your data, it most likely won't improve your performance as well. At least actual performance for writes are likely to suffice. -DRBD does service read requests from local storage (unless that local storage has failed), so read requests are usually fast and not of an issue. Writing can be an issue, as write requests have to be performed by both nodes in the DRBD cluster before the write requests may be marked as "complete". So your database is not only writing for the local disk to complete the "write" transaction, it's also waiting for the network to transmit the changed block, the other node to acknowledge and write that block to its local disk. At least, you're adding this network latency to every write transaction. DRBD's permits some tuning (protocol a-c), but this does add inconsistency, so for a shared filesystem, you're stuck with synchronous mode. -OCFS2 requires a lock manager; this lock manager makes sure only one node at a time is able to write to a file on a shared filesystem. So if you're running two non-multi-master-aware database systems on the same shared file system, only one of them is actually able to write to the shared transaction log. The other one will either throw an error message (->no additional performance) or wait until the lock is available (->no additional performance). -OCFS2 works using block sizes from 512 Byte to 4kb. DRBD works on top of a 4kb internal block size. So in probably the worst scenario, you're using OCFS2 with 512 Byte block size and for every OCFS2 block write, DRBD will attempt to re-sync 4kb of actual data. If DRBD and OCFS2 aren't properly aligned to each other, this may eat up any remaining performance pretty fast. > And use heartbeat for the health-checking monitoring in the pool of > database servers. > It's a feasible scenario? Heartbeat is a complete cluster manager and not a monitoring tool. It's usually used for automatically managing what needs to be done when one node fails or a failed node later shows up again. If you're considering DRBD-replication (which is limited to two servers), I'd rather recommend to setup a "typical" active/passive HA setup. One node is active, while the other node is only replicating any changes from the master. In order to improve performance, I'd rather recommend tuning the PG configuration. I'm no PG expert, so I can't give much advice on this; however, I'm aware that many linux distributions do initially setup PG in a very conservative, extremely slow default configuration with much room for improvement in terms of performance. Anders -- 1&1 Internet AG Expert Systems Architect (IT Operations) Brauerstrasse 50 v://49.721.91374.0 D-76135 Karlsruhe f://49.721.91374.225 Amtsgericht Montabaur HRB 6484 Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert Hoffmann, Andreas Hofmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen Aufsichtsratsvorsitzender: Michael Scheeren _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
