[sqlalchemy] Re: Using multiple databases for reliability (NOT sharding)

2010-12-17 Thread Lloyd Kvam


On Dec 17, 1:21 pm, Michael Bayer mike...@zzzcomputing.com wrote:
 Doesn't seem like anyone has any thoughts here.   Its certainly not something 
 I've tried, but the general area of study here is multi-master replication: 
  http://en.wikipedia.org/wiki/Multi-master_replication.   The various ways of 
 dealing with whos dirty and whatnot fall under that realm of study.   For 
 this kind of thing I'd try to use existing techniques and tools as much as 
 possible since its not a trivial affair.

 On Dec 15, 2010, at 7:31 AM, Marcin Krol wrote:

  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  Hello everyone,

  I'm in process of writing a distributed application and I'd like to use
  SQLAlchemy as backend.

  However, it's not performance that is my concern, but reliability.

  Suppose I have n server nodes.

  Writing:

  I would like to be able to write on any node from 1 to n, and have write
  results being sent to DB on every node.

  Reading:

  I would like to randomly select one of the n clean (up-to-date) nodes
  and use it for reading objects (normal SA session handling).

  Rationale: the write stream in my case is not going to be very large
  (storing test results), while read stream is going to be heavy, so I can
  afford this sort of architecture.

  On the face of it, implementing such scenario manually should be simple:
  just wrap around SA session and have the object sent to each backend in
  turn (yes, I know, it's a performance hit but that's not a big problem
  in this project).

  However, suppose one of the nodes gets offline at some moment: it would
  have to be marked as 'dirty' and synchronized somehow with other nodes
  when returned to 'online' status. This gets complex and risky.

  Alternatively, I could go with the low tech version: always assign
  particular client to a particular server node, and back the DB up /
  replicate it elsewhere. But this cuts into availability and makes me
  maintain n backups / replicas.

  - --

  Regards,
  mk

  - --
  Premature optimization is the root of all fun.
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.9 (MingW32)
  Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/

  iQEcBAEBAgAGBQJNCLU/AAoJEFMgHzhQQ7hO/VYH+wXF08U/+dSJ0op9/h9KgnO3
  fclL3eTuRu1ppZtISoEf3VFoJoE6bzlOU2FYd/YviGHHgU3MoK+QsgL6rPiA1lGp
  wITsKExnl4jZPvGBe4pT+QQivzVMdENNTuIClGjLJq+DiqXYL7gkdzU2qukdHQB7
  JhyVyvKicU0h+E6jvlv8CpVg2WpLNyGXrmpSTap0Fs3FnUcs18P7hZCsZWNxt+mw
  nMFD9Zp/BTGiB0eOJDC6reL+ZtjDc23/oKskTp3tFI4m3KOri+k1XyO8i1DEPbiH
  fVvUPy2610+Im8/y3a1gnyxktECIhpDRsErE5lm4pXfe01dDchSkQc5eDIyECdY=
  =whqS
  -END PGP SIGNATURE-

  --
  You received this message because you are subscribed to the Google Groups 
  sqlalchemy group.
  To post to this group, send email to sqlalch...@googlegroups.com.
  To unsubscribe from this group, send email to 
  sqlalchemy+unsubscr...@googlegroups.com.
  For more options, visit this group 
  athttp://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Re: Using multiple databases for reliability (NOT sharding)

2010-12-17 Thread Lloyd Kvam
(Sorry, I mis-clicked before.)

I've used MySQL replication for sharing data from multiple databases.
You can organize the servers in a loop and configure them to pass
along the changes replicated by other servers.  Of course if any
server fails, the loop is broken until that server is running again.
This might be good enough for you if all of the servers are local and
failures are rare.

I was able to satisfy my requirements by having an extra server that I
called the collector.  The individual servers send their logfiles to
the collector irregularly, but roughly daily.  The collector passes
the log files through rewriting log headers as necessary so that it
can masquerade as the last server in the loop.  server#1 replicates
from the collector.  server#2 to server#n replicate from #1, but cheat
sending their logs to the collector.  The replication logic for all
the normal servers works as though they were in a looped replication
stream.

This is not for everyone.  Transactions go through out of order.
Inserts create their keys independently of each other.  (I use the
server_ID as part of the primary keys.)  We have controls so that some
data must be processed on a particular server.  However, updates to
uncontrolled data can happen out of sequence since the transfer of log
files to the collector is unregulated.  The replication stream can not
flow if the collector or server#1 are down, so you still have critical
failure points.  Our objective was distributing data and closing the
loop even when some servers were off the network.


On Dec 17, 1:21 pm, Michael Bayer mike...@zzzcomputing.com wrote:
 Doesn't seem like anyone has any thoughts here.   Its certainly not something 
 I've tried, but the general area of study here is multi-master replication: 
  http://en.wikipedia.org/wiki/Multi-master_replication.   The various ways of 
 dealing with whos dirty and whatnot fall under that realm of study.   For 
 this kind of thing I'd try to use existing techniques and tools as much as 
 possible since its not a trivial affair.

 On Dec 15, 2010, at 7:31 AM, Marcin Krol wrote:

  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  Hello everyone,

  I'm in process of writing a distributed application and I'd like to use
  SQLAlchemy as backend.

  However, it's not performance that is my concern, but reliability.

  Suppose I have n server nodes.

  Writing:

  I would like to be able to write on any node from 1 to n, and have write
  results being sent to DB on every node.

  Reading:

  I would like to randomly select one of the n clean (up-to-date) nodes
  and use it for reading objects (normal SA session handling).

  Rationale: the write stream in my case is not going to be very large
  (storing test results), while read stream is going to be heavy, so I can
  afford this sort of architecture.

  On the face of it, implementing such scenario manually should be simple:
  just wrap around SA session and have the object sent to each backend in
  turn (yes, I know, it's a performance hit but that's not a big problem
  in this project).

  However, suppose one of the nodes gets offline at some moment: it would
  have to be marked as 'dirty' and synchronized somehow with other nodes
  when returned to 'online' status. This gets complex and risky.

  Alternatively, I could go with the low tech version: always assign
  particular client to a particular server node, and back the DB up /
  replicate it elsewhere. But this cuts into availability and makes me
  maintain n backups / replicas.

  - --

  Regards,
  mk

  - --
  Premature optimization is the root of all fun.
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.9 (MingW32)
  Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/

  iQEcBAEBAgAGBQJNCLU/AAoJEFMgHzhQQ7hO/VYH+wXF08U/+dSJ0op9/h9KgnO3
  fclL3eTuRu1ppZtISoEf3VFoJoE6bzlOU2FYd/YviGHHgU3MoK+QsgL6rPiA1lGp
  wITsKExnl4jZPvGBe4pT+QQivzVMdENNTuIClGjLJq+DiqXYL7gkdzU2qukdHQB7
  JhyVyvKicU0h+E6jvlv8CpVg2WpLNyGXrmpSTap0Fs3FnUcs18P7hZCsZWNxt+mw
  nMFD9Zp/BTGiB0eOJDC6reL+ZtjDc23/oKskTp3tFI4m3KOri+k1XyO8i1DEPbiH
  fVvUPy2610+Im8/y3a1gnyxktECIhpDRsErE5lm4pXfe01dDchSkQc5eDIyECdY=
  =whqS
  -END PGP SIGNATURE-

  --
  You received this message because you are subscribed to the Google Groups 
  sqlalchemy group.
  To post to this group, send email to sqlalch...@googlegroups.com.
  To unsubscribe from this group, send email to 
  sqlalchemy+unsubscr...@googlegroups.com.
  For more options, visit this group 
  athttp://groups.google.com/group/sqlalchemy?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.