I have been thinking about this post a lot. And in the context of the
solution we "hacked up".
On 10/3/07, Orlando Andico <[EMAIL PROTECTED]> wrote:
> On 10/3/07, Rogelio Serrano <[EMAIL PROTECTED]> wrote:
> ..
> > so whats so bad about stateless protocols? i prefer them actually.
> > specially over the internet. you can do distributed transactions over
> > over them and stay sane.
>
>
> I have a simple question regarding your distributed transactions.
>
> - You have a server cluster, each machines is capable of executing a
> business process transaction.
>
yes. this used to be a webserver. now its a custom server that handles
email ,http requests and the hacked up binary protocol.
> - You have a request dispatcher, which sends requests to any one of
> the machines, depending on load and a variety of other parameters
>
no need. if one frontline machine is overloaded it sends back a
redirect reply to the client. all servers know each others load. and
the overload threshold is really an arbitrary number.
in practice one machine is usually at point accepting requests until
it reaches the threshold. i never got around to configuring round
robin dns. i dont think i like lvs though.
> - You need to process say 1 million transactions per second
>
> - Now you issue a request ("transfer $1 billion dollars from account 1
> to account 2"), send it to Machine #5, this process requires a lot of
> back-end activity (which must complete as one transaction)
>
frontline client proxy retrieves account1 and account 2 from backend server
actually each request to the backend server has almost no side effect.
everything needed by the server to process the request is already in
the resource referred to and the request packet. this is the key
design requirement.
> - Halfway through the business process, half of your machines die
> (someone tripped over the power cord to the rack) including the one
> which was doing the $1 billion transaction
>
> What do you do?
>
if the frontline machine goes down then the transaction is forgotten
anyway. we use a two phase commit protocol with the backend servers.
if only a few of the backend servers go down then the frontline server
detects the failure and retry the transaction. we have had failovers
but those are planned. Failover time is not critical for us though.
Failover does take several seconds because we use ping to detect if
the node is still alive. The only effective way i can think of is
actually have two machines process the request at the same time and
the first one that replies wins. well we can use three but we will
need one more ups and the data center only has two.
The key to fast failovers then is fast detection of dead nodes.
> What if a specific machine dies?
>
same as above
> What if you start running out of capacity, can you add machines to the
> cluster in an application-transparent and zero-hands way?
>
i dont understand. you have to install the system on the hard disk
anyway. and then insert it in the rack and turn it on.
>
> Finally, can you do this in 10MB of code?
>
well binaries for our servers are only 2 MB total. expensive
optimizations will actually make this bigger.
> Such a product exists. I know of nothing that even comes within a mile
> of it in the Open Source world.
Yeah me too.
--
Lay low and nourish in obscurity
_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
[email protected] (#PLUG @ irc.free.net.ph)
Read the Guidelines: http://linux.org.ph/lists
Searchable Archives: http://archives.free.net.ph