Re: [HACKERS] 2-phase commit

2003-10-25 Thread Rob Butler
Of course I have no time to work on it : (, but in my opinion XA interface
and support for the JDBC driver is absolutely necessary.  I think that 2pc
will generally be used more for supporting 2pc transactions between the DB
and JMS than it would be for 2pc across 2 db's.

Glad to see some progress on 2PC with Postgres though.

Later
Rob

>
> The next step is going to be writing 2PC support to the JDBC driver using
> the new backend commands. XA interface would be very nice too, but I'm
> personally not that interested in that. Any volunteers?
>
> Please comment! I'd like to know what you guys think about this. Am I
> heading into the right direction?
>


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-10-24 Thread Heikki Linnakangas
On Fri, 10 Oct 2003, Heikki Linnakangas wrote:

> On Thu, 9 Oct 2003, Bruce Momjian wrote:
>
> > Agreed.  Let's get it into 7.5 and see it in action.  If we need to
> > adjust it, we can, but right now, we need something for distributed
> > transactions, and this seems like the logical direction.
>
> I've started working on two-phase commits last week, and the very
> basic stuff is now working. Still a lot of bugs though.

I have done more work on my 2PC commit patch. I still need to work out
notifications and CREATE statements, but otherwise I'm quite happy with it
now. I received no feedback on the first version, so I'll try to clarify
how it works a bit.

The patch is against the current cvs tip. I'll post it to the
patches-list, and you can also grab it from here:
http://www.hut.fi/~hlinnaka/twophase2.diff

The patch introduces three new commands, PREPCOMMIT, COMMITPREPARED and
ABORTPREPARED.

PREPCOMMIT is called in place of COMMIT, to put the active transaction
block into prepared state. PREPCOMMIT takes a string argument that
becomes the Global Transaction Identifier (GID) for the transaction. The
GID is used as a handle to COMMITPREPARED/ABORTPREPARED commands to finish
the 2nd phase commit. After the PREPCOMMIT command finishes, the
transaction is no longer associated with any specific backend.

COMMITPREPARED/ABORTPREPARED commands are used to finish the prepared
transaction. They can be issued from any backend.

There's also a new system view, pg_prepared_xacts that show all prepared
transactions.

Here's a little step-by-step tutorial to trying out the patch:
-
1. apply patch, patch -p0 < twophase2.diff
2. compile
3. create a new database system with initdb.
4. run postmaster
5. psql template1
6. CREATE TABLE foobar (a integer);
7. INSERT INTO foobar values (1);

8. BEGIN; UPDATE foobar SET a = 2 WHERE a = 1;
9. SELECT * FROM foobar;
10. PREPCOMMIT 'foobar_update1';

The transaction is now in prepared state, and it's no longer associated
with this backend, as you can see by issuing:

11. SELECT * FROM foobar;
12. SELECT * FROM pg_prepared_xacts;

Let's commit it then.

13. COMMITPREPARED 'foobar_update1';
14. SELECT * FROM pg_prepared_xacts;
15. SELECT * FROM foobar;

Next repeat steps 8-15 but try killing postmaster somewhere after step 9,
and observe that the transaction is not lost. Also try doing another
update with a different backend, and see that the locks held by the
prepared transaction survive the crash.


I also took a look at Satoshis patches. The main difference is that
his implementation made modifications to the BE/FE protocol, while my
implementation works at the statement level. His patches don't handle
shutdowns or broken connections yet, but that was on his TODO list.

When I started working on 2PC, I didn't know about Satoshis patches,
otherwise I probably would have took them as a starting point.

The next step is going to be writing 2PC support to the JDBC driver using
the new backend commands. XA interface would be very nice too, but I'm
personally not that interested in that. Any volunteers?

Please comment! I'd like to know what you guys think about this. Am I
heading into the right direction?

Some people have expressed concerns about performance issues with 2PC in
general. Please note that this patch doesn't change the traditional
commit routines, so it won't affect you performance if you don't use 2PC.

- Heikki


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-10-23 Thread Bruce Momjian
Satoshi Nagayasu wrote:
> Bruce,
> 
> Ok, I will write my proposal.
> 
> BTW, my 2PC work is now suspended because of my master thesis.
> My master thesis will (must) be finished in next few months.
> 
> To finish 2PC work, I feel 2 or 3 months are needed after that.

Oh, OK, that is helpful.  Perhaps Heikki Linnakangas could help too.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-10-23 Thread Satoshi Nagayasu
Bruce,

Ok, I will write my proposal.

BTW, my 2PC work is now suspended because of my master thesis.
My master thesis will (must) be finished in next few months.

To finish 2PC work, I feel 2 or 3 months are needed after that.

Bruce Momjian wrote:
> Satoshi, can you get this ready for inclusion in 7.5?  We need a formal
> proposal of how it will work from the user's perspective (new
> commands?), and how it will internally work.  It seem Heikki Linnakangas
> has also started working on this and perhaps he can help.
> 
> Ideally, we should have this proposal when we start 7.5 development in a
> few weeks.
> 
> I know some people have concerns about 2-phase commit, from a
> performance perspective and from a network failure perspective, but I
> think there are enough people who want it that we should see how this
> can be implemented with the proper safeguards.
> 
> ---
> 
> Satoshi Nagayasu wrote:
> 
>>Andrew Sullivan <[EMAIL PROTECTED]> wrote:
>>
>>>On Fri, Oct 10, 2003 at 09:46:35AM +0900, Tatsuo Ishii wrote:
>>>
Satoshi, the only guy who made a trial implementation of 2PC for
PostgreSQL, has already showed that 2PC is not that slow.
>>>
>>>If someone has a fast implementation, so much the better.  I'm not
>>>opposed to fast implementations! 
>>
>>The pgbench results of my experimental 2PC implementation
>>and plain postgresql are available.
>>
>>PostgreSQL 7.3
>>  http://snaga.org/pgsql/pgbench/pgbench-REL7_3.log
>>
>>Experimental 2PC in PostgreSQL 7.3
>>  http://snaga.org/pgsql/pgbench/pgbench-TPC0_0_2.log
>>
>>I can't see a grave overhead from this comparison.
>>
>>
>>>A
>>>
>>>-- 
>>>
>>>Andrew Sullivan 204-4141 Yonge Street
>>>Afilias CanadaToronto, Ontario Canada
>>><[EMAIL PROTECTED]>  M2P 2A8
>>> +1 416 646 3304 x110
>>>
>>>
>>>---(end of broadcast)---
>>>TIP 8: explain analyze is your friend
>>>
>>
>>
>>-- 
>>NAGAYASU Satoshi <[EMAIL PROTECTED]>
>>
>>
>>---(end of broadcast)---
>>TIP 6: Have you searched our list archives?
>>
>>   http://archives.postgresql.org
>>
> 
> 


-- 
NAGAYASU Satoshi <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] 2-phase commit

2003-10-23 Thread Bruce Momjian

Satoshi, can you get this ready for inclusion in 7.5?  We need a formal
proposal of how it will work from the user's perspective (new
commands?), and how it will internally work.  It seem Heikki Linnakangas
has also started working on this and perhaps he can help.

Ideally, we should have this proposal when we start 7.5 development in a
few weeks.

I know some people have concerns about 2-phase commit, from a
performance perspective and from a network failure perspective, but I
think there are enough people who want it that we should see how this
can be implemented with the proper safeguards.

---

Satoshi Nagayasu wrote:
> 
> Andrew Sullivan <[EMAIL PROTECTED]> wrote:
> > On Fri, Oct 10, 2003 at 09:46:35AM +0900, Tatsuo Ishii wrote:
> > > Satoshi, the only guy who made a trial implementation of 2PC for
> > > PostgreSQL, has already showed that 2PC is not that slow.
> > 
> > If someone has a fast implementation, so much the better.  I'm not
> > opposed to fast implementations! 
> 
> The pgbench results of my experimental 2PC implementation
> and plain postgresql are available.
> 
> PostgreSQL 7.3
>   http://snaga.org/pgsql/pgbench/pgbench-REL7_3.log
> 
> Experimental 2PC in PostgreSQL 7.3
>   http://snaga.org/pgsql/pgbench/pgbench-TPC0_0_2.log
> 
> I can't see a grave overhead from this comparison.
> 
> > 
> > A
> > 
> > -- 
> > 
> > Andrew Sullivan 204-4141 Yonge Street
> > Afilias CanadaToronto, Ontario Canada
> > <[EMAIL PROTECTED]>  M2P 2A8
> >  +1 416 646 3304 x110
> > 
> > 
> > ---(end of broadcast)---
> > TIP 8: explain analyze is your friend
> > 
> 
> 
> -- 
> NAGAYASU Satoshi <[EMAIL PROTECTED]>
> 
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 2-phase commit

2003-10-14 Thread Hans-Jürgen Schönig
Why would you spent time on implementing a mechanism whose ultimate
benefit is supposed to be increasing reliability and performance, when you
already realize that it will have to lock up at the slightest sight of
trouble?  There are better mechanisms out there that you can use instead.


If you want cross-server transactions, what other methods are there that
are more reliable?  It seems network unreliability is going to be a
problem no matter what method you use.


I guess we need something like PITR to make this work because otherwise 
I cannot see a way to get in sync again.
Maybe I should call the desired mechanism "Entire cluster back to 
transaction X recovery".
Did anybody hear about PITR recently?

How else would you recover from any kind of problem?
No matter what you are doing network reliability will be a problem so we 
have to live with it.
Having some "going back to something consistent" is necessary anyway.
People might argue now that committed transactions might be lost. If 
people knew which ones, its ok. 90% of all people will understand that 
in case of a crash something evil might happen.

	Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-10-14 Thread Heikki Linnakangas
On Thu, 9 Oct 2003, Bruce Momjian wrote:

> Agreed.  Let's get it into 7.5 and see it in action.  If we need to
> adjust it, we can, but right now, we need something for distributed
> transactions, and this seems like the logical direction.

I've started working on two-phase commits last week, and the very
basic stuff is now working. Still a lot of bugs though.

I posted the stuff I've put together to patches-list. I'd appreciate any
comments.

- Heikki


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 2-phase commit

2003-10-14 Thread Hans-Jürgen Schönig
I'm tired of this kind of "2PC is too slow" arguments. I think
Satoshi, the only guy who made a trial implementation of 2PC for
PostgreSQL, has already showed that 2PC is not that slow.


Where does Satoshi's implementation sit right now?  Will it patch to v7.4?
Can it provide us with a base to work from, or is it complete?


It is not ready yet.
You can find it at ...
http://snaga.org/pgsql/

It is based on 7.3

* the 2-phase commit protocol (precommit and commit)
* the multi-master replication using 2PC
* distributed transaction (distributed query)
current work

* restarting (from 2nd phase) when the session is disconnected in 
2nd phase (XLOG stuffs)
* XA compliance

future work

* hot failover and recovery in PostgreSQL cluster
* data partitioning on different servers
I have compiled it a while ago.
Seems to be pretty nice :).
	Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] 2-phase commit

2003-10-14 Thread Peter Galbavy
Jan Wieck wrote:
> 2PC is not too slow in normal operations when everything is purring
> like little kittens and you're just wasting your excess bandwidth on
> it. The point is that it behaves horrible and like a dirty backstreet
> cat at the time when things go wrong ... basically it's a neat thing
> to have, but from the second you need it it becomes useless.

I can't see anyone being forced to use it once it maybe/is supported. Like
many tools, "ouch!" is a good reaction when used untrained/incorrectly.

Peter


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-10-13 Thread Jan Wieck
Bruce Momjian wrote:

Tatsuo Ishii wrote:
> Yes.  I don't think that 2PC is a solution for robustness in face of
> network failure.  It's too slow, to begin with.  Some sort of
> multi-master system is very desirable for network failures, &c., but
> I don't think anybody does active/hot standby with 2PC any more; the
> performance is too bad.
I'm tired of this kind of "2PC is too slow" arguments. I think
Satoshi, the only guy who made a trial implementation of 2PC for
PostgreSQL, has already showed that 2PC is not that slow.
Agreed.  Let's get it into 7.5 and see it in action.  If we need to
adjust it, we can, but right now, we need something for distributed
transactions, and this seems like the logical direction.
Are you guy's kidding or what?

2PC is not too slow in normal operations when everything is purring like 
little kittens and you're just wasting your excess bandwidth on it. The 
point is that it behaves horrible and like a dirty backstreet cat at the 
time when things go wrong ... basically it's a neat thing to have, but 
from the second you need it it becomes useless.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-10-13 Thread Jordan Henderson
On Monday 13 October 2003 20:11, Rod Taylor wrote:
> > I think another way it could be handled is with nested transactions.
> > Just have the promise phase be an inner transaction commit but have an
> > outer transaction bracket that one for the actual commit.
>
> Not really. In the event of a crash, most 2PC systems will expect the
> participant to come back in the same state it crashed in.
>

Yes, this is correct.  There are certain phases of the protocol in which the 
transaction state must be re-instated from the log file after a crash of the 
DB server.  The re-instatement must occur prior to any connections being 
accepted by the server.  Additionally, the coordinator must be fully 
recoverable as well.  The coordinator may, depending on the phase of the 
commit/abort, contact child servers after it crashes.  The requirement is 
that during log replay, the transaction structures might have to be fully 
reconstructed and remain in-place after log replay has completed, until the 
disposition of the (sub)transaction is settled by the coordinator.  All 
dependent on the phase of course.

> Our nested-transaction implementation (like our standard transaction
> implementation) aborts all transactions on crash.

Jordan Henderson


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 2-phase commit

2003-10-13 Thread Rod Taylor
> I think another way it could be handled is with nested transactions.
> Just have the promise phase be an inner transaction commit but have an
> outer transaction bracket that one for the actual commit.

Not really. In the event of a crash, most 2PC systems will expect the
participant to come back in the same state it crashed in.

Our nested-transaction implementation (like our standard transaction
implementation) aborts all transactions on crash.


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 2-phase commit

2003-10-13 Thread Dann Corbit
> -Original Message-
> From: Jeroen T. Vermeulen [mailto:[EMAIL PROTECTED] 
> Sent: Saturday, October 11, 2003 5:36 AM
> To: Dann Corbit
> Cc: Christopher Browne; [EMAIL PROTECTED]
> Subject: Re: [HACKERS] 2-phase commit
> 
> 
> On Fri, Oct 10, 2003 at 09:37:53PM -0700, Dann Corbit wrote:
> > Why not apply the effort to something already done and compatibly 
> > licensed?
> > 
> > This:
> > http://dog.intalio.com/ots.html
> > 
> > Appears to be a Berkeley style licensed: 
> > http://dog.intalio.com/license.html
> > 
> > Transaction monitor.
> 
> I'd say this is complementary, not an alternative to 2PC 
> implementation issues.  

My notion is that the specification has been created that describes how
the system should operate, what the API's are, etc.  I think that most
of the work is involved in that area.  The notion is that if you program
to this spec, it will already have been well thought out and it should
be standards based when completed.
 
> The transaction monitor lives on the other side of the 
> problem.  2PC is needed in the database _so that_ the 
> transaction monitor can do its job.

Theoretically, if any database in the chain supports 2PC, you could make
all connected systems 2PC compliant by using the one functional system
as a persistent store.  But you are right.  PostgreSQL still would need
the "I promise to commit when you ask" method if it is to really support
it.

I think another way it could be handled is with nested transactions.
Just have the promise phase be an inner transaction commit but have an
outer transaction bracket that one for the actual commit.
 
> That said, having a 3-tier model is probably a good idea if 
> distributed transaction management is what we want.  :-)

In real life, I think it is _always_ done this way.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-10-11 Thread Jeroen T. Vermeulen
On Fri, Oct 10, 2003 at 09:37:53PM -0700, Dann Corbit wrote:
> Why not apply the effort to something already done and compatibly
> licensed?
> 
> This:
> http://dog.intalio.com/ots.html
> 
> Appears to be a Berkeley style licensed:
> http://dog.intalio.com/license.html
> 
> Transaction monitor.

I'd say this is complementary, not an alternative to 2PC implementation
issues.  

The transaction monitor lives on the other side of the problem.  2PC is
needed in the database _so that_ the transaction monitor can do its job.

That said, having a 3-tier model is probably a good idea if distributed
transaction management is what we want.  :-)


Jeroen


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-10-10 Thread Dann Corbit
Here is a sourceforge version of the same thing
http://openorb.sourceforge.net/

> -Original Message-
> From: Dann Corbit 
> Sent: Friday, October 10, 2003 9:38 PM
> To: Christopher Browne; [EMAIL PROTECTED]
> Subject: Re: [HACKERS] 2-phase commit
> 
> 
> Why not apply the effort to something already done and 
> compatibly licensed?
> 
> This:
> http://dog.intalio.com/ots.html
> 
> Appears to be a Berkeley style licensed: 
> http://dog.intalio.com/license.html
> 
> Transaction monitor.
> 
> "Overview
> The OpenORB Transaction Service is a very scalable 
> transaction monitor which also provides several extensions 
> like XA management, a management interface to control all 
> transaction processes and a high reliable recovery system. 
> 
> By coordinating OpenORB and OpenORB Transaction Service, you 
> provide a reliable and powerful foundation for building large 
> scalable distributed applications. 
> 
> Datasheet
> The OpenORB Transaction Service is a fully compliant 
> implementation of the OMG Transaction Service specification. 
> The OpenORB Transaction Service features are :  
>   Management of distributed transactions with a two phase 
> commit protocol 
>  Sub Transactions management ( nested transactions ) 
>  Propagation of the transaction context between CORBA objects 
>  Management of distributed transactions propagation through 
> databases with the XA protocol 
>  Automatic logs to be able to make recovery in case of failures 
>  Can be used as a transaction initiator or subordinate 
>  High-performance, multiple thread architecture 
>  Developed with POA 
>  Provides a management interface to control all transactions 
>  Full support of JTA 
>  JDBC pooling and automatic resource enlistment 
> 
> 
> Download
> To download the OpenORB Transaction Service, do one of the 
> following :  
>   CVS : you can use CVS to grab the sources directly.  
>  FTP : you get either a CVS snapshot or a prebuilt version 
> To use one of these possibilities, go to the Download Services page. 
> 
> ChangeLog
> August 15th 2001. Version 1.2.0.  
>   Changed the transaction client side to support late binding 
> to the transaction monitor. 
>  Bug fixed in the transactional client interceptor. This bug 
> was due to a change in the OpenORB behavior concerning the slot 
> 
> 
> To get previous change log, please refer to the CHANGELOG 
> file available within this service distribution."
> 
> ---(end of 
> broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
   http://www.postgresql.org/docs/faqs/FAQ.html

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-10-10 Thread Dann Corbit
Why not apply the effort to something already done and compatibly
licensed?

This:
http://dog.intalio.com/ots.html

Appears to be a Berkeley style licensed:
http://dog.intalio.com/license.html

Transaction monitor.

"Overview
The OpenORB Transaction Service is a very scalable transaction monitor
which also provides several extensions like XA management, a management
interface to control all transaction processes and a high reliable
recovery system. 

By coordinating OpenORB and OpenORB Transaction Service, you provide a
reliable and powerful foundation for building large scalable distributed
applications. 

Datasheet
The OpenORB Transaction Service is a fully compliant implementation of
the OMG Transaction Service specification. 
The OpenORB Transaction Service features are :  
  Management of distributed transactions with a two phase commit
protocol 
 Sub Transactions management ( nested transactions ) 
 Propagation of the transaction context between CORBA objects 
 Management of distributed transactions propagation through databases
with the XA protocol 
 Automatic logs to be able to make recovery in case of failures 
 Can be used as a transaction initiator or subordinate 
 High-performance, multiple thread architecture 
 Developed with POA 
 Provides a management interface to control all transactions 
 Full support of JTA 
 JDBC pooling and automatic resource enlistment 


Download
To download the OpenORB Transaction Service, do one of the following :  
  CVS : you can use CVS to grab the sources directly.  
 FTP : you get either a CVS snapshot or a prebuilt version 
To use one of these possibilities, go to the Download Services page. 

ChangeLog
August 15th 2001. Version 1.2.0.  
  Changed the transaction client side to support late binding to the
transaction monitor. 
 Bug fixed in the transactional client interceptor. This bug was due to
a change in the OpenORB behavior concerning the slot 


To get previous change log, please refer to the CHANGELOG file available
within this service distribution."

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-10-10 Thread Christopher Browne
Martha Stewart called it a Good Thing [EMAIL PROTECTED] ("Dann Corbit")wrote:
>> I can't see a grave overhead from this comparison.
>
> 2PC is absolutely essential when you have to have both parts of the
> transaction complete for a logical unit of work.  For a project that
> needs it, if you don't have it you will be forced to go to another
> tool, or perform lots of custom programming to work around it.
>
> If you have 2PC and it is ten times slower than without it, you will
> still need it for projects requiring that capability.

Just so.

I would be completely unsurprised if an attempt to use 2PC to support
generalized "multimaster replication" would involve 10-fold slowdowns
as compared to having all the activity take place on one database.

Which would imply that 2PC is not a tool that may be appropriately
used to naively do replication.  But that should not come as any grand
surprise.

To each tool the right job, and to each job the right tool...

There seems to be enough room for there to be evidence both of 2PC
being useful for improving performance, and for it to cut
performance:

 - TPC benchmarks often specify the inclusion of Tuxedo as a
   component; the combination of vendors would surely NOT put it
   on the list if it were not an aid to performance;

 - There is also indication that there can be a cost, notably in the
   form of the concerns of deadlock, but it should also be obvious
   that slow network links would lead to _hideous_ increases in
   latency.

As you say, even if there is a substantial cost, it's still worthwhile
if a project needs it.

> Now, a good model to start with is a very good idea.  So some
> discussion and analysis is a good thing.  From the looks of it,
> Satoshi Nagayasu has done a very good job.  Having a functional 2PC
> would be a huge feather in the cap of PostgreSQL.

It would seem so.  I look forward to seeing how this progresses.
-- 
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','acm.org').
http://cbbrowne.com/info/linuxdistributions.html
"XFS might  (or might not)  come out before  the year 3000.  As far as
kernel patches go,  SGI are brilliant.  As far as graphics, especially
OpenGL,  go,  SGI is  untouchable.  As  far as   filing  systems go, a
concussed doormouse in a tarpit would move faster."  -- jd on Slashdot

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-10-10 Thread Dann Corbit
> -Original Message-
> From: Satoshi Nagayasu [mailto:[EMAIL PROTECTED] 
> Sent: Friday, October 10, 2003 12:26 PM
> To: Andrew Sullivan
> Cc: [EMAIL PROTECTED]
> Subject: Re: [HACKERS] 2-phase commit
> 
> Andrew Sullivan <[EMAIL PROTECTED]> wrote:
> > On Fri, Oct 10, 2003 at 09:46:35AM +0900, Tatsuo Ishii wrote:
> > > Satoshi, the only guy who made a trial implementation of 2PC for 
> > > PostgreSQL, has already showed that 2PC is not that slow.
> > 
> > If someone has a fast implementation, so much the better.  I'm not 
> > opposed to fast implementations!
> 
> The pgbench results of my experimental 2PC implementation
> and plain postgresql are available.
> 
> PostgreSQL 7.3
>   http://snaga.org/pgsql/pgbench/pgbench-REL7_3.log
> 
> Experimental 2PC in PostgreSQL 7.3
>   http://snaga.org/pgsql/pgbench/pgbench-TPC0_0_2.log
> 
> I can't see a grave overhead from this comparison.

2PC is absolutely essential when you have to have both parts of the
transaction complete for a logical unit of work.  For a project that
needs it, if you don't have it you will be forced to go to another tool,
or perform lots of custom programming to work around it.

If you have 2PC and it is ten times slower than without it, you will
still need it for projects requiring that capability.

Now, a good model to start with is a very good idea.  So some discussion
and analysis is a good thing.  From the looks of it, Satoshi Nagayasu
has done a very good job.  Having a functional 2PC would be a huge
feather in the cap of PostgreSQL.

IMO-YMMV

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-10-10 Thread Satoshi Nagayasu

Andrew Sullivan <[EMAIL PROTECTED]> wrote:
> On Fri, Oct 10, 2003 at 09:46:35AM +0900, Tatsuo Ishii wrote:
> > Satoshi, the only guy who made a trial implementation of 2PC for
> > PostgreSQL, has already showed that 2PC is not that slow.
> 
> If someone has a fast implementation, so much the better.  I'm not
> opposed to fast implementations! 

The pgbench results of my experimental 2PC implementation
and plain postgresql are available.

PostgreSQL 7.3
  http://snaga.org/pgsql/pgbench/pgbench-REL7_3.log

Experimental 2PC in PostgreSQL 7.3
  http://snaga.org/pgsql/pgbench/pgbench-TPC0_0_2.log

I can't see a grave overhead from this comparison.

> 
> A
> 
> -- 
> 
> Andrew Sullivan 204-4141 Yonge Street
> Afilias CanadaToronto, Ontario Canada
> <[EMAIL PROTECTED]>  M2P 2A8
>  +1 416 646 3304 x110
> 
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
> 


-- 
NAGAYASU Satoshi <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-10-10 Thread Andrew Sullivan
On Thu, Oct 09, 2003 at 11:53:46PM -0400, Christopher Browne wrote:
> 
> If 2PC gets implemented, that simply means that there will be another
> module that some will be interested in, and which many people won't
> bother using.  Which shouldn't seem to be a particularly big deal.

I think the reason this is controversial, however, is that while PL/R
(e.g.) doesn't make big changes to the internals, 2PC certainly will
touch the fundamentals.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] 2-phase commit

2003-10-10 Thread Andrew Sullivan
On Fri, Oct 10, 2003 at 09:46:35AM +0900, Tatsuo Ishii wrote:
> Satoshi, the only guy who made a trial implementation of 2PC for
> PostgreSQL, has already showed that 2PC is not that slow.

If someone has a fast implementation, so much the better.  I'm not
opposed to fast implementations! 

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 2-phase commit

2003-10-10 Thread Zeugswetter Andreas SB SD

I was wondering whether we need to keep WAL online for 2PC,
or whether only something like clog is sufficient.

What if:
1. phase 1 commit must pass the slave xid that will be used for 2nd phase
   (it needs to return some sort of identification anyway)
2. the coordinator must keep a list of slave xid's along with 
   corresponding (commit/rollback) info

Is that not sufficient ? Why would WAL be needed in the first place ?
This is not replication, the slave has it's own WAL anyway.

I also don't buy the argument with the lockup. Iff today somebody connects
with psql starts a transaction modifies something and then never commits
or aborts there is also no automatism builtin that will eventually kill 
it automatically. 2PC will simply need to have means for the administrator
to rollback/commit an in doubt transaction manually.

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Christopher Browne
The world rejoiced as [EMAIL PROTECTED] (Tatsuo Ishii) wrote:
> I'm tired of this kind of "2PC is too slow" arguments. I think
> Satoshi, the only guy who made a trial implementation of 2PC for
> PostgreSQL, has already showed that 2PC is not that slow.

I'm tired of it for a different reason, namely that there are "use
cases" where speed is not _relevant_.  The REAL problem that is taking
place is that people are talking past each other.

- Some say, "It's too slow; no point in doing it."

  The fact that it may be too slow _for them_ means they probably
  shouldn't use it.  I somehow doubt that there are Vastly Faster
  alternatives waiting in the wings.

- The other problem that gets pointed out:  "2PC is inherently
  fragile, and prone to deadlock."

  Again, those that _need_ to use 2PC will forcibly need to address
  those concerns in the way they manage their systems.

  Those that can't afford the fragility are not 'customers' for use of
  2PC.  And, pointing back to the speed controversy, it is not at all
  obvious that there is any other alternative for handling distributed
  processing that _totally addresses_ the concerns about fragility.

Those that can't afford these costs associated with 2PC will simply
Not Use It.

Probably in much the same way that most people _aren't_ using
replication.  And most people _aren't_ using PL/R.  And most people
_aren't_ using any number of the contributed things.

If 2PC gets implemented, that simply means that there will be another
module that some will be interested in, and which many people won't
bother using.  Which shouldn't seem to be a particularly big deal.
-- 
"aa454","@","freenet.carleton.ca"
http://www.ntlug.org/~cbbrowne/
The way to a man's heart is with a broadsword.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Marc G. Fournier


On Fri, 10 Oct 2003, Tatsuo Ishii wrote:

> > Yes.  I don't think that 2PC is a solution for robustness in face of
> > network failure.  It's too slow, to begin with.  Some sort of
> > multi-master system is very desirable for network failures, &c., but
> > I don't think anybody does active/hot standby with 2PC any more; the
> > performance is too bad.
>
> I'm tired of this kind of "2PC is too slow" arguments. I think
> Satoshi, the only guy who made a trial implementation of 2PC for
> PostgreSQL, has already showed that 2PC is not that slow.

Where does Satoshi's implementation sit right now?  Will it patch to v7.4?
Can it provide us with a base to work from, or is it complete?


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Bruce Momjian
Tatsuo Ishii wrote:
> > Yes.  I don't think that 2PC is a solution for robustness in face of
> > network failure.  It's too slow, to begin with.  Some sort of
> > multi-master system is very desirable for network failures, &c., but
> > I don't think anybody does active/hot standby with 2PC any more; the
> > performance is too bad.
> 
> I'm tired of this kind of "2PC is too slow" arguments. I think
> Satoshi, the only guy who made a trial implementation of 2PC for
> PostgreSQL, has already showed that 2PC is not that slow.

Agreed.  Let's get it into 7.5 and see it in action.  If we need to
adjust it, we can, but right now, we need something for distributed
transactions, and this seems like the logical direction.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Tatsuo Ishii
> Yes.  I don't think that 2PC is a solution for robustness in face of
> network failure.  It's too slow, to begin with.  Some sort of
> multi-master system is very desirable for network failures, &c., but
> I don't think anybody does active/hot standby with 2PC any more; the
> performance is too bad.

I'm tired of this kind of "2PC is too slow" arguments. I think
Satoshi, the only guy who made a trial implementation of 2PC for
PostgreSQL, has already showed that 2PC is not that slow.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Andrew Sullivan
On Thu, Oct 09, 2003 at 02:17:28PM -0400, Robert Treat wrote:
> Can you elaborate on "your purposes"?  Do they fall into the
> "XA-compatibility" bit or the "Robustness in the face of network
> failure"?  

Yes.  I don't think that 2PC is a solution for robustness in face of
network failure.  It's too slow, to begin with.  Some sort of
multi-master system is very desirable for network failures, &c., but
I don't think anybody does active/hot standby with 2PC any more; the
performance is too bad.

I'm interested in the ability to use it for XA(ish) compatibility and
heterogenous database support.  Arguments with
people-who-think-Gartner-reports-are-good-guides-for-what-to-do would
be a lot easier if I had that, to begin with.

A 

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Robert Treat
On Thu, 2003-10-09 at 12:07, Andrew Sullivan wrote:
> On Thu, Oct 09, 2003 at 11:22:05AM -0400, Mike Mascari wrote:
> > The implementation choosen depends upon the answer, does it not? Is
> > there an implementation (e.g. 3PC) that can simulate 2PC behavior for
> > interoperability purposes and satisfy both requirements?
> 
> I don't know.  What I know is that someone showed up working on 2PC,
> and got a frosty reception.  I'm trying to learn what criteria would
> make the work acceptable.  For my purposes, the feature would be
> really nice, so I'd hate to see the opportunity lost.  If someone has
> an idea even how 3PC might be implemented, I'd be happy to hear it.
> 

Can you elaborate on "your purposes"?  Do they fall into the
"XA-compatibility" bit or the "Robustness in the face of network
failure"?  

On the likely chance that 50% fall into 1 and the other into 2, can we
accept a solution than doesn't address both?

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Andrew Sullivan
On Thu, Oct 09, 2003 at 11:22:05AM -0400, Mike Mascari wrote:
> The implementation choosen depends upon the answer, does it not? Is
> there an implementation (e.g. 3PC) that can simulate 2PC behavior for
> interoperability purposes and satisfy both requirements?

I don't know.  What I know is that someone showed up working on 2PC,
and got a frosty reception.  I'm trying to learn what criteria would
make the work acceptable.  For my purposes, the feature would be
really nice, so I'd hate to see the opportunity lost.  If someone has
an idea even how 3PC might be implemented, I'd be happy to hear it.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Rod Taylor
On Thu, 2003-10-09 at 11:14, Peter Eisentraut wrote:
> Bruce Momjian writes:
> 
> > If you want cross-server transactions, what other methods are there that
> > are more reliable?
> 
> 3-phase commit

How about a real world example of a transaction manager that has
actually implemented 3PC?

But yes, the ability for the participants to talk to each-other in the
event the controller is unavailable seems an obvious fix.


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Bruce Momjian
Peter Eisentraut wrote:
> Bruce Momjian writes:
> 
> > If you want cross-server transactions, what other methods are there that
> > are more reliable?
> 
> 3-phase commit

OK, how is that going to make thing safer, or does it just shrink the
failure window smaller?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Mike Mascari
Bruce Momjian wrote:

> Peter Eisentraut wrote:
> 
>>Andrew Sullivan writes:
>>
>>>Does the proposal of allowing dbas to run that risk, provided there's a
>>>mechanism to tell them about it, satisfy the objection (assuming, of
>>>course, 2PC can be turned off)?
>>
>>Why would you spent time on implementing a mechanism whose ultimate
>>benefit is supposed to be increasing reliability and performance, when you
>>already realize that it will have to lock up at the slightest sight of
>>trouble?  There are better mechanisms out there that you can use instead.
> 
> If you want cross-server transactions, what other methods are there that
> are more reliable?  It seems network unreliability is going to be a
> problem no matter what method you use.

What is the stated goal of distributed transactions in PostgreSQL?

1) XA-compatibility/interoperability

or

2) Robustness in the face of network failure

The implementation choosen depends upon the answer, does it not? Is
there an implementation (e.g. 3PC) that can simulate 2PC behavior for
interoperability purposes and satisfy both requirements?

Mike Mascari
[EMAIL PROTECTED]










---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Zeugswetter Andreas SB SD

> > Why would you spent time on implementing a mechanism whose ultimate
> > benefit is supposed to be increasing reliability and performance, when you
> > already realize that it will have to lock up at the slightest sight of
> > trouble?  There are better mechanisms out there that you can use instead.
> 
> If you want cross-server transactions, what other methods are there that
> are more reliable?  It seems network unreliability is going to be a
> problem no matter what method you use.

And unless you have 2-phase (or 3-phase) commit, all other methods are going 
to be worse, since their time window for possible critical failure is
going to be substantially larger. (extending 2-phase to 3-phase should not be 
too difficult)

A lot of use cases for 2PC are not for manipulating the same data on more than 
one server (replication), but different data that needs to be manipulated in an
all or nothing transaction. In this scenario it is not about reliability but about 
physically locating data (e.g. in LA vs New York) where it is needed most often.

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Peter Eisentraut
Bruce Momjian writes:

> If you want cross-server transactions, what other methods are there that
> are more reliable?

3-phase commit

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Andrew Sullivan
On Thu, Oct 09, 2003 at 04:22:13PM +0200, Peter Eisentraut wrote:
> Why would you spent time on implementing a mechanism whose ultimate
> benefit is supposed to be increasing reliability and performance, when you
> already realize that it will have to lock up at the slightest sight of
> trouble?  There are better mechanisms out there that you can use instead.

"The slightest sign of trouble" seems to me to be overstating the
matter rather.  It cannot recover in the case where the first phase
of commit has happened everywhere, and then the master crashes.  

We are talking, after all, about a pretty exotic feature in the first
place.  I presume that anyone who is using it is also using it on
machines which have ultra-high-reliable, the cpu can catch on fire
and the box stays up sort of hardware.  I'll grant you that running a
pair of B0b'5 C0mpu73r5 Ultra kewl sooper fa5t overclocked specials
with serial ATA with the write cache enabled is a recipe for data
loss.  But that's a disaster no matter what.

But you cannot have XA-like stuff without 2PC.  You can't easily have
heterogenous systems without 2PC.  And folks have already generously
volunteered to work on this problem; I think that they deserve
support, assuming we can come up with some idea of what kinds of
compromises are acceptable ones.  There's no question that 2PC
requires some unpleasant compromises.  But if you want someone to be
able to add a Postgres member to a heterogenous cluster, you're
going to need to be able to accept some compromises, because the DBA
(or, more likely, his management) already has.

I'm not sure that 2PC is actually intended to increase reliability or
performance, by the way.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Bruce Momjian
Peter Eisentraut wrote:
> Andrew Sullivan writes:
> 
> > Does the proposal of allowing dbas to run that risk, provided there's a
> > mechanism to tell them about it, satisfy the objection (assuming, of
> > course, 2PC can be turned off)?
> 
> Why would you spent time on implementing a mechanism whose ultimate
> benefit is supposed to be increasing reliability and performance, when you
> already realize that it will have to lock up at the slightest sight of
> trouble?  There are better mechanisms out there that you can use instead.

If you want cross-server transactions, what other methods are there that
are more reliable?  It seems network unreliability is going to be a
problem no matter what method you use.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Peter Eisentraut
Andrew Sullivan writes:

> Does the proposal of allowing dbas to run that risk, provided there's a
> mechanism to tell them about it, satisfy the objection (assuming, of
> course, 2PC can be turned off)?

Why would you spent time on implementing a mechanism whose ultimate
benefit is supposed to be increasing reliability and performance, when you
already realize that it will have to lock up at the slightest sight of
trouble?  There are better mechanisms out there that you can use instead.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-10-09 Thread Andrew Sullivan
On Wed, Oct 08, 2003 at 05:43:49PM -0400, Bruce Momjian wrote:
> 
> OK, I think we came to the conclusion that we want 2-phase commit, but
> want some way to mark a server as offline/read-only, or notify an

That sounds to me like the concusion, to the extent there was one,
yes.  I'd still like to hear from those who continue to have strong
objections on the grounds of the impossibility of a guaranteed
recovery method.  Does the proposal of allowing dbas to run that
risk, provided there's a mechanism to tell them about it, satisfy the
objection (assuming, of course, 2PC can be turned off)?

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-10-08 Thread Bruce Momjian
Andrew Sullivan wrote:
> On Sat, Sep 27, 2003 at 09:13:27AM -0300, Marc G. Fournier wrote:
> > 
> > I think it was Andrew that suggested it ... when the slave timesout, it
> > should "trigger" a READ ONLY mode on the slave, so that when/if the master
> > tries to start to talk to it, it can't ...
> > 
> > As for the master itself, it should be smart enough that if it times out,
> > it knows to actually abandom the slave and not continue to try ...
> 
> Yes, but now we're talking as though this is master-slave
> replication.  Actually, "master" and "slave" are only useful terms in
> a transaction for 2PC.  So every machine is both a master and a
> slave.
> 
> It seems that one way out is just to fall back to "read only" as soon
> as a single failure happens.  That's the least graceful but maybe
> safest approach to failure, analogous to what fsck does to your root
> filesystem at boot time.  Of course, since there's no "read only"
> mode at the moment, this is all pretty hand-wavy on my part :-/

OK, I think we came to the conclusion that we want 2-phase commit, but
want some way to mark a server as offline/read-only, or notify an
administrator.  Can we communicate this to the Japanese guys working on
2-phase commit so they can start working toward including in 7.5?


Added to TODO:

* Add two-phase commit to all distributed transactions with
  offline/readonly server status or administrator notification 
  for failure

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-10-07 Thread Hans-Jürgen Schönig
Marc G. Fournier wrote:
On Sat, 27 Sep 2003, Bruce Momjian wrote:


I have been thinking it might be time to start allowing external
programs to be called when certain events occur that require
administrative attention --- this would be a good case for that.
Administrators could configure shell scripts to be run when the network
connection fails or servers drop off the network, alerting them to the
problem.  Throwing things into the server logs isn't _active_ enough.


Actually, apparently you can do this now ... there is apparently a "mail
module" for PostgreSQL that you can use to have the database send email's
out ...
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


I guess someting such as

CREATE TRIGGER my_trig ON BEGIN / COMMIT
EXECUTE ...
would be nice. I think this can be used for many perposes (not 
necessarily 2PC).
If a trigger could handle database events and not just events on tables.

ON BEGIN
ON COMMIT
ON CREATE TABLE , ...
We could have used that so often in the past in countless applications.

	Regards,

		Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Dann Corbit
A really nice overview of how various transaction managers are modeled:

http://www.ti5.tu-harburg.de/Lecture/99ws/TP/06-OverviewOfTPSystemsAndPr
oducts/sld001.htm

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Christopher Browne
[EMAIL PROTECTED] ("Dann Corbit") writes:
> Tuxedo

Note that this is probably the only one of the lot that is _really_
worth looking at in a serious way, as the XA standard was essentially
based on Tuxedo.  (Irrelevant Aside: BEA had releases of CICS running
on both Unix and Windows NT, so it isn't quite fair to call that
"mainframe" code...)

There might be some value in looking at how Berkeley DB supports XA,
as there actually support for using Berkeley DB as an XA resource
manager.



While it would obviously be exceedingly inappropriate to copy any of
SleepyCat's software, there is some very useful background information
there on "care and feeding" which can give an idea of how a TP monitor
might be used and configured.
-- 
"cbbrowne","@","libertyrms.info"

Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Andrew Sullivan
On Mon, Sep 29, 2003 at 12:48:30PM -0400, Andrew Sullivan wrote:
> In every circumstance where a stand-alone machine would have it. 

Oops.  Wrong stage.  Never mind.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Dann Corbit
Commercial systems use:

Mainframe:
CICS

UNIX:
Tuxedo
Encina

Win32:
MTS

DEC/COMPAQ/HP:
ACMS

Probably lots of others that I have never heard about.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Andrew Sullivan
On Mon, Sep 29, 2003 at 12:59:55PM -0400, Bruce Momjian wrote:
> working on. I think we have to get beyond the idea that this can be made
> failure-proof, and just outline the behaviors for failure, and it has to
> be configurable by the administrator.

Exactly.  There are plenty of cases where graceless failure is
acceptable to someone as the right answer to the compromise.  Of
course, this is not to pretend they're not compromises.  There's a
world of difference between saying, "This is not safe, but if you
want to do it, here are some potential failure modes," and, "Hey, you
can use this even though it can't roll back 100% of the time, because
your application should check that."  Any comparison with any actual
application I have had to use is strictly coincidental. ;-)

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Rod Taylor
On Mon, 2003-09-29 at 15:55, Peter Eisentraut wrote:
> Manfred Spraul writes:
> 
> > Ok. Lets assume one coordinator, two partitipants.
> > Global commit send to both by coordinator. One replies with ok, the
> > other one remains silent.
> > What should the coordinator do? It can't fail the transaction - the
> > first partitipant has commited its part. It can't complete the
> > transaction, because the ok from the 2nd partitipant is still outstanding.
> 
> If a participant doesn't reply in an orderly fashion (say, after timeout),
> it just gets kicked out of the whole mechanism.  That isn't the
> interesting part.  The interesting part is what happens when the
> coordinator fails.

The hot-standby coordinator picks up where the first one left off. Just
like when the participant fails the hot-standby for that participant
steps up to the plate.

For the application server side in Java, I believe the standard is OTS
(Object Transaction Service).



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Peter Eisentraut
Manfred Spraul writes:

> Ok. Lets assume one coordinator, two partitipants.
> Global commit send to both by coordinator. One replies with ok, the
> other one remains silent.
> What should the coordinator do? It can't fail the transaction - the
> first partitipant has commited its part. It can't complete the
> transaction, because the ok from the 2nd partitipant is still outstanding.

If a participant doesn't reply in an orderly fashion (say, after timeout),
it just gets kicked out of the whole mechanism.  That isn't the
interesting part.  The interesting part is what happens when the
coordinator fails.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Manfred Spraul
Peter Eisentraut wrote:

Tom Lane writes:

 

No.  The real problem with 2PC in my mind is that its failure modes
occur *after* you have promised commit to one or more parties.  In
multi-master, if you fail you know it before you have told the client
his data is committed.
   

I have a book here which claims that the solution to the problems of
2-phase commit is 3-phase commit, which goes something like this:
coordinator participant
--- ---
INITIAL INITIAL
prepare -->
WAIT
<-- vote commit
READY
(all voted commit)
prepare-to-commit -->
PRE-COMMIT
<-- ready-to-commit
PRE-COMMIT
global-commit -->
COMMIT  COMMIT
If the coordinator fails and all participants are in state READY, they can
safely decide to abort after some timeout.  If some participant is already
in state PRE-COMMIT, it becomes the new coordinator and sends the
global-commit message.
Details are left as an exercise. :-)
 

Ok. Lets assume one coordinator, two partitipants.
Global commit send to both by coordinator. One replies with ok, the 
other one remains silent.
What should the coordinator do? It can't fail the transaction - the 
first partitipant has commited its part. It can't complete the 
transaction, because the ok from the 2nd partitipant is still outstanding.
I think Bruce is right: It's an admin decision. If a timeout expires, a 
user supplied app should be called, with a safe default (database 
shutdown?).

--
   Manfred
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Dann Corbit
> -Original Message-
> From: Bruce Momjian [mailto:[EMAIL PROTECTED] 
> Sent: Monday, September 29, 2003 7:10 AM
> To: Marc G. Fournier
> Cc: Hiroshi Inoue; Tom Lane; 'Zeugswetter Andreas SB SD'; 
> 'Andrew Sullivan'; [EMAIL PROTECTED]
> Subject: Re: [HACKERS] 2-phase commit
> 
> 
> Marc G. Fournier wrote:
> > > Master  Slave
> > > --  -
> > > commit ready-->
> > > <--OK
> > > commit done->XX
> > >
> > > is the "commit done" message needed ?
> > 
> > Of course ... how else will the Slave commit?  From my 
> understanding, 
> > the concept is that the master sends a commit ready to the 
> slave, but 
> > the OK back is that "OK, I'm ready to commit whenever you are", at 
> > which point the master does its commit and tells the slave 
> to do its 
> > ...
> 
> Or the slave could reject the request.
> 

Here is a BSD-like licensed transaction monitor:

http://tyrex.sourceforge.net/tpmonitor.html

The stuff that eventually became Tuxedo and Encina was open source from
MIT (not sure what came of it).  You used to be able to download the
source code for their transaction monitor that worked on the IBM RS/2.

This is the Transaction Internet Protocol:
http://www.ietf.org/html.charters/OLD/tip-charter.html
It should be considered very seriously as a general solution to the
problem.

I mention this, because a transaction monitor is the next logical step
in managing database activity.
Two phase commit is a subset of transaction processing.

Interesting discussion:
http://www.developer.com/db/article.php/10920_2246481_2
http://www.developer.com/java/data/article.php/10932_3066301_4

Article worth a look (win32 specific, but talks about developing a
transaction monitor):
http://archive.devx.com/free/mgznarch/vcdj/1998/octmag98/dtc1.asp

Some simple background for those who have not spent much time looking
into it:
http://www.geocities.com/rajesh_purohit/db/twophasecommit.html


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Rod Taylor
> No, I'm not.  One needs to decide how to handle the situation where a
> slave database in a 2PC transaction goes away and comes back, for
> whatever reasons that may happen.  Since the idea here is to come up
> with ways of handling the failure of 2PC in some cases, we need
> something which notices that members are not playing nice. 

Yes, you're right. The part about the member reinitializing lead me to
believe that you were thinking replication (read it as copying data from
source location to bring it back up to speed -- which is not what you
intended). 




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Peter Eisentraut
Tom Lane writes:

> No.  The real problem with 2PC in my mind is that its failure modes
> occur *after* you have promised commit to one or more parties.  In
> multi-master, if you fail you know it before you have told the client
> his data is committed.

I have a book here which claims that the solution to the problems of
2-phase commit is 3-phase commit, which goes something like this:

coordinator participant
--- ---
INITIAL INITIAL
prepare -->
WAIT
<-- vote commit
READY
(all voted commit)
prepare-to-commit -->
PRE-COMMIT
<-- ready-to-commit
PRE-COMMIT
global-commit -->
COMMIT  COMMIT


If the coordinator fails and all participants are in state READY, they can
safely decide to abort after some timeout.  If some participant is already
in state PRE-COMMIT, it becomes the new coordinator and sends the
global-commit message.

Details are left as an exercise. :-)

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Rod Taylor
> > It seems that one way out is just to fall back to "read only" as soon
> > as a single failure happens.  That's the least graceful but maybe
> > safest approach to failure, analogous to what fsck does to your root
> > filesystem at boot time.  Of course, since there's no "read only"
> > mode at the moment, this is all pretty hand-wavy on my part :-/
> 
> Yes, but that affects all users, not just the transaction we were
> working on. I think we have to get beyond the idea that this can be made
> failure-proof, and just outline the behaviors for failure, and it has to
> be configurable by the administrator.

Yes, but holding locks on the affected rows IS appropriate until the
administrator issues something like:

ALTER SYSTEM ABORT GLOBAL TRANSACTION 123;


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Andrew Sullivan
On Fri, Sep 26, 2003 at 05:15:37PM -0400, Rod Taylor wrote:
> > The first problem is the restart/rejoin problem.  When a 2PC member
> > goes away, it is supposed to come back with all its former locks and
> > everything in place, so that it can know what to do.  This is also
> > extremely tricky, but I think the answer is sort of easy.  A member
> > which re-joins without crashing (that is, it has open transactions,
> 
> I think you may be confusing 2PC with replication.

No, I'm not.  One needs to decide how to handle the situation where a
slave database in a 2PC transaction goes away and comes back, for
whatever reasons that may happen.  Since the idea here is to come up
with ways of handling the failure of 2PC in some cases, we need
something which notices that members are not playing nice. 

> PostgreSQLs 2PC implementation should follow enough of the XA rules to
> play nice in a mixed environment where something else is managing the
> transactions (application servers are becoming more common all the
> time).

I agree.  But we still need to decide how to handle cases where
things go away, and if there are some transaction managers that don't
fit that model, then we should not accept such managers.  Of course,
what such managers do is important data in deciding what sorts of
compromises are acceptable.

A
-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Andrew Sullivan
On Sat, Sep 27, 2003 at 08:36:36AM +, Jeff wrote:
> 
> What do commercial databases do about 2PC or other multi-master solutions?
> You've done a good job of convincing me that it's unreliable no matter what
> (through your posts on this topic over a long time). However, I would think
> that something like Oracle or DB2 have some kind of answer for
> multi-master, and I'm curious what it is. If they don't, is it reasonable
> to make a test case that leaves their database inconsistent or hanging?

Most real replication systems are not doing 2PC.  For me, 2PC-based
replication is not real interesting anyway, because the point of
multi-master replication is often at least partly speed, and 2PC is
nothing if not a good way to make sure that every database is at
least as slow as the slowest node.

But 2PC is important for application-server-based, XA-type work, and
for heterogenous databases.  Both of those would be real nice
features to support.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Marc G. Fournier wrote:
>>> Or the slave could reject the request.
>> 
>> Huh?  The slave has that option??  In what circumstance?

> I thought the slave could reject if someone local already had the row
> locked.

All normal reasons for transaction failure are supposed to be checked
for before the slave responds that it's ready to commit.  Otherwise it's
supposed to say it can't commit.

Basically the weak spot of 2PC is that it assumes there are no possible
reasons for failure after "ready to commit" is sent.  You can make that
approximately true, with sufficient investment of resources, but it's
definitely not a pleasant assumption.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Andrew Sullivan
On Mon, Sep 29, 2003 at 11:14:30AM -0300, Marc G. Fournier wrote:
> >
> > Or the slave could reject the request.
> 
> Huh?  The slave has that option??  In what circumstance?

In every circumstance where a stand-alone machine would have it. 
Machine A may not yet know about conflicting transactions on machine
B.  This is why 2PC is hard ;-)

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Bruce Momjian
Andrew Sullivan wrote:
> On Sat, Sep 27, 2003 at 09:13:27AM -0300, Marc G. Fournier wrote:
> > 
> > I think it was Andrew that suggested it ... when the slave timesout, it
> > should "trigger" a READ ONLY mode on the slave, so that when/if the master
> > tries to start to talk to it, it can't ...
> > 
> > As for the master itself, it should be smart enough that if it times out,
> > it knows to actually abandom the slave and not continue to try ...
> 
> Yes, but now we're talking as though this is master-slave
> replication.  Actually, "master" and "slave" are only useful terms in
> a transaction for 2PC.  So every machine is both a master and a
> slave.
> 
> It seems that one way out is just to fall back to "read only" as soon
> as a single failure happens.  That's the least graceful but maybe
> safest approach to failure, analogous to what fsck does to your root
> filesystem at boot time.  Of course, since there's no "read only"
> mode at the moment, this is all pretty hand-wavy on my part :-/

Yes, but that affects all users, not just the transaction we were
working on. I think we have to get beyond the idea that this can be made
failure-proof, and just outline the behaviors for failure, and it has to
be configurable by the administrator.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Andrew Sullivan
On Sun, Sep 28, 2003 at 11:58:24AM -0700, Kevin Brown wrote:
> > But the postmaster doesn't connect to any database, and in a serious
> > failure, might not be able to start one.
> 
> Ah, true.  But I figured that in the context of 2PC and replication that
> most of the associated failures were likely to occur in an active
> backend or something equivalent, where a stored procedure was likely to
> be accessible.

AS you go on to note, that's not always a possibility.  For instance,
server C crashes and can't come back because, say, its WAL is
scrabled.  All it will currently be able to do is scream at you in
the logs, which won't solve all the problems one has with 2PC (among
other problems).

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Marc G. Fournier wrote:
> >>> Or the slave could reject the request.
> >> 
> >> Huh?  The slave has that option??  In what circumstance?
> 
> > I thought the slave could reject if someone local already had the row
> > locked.
> 
> All normal reasons for transaction failure are supposed to be checked
> for before the slave responds that it's ready to commit.  Otherwise it's
> supposed to say it can't commit.
> 
> Basically the weak spot of 2PC is that it assumes there are no possible
> reasons for failure after "ready to commit" is sent.  You can make that
> approximately true, with sufficient investment of resources, but it's
> definitely not a pleasant assumption.

Yep.  There is no full solution.  I think it is like running with fsync
off --- if the OS crashes, you have to clean up --- if you fail on a
2-phase commit, you have to clean up.  Multi-master will be the same.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Bruce Momjian
Zeugswetter Andreas SB SD wrote:
> 
> > > > Or the slave could reject the request.
> > > 
> > > Huh?  The slave has that option??  In what circumstance?
> > 
> > I thought the slave could reject if someone local already had the row
> > locked.
> 
> No, not at all. The slave would need to reject phase 1 "commit ready"
> for this.

Oh, yea, thanks.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Andrew Sullivan
On Sat, Sep 27, 2003 at 09:13:27AM -0300, Marc G. Fournier wrote:
> 
> I think it was Andrew that suggested it ... when the slave timesout, it
> should "trigger" a READ ONLY mode on the slave, so that when/if the master
> tries to start to talk to it, it can't ...
> 
> As for the master itself, it should be smart enough that if it times out,
> it knows to actually abandom the slave and not continue to try ...

Yes, but now we're talking as though this is master-slave
replication.  Actually, "master" and "slave" are only useful terms in
a transaction for 2PC.  So every machine is both a master and a
slave.

It seems that one way out is just to fall back to "read only" as soon
as a single failure happens.  That's the least graceful but maybe
safest approach to failure, analogous to what fsck does to your root
filesystem at boot time.  Of course, since there's no "read only"
mode at the moment, this is all pretty hand-wavy on my part :-/

A


-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Zeugswetter Andreas SB SD

> > > Or the slave could reject the request.
> > 
> > Huh?  The slave has that option??  In what circumstance?
> 
> I thought the slave could reject if someone local already had the row
> locked.

No, not at all. The slave would need to reject phase 1 "commit ready"
for this.

Andreas

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Tom Lane
Hiroshi Inoue <[EMAIL PROTECTED]> writes:
> But is it 2-phase commit protocol in the first place ?

> That is, in your exmaple below

>  Example:

> Master  Slave
> --  -
> commit ready-->
> <--OK
> commit done->XX

> is the "commit done" message needed ?

Absolutely --- otherwise, we'd not be having this whole discussion.  The
problem is that the slave is holding ready to commit but doesn't know
whether he should or not ... or alternatively, he did commit but the
master didn't get the acknowledgement.

It's not that big a deal for the master to remember past committed
transactions until it knows all slaves have acknowledged committing
them; you only need a bit or so per transaction.  It's a much bigger
deal if the slave has to hold the transaction ready-to-commit for a
long time.  That transaction is holding locks, and also the sheer
volume of log data is way bigger.  (For comparison, we recycle pg_xlog
details about a transaction much sooner than we recycle pg_clog.)

I think you really want some way for the slave to decide it can time out
and abort the transaction after all ... but I don't see how you do
that without breaking the 2PC protocol.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Bruce Momjian
Marc G. Fournier wrote:
> > > > is the "commit done" message needed ?
> > >
> > > Of course ... how else will the Slave commit?  From my understanding, the
> > > concept is that the master sends a commit ready to the slave, but the OK
> > > back is that "OK, I'm ready to commit whenever you are", at which point
> > > the master does its commit and tells the slave to do its ...
> >
> > Or the slave could reject the request.
> 
> Huh?  The slave has that option??  In what circumstance?

I thought the slave could reject if someone local already had the row
locked.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Zeugswetter Andreas SB SD

> > > Master  Slave
> > > --  -
> > > commit ready-->
> > > <--OK
> > > commit done->XX
> > >
> > > is the "commit done" message needed ?
> > 
> > Of course ... how else will the Slave commit?  From my 
> understanding, the
> > concept is that the master sends a commit ready to the 
> slave, but the OK
> > back is that "OK, I'm ready to commit whenever you are", at 
> which point
> > the master does its commit and tells the slave to do its ...
> 
> Or the slave could reject the request.

At this point only because of a hardware error. In case of network 
problems the "commit done" eighter did not reach the slave or the "success"
answer did not reach the master.

That is what it's all about. Phase 2 is supposed to be low overhead and very 
fast to allow keeping the time window for failure (that produces in-doubt 
transactions) as short as possible.

Andreas

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Zeugswetter Andreas SB SD

> I don't think there is any way to handle cases where the master or slave
> just disappears.  The other machine isn't under the server's control, so
> it has no way of it knowing. I think we have to allow the administrator
> to set a timeout, or ask to wait indefinately, and allow them to call an
> external program to record the event or notify administrators.
> Multi-master replication has the same issues.

Needs to wait indefinitely, a timeout is not acceptable since it leads to 
inconsistent data. Human (or monitoring software) intervention is needed
if they can't reach each other in a reasonable time.

I think this needs to be kept dumb. Different sorts of use cases will simply  
need different answers to resolve in-doubt transactions. What is needed is an
interface that allows listing and commit/rollback of in-doubt transactions 
(preferably from a newly started client, or a direct command for the postmaster).

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Hiroshi Inoue
> -Original Message-
> From: Zeugswetter Andreas SB SD [mailto:[EMAIL PROTECTED] 
> > 
> >  Example:
> > 
> > Master  Slave
> > --  -
> > commit ready-->
> 
> This is the commit for phase 1. This commit is allowed to return all 
> sorts of errors, like violated deferred checks, out of diskspace, ...
> 
> > <--OK
> > commit done->XX
> 
> This is commit for phase 2, the slave *must* answer with "success"
> in all but hardware failure cases. (Note that instead the 
> master could 
> instead send rollback, e.g. because some other slave aborted)
> 
> > is the "commit done" message needed ?
> 
> So, yes this is needed

Thanks.
I misunderstood that the "commit done" message is the last response from
the participant to the coordinator. I missed the "OK" message before it.
Where were my eyes ?

regards,
Hiroshi Inoue


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Marc G. Fournier


On Mon, 29 Sep 2003, Bruce Momjian wrote:

> Marc G. Fournier wrote:
> > > Master  Slave
> > > --  -
> > > commit ready-->
> > > <--OK
> > > commit done->XX
> > >
> > > is the "commit done" message needed ?
> >
> > Of course ... how else will the Slave commit?  From my understanding, the
> > concept is that the master sends a commit ready to the slave, but the OK
> > back is that "OK, I'm ready to commit whenever you are", at which point
> > the master does its commit and tells the slave to do its ...
>
> Or the slave could reject the request.

Huh?  The slave has that option??  In what circumstance?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Jeff
Tom Lane wrote:

> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
>>> ... You can make this work, but the resource costs
>>> are steep.
> 
>> So, after 'n' seconds of waiting, we abandon the slave and the slave
>> abandons the master.
> 
> [itch...]  But you surely cannot guarantee that the slave and the master
> time out at exactly the same femtosecond.  What happens when the comm
> link comes back online just when one has timed out and the other not?
> (Hint: in either order, it ain't good.  Double plus ungood if, say, the
> comm link manages to deliver the master's "commit confirm" message a
> little bit after the master has timed out and decided to abort after all.)
> 
> In my book, timeout-based solutions to this kind of problem are certain
> disasters.
> 
> regards, tom lane

What do commercial databases do about 2PC or other multi-master solutions?
You've done a good job of convincing me that it's unreliable no matter what
(through your posts on this topic over a long time). However, I would think
that something like Oracle or DB2 have some kind of answer for
multi-master, and I'm curious what it is. If they don't, is it reasonable
to make a test case that leaves their database inconsistent or hanging?

I can (probably) get access to a SQL Server system to run some tests, if
someone is interested.

regards,
jeff davis




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Bruce Momjian
Tom Lane wrote:
> > [At participant(master)'s side]
> >   Because the commit operations is done, does nothing.
> 
> > [At coordinator(slave)' side]
> >1) After a while
> >2) re-establish the communication path between the
> >   partcipant(master)'s TM.
> >3) resend the "commit requeset" to the participant's TM.
> >   1)2)3) would be repeated until the coordinator receives
> >   the "commit ok" message from the partcipant.
> 
> [ scratches head ] I think you are using the terms "master" and "slave"
> oppositely than I would.  But in any case, this is not an answer to the
> concern I had.  You're assuming that the "coordinator(slave)" side is
> willing to resend a request indefinitely, and also that the
> "participant(master)" side is willing to retain per-transaction commit
> state indefinitely so that it can correctly answer belated questions
> from the other side.  What I was complaining about was that I don't
> think either side can afford to remember per-transaction state
> indefinitely.  2PC in the abstract is a useless academic abstraction ---
> where the rubber meets the road is defining how you cope with failures
> in the commit protocol.

I don't think there is any way to handle cases where the master or slave
just disappears.  The other machine isn't under the server's control, so
it has no way of it knowing. I think we have to allow the administrator
to set a timeout, or ask to wait indefinately, and allow them to call an
external program to record the event or notify administrators.
Multi-master replication has the same issues.

My original point was that multi-master replication has the same
limitations, but people still want it.  Same for two-phase commit --- it
has the same limitations, but people want it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Bruce Momjian
Marc G. Fournier wrote:
> > Master  Slave
> > --  -
> > commit ready-->
> > <--OK
> > commit done->XX
> >
> > is the "commit done" message needed ?
> 
> Of course ... how else will the Slave commit?  From my understanding, the
> concept is that the master sends a commit ready to the slave, but the OK
> back is that "OK, I'm ready to commit whenever you are", at which point
> the master does its commit and tells the slave to do its ...

Or the slave could reject the request.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Marc G. Fournier


On Mon, 29 Sep 2003, Hiroshi Inoue wrote:

>
>
> Hiroshi Inoue wrote:
> >
> > Tom Lane wrote:
> > >
> > > Hiroshi Inoue <[EMAIL PROTECTED]> writes:
> > > > The simplest senario(though there could be varations) is
> > >
> > > > [At participant(master)'s side]
> > > >   Because the commit operations is done, does nothing.
> > >
> > > > [At coordinator(slave)' side]
> > > >1) After a while
> > > >2) re-establish the communication path between the
> > > >   partcipant(master)'s TM.
> > > >3) resend the "commit requeset" to the participant's TM.
> > > >   1)2)3) would be repeated until the coordinator receives
> > > >   the "commit ok" message from the partcipant.
> > >
> > > [ scratches head ] I think you are using the terms "master" and "slave"
> > > oppositely than I would.
> >
> > Oops my mistake, sorry.
> > But is it 2-phase commit protocol in the first place ?
>
> That is, in your exmaple below
>
>  Example:
>
> Master  Slave
> --  -
> commit ready-->
> <--OK
> commit done->XX
>
> is the "commit done" message needed ?

Of course ... how else will the Slave commit?  From my understanding, the
concept is that the master sends a commit ready to the slave, but the OK
back is that "OK, I'm ready to commit whenever you are", at which point
the master does its commit and tells the slave to do its ...


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Zeugswetter Andreas SB SD

> > > > The simplest senario(though there could be varations) is
> > >
> > > > [At participant(master)'s side]
> > > >   Because the commit operations is done, does nothing.
> > >
> > > > [At coordinator(slave)' side]
> > > >1) After a while
> > > >2) re-establish the communication path between the
> > > >   partcipant(master)'s TM.
> > > >3) resend the "commit requeset" to the participant's TM.
> > > >   1)2)3) would be repeated until the coordinator receives
> > > >   the "commit ok" message from the partcipant.
> > >
> > > [ scratches head ] I think you are using the terms "master" and "slave"
> > > oppositely than I would.
> > 
> > Oops my mistake, sorry.
> > But is it 2-phase commit protocol in the first place ?
> 
> That is, in your exmaple below
> 
>  Example:
> 
> Master  Slave
> --  -
> commit ready-->

This is the commit for phase 1. This commit is allowed to return all 
sorts of errors, like violated deferred checks, out of diskspace, ...

> <--OK
> commit done->XX

This is commit for phase 2, the slave *must* answer with "success"
in all but hardware failure cases. (Note that instead the master could 
instead send rollback, e.g. because some other slave aborted)

> is the "commit done" message needed ?

So, yes this is needed.

Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-09-29 Thread Hiroshi Inoue
I seem to have misunderstood the problem completely.
(BI apologize to you all(especially Tom) for disturbing
(Bthis thread.
(B
(BI wonder if there might be such a nice solution when
(Bsome of the systems or communications are dead.
(BAnd as many people already mentioned, there's not so
(Bmuch allowance if we only adopt XA-based protocol. 
(B
(Bregards,
(BHiroshi Inoue
(Bhttp://www.geocities.jp/inocchichichi/psqlodbc/
(B
(BTom Lane wrote:
(B> 
(B> Hiroshi Inoue <[EMAIL PROTECTED]> writes:
(B> > The simplest senario(though there could be varations) is
(B> 
(B> > [At participant(master)'s side]
(B> >   Because the commit operations is done, does nothing.
(B> 
(B> > [At coordinator(slave)' side]
(B> >1) After a while
(B> >2) re-establish the communication path between the
(B> >   partcipant(master)'s TM.
(B> >3) resend the "commit requeset" to the participant's TM.
(B> >   1)2)3) would be repeated until the coordinator receives
(B> >   the "commit ok" message from the partcipant.
(B> 
(B> [ scratches head ] I think you are using the terms "master" and "slave"
(B> oppositely than I would.  But in any case, this is not an answer to the
(B> concern I had.  You're assuming that the "coordinator(slave)" side is
(B> willing to resend a request indefinitely, and also that the
(B> "participant(master)" side is willing to retain per-transaction commit
(B> state indefinitely so that it can correctly answer belated questions
(B> from the other side.  What I was complaining about was that I don't
(B> think either side can afford to remember per-transaction state
(B> indefinitely.  2PC in the abstract is a useless academic abstraction ---
(B> where the rubber meets the road is defining how you cope with failures
(B> in the commit protocol.
(B> 
(B> regards, tom lane
(B
(B---(end of broadcast)---
(BTIP 7: don't forget to increase your free space map settings

Re: [HACKERS] 2-phase commit

2003-09-28 Thread Hiroshi Inoue
Tom Lane wrote:
(B> 
(B> Hiroshi Inoue <[EMAIL PROTECTED]> writes:
(B> > The simplest senario(though there could be varations) is
(B> 
(B> > [At participant(master)'s side]
(B> >   Because the commit operations is done, does nothing.
(B> 
(B> > [At coordinator(slave)' side]
(B> >1) After a while
(B> >2) re-establish the communication path between the
(B> >   partcipant(master)'s TM.
(B> >3) resend the "commit requeset" to the participant's TM.
(B> >   1)2)3) would be repeated until the coordinator receives
(B> >   the "commit ok" message from the partcipant.
(B> 
(B> [ scratches head ] I think you are using the terms "master" and "slave"
(B> oppositely than I would.  But in any case, this is not an answer to the
(B> concern I had.  You're assuming that the "coordinator(slave)" side is
(B> willing to resend a request indefinitely, and also that the
(B> "participant(master)" side is willing to retain per-transaction commit
(B> state indefinitely so that it can correctly answer belated questions
(B> from the other side.  What I was complaining about was that I don't
(B> think either side can afford to remember per-transaction state
(B> indefinitely.
(B
(BOK maybe I understand your complaint.
(BBasically such situation can occur when either side
(Bis down. Especially when the coodinator(master) is down,
(Bthe particicipants are troubled. In such cases, e.g. XA
(Binterface allows heuristic-commit on the participants.
(B
(BIn case one or more paricipants are down, the coordinator
(Bmay have to remember per-transaction state indefinitely.
(BIs it a big problem ? 
(B
(Bregards,
(BHiroshi Inoue
(Bhttp://www.geocities.jp/inocchichichi/psqlodbc/
(B
(B---(end of broadcast)---
(BTIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] 2-phase commit

2003-09-28 Thread Hiroshi Inoue

(B
(BHiroshi Inoue wrote:
(B> 
(B> Tom Lane wrote:
(B> >
(B> > Hiroshi Inoue <[EMAIL PROTECTED]> writes:
(B> > > The simplest senario(though there could be varations) is
(B> >
(B> > > [At participant(master)'s side]
(B> > >   Because the commit operations is done, does nothing.
(B> >
(B> > > [At coordinator(slave)' side]
(B> > >1) After a while
(B> > >2) re-establish the communication path between the
(B> > >   partcipant(master)'s TM.
(B> > >3) resend the "commit requeset" to the participant's TM.
(B> > >   1)2)3) would be repeated until the coordinator receives
(B> > >   the "commit ok" message from the partcipant.
(B> >
(B> > [ scratches head ] I think you are using the terms "master" and "slave"
(B> > oppositely than I would.
(B> 
(B> Oops my mistake, sorry.
(B> But is it 2-phase commit protocol in the first place ?
(B
(BThat is, in your exmaple below
(B
(B Example:
(B
(BMaster  Slave
(B--  -
(Bcommit ready-->
(B<--OK
(Bcommit done->XX
(B
(Bis the "commit done" message needed ?
(B
(Bregards,
(BHiroshi Inoue
(Bhttp://www.geocities.jp/inocchichichi/psqlodbc/
(B
(B---(end of broadcast)---
(BTIP 5: Have you checked our extensive FAQ?
(B
(B   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] 2-phase commit

2003-09-28 Thread Hiroshi Inoue
Tom Lane wrote:
(B> 
(B> Hiroshi Inoue <[EMAIL PROTECTED]> writes:
(B> > The simplest senario(though there could be varations) is
(B> 
(B> > [At participant(master)'s side]
(B> >   Because the commit operations is done, does nothing.
(B> 
(B> > [At coordinator(slave)' side]
(B> >1) After a while
(B> >2) re-establish the communication path between the
(B> >   partcipant(master)'s TM.
(B> >3) resend the "commit requeset" to the participant's TM.
(B> >   1)2)3) would be repeated until the coordinator receives
(B> >   the "commit ok" message from the partcipant.
(B> 
(B> [ scratches head ] I think you are using the terms "master" and "slave"
(B> oppositely than I would.
(B
(BOops my mistake, sorry. 
(BBut is it 2-phase commit protocol in the first place ?
(B
(Bregards,
(BHiroshi Inoue
(Bhttp://www.geocities.jp/inocchichichi/psqlodbc/
(B
(B---(end of broadcast)---
(BTIP 9: the planner will ignore your desire to choose an index scan if your
(B  joining column's datatypes do not match

Re: [HACKERS] 2-phase commit

2003-09-28 Thread Tom Lane
Hiroshi Inoue <[EMAIL PROTECTED]> writes:
> The simplest senario(though there could be varations) is

> [At participant(master)'s side]
>   Because the commit operations is done, does nothing.

> [At coordinator(slave)' side]
>1) After a while
>2) re-establish the communication path between the
>   partcipant(master)'s TM.
>3) resend the "commit requeset" to the participant's TM.
>   1)2)3) would be repeated until the coordinator receives
>   the "commit ok" message from the partcipant.

[ scratches head ] I think you are using the terms "master" and "slave"
oppositely than I would.  But in any case, this is not an answer to the
concern I had.  You're assuming that the "coordinator(slave)" side is
willing to resend a request indefinitely, and also that the
"participant(master)" side is willing to retain per-transaction commit
state indefinitely so that it can correctly answer belated questions
from the other side.  What I was complaining about was that I don't
think either side can afford to remember per-transaction state
indefinitely.  2PC in the abstract is a useless academic abstraction ---
where the rubber meets the road is defining how you cope with failures
in the commit protocol.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-09-28 Thread Marc G. Fournier

On Mon, 29 Sep 2003, Hiroshi Inoue wrote:

> The simplest senario(though there could be varations) is
>
> [At participant(master)'s side]
>   Because the commit operations is done, does nothing.
>
> [At coordinator(slave)' side]
>1) After a while
>2) re-establish the communication path between the
>   partcipant(master)'s TM.
>3) resend the "commit requeset" to the participant's TM.
>   1)2)3) would be repeated until the coordinator receives
>   the "commit ok" message from the partcipant.
>
> If there's no objection from you, I would assume I'm right.

'K, but what happens if the slave never gets a 'commit ok'?  Does the
slave keep trying ad nausem?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-28 Thread Hiroshi Inoue
Hiroshi Inoue wrote:
(B> 
(B> > -Original Message-
(B> > From: Tom Lane
(B> >
(B> > Bruce Momjian <[EMAIL PROTECTED]> writes:
(B> > > Tom Lane wrote:
(B> > >> You're not considering the possibility of a transient communication
(B> > >> failure.
(B> >
(B> > > Can't the master re-send the request after a timeout?
(B> >
(B> > Not "it can", but "it has to".
(B> 
(B> Why ?$B!!(BMainly the coordinator(slave) not the participant(master)
(B> has the resposibilty to resolve the in-doubt transaction.
(B
(BAs far as I see, it's the above point which prevents the
(Badvance of this topic and the issue must be solved ASAP.
(B
(BAs opposed to your answer
(B   Not "it can", but "it has to",
(Bmy answer is
(B   Yes "it can", but "it doesn't have to".
(B
(BThe simplest senario(though there could be varations) is
(B
(B[At participant(master)'s side]
(B  Because the commit operations is done, does nothing.
(B
(B[At coordinator(slave)' side]
(B   1) After a while
(B   2) re-establish the communication path between the
(B  partcipant(master)'s TM.
(B   3) resend the "commit requeset" to the participant's TM.
(B  1)2)3) would be repeated until the coordinator receives
(B  the "commit ok" message from the partcipant.
(B
(BIf there's no objection from you, I would assume I'm right.
(BPlease don't dodge my question this time.
(B
(Bregards,
(BHiroshi Inoue
(Bhttp://www.geocities.jp/inocchichichi/psqlodbc/
(B
(B---(end of broadcast)---
(BTIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] 2-phase commit

2003-09-28 Thread Rod Taylor
> > Actually, all that's really necessary is the ability to call a stored
> > procedure when some event occurs.  The stored procedure can take it from
> > there, and since it can be written in C it can do anything the postgres
> > user can do (for good or for ill, of course).
> 
> But the postmaster doesn't connect to any database, and in a serious
> failure, might not be able to start one.

In the event of a catastrophic, the 'nothing is running' scenario is one
standard monitoring software should pick up on that easily enough. One
that PostgreSQL cannot help with anyway (normally this is admin error).

Something simple much like pg_locks with transaction state (idle,
waiting on local lock, waiting on 3rd party, etc.), time transaction
started, time of last status change would be plenty. The monitor
software folks (Big Brother, etc. etc.) can write jobs to query those
elements and create the appropriate SNMP events when say waiting on 3rd
party for > N minutes (log at 1, trouble ticket at 2, SysAdmin page at
5, escalate to VP Pager at 20 minutes or whatever corporate policy is).

An alternative is to package an SNMP daemon (much like the stats daemon)
into the backend to generate SNMP events -- but I think this is overkill
if views are available.


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 2-phase commit

2003-09-28 Thread Kevin Brown
Bruce Momjian wrote:
> Kevin Brown wrote:
> > Actually, all that's really necessary is the ability to call a stored
> > procedure when some event occurs.  The stored procedure can take it from
> > there, and since it can be written in C it can do anything the postgres
> > user can do (for good or for ill, of course).
> 
> But the postmaster doesn't connect to any database, and in a serious
> failure, might not be able to start one.

Ah, true.  But I figured that in the context of 2PC and replication that
most of the associated failures were likely to occur in an active
backend or something equivalent, where a stored procedure was likely to
be accessible.

But yes, you certainly want to account for failures where the database
itself is unavailable.  So I guess my original comment isn't strictly
true.  :-)


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Bruce Momjian
Kevin Brown wrote:
> Bruce Momjian wrote:
> > Marc G. Fournier wrote:
> > > 
> > > 
> > > On Sat, 27 Sep 2003, Bruce Momjian wrote:
> > > 
> > > > I have been thinking it might be time to start allowing external
> > > > programs to be called when certain events occur that require
> > > > administrative attention --- this would be a good case for that.
> > > > Administrators could configure shell scripts to be run when the network
> > > > connection fails or servers drop off the network, alerting them to the
> > > > problem.  Throwing things into the server logs isn't _active_ enough.
> > > 
> > > Actually, apparently you can do this now ... there is apparently a "mail
> > > module" for PostgreSQL that you can use to have the database send email's
> > > out ...
> > 
> > The only part that needs to be added is the ability to call an external
> > program when some even occurs, like a database write failure.
> 
> Actually, all that's really necessary is the ability to call a stored
> procedure when some event occurs.  The stored procedure can take it from
> there, and since it can be written in C it can do anything the postgres
> user can do (for good or for ill, of course).

But the postmaster doesn't connect to any database, and in a serious
failure, might not be able to start one.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Kevin Brown
Bruce Momjian wrote:
> Marc G. Fournier wrote:
> > 
> > 
> > On Sat, 27 Sep 2003, Bruce Momjian wrote:
> > 
> > > I have been thinking it might be time to start allowing external
> > > programs to be called when certain events occur that require
> > > administrative attention --- this would be a good case for that.
> > > Administrators could configure shell scripts to be run when the network
> > > connection fails or servers drop off the network, alerting them to the
> > > problem.  Throwing things into the server logs isn't _active_ enough.
> > 
> > Actually, apparently you can do this now ... there is apparently a "mail
> > module" for PostgreSQL that you can use to have the database send email's
> > out ...
> 
> The only part that needs to be added is the ability to call an external
> program when some even occurs, like a database write failure.

Actually, all that's really necessary is the ability to call a stored
procedure when some event occurs.  The stored procedure can take it from
there, and since it can be written in C it can do anything the postgres
user can do (for good or for ill, of course).


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Hiroshi Inoue
> -Original Message-
(B> From: Tom Lane
(B> 
(B> Bruce Momjian <[EMAIL PROTECTED]> writes:
(B> > Tom Lane wrote:
(B> >> You're not considering the possibility of a transient communication
(B> >> failure.
(B> 
(B> > Can't the master re-send the request after a timeout?
(B> 
(B> Not "it can", but "it has to". 
(B
(BWhy ?$B!!(BMainly the coordinator(slave) not the participant(master)
(Bhas the resposibilty to resolve the in-doubt transaction.
(B
(Bregards,
(BHiroshi Inoue
(B
(B
(B---(end of broadcast)---
(BTIP 2: you can get off all lists at once with the unregister command
(B(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] 2-phase commit

2003-09-27 Thread Bruce Momjian
Marc G. Fournier wrote:
> 
> 
> On Sat, 27 Sep 2003, Bruce Momjian wrote:
> 
> > I have been thinking it might be time to start allowing external
> > programs to be called when certain events occur that require
> > administrative attention --- this would be a good case for that.
> > Administrators could configure shell scripts to be run when the network
> > connection fails or servers drop off the network, alerting them to the
> > problem.  Throwing things into the server logs isn't _active_ enough.
> 
> Actually, apparently you can do this now ... there is apparently a "mail
> module" for PostgreSQL that you can use to have the database send email's
> out ...

The only part that needs to be added is the ability to call an external
program when some even occurs, like a database write failure.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Marc G. Fournier


On Sat, 27 Sep 2003, Bruce Momjian wrote:

> I have been thinking it might be time to start allowing external
> programs to be called when certain events occur that require
> administrative attention --- this would be a good case for that.
> Administrators could configure shell scripts to be run when the network
> connection fails or servers drop off the network, alerting them to the
> problem.  Throwing things into the server logs isn't _active_ enough.

Actually, apparently you can do this now ... there is apparently a "mail
module" for PostgreSQL that you can use to have the database send email's
out ...


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Richard Huxton
On Saturday 27 September 2003 15:47, Bruce Momjian wrote:
> Richard Huxton wrote:
[snip]
> > I might be (well, am actually) a bit out of my depth here, but surely
> > what happens is if you have machines A,B,C and *any* of them thinks
> > machine C has a problem then it does. If C can still communicate with the
> > others then it is told to reinitialise/go away/start the sirens. If C
> > can't communicate then it's all a bit academic.
> >
[snip]
>
> I have been thinking it might be time to start allowing external
> programs to be called when certain events occur that require
> administrative attention --- this would be a good case for that.
> Administrators could configure shell scripts to be run when the network
> connection fails or servers drop off the network, alerting them to the
> problem.  Throwing things into the server logs isn't _active_ enough.

Actually, from the discussion I'd assumed there was some sort of plug-in 
"policy daemon" that was making decisions when things went wrong. Given the 
different scenarios 2 phase-commit will be used in, one size is unlikely to 
fit all.

The idea of a more general system is _very_ interesting. I know Wietse Venema 
has decided to provide an external "policy" interface for his Postfix 
mailserver, precisely because he wants to keep the core system fairly clean.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Shridhar Daithankar
On Saturday 27 September 2003 20:17, Bruce Momjian wrote:
> Richard Huxton wrote:
> I have been thinking it might be time to start allowing external
> programs to be called when certain events occur that require
> administrative attention --- this would be a good case for that.
> Administrators could configure shell scripts to be run when the network
> connection fails or servers drop off the network, alerting them to the
> problem.  Throwing things into the server logs isn't _active_ enough.

I would say calling events from external libraries would be a good extension. 
That could allow for extending postgresql in novel way. e.g. calling a 
logrecord copy event after a WAL record is written for near real time 
replication..:-)

 Shridhar


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Bruce Momjian
Richard Huxton wrote:
> > [itch...]  But you surely cannot guarantee that the slave and the master
> > time out at exactly the same femtosecond.  What happens when the comm
> > link comes back online just when one has timed out and the other not?
> > (Hint: in either order, it ain't good.  Double plus ungood if, say, the
> > comm link manages to deliver the master's "commit confirm" message a
> > little bit after the master has timed out and decided to abort after all.)
> >
> > In my book, timeout-based solutions to this kind of problem are certain
> > disasters.
> 
> I might be (well, am actually) a bit out of my depth here, but surely what 
> happens is if you have machines A,B,C and *any* of them thinks machine C has 
> a problem then it does. If C can still communicate with the others then it is 
> told to reinitialise/go away/start the sirens. If C can't communicate then 
> it's all a bit academic.
> 
> Granted, if you have intermittent problems on a link and set your timeouts 
> badly then you'll have a very brittle system, but if A thinks C has died, you 
> can't just reverse that decision.

I have been thinking it might be time to start allowing external
programs to be called when certain events occur that require
administrative attention --- this would be a good case for that. 
Administrators could configure shell scripts to be run when the network
connection fails or servers drop off the network, alerting them to the
problem.  Throwing things into the server logs isn't _active_ enough.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Marc G. Fournier


On Sat, 27 Sep 2003, Tom Lane wrote:

> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> >> ... You can make this work, but the resource costs
> >> are steep.
>
> > So, after 'n' seconds of waiting, we abandon the slave and the slave
> > abandons the master.
>
> [itch...]  But you surely cannot guarantee that the slave and the master
> time out at exactly the same femtosecond.  What happens when the comm
> link comes back online just when one has timed out and the other not?
> (Hint: in either order, it ain't good.

I think it was Andrew that suggested it ... when the slave timesout, it
should "trigger" a READ ONLY mode on the slave, so that when/if the master
tries to start to talk to it, it can't ...

As for the master itself, it should be smart enough that if it times out,
it knows to actually abandom the slave and not continue to try ...

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-27 Thread Richard Huxton
On Saturday 27 September 2003 06:59, Tom Lane wrote:
> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> >> ... You can make this work, but the resource costs
> >> are steep.
> >
> > So, after 'n' seconds of waiting, we abandon the slave and the slave
> > abandons the master.
>
> [itch...]  But you surely cannot guarantee that the slave and the master
> time out at exactly the same femtosecond.  What happens when the comm
> link comes back online just when one has timed out and the other not?
> (Hint: in either order, it ain't good.  Double plus ungood if, say, the
> comm link manages to deliver the master's "commit confirm" message a
> little bit after the master has timed out and decided to abort after all.)
>
> In my book, timeout-based solutions to this kind of problem are certain
> disasters.

I might be (well, am actually) a bit out of my depth here, but surely what 
happens is if you have machines A,B,C and *any* of them thinks machine C has 
a problem then it does. If C can still communicate with the others then it is 
told to reinitialise/go away/start the sirens. If C can't communicate then 
it's all a bit academic.

Granted, if you have intermittent problems on a link and set your timeouts 
badly then you'll have a very brittle system, but if A thinks C has died, you 
can't just reverse that decision.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] 2-phase commit

2003-09-26 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
>> ... You can make this work, but the resource costs
>> are steep.

> So, after 'n' seconds of waiting, we abandon the slave and the slave
> abandons the master.

[itch...]  But you surely cannot guarantee that the slave and the master
time out at exactly the same femtosecond.  What happens when the comm
link comes back online just when one has timed out and the other not?
(Hint: in either order, it ain't good.  Double plus ungood if, say, the
comm link manages to deliver the master's "commit confirm" message a
little bit after the master has timed out and decided to abort after all.)

In my book, timeout-based solutions to this kind of problem are certain
disasters.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-09-26 Thread Christopher Kings-Lynne
> Not "it can", but "it has to".  The master *must* keep hold of that
> request forever (or until the slave responds, or until we reconfigure
> the system not to consider that slave valid anymore).  Similarly, the
> slave cannot forget the maybe-committed transaction on pain of not being
> a valid slave anymore.  You can make this work, but the resource costs
> are steep.  For instance, in Postgres, you don't get to truncate the WAL
> log, for what could be a really really long time --- more disk space
> than you wanted to spend on WAL anyway.  The locks held by the
> maybe-committed transaction are another potentially unpleasant problem;
> you can't release them, no matter what else they are blocking.

So, after 'n' seconds of waiting, we abandon the slave and the slave
abandons the master.

Such a condition is probably a fairly serious failure anyway, and
something that an admin would need to expect.  The admin would also need
to expect to allocate a heap of disk space for WAL.

Chris



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 2-phase commit

2003-09-26 Thread Gavin Sherry
On Fri, 26 Sep 2003, Christopher Browne wrote:

> [EMAIL PROTECTED] (Bruce Momjian) writes:
> > Patrick Welche wrote:
> >> On Fri, Sep 26, 2003 at 02:49:30PM -0300, Marc G. Fournier wrote:
> >> ...
> >> > if we are talking two computers sitting next to each other on a switch,
> >> > you'd expect those to be low ... but if you were talking about two
> >> > seperate geographical locations (and yes, I realize you are adding lag to
> >> > the mix with waiting for responses), you'd expect those #s to rise ...
> >>
> >> Which I thought was the whole point of using a group communication
> >> protocol such as spread in postgresql-r. It seemed solved there...
> >
> > Right, but I think we want to try to do two-phase commit without
> > spread.  Spread seems overkill for this usage.
>
> Is there some big demerit to _having_ that "overkill"?  If there is no
> major price to pay, then I don't see why it isn't reasonable to simply
> say "Sure, we'll use that!"

I recall Darren Johnson (who is working on replication with spread) saying
that it required a lot of bandwidth in real world scenarios.

Gavin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] 2-phase commit

2003-09-26 Thread Mike Mascari
Marc G. Fournier wrote:

> On Fri, 26 Sep 2003, Tom Lane wrote:
> 
>>Bruce Momjian <[EMAIL PROTECTED]> writes:
>>
>>>Tom Lane wrote:
>>>
You're not considering the possibility of a transient communication
failure.
>>
>>>Can't the master re-send the request after a timeout?
>>
>>Not "it can", but "it has to".  The master *must* keep hold of that
>>request forever (or until the slave responds, or until we reconfigure
>>the system not to consider that slave valid anymore).  Similarly, the
>>slave cannot forget the maybe-committed transaction on pain of not being
>>a valid slave anymore.
> 
> Hr ... is there no way of having part of the protocol being a message
> sent back that its a valid/invalid slave?  ie. slave has an uncommitted
> transaction, never hears back from master to actually do the commit, so
> after x-secs * y-retries any messages it does try to send to the master
> have a bit flag set to 'invalid'?

If I understand Andrew Sullivan's request, the purpose for integration
of 2-PC into PostgreSQL, is more for distributed query than
replication via an XA interface:

http://sybooks.sybase.com/onlinebooks/group-xsarc/xsge/xatuxedo/@ebt-link;pt=61?target=%25N%13_446_START_RESTART_N%25

If that is the desire (XA-compatibility) then PostgreSQL might be
talking to an Oracle database or a BEA Tuxedo TPM acting as the
coordinator. So PostgreSQL won't have an opportunity to modify the
protocol in any meaningful way if it wishes to interoperate with
XA-based transaction managers.

If it is being used only amongst other PostgreSQL backends for
replication, then why not use one of the optimistic replication protocols:

http://www.inf.ethz.ch/personal/alonso/PAPERS/commit-fast.pdf

Mike Mascari
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] 2-phase commit

2003-09-26 Thread Rod Taylor
> The first problem is the restart/rejoin problem.  When a 2PC member
> goes away, it is supposed to come back with all its former locks and
> everything in place, so that it can know what to do.  This is also
> extremely tricky, but I think the answer is sort of easy.  A member
> which re-joins without crashing (that is, it has open transactions,

I think you may be confusing 2PC with replication.

PostgreSQLs 2PC implementation should follow enough of the XA rules to
play nice in a mixed environment where something else is managing the
transactions (application servers are becoming more common all the
time).

As far as inter-PostgreSQL replication / queries are concerned we can
choose whatever semantics we like -- just realize that they are 2
different problems.


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 2-phase commit

2003-09-26 Thread Rod Taylor
On Fri, 2003-09-26 at 13:58, Bruce Momjian wrote:
> Patrick Welche wrote:
> > On Fri, Sep 26, 2003 at 02:49:30PM -0300, Marc G. Fournier wrote:
> > ... 
> > > if we are talking two computers sitting next to each other on a switch,
> > > you'd expect those to be low ... but if you were talking about two
> > > seperate geographical locations (and yes, I realize you are adding lag to
> > > the mix with waiting for responses), you'd expect those #s to rise ...
> > 
> > Which I thought was the whole point of using a group communication protocol
> > such as spread in postgresql-r. It seemed solved there...
> 
> Right, but I think we want to try to do two-phase commit without spread.
> Spread seems overkill for this usage.

Out of curiosity, how does one use spread to accomplish 2PC? Isn't the
logic the Application Server would need to follow rather different with
a group communication based control than with XA / 2PC style
communication? How does one map to the other?


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 2-phase commit

2003-09-26 Thread Marc G. Fournier


On Fri, 26 Sep 2003, Christopher Browne wrote:

> [EMAIL PROTECTED] (Bruce Momjian) writes:
> > Patrick Welche wrote:
> >> On Fri, Sep 26, 2003 at 02:49:30PM -0300, Marc G. Fournier wrote:
> >> ...
> >> > if we are talking two computers sitting next to each other on a switch,
> >> > you'd expect those to be low ... but if you were talking about two
> >> > seperate geographical locations (and yes, I realize you are adding lag to
> >> > the mix with waiting for responses), you'd expect those #s to rise ...
> >>
> >> Which I thought was the whole point of using a group communication
> >> protocol such as spread in postgresql-r. It seemed solved there...
> >
> > Right, but I think we want to try to do two-phase commit without
> > spread.  Spread seems overkill for this usage.
>
> Is there some big demerit to _having_ that "overkill"?  If there is no
> major price to pay, then I don't see why it isn't reasonable to simply
> say "Sure, we'll use that!"

Reliance on a third party library to be installed to provide the
functionality ... if it were meant as an "add on" instead of "standard
feature", then sure ...

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] 2-phase commit

2003-09-26 Thread Andrew Sullivan
On Fri, Sep 26, 2003 at 02:05:36PM -0400, Tom Lane wrote:
> a valid slave anymore.  You can make this work, but the resource costs
> are steep.  For instance, in Postgres, you don't get to truncate the WAL

But people who want 2PC are more than willing to pay all that cost. 

A
-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 8: explain analyze is your friend


  1   2   >