Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-25 Thread dom

> yup :-) Maybe this could even be raised to the SQL level, 
> so applications could use this ? I have not seen this elsewhere,
> but why actually not ?

  Yes please :-) if someone is to code this quicker than me (I suppose
so, since I have other projects to deal with concurrently).

-- 
<< Tout n'y est pas parfait, mais on y honore certainement les jardiniers >>

Dominique Quatravaux <[EMAIL PROTECTED]>



Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-24 Thread Bruce Momjian


Added to TODO.detail/replication.

> [ Charset ISO-8859-1 unsupported, converting... ]
> > >   I had thought that the pre-commit information could be stored in an
> > > auxiliary table by the middleware program ; we would then have
> > > to re-implement some sort of higher-level WAL (I thought of the list
> > > of the commands performed in the current transaction, with a sequence
> > > number for each of them that would guarantee correct ordering between
> > > concurrent transactions in case of a REDO). But I fear I am missing
> > 
> > This wouldn't work for READ COMMITTED isolation level.
> > But why do you want to log commands into WAL where each modification
> > is already logged in, hm, correct order?
> > Well, it has sense if you're looking for async replication but
> > you need not in two-phase commit for this and should aware about
> > problems with READ COMMITTED isolevel.
> > 
> 
> I believe the issue here is that while SERIALIZABLE ISOLATION means all
> queries can be run serially, our default is READ COMMITTED, meaning that
> open transactions see committed transactions, even if the transaction
> committed after our transaction started.  (FYI, see my chapter on
> transactions for help,  http://www.postgresql.org/docs/awbook.html.)
> 
> To do higher-level WAL, you would have to record not only the queries,
> but the other queries that were committed at the start of each command
> in your transaction.
> 
> Ideally, you could number every commit by its XID your log, and then
> when processing the query, pass the "committed" transaction ids that
> were visible at the time each command began.
> 
> In other words, you can replay the queries in transaction commit order,
> except that you have to have some transactions committed at specific
> points while other transactions are open, i.e.:
> 
> XID   Open XIDS   Query
> 500   UPDATE t SET col = 3;
> 501   500 BEGIN;
> 501   500 UPDATE t SET col = 4;
> 501   UPDATE t SET col = 5;
> 501   COMMIT;
> 
> This is a silly example, but it shows that 500 must commit after the
> first command in transaction 501, but before the second command in the
> transaction.  This is because UPDATE t SET col = 5 actually sees the
> changes made by transaction 500 in READ COMMITTED isolation level.
> 
> I am not advocating this.  I think WAL is a better choice.  I just
> wanted to outline how replaying the queries in commit order is 
> insufficient.
> 
> > Back to two-phase commit - it's easiest part of work required for
> > distributed transaction processing.
> > Currently we place single commit record to log and transaction is
> > committed when this record (and so all other transaction records)
> > is on disk.
> > Two-phase commit:
> > 
> > 1. For 1st phase we'll place into log "prepared-to-commit" record
> >and this phase will be accomplished after record is flushed on disk.
> >At this point transaction may be committed at any time because of
> >all its modifications are logged. But it still may be rolled back
> >if this phase failed on other sites of distributed system.
> > 
> > 2. When all sites are prepared to commit we'll place "committed"
> >record into log. No need to flush it because of in the event of
> >crash for all "prepared" transactions recoverer will have to
> >communicate other sites to know their statuses anyway.
> > 
> > That's all! It is really hard to implement distributed lock- and
> > communication- managers but there is no problem with logging two
> > records instead of one. Period.
> 
> Great.
> 
> 
> -- 
>   Bruce Momjian|  http://candle.pha.pa.us
>   [EMAIL PROTECTED]   |  (610) 853-3000
>   +  If your life is a hard drive, |  830 Blythe Avenue
>   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026
> 


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-24 Thread Bruce Momjian


Added to TODO.detail/replication.

[ Charset ISO-8859-1 unsupported, converting... ]
> >   I had thought that the pre-commit information could be stored in an
> > auxiliary table by the middleware program ; we would then have
> > to re-implement some sort of higher-level WAL (I thought of the list
> > of the commands performed in the current transaction, with a sequence
> > number for each of them that would guarantee correct ordering between
> > concurrent transactions in case of a REDO). But I fear I am missing
> 
> This wouldn't work for READ COMMITTED isolation level.
> But why do you want to log commands into WAL where each modification
> is already logged in, hm, correct order?
> Well, it has sense if you're looking for async replication but
> you need not in two-phase commit for this and should aware about
> problems with READ COMMITTED isolevel.
> 
> Back to two-phase commit - it's easiest part of work required for
> distributed transaction processing.
> Currently we place single commit record to log and transaction is
> committed when this record (and so all other transaction records)
> is on disk.
> Two-phase commit:
> 
> 1. For 1st phase we'll place into log "prepared-to-commit" record
>and this phase will be accomplished after record is flushed on disk.
>At this point transaction may be committed at any time because of
>all its modifications are logged. But it still may be rolled back
>if this phase failed on other sites of distributed system.
> 
> 2. When all sites are prepared to commit we'll place "committed"
>record into log. No need to flush it because of in the event of
>crash for all "prepared" transactions recoverer will have to
>communicate other sites to know their statuses anyway.
> 
> That's all! It is really hard to implement distributed lock- and
> communication- managers but there is no problem with logging two
> records instead of one. Period.
> 
> Vadim
> 


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



RE: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-24 Thread Mikheev, Vadim

> > 1. For 1st phase we'll place into log "prepared-to-commit" record
> >and this phase will be accomplished after record is 
> >flushed on disk.
> >At this point transaction may be committed at any time because of
> >all its modifications are logged. But it still may be rolled back
> >if this phase failed on other sites of distributed system.
> 
> 1st phase will also need to do all the delayed constraint checks,
> and all other work a commit currently does, that could possibly lead 
> to a transaction abort. The 2nd phase of 2phase commit is not 

It was assumed.

Vadim



Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-23 Thread Bruce Momjian

[ Charset ISO-8859-1 unsupported, converting... ]
> >   I had thought that the pre-commit information could be stored in an
> > auxiliary table by the middleware program ; we would then have
> > to re-implement some sort of higher-level WAL (I thought of the list
> > of the commands performed in the current transaction, with a sequence
> > number for each of them that would guarantee correct ordering between
> > concurrent transactions in case of a REDO). But I fear I am missing
> 
> This wouldn't work for READ COMMITTED isolation level.
> But why do you want to log commands into WAL where each modification
> is already logged in, hm, correct order?
> Well, it has sense if you're looking for async replication but
> you need not in two-phase commit for this and should aware about
> problems with READ COMMITTED isolevel.
> 

I believe the issue here is that while SERIALIZABLE ISOLATION means all
queries can be run serially, our default is READ COMMITTED, meaning that
open transactions see committed transactions, even if the transaction
committed after our transaction started.  (FYI, see my chapter on
transactions for help,  http://www.postgresql.org/docs/awbook.html.)

To do higher-level WAL, you would have to record not only the queries,
but the other queries that were committed at the start of each command
in your transaction.

Ideally, you could number every commit by its XID your log, and then
when processing the query, pass the "committed" transaction ids that
were visible at the time each command began.

In other words, you can replay the queries in transaction commit order,
except that you have to have some transactions committed at specific
points while other transactions are open, i.e.:

XID Open XIDS   Query
500 UPDATE t SET col = 3;
501 500 BEGIN;
501 500 UPDATE t SET col = 4;
501 UPDATE t SET col = 5;
501 COMMIT;

This is a silly example, but it shows that 500 must commit after the
first command in transaction 501, but before the second command in the
transaction.  This is because UPDATE t SET col = 5 actually sees the
changes made by transaction 500 in READ COMMITTED isolation level.

I am not advocating this.  I think WAL is a better choice.  I just
wanted to outline how replaying the queries in commit order is 
insufficient.

> Back to two-phase commit - it's easiest part of work required for
> distributed transaction processing.
> Currently we place single commit record to log and transaction is
> committed when this record (and so all other transaction records)
> is on disk.
> Two-phase commit:
> 
> 1. For 1st phase we'll place into log "prepared-to-commit" record
>and this phase will be accomplished after record is flushed on disk.
>At this point transaction may be committed at any time because of
>all its modifications are logged. But it still may be rolled back
>if this phase failed on other sites of distributed system.
> 
> 2. When all sites are prepared to commit we'll place "committed"
>record into log. No need to flush it because of in the event of
>crash for all "prepared" transactions recoverer will have to
>communicate other sites to know their statuses anyway.
> 
> That's all! It is really hard to implement distributed lock- and
> communication- managers but there is no problem with logging two
> records instead of one. Period.

Great.


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



RE: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-23 Thread Mikheev, Vadim

>   I had thought that the pre-commit information could be stored in an
> auxiliary table by the middleware program ; we would then have
> to re-implement some sort of higher-level WAL (I thought of the list
> of the commands performed in the current transaction, with a sequence
> number for each of them that would guarantee correct ordering between
> concurrent transactions in case of a REDO). But I fear I am missing

This wouldn't work for READ COMMITTED isolation level.
But why do you want to log commands into WAL where each modification
is already logged in, hm, correct order?
Well, it has sense if you're looking for async replication but
you need not in two-phase commit for this and should aware about
problems with READ COMMITTED isolevel.

Back to two-phase commit - it's easiest part of work required for
distributed transaction processing.
Currently we place single commit record to log and transaction is
committed when this record (and so all other transaction records)
is on disk.
Two-phase commit:

1. For 1st phase we'll place into log "prepared-to-commit" record
   and this phase will be accomplished after record is flushed on disk.
   At this point transaction may be committed at any time because of
   all its modifications are logged. But it still may be rolled back
   if this phase failed on other sites of distributed system.

2. When all sites are prepared to commit we'll place "committed"
   record into log. No need to flush it because of in the event of
   crash for all "prepared" transactions recoverer will have to
   communicate other sites to know their statuses anyway.

That's all! It is really hard to implement distributed lock- and
communication- managers but there is no problem with logging two
records instead of one. Period.

Vadim



Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-23 Thread dom

> This 'pre-commit' 'really commit' two-step (get 'yer cowboy hats, right
> here) is what's needed, and is currently missing from pgsql. 

  Hello,

  I'm very interested in this topic since I am involved in a
distributed, several-PostgreSQLs-backed, open-source,
buzzword-compliant database replication middleware (still in the draft
stage though --- this is not an announcement :-).
  I had thought that the pre-commit information could be stored in an
auxiliary table by the middleware program ; we would then have
to re-implement some sort of higher-level WAL (I thought of the list
of the commands performed in the current transaction, with a sequence
number for each of them that would guarantee correct ordering between
concurrent transactions in case of a REDO). But I fear I am missing
a number of important issues there ; so could you please comment on my
idea ? 
  * what should I try not to forget to record in the higher-level WAL
  if I want consistency ?
  * how could one collect consistent ordering information without
  impacting performance too much ? Will ordering suffice to guarantee
  correctness of the REDO ? (I mean, are there sources of
  nondeterminism in PostgreSQL such as resource exhaustion etc. that I
  should be aware of ?)
  * would it be easier or harder to help implement 2-phase commit
  inside PostgreSQL (but I am not quite a PostgreSQL hacker yet !)

  Many thanks in advance !

-- 
<< Tout n'y est pas parfait, mais on y honore certainement les jardiniers >>

Dominique Quatravaux <[EMAIL PROTECTED]>



Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-22 Thread Ross J. Reedstrom

On Mon, Jan 22, 2001 at 12:41:38PM -0500, Joel Burton wrote:
> On Mon, 22 Jan 2001, Ross J. Reedstrom wrote:
> 
> > And this case can be handled within one database by having multiple
> > schema, one for each package. It's not there yet, but it's a simpler
> > solution than the generic solution. The problem (as others have mentioned)
> > is that we don't want to open the door to remote access until we have a
> > two-phase transaction commit mechanism in place. Doing it any other way
> > is not a 'partial solution', it's a corrupt database waiting to happen.
> 
> What does '2-phase transaction commit mechanism' mean in this case?

Same thing it means elsewhere. Typing "two phase commit" into Google gets me
this url:

http://webopedia.internet.com/Computer_Science/Transaction_Processing/two_phase_commit.html

Which says:

   A feature of transaction processing systems that enables databases
   to be returned to the pre-transaction state if some error condition
   occurs. A single transaction can update many different databases. The
   two-phase commit strategy is designed to ensure that either all the
   databases are updated or none of them, so that the databases remain
   synchronized.

   Database changes required by a transaction are initially stored
   temporarily by each database. The transaction monitor then
   issues a "pre-commit" command to each database which requires an
   acknowledgment. If the monitor receives the appropriate response from
   each database, the monitor issues the "commit" command, which causes
   all databases to simultaneously make the transaction changes permanent.


This 'pre-commit' 'really commit' two-step (get 'yer cowboy hats, right
here) is what's needed, and is currently missing from pgsql. 


Ross



Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-22 Thread Joel Burton

On Mon, 22 Jan 2001, Ross J. Reedstrom wrote:

> And this case can be handled within one database by having multiple
> schema, one for each package. It's not there yet, but it's a simpler
> solution than the generic solution. The problem (as others have mentioned)
> is that we don't want to open the door to remote access until we have a
> two-phase transaction commit mechanism in place. Doing it any other way
> is not a 'partial solution', it's a corrupt database waiting to happen.

What does '2-phase transaction commit mechanism' mean in this case?

-- 
Joel Burton   <[EMAIL PROTECTED]>
Director of Information Systems, Support Center of Washington




Re: [HACKERS] Re: AW: Re: MySQL and BerkleyDB (fwd)

2001-01-22 Thread Ross J. Reedstrom

On Mon, Jan 22, 2001 at 12:18:54PM -0500, Joel Burton wrote:
> On Mon, 22 Jan 2001, Zeugswetter Andreas SB wrote:
> 
> > 
> > > Is anyone looking at doing this?  Is this purely a MySQL-ism, or is it
> > > something that everyone else has except us?
> > 
> > We should not only support access to all db's under one postmaster,
> > but also remote access to other postmaster's databases.
> > All biggie db's allow this in one way or another (synonyms, 
> > qualified object names) including 2-phase commit.
> > Ideally this includes access to other db manufacturers, flat files, bdb ...
> > Meaning, that this is a problem needing a generic approach.
> 
> Of course, a generic, powerful approach would be great.
> 
> However, a simple, limited approach would a be solution for (I
> suspect) 97% of the cases, which is that one software package creates a
> database to store mailing list names, and another creates a database to
> store web permissions, and you want to write a query that encompasses
> both, w/o semi-tedious COPY TO FILEs to temporarily move a table back and
> forth. And of course, a simple solution might be completed faster :-)
> 
> How could this be handled? 
> 

And this case can be handled within one database by having multiple
schema, one for each package. It's not there yet, but it's a simpler
solution than the generic solution. The problem (as others have mentioned)
is that we don't want to open the door to remote access until we have a
two-phase transaction commit mechanism in place. Doing it any other way
is not a 'partial solution', it's a corrupt database waiting to happen.


> * a syntax for db-table names, such as mydb.myfield or something like
> that. (do we have any unused punctuation? :-) )

This is the sort of syntax that SQL9* specify for cross schema access.
So far, it fits into the parser just fine.

> * aliases, so that tblFoo in dbA can be called as ToFoo in dbB

This can be done with views, once schema are in place.

Ross
-- 
Open source code is like a natural resource, it's the result of providing
food and sunshine to programmers, and then staying out of their way.
[...] [It] is not going away because it has utility for both the developers 
and users independent of economic motivations.  Jim Flynn, Sunnyvale, Calif.