Re: [HACKERS] incoherent view of serializable transactions

2009-01-21 Thread Kevin Grittner
Here's a shot at a more radical revision, to try to address concerns
raised over my failure in the previous (very minimal) suggested patch
to address PostgreSQL behavior close to where the spec's behavior is
described, and my dragging in of language directly from the spec in a
confusing context.  I'd appreciate any corrections or suggestions
before I massage it into sgml.
 
Also, I don't know if I should leave it with the one example or
whether there should be more.  I could leave in the old example,
although the popular example of reversing updates (one transaction
updates all card rows to 'face-up' where they are 'face-down' and vice
versa) seems easier to understand.  One or both of these?  Other
suggestions?
 
Also, I tried using SELECT FOR SHARE and SELECT FOR HOLD as the
complete solution or instead of one of the table locks, but was able
to generate anomalies in all such cases.  If someone has a less
extreme technique for blocking the anomalies in the receipt example
(even when the SELECT of the deposit date for a receipt is in a
separate statement earlier in the transaction), please let me know, so
that I can include it.
 
If time permits I might take a stab at expanding the section on data
consistency checks at the application level; however, that seems less
urgent than correcting the obsolescent discussions of the SQL standard
and describing some of the anomalies not covered by a discussion of
consistency checks.
 
Thanks,
 
-Kevin
 
 
13.2. Transaction Isolation
 
The SQL standard defines four levels of transaction isolation in terms
of three phenomena that must be prevented between concurrent
transactions, with additional constraints on Serializable
transactions. These undesirable phenomena are: 
 
dirty read 
 
A transaction reads data written by a concurrent uncommitted
transaction. 
 
nonrepeatable read 
 
A transaction re-reads data it has previously read and finds that data
has been modified by another transaction (that committed since the
initial read). 
 
phantom read 
 
A transaction re-executes a query returning a set of rows that satisfy
a search condition and finds that the set of rows satisfying the
condition has changed due to another recently-committed transaction.
 
The four transaction isolation levels and the corresponding behaviors
are described in Table 13-1.
 
Table 13-1. SQL Transaction Isolation Levels
 
table here
 
The standard also requires that serializable transactions behave as
though they were run one at a time, even though their execution may
actually overlap. Since the phenomena described above relate to the
visibility of the effects of concurrent transactions, and each
serializable transaction must behave as though it were run in its
entirety either before or after every other transaction, none of the
above phenomena can occur within a serializable transaction.
 
In practice there is another popular transaction isolation level, not
mentioned in the standard, generally known as Snapshot isolation
level.  A transaction executed at this transaction isolation level
sees a consistent view of the data; changes made by other transactions
are not visible to it.  Because of this, none of the phenomena
described above are possible.  Additionally, when concurrent
transactions running at this level attempt to modify the same data,
the update conflict causes causes transaction rollback to prevent many
forms of update anomalies.  Still, data may be viewed or stored in a
state which is not consistent with any serial execution of
transactions run at this level, so although it is more strict than
required for Repeatable Read, it does not meet the standard's
definition of the Serializable transaction isolation level.
 
In PostgreSQL you can request any of the four standard transaction
isolation levels, but internally there are only two distinct isolation
levels, which correspond to the levels Read Committed and Snapshot.
When you select the level Read Uncommitted you really get Read
Committed, and when you select Repeatable Read or Serializable you
really get Snapshot.  Since the standard does not provide for the
Snapshot isolation level, PostgreSQL reports it as Serializable.  The
behavior of the available isolation levels is detailed in the
following subsections. 
 
To set the transaction isolation level of a transaction, use the
command SET TRANSACTION. or specify the desired transaction isolation
level on a BEGIN TRANSACTION or START TRANSACTION statement.
 
 
13.2.1. Read Committed Isolation Level
 
unchanged
 
 
13.2.2. Snapshot Isolation Level
 
The Snapshot level (reported as Serializable) provides the strictest
transaction isolation available in PostgreSQL. This level approximates
serial transaction execution, as if transactions had been executed one
after another, serially, rather than concurrently. However,
applications using this level must be prepared to retry transactions
due to serialization failures.
 
When a transaction is on the this level, a SELECT query sees 

Re: [HACKERS] incoherent view of serializable transactions

2009-01-08 Thread Kevin Grittner
 I wrote: 
 Peter Eisentraut pete...@gmx.net wrote: 
 Kevin Grittner wrote:
 is a natural consequence of the fact --- There is nothing
 natural about any of this.  Why is it a consequence and how?
  
 How could you possibly get any of those phenomena if there are no
 concurrent transactions?
 
 I see what you mean now, but you could write out that logic in more
 detail.
  
 Those weren't my words; I was quoting the SQL spec.
 
Last night I was reviewing my proposed patch from this thread, to try
to address other expressed concerns, and noticed that I had used this
language from the SQL spec in the patch.  I see your point now. 
Without the same context as the spec, and when intended for a
different audience, this language probably isn't the best.  It now
also occurs to me that the spec is a copyrighted work, and it probably
isn't appropriate to copy a chunk that big into PostgreSQL docs.
 
I'll write something in my own words to replace this.
 
Thanks for the input, and sorry for misunderstanding.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-07 Thread Peter Eisentraut

Kevin Grittner wrote:

is a natural consequence of the fact --- There is nothing natural
about any of this.  Why is it a consequence and how?
 
How could you possibly get any of those phenomena if there are no

concurrent transactions?


I see what you mean now, but you could write out that logic in more detail.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-07 Thread Kevin Grittner
 Peter Eisentraut pete...@gmx.net wrote: 
 Kevin Grittner wrote:
 is a natural consequence of the fact --- There is nothing
 natural about any of this.  Why is it a consequence and how?
  
 How could you possibly get any of those phenomena if there are no
 concurrent transactions?
 
 I see what you mean now, but you could write out that logic in more
 detail.
 
Those weren't my words; I was quoting the SQL spec.  It came about
half a page after they had defined serializable transactions by saying
that they must produce the same effect as if they had been run (in
some order) one at a time.  The spec then defined phenomena that can
occur during the execution of concurrent SQL-transactions and gave a
table of which phenomena could occur at which transaction isolation
level.  Immediately after the table was the note about natural
consequence.
 
Since these phenomena are defined in terms of the visibility of the
effects of concurrent transactions, and serializable transactions must
have the same effect as if they were run one at a time, a natural
consequence is that none of the effects can occur.  What gaps in the
logic to you see?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-06 Thread Kevin Grittner
 Paul Schlie sch...@comcast.net wrote: 
 Sorry if I'm restating the obvious, however I don't understand the
 confusion, as it seems the standard's definition isn't mysterious;
 it simply requires that the resulting state from the concurrent
 execution of transactions (and implicitly any subset) designated to
 occur at the isolation level SERIALIZABLE be equivalent to SOME
 LITERALLY SERIAL execution of said transactions.
 
I think that some of the confusion may result from changes in the
standard.  As far as I can recall, the language requiring that the
SERIALIZABLE transaction isolation level be truly serializable was not
in early versions of the standard, and it may be that there is some
reluctance to concede that a shift in the standard has rendered
PostgreSQL out of compliance on this point.
 
As I see it, the discussion on this thread is around recognition of
the requirements of the current standard within the PostgreSQL
documentation.
 
There is a related thread on which I'm attempting to come up with
documentation to assist those familiar with true serializable behavior
who are attempting to recognize application coding patterns where the
differences between that and snapshot isolation are material, with
tips on how to handle these differences.  There seems to be some
question whether the patterns in which anomalies occur are common
enough to merit comment.
 
If you could reference any concise and accessible work on these
anomalies and practical workarounds in application code, it would be
much appreciated.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-06 Thread Paul Schlie
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Paul Schlie sch...@comcast.net wrote:
 Sorry if I'm restating the obvious, however I don't understand the
 confusion, as it seems the standard's definition isn't mysterious;
 it simply requires that the resulting state from the concurrent
 execution of transactions (and implicitly any subset) designated to
 occur at the isolation level SERIALIZABLE be equivalent to SOME
 LITERALLY SERIAL execution of said transactions.
  
 I think that some of the confusion may result from changes in the
 standard.  As far as I can recall, the language requiring that the
 SERIALIZABLE transaction isolation level be truly serializable was not
 in early versions of the standard, and it may be that there is some
 reluctance to concede that a shift in the standard has rendered
 PostgreSQL out of compliance on this point.
  
 As I see it, the discussion on this thread is around recognition of
 the requirements of the current standard within the PostgreSQL
 documentation.
  
 There is a related thread on which I'm attempting to come up with
 documentation to assist those familiar with true serializable behavior
 who are attempting to recognize application coding patterns where the
 differences between that and snapshot isolation are material, with
 tips on how to handle these differences.  There seems to be some
 question whether the patterns in which anomalies occur are common
 enough to merit comment.
  
 If you could reference any concise and accessible work on these
 anomalies and practical workarounds in application code, it would be
 much appreciated.

Personally; although compliance may reduce the execution performance of
such so designated transactions, it will correspondingly warrant correct
results, and should be the goal rather than documenting non-conformance;
as those who wish to embed more direct control over transaction evaluation
into their specification to enable their improved concurrent execution
efficiency by utilizing more relaxed evaluation semantics, remain free to
do without penalty. (Simple examples of the risk of non-compliance already
seem sufficiently identified in your example and first reference cited).

Merely documenting that transactions designated to be evaluated at the
isolation level SERIALIZABLE may not yield expected results, as currently
identified, seems sufficient in the short term; and as/if enough interest
develops otherwise, so may an effort to warrant compliance; I suspect.

(as that known to most often be fine, can't be relied upon in practice)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-06 Thread Kevin Grittner
 Paul Schlie sch...@comcast.net wrote: 
  Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 There is a related thread on which I'm attempting to come up with
 documentation to assist those familiar with true serializable
 behavior who are attempting to recognize application coding
 patterns where the differences between that and snapshot isolation
 are material, with tips on how to handle these differences.  There
 seems to be some question whether the patterns in which anomalies
 occur are common enough to merit comment.
  
 If you could reference any concise and accessible work on these
 anomalies and practical workarounds in application code, it would
 be much appreciated.
 
 Personally; although compliance may reduce the execution performance
 of such so designated transactions, it will correspondingly warrant
 correct results, and should be the goal rather than documenting
 non-conformance; as those who wish to embed more direct control over
 transaction evaluation into their specification to enable their
 improved concurrent execution efficiency by utilizing more relaxed
 evaluation semantics, remain free to do without penalty. (Simple
 examples of the risk of non-compliance already seem sufficiently
 identified in your example and first reference cited).
 
 Merely documenting that transactions designated to be evaluated at
 the isolation level SERIALIZABLE may not yield expected results, as
 currently identified, seems sufficient in the short term; and as/if
 enough interest develops otherwise, so may an effort to warrant
 compliance; I suspect.
 
 (as that known to most often be fine, can't be relied upon in
 practice)
 
Thank you for your perspective.  I'm not sure that I totally followed
you, so let me restate to see if it sounds right to you.  You are
suggesting that minimal discussion of the problem, the initial example
I provided, and more discussion of how to ensure correct semantics
would be what is needed?  Filling in more detail if interest is
expressed by users?
 
If so, the draft of a partial replacement for the partial replacement
of text in Serializable Isolation versus True Serializability may be
close to what you're suggesting -- if additional guidance on when to
use what additional locks is provided.  I'll paste below my signature
for comment.  It's a little rough yet, but looking to see if I'm on
the right track.
 
The first paragraph is a slightly modified form of a suggestion from
Robert Haas in:
 
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01732.php
 
-Kevin
 
 
PostgreSQL's MVCC framework, snapshot isolation, and limited automatic
row-level locking permit a greater degree of concurrency than some
other databases; however, even when the transaction isolation level is
set to serializable, serialization anomalies can occur in some
situations.  When it is important to prevent these anomalies, explicit
row-level or table-level locking can be used at the expense of reduced
concurrency.
 
Since PostgreSQL protects a serializable transaction against changes
in the view of the data, and uses locks to prevent modification of
data which is being modified by a concurrent transaction, the
anomalies can only occur when a transaction reads data which is
modified by a concurrent transaction, and uses that as the basis of
database modifications which are read by a concurrent transaction. 
Data consistency checks at the application level have a problem with
this in general, and are addressed in section 13.4.  Some examples of
other types of anomalies follow, with suggestions on how to use
explicit locking to prevent the anomalies where needed.
 
Consider a system which involves recording receipts, each of which
must go into a daily deposit.  There is a control table with one row
containing the current deposit date for receipts.  Each transaction
which is inserting a receipt selects the deposit date from the control
table within its transaction, and uses it for the receipt's deposit
date.  Somewhere mid-afternoon the control table's date is updated,
all subsequent receipts should fall into the new day, and a report is
run listing the receipts for the day and giving the deposit total.
 
If all transactions involved were truly serializable, any SELECT of
receipts for a date prior to the deposit date of the control table
would see the complete, final set of receipts.  Under the PostgreSQL
implementation, unless explicit locking is used, although data
eventually gets to that state there can be a window of time during
which a SELECT can return an incomplete list of receipts for a date
which appears to be closed, even if all transactions for modifying and
viewing data are SERIALIZABLE.  This window of time runs from the
commit of the transaction which updated the control table until the
commit of any pending transactions which are inserting receipts and
which obtained a snapshot before the update of the control table.
 
To prevent this anomaly, a lock can be taken out on the receipt 

Re: [HACKERS] incoherent view of serializable transactions

2009-01-05 Thread Kevin Grittner
 Peter Eisentraut pete...@gmx.net wrote:
 A language lawyer might also point out that the note that contains 
 the explicitness isn't actually part of the formal standard.  The
only 
 thing that the standard formally defines are the excluded phenomena.
 
Previously quoted, from the standard:
 
The execution of concurrent SQL-transactions at isolation level
SERIALIZABLE is guaranteed to be serializable. A serializable
execution is defined to be an execution of the operations of
concurrently executing SQL-transactions that produces the same
effect as some serial execution of those same SQL-transactions. A
serial execution is one in which each SQL-transaction executes to
completion before the next SQL-transaction begins.
 
 More to the point, think about how a user might want to think about
these 
 issues.
 
 The standard also requires that serializable transactions behave as
though 
 [...] --- User: The standard requires it, but is it also
implemented?  
 (Apparently not, but that is explained somewhere else.)
 
That's something I thought about, but failed to find a good way to
incorporate at that point, and I thought the discussion in the
following sections covered it.  Perhaps a reference to those following
sections at the point of definition?
 
 is a natural consequence of the fact --- There is nothing natural
 about any of this.  Why is it a consequence and how?
 
How could you possibly get any of those phenomena if there are no
concurrent transactions?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-05 Thread Paul Schlie
Sorry if I'm restating the obvious, however I don't understand the
confusion, as it seems the standard's definition isn't mysterious; it
simply requires that the resulting state from the concurrent execution
of transactions (and implicitly any subset) designated to occur at the
isolation level SERIALIZABLE be equivalent to SOME LITERALLY SERIAL
execution of said transactions.

Thereby although of course transactions literally executed serially
comply; they need not be executed literally serially, as long as some
mechanism is used to warrant that the state resulting from the completion
of every such transaction in some sequence be equivalent to the literally
serial execution of the transactions in the same sequence.

Thereby even if only a subset of such concurrently executed transactions
may be complete at some point in time, any potentially remaining such
transactions may be restarted for whatever reason and remain compliant
(as the resulting state from the concurrent execution of such transactions
will be equivalent to some literal serial execution of said transactions
upon the completion of each so executed transaction; correspondingly any
implementation which can not warrant the same, is simply non-compliant).

Thereby although no pre-specified completion order exists per-se, the
order of completion of transactions implicitly determine if incomplete
transactions may need to be restarted if dependant on a preceding completed
transaction's resulting mutated state (or alternatively allowed to proceed
if previously blocked on said potentially mutated state).

Thereby although a collection of such concurrently executed transactions
my have mutual interdependencies, each subsequent complete transaction in
effect break dependency deadlocks as may otherwise logically exist; and
thereby may affect optimal logical execution order of remaining such
transactions potentially concurrently; and be warranted to never deadlock.
I believe IMHO.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-03 Thread Peter Eisentraut
On Wednesday 31 December 2008 02:33:26 Kevin Grittner wrote:
 I'm still working on section Serializable Isolation versus True
 Serializability, but here are all the changes I can see which precede
 it.  Has the review of the SQL specs convinced everyone that this much
 is appropriate?

I don't agree with these changes.  You make it sound like serializability is 
an additional condition on the serializable isolation level on top of the 
no-phantom-reads condition.  I think that is not true, both mathematically 
and from the wording of the SQL standard.  It is an equivalent condition or a 
consequence, depending on how you view it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-03 Thread Peter Eisentraut
On Tuesday 30 December 2008 19:28:01 Kevin Grittner wrote:
 Here we go.  I've labeled the transactions consistently with new
 thread I started trying to characterize the full scope of issues and
 workarounds.

OK, I believe it now. :-)

The missing predicate locking is again at fault here.  Because this ...

 -- connection 1 (start of T0)
 start transaction isolation level serializable;
 insert into receipt values (3, (select deposit_date from ctl where k =
 'receipt'), 4.00);

... would lock ctl where k = 'receipt' ...

 -- connection 2 (T1)
 start transaction isolation level serializable;
 update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
 commit transaction;

... and then this would have to wait ...

 -- connection 2 (TN version 2)
 start transaction isolation level serializable;
 insert into receipt_totals
   select r.deposit_date, c.deposit_date, sum(amount)
 from ctl c join receipt r
   on ( r.deposit_date  c.deposit_date
and not exists
(
  select * from receipt r2
where r2.deposit_date  c.deposit_date
  and r2.deposit_date  r.deposit_date
)
  )
 group by r.deposit_date, c.deposit_date;
 commit transaction;

... and this could never be executed before T0 commits.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-03 Thread Gregory Stark
Peter Eisentraut pete...@gmx.net writes:

 On Wednesday 31 December 2008 02:33:26 Kevin Grittner wrote:
 I'm still working on section Serializable Isolation versus True
 Serializability, but here are all the changes I can see which precede
 it.  Has the review of the SQL specs convinced everyone that this much
 is appropriate?

 I don't agree with these changes.  You make it sound like serializability is 
 an additional condition on the serializable isolation level on top of the 
 no-phantom-reads condition.  I think that is not true, both mathematically 
 and from the wording of the SQL standard.  It is an equivalent condition or a 
 consequence, depending on how you view it.

The standard explicitly says that the no-phantom-reads condition is a
consequence of the serializability constraint. Did you miss that whole
discussion this past week?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2009-01-03 Thread Peter Eisentraut
On Sunday 04 January 2009 00:53:51 Gregory Stark wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On Wednesday 31 December 2008 02:33:26 Kevin Grittner wrote:
  I'm still working on section Serializable Isolation versus True
  Serializability, but here are all the changes I can see which precede
  it.  Has the review of the SQL specs convinced everyone that this much
  is appropriate?
 
  I don't agree with these changes.  You make it sound like serializability
  is an additional condition on the serializable isolation level on top of
  the no-phantom-reads condition.  I think that is not true, both
  mathematically and from the wording of the SQL standard.  It is an
  equivalent condition or a consequence, depending on how you view it.

 The standard explicitly says that the no-phantom-reads condition is a
 consequence of the serializability constraint. Did you miss that whole
 discussion this past week?

The SQL standard says it in about three different ways, and textbooks might 
define it an three more ways.  That still doesn't negate my concern with the 
way his patch phrases the issue.

A language lawyer might also point out that the note that contains 
the explicitness isn't actually part of the formal standard.  The only 
thing that the standard formally defines are the excluded phenomena.

More to the point, think about how a user might want to think about these 
issues.

The standard also requires that serializable transactions behave as though 
[...] --- User: The standard requires it, but is it also implemented?  
(Apparently not, but that is explained somewhere else.)

is a natural consequence of the fact --- There is nothing natural about any 
of this.  Why is it a consequence and how?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-30 Thread Peter Eisentraut

Greg Stark wrote:
And I don't see why you discard visibility as unimportant. All the 
transaction isolations are defined in terms of the results if the 
transactions. Those results include both the database state and the data 
returned by the queries. Otherwise phantom read is a meaningless concept.


Basically, if he wants to make a rigid argument that some scenario 
violates the serializability promise, then it is necessary to prove:


(1) There is no serial schedule for the set of transactions that 
achieves the same outcome.  (This proof is probably hard to work out, as 
many there is no proofs are.)


- or -

(2) A phantom read situation occurs.

His original argument uses terms like window where something is 
visible (to whom?), which can probably be transformed into a proof for 
(2), but is not convincing (to me) by itself.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-30 Thread Kevin Grittner
 Peter Eisentraut pete...@gmx.net wrote: 
 Greg Stark wrote:
 And I don't see why you discard visibility as unimportant. All
 the transaction isolations are defined in terms of the results if
 the transactions. Those results include both the database state and
 the data returned by the queries. Otherwise phantom read is a
 meaningless concept.
 
 Basically, if he wants to make a rigid argument that some scenario 
 violates the serializability promise, then it is necessary to prove:
 
 (1) There is no serial schedule for the set of transactions that 
 achieves the same outcome.
 
 - or -
 
 (2) A phantom read situation occurs.
 
Agreed, except that (2) is a subset of (1), so (1) alone is
sufficient.  (How can a read not be repeatable if there are no
concurrent transactions?)
 
I feel that I did provide a proof that the transactions couldn't
represent any serial execution in my original email.  This seems to be
disputed with the argument that a SELECT from a database is not
required to provide coherent data that represents some point in the
serializable stream of transactions.  I disagree, but as I pointed out
previously, it is trivial to capture the results of any such query
into a table and thereby persist the problem within the database.
 
Here we go.  I've labeled the transactions consistently with new
thread I started trying to characterize the full scope of issues and
workarounds.
 
-- setup
drop if exists ctl, receipt, receipt_totals;
create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key,
deposit_date date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);
create table receipt_totals (deposit_date date not null primary key,
next_date date not null, deposit_total numeric(13,2) not null);

-- connection 1 (start of T0)
start transaction isolation level serializable;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2 (T1)
start transaction isolation level serializable;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;

-- connection 2 (TN version 1)
start transaction isolation level serializable;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt where deposit_date = date '2008-12-22';
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit transaction;

-- connection 2 (TN version 2)
start transaction isolation level serializable;
insert into receipt_totals
  select r.deposit_date, c.deposit_date, sum(amount)
from ctl c join receipt r
  on ( r.deposit_date  c.deposit_date
   and not exists
   (
 select * from receipt r2
   where r2.deposit_date  c.deposit_date
 and r2.deposit_date  r.deposit_date
   )
 )
group by r.deposit_date, c.deposit_date;
commit transaction;

-- connection 1 (end of T0)
commit transaction;
 
After all this is done, a select from receipt_totals shows:
 
 deposit_date | next_date  | deposit_total
--++---
 2008-12-22   | 2008-12-23 |  3.00
(1 row)
 
Here goes the proof, although I'm not going to be overly formal in the
language.
 
(1)  If these transactions were serialized, T0 must come before T1,
because T0 uses the ctl.deposit_date before it is updated by T1.

(2)  If these transactions were serialized, T1 must come before TN
(either version), because the TN transactions see the ctl.deposit_date
set by T1.
 
(3)  If these transactions were serialized, the TN transactions must
come before T0, since they don't see the row inserted by T0.
 
(4)  Since serialization requires that T0  T1  TN  T0 (comparing
time sequence) the transactions cannot be considered serialized.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-30 Thread Kevin Grittner
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: 
 If you have a concrete suggestion (= patch) for the documentation,
I'm 
 all ears.
 
I'm still working on section Serializable Isolation versus True
Serializability, but here are all the changes I can see which precede
it.  Has the review of the SQL specs convinced everyone that this much
is appropriate?
 
It also seems like the Data Consistency Checks at the Application
Level section could use a little more detail.  Since referential
integrity checks are so well understood, and don't work reliably under
snapshot serialization without explicit locks, perhaps that could be
added?
 
-Kevin
 



serializable-doc1.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-29 Thread Peter Eisentraut
On Tuesday 23 December 2008 16:51:03 Kevin Grittner wrote:
 If you look at the serializable queries in the original post for this
 thread, it's not hard to see that this standard is not met.  The
 insert of receipt 3 appears to happen before the update of the control
 record, since it has the old deposit date.  The transaction which
 selects from both tables sees the update to the control record, so it
 must come after that.  Yet it doesn't see the results of the first
 transaction.  There is no sequence of serial execution which is
 consistent with the behavior.

I am not sure yet whether or not your complaint is valid, but your arguments 
are not very rigid.

Serializability is not defined in terms of what is visible, but what the state 
of the database is.  If you can order the transactions without overlap so 
that the state of the database is the same as in your original schedule, the 
schedule is serializable.  It is not of concern what was visible in 
between.  You may, however, be able to transform that argument to proving 
that a phantom read is possible, which is how the SQL standard ultimately 
defines serializability.

Also note that discussing what is visible necessarily implies the existence of 
another transaction that does the reading, and that transaction does not 
appear to be defined in your arguments.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-29 Thread Kevin Grittner
 Peter Eisentraut pete...@gmx.net wrote:
 
 On Tuesday 23 December 2008 16:51:03 Kevin Grittner wrote:
 If you look at the serializable queries in the original post for
this
 thread, it's not hard to see that this standard is not met.  The
 insert of receipt 3 appears to happen before the update of the
control
 record, since it has the old deposit date.  The transaction which
 selects from both tables sees the update to the control record, so
it
 must come after that.  Yet it doesn't see the results of the first
 transaction.  There is no sequence of serial execution which is
 consistent with the behavior.
 
 I am not sure yet whether or not your complaint is valid, but your
arguments 
 are not very rigid.
 
 Serializability is not defined in terms of what is visible, but what
the 
 state of the database is.
 
I don't read the standard that way.
 
I'll repeat the relevant part of the standard:
 
The execution of concurrent SQL-transactions at isolation level
SERIALIZABLE is guaranteed to be serializable. A serializable
execution is defined to be an execution of the operations of
concurrently executing SQL-transactions that produces the same
effect as some serial execution of those same SQL-transactions. A
serial execution is one in which each SQL-transaction executes to
completion before the next SQL-transaction begins.
 
There is no end-to-end sequence in which the transactions listed in my
original email can be run that will produce the same effect that you
get within PostgreSQL, at least if you consider the results of a
SELECT statement to be significant.  (I do.)
 
 If you can order the transactions without overlap so 
 that the state of the database is the same as in your original
schedule, the 
 schedule is serializable.  It is not of concern what was visible in

 between.
 
On what do you base that assertion?
 
 You may, however, be able to transform that argument to proving 
 that a phantom read is possible, which is how the SQL standard
ultimately 
 defines serializability.
 
No it isn't.  The above quoted text is.
 
 Also note that discussing what is visible necessarily implies the
existence 
 of 
 another transaction that does the reading, and that transaction does
not 
 appear to be defined in your arguments.
 
It was this one, in my original post:
 
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-29 Thread Kevin Grittner
 Peter Eisentraut pete...@gmx.net wrote:
 Serializability is not defined in terms of what is visible, but what
the 
 state of the database is.
 
A belated thought on this: It would be easy enough to add a
daily_receipt_total table to the example to capture the result of the
query, instead of assuming that it merely prints on the daily bank
deposit report.  That would certainly meet your test of ending with an
invalid database state after all the transactions commit.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-29 Thread Greg Stark

Sorry for top posting -- damn phone.

The standard defines all the *other* transaction isolations in terms  
of phantom reads, dirty reads, and, uh one other phenomenon. But it  
defines serializability literally, as the transactions having the same  
effect as ift they were run serially. It explicitly says the fact that  
phantom reads can't occur in serializable is only a consequence of tfe  
definition.


And I don't see why you discard visibility as unimportant. All the  
transaction isolations are defined in terms of the results if the  
transactions. Those results include both the database state and the  
data returned by the queries. Otherwise phantom read is a  
meaningless concept.


--
Greg


On 29 Dec 2008, at 07:20, Peter Eisentraut pete...@gmx.net wrote:


On Tuesday 23 December 2008 16:51:03 Kevin Grittner wrote:

If you look at the serializable queries in the original post for this
thread, it's not hard to see that this standard is not met.  The
insert of receipt 3 appears to happen before the update of the  
control

record, since it has the old deposit date.  The transaction which
selects from both tables sees the update to the control record, so it
must come after that.  Yet it doesn't see the results of the first
transaction.  There is no sequence of serial execution which is
consistent with the behavior.


I am not sure yet whether or not your complaint is valid, but your  
arguments

are not very rigid.

Serializability is not defined in terms of what is visible, but what  
the state
of the database is.  If you can order the transactions without  
overlap so
that the state of the database is the same as in your original  
schedule, the

schedule is serializable.  It is not of concern what was visible in
between.  You may, however, be able to transform that argument to  
proving
that a phantom read is possible, which is how the SQL standard  
ultimately

defines serializability.

Also note that discussing what is visible necessarily implies the  
existence of
another transaction that does the reading, and that transaction does  
not

appear to be defined in your arguments.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-26 Thread Kevin Grittner
 Emmanuel Cecchet m...@frogthinker.org wrote: 
 There was an interesting comment by Oracle folks: Oracle does 
 not provide serializability but only snapshot isolation, and still
users 
 prefer to 'downgrade' to read committed for better performance. The 
 Oracle guys experience seemed to indicate that there was no need for

 serializability (well, that's also less work for them ;-)) in their 
 customer base.
 
Although I note that, while this work was done in 2006 at the
University of Sidney, Mr. Cahill is now an employee of Oracle
Corporation
 
The interesting thing about this technique is that it works without
the kind of blocking which is the bane of other implementation
techniques.  Reads don't block writes or vice versa.  In my receipt
example, the fact that there is a transitory state where the view of
the data would not be consistent would cause no problems unless
someone actually tried to view the data within that (probably very
small) window of time.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-24 Thread Kevin Grittner
 Emmanuel Cecchet m...@frogthinker.org 12/22/08 11:42 PM 
 If you want to know how to build SERIALIZABLE with a database that 
 provides SI (Snapshot Isolation), read 
 http://portal.acm.org/citation.cfm?doid=1376616.137669
 
The link didn't seem to work for me, but I think I found the article
you meant:  Serializable Isolation for Snapshot Databases
by Michael J. Cahill, et al

An interesting work.  If nothing else, it will help flesh out the
documentation of anomalies.  If the PostgreSQL community ever
does want to better approach true serializable behavior, this
should provide a good theoretical basis.
 
Thanks very much for the cite.
 
 I still don't 
 get why people would use SERIALIZABLE since there is no efficient 
 implementation of it.
 
Read the last paragraph of the secion 1 (Introduction)
of the article you just cited.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-24 Thread Emmanuel Cecchet

Hi Kevin,


The link didn't seem to work for me, but I think I found the article
you meant:  Serializable Isolation for Snapshot Databases
by Michael J. Cahill, et al

An interesting work.  If nothing else, it will help flesh out the
documentation of anomalies.  If the PostgreSQL community ever
does want to better approach true serializable behavior, this
should provide a good theoretical basis.
  

Sorry for the broken link. Yes this is the paper.
Note that the paper was not necessarily enthusiastically received by the 
community when presented at the conference. While this is an interesting 
academic paper, it's practicality left a majority of the audience 
perplex. There was an interesting comment by Oracle folks: Oracle does 
not provide serializability but only snapshot isolation, and still users 
prefer to 'downgrade' to read committed for better performance. The 
Oracle guys experience seemed to indicate that there was no need for 
serializability (well, that's also less work for them ;-)) in their 
customer base.
Having both a foot in academia and in industry, I understand the 
intellectual interest for serializability on the academic side, but I 
still have not seen a single use case in industry where this was a 
requirement (but my db experience is probably narrow).


Have nice serializable holidays ;-)
manu

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development  Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-24 Thread Ron Mayer

Robert Haas wrote:

... serializable transaction ...

If we were to construct a database that had one giant lock for the
entire database so that only a single query could execute at one time,
transactions would be serializable (because they'd in fact be
serialized).  However, performance would suck.  


I wonder if this giant-lock-for-isolation-level-serializable
is a mode postgres should support.  ISTM it would meet the
letter of the spec, and at least some of the people using
transaction isolation level serializable are doing so precisely
because they *want* the database to deal with all possible
serialization issues, and accepting performance penalties.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-24 Thread Robert Haas
On Dec 24, 2008, at 6:46 PM, Ron Mayer rm...@cheapcomplexdevices.com  
wrote:



Robert Haas wrote:

... serializable transaction ...

If we were to construct a database that had one giant lock for the
entire database so that only a single query could execute at one  
time,

transactions would be serializable (because they'd in fact be
serialized).  However, performance would suck.


I wonder if this giant-lock-for-isolation-level-serializable
is a mode postgres should support.  ISTM it would meet the
letter of the spec, and at least some of the people using
transaction isolation level serializable are doing so precisely
because they *want* the database to deal with all possible
serialization issues, and accepting performance penalties.


No. :-)

Cheers,

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Kevin Grittner
 Tom Lane t...@sss.pgh.pa.us wrote: 
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 At this point, SERIALIZABLE transactions appear to have worked,
with
 receipt 3 happening before the update of deposit_date; however,
there
 was a window of time when the update to deposit_date was visible
and
 receipt 3 was not.
  
 This absolutely can't happen in a standard-compliant
implementation.
 
 I think you mean you'd like to believe that can't happen in a
 standard-compliant implementation.
 
The execution of concurrent SQL-transactions at isolation level
SERIALIZABLE is guaranteed to be serializable. A serializable
execution is defined to be an execution of the operations of
concurrently executing SQL-transactions that produces the same
effect as some serial execution of those same SQL-transactions. A
serial execution is one in which each SQL-transaction executes to
completion before the next SQL-transaction begins.
 
If you look at the serializable queries in the original post for this
thread, it's not hard to see that this standard is not met.  The
insert of receipt 3 appears to happen before the update of the control
record, since it has the old deposit date.  The transaction which
selects from both tables sees the update to the control record, so it
must come after that.  Yet it doesn't see the results of the first
transaction.  There is no sequence of serial execution which is
consistent with the behavior.
 
Perhaps you are suggesting that it's not possible in a practical sense
for a DBMS to meet this standard?  See below.
 
 It doesn't include any of the
 specific behaviors that are forbidden by the spec, though, so I'm
less
 than convinced.
 
Following the diagram of specific behaviors allowed at each
level is good evidence that these phenomena don't *define*
serializable:
 
NOTE 53 * The exclusion of these phenomena for SQL-transactions
executing at isolation level SERIALIZABLE is a consequence of the
requirement that such transactions be serializable.
 
 An appropriate way to prevent the problem is probably for the
 transaction that changes the deposit_date to take out a
write-excluding
 lock on the receipts table before it does so.
 
Well, a serializable transaction operating to standard would probably
take out a write-excluding lock on the control table row when it is
read.  This would block the update to the control table until
completion of any pending receipts on the old date.  The blocked
update would then take out a read-excluding lock before updating the
row, which would then block receipt transactions looking to check the
date.  As soon as the update to the control table is committed, you
can see the new date, and you can have confidence that the old date's
receipts will all show on a SELECT.
 
Again, I'm not suggesting that we change the behavior of PostgreSQL to
match this, but that we document the difference so that those looking
to switch to PostgreSQL from databases which provide true serializable
transactions know that they have to explicitly lock rows to maintain a
coherent view of the data.  In the queries shown in the original post,
the INSERT of the receipts could read the deposit date FOR SHARE to
provide the equivalent functionality, although they would have to be
rejiggered a bit, since that isn't allowed in subqueries.
 
This isn't some hypothetical maybe some day some product might
implement this, but it'll never catch on sort of thing -- Microsoft
and Sybase SQL Server had this from version 1.  I used it from 1990
until the conversion to PostgreSQL over the last couple years. 
Serializable transactions worked as advertised.  Was there more
blocking than PostgreSQL?  Yes.  Were there more deadlocks than
PostgreSQL?  Yes.  Did it impact performance?  Well, PostgreSQL is
faster by enough that users commented that the applications seemed
snappier when we switched the database underneath them from Sybase
to PostgreSQL.
 
I'm going on second-hand information here, but I'm told that IBM DB2
has used similar techniques to provide true serializable transactions
for even longer.
 
I'm somewhat mystified at the reaction this topic gets here.  :-/
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Emmanuel Cecchet

Kevin Grittner wrote:

This isn't some hypothetical maybe some day some product might
implement this, but it'll never catch on sort of thing -- Microsoft
and Sybase SQL Server had this from version 1.  I used it from 1990
until the conversion to PostgreSQL over the last couple years. 
  
Have you ever used serializable transactions with Sybase? The locking is 
actually based on memory-pages and you end-up with deadlocks if you 
don't pad your data structures to prevent false sharing. Oracle also 
provides SI like Postgres and I don't think they are doing that bad.

I'm going on second-hand information here, but I'm told that IBM DB2
has used similar techniques to provide true serializable transactions
for even longer.
 
I'm somewhat mystified at the reaction this topic gets here.  :-
I am somewhat mystified by the interest some people still have in 
serializable transactions. Why don't users program the application to 
deal with a lower isolation (actually I think they do)?


But I am probably missing the point which was to fix the doc?
Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development  Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Kevin Grittner
 Emmanuel Cecchet m...@frogthinker.org 12/23/08 8:59 AM  
 Kevin Grittner wrote:
 This isn't some hypothetical maybe some day some product might
 implement this, but it'll never catch on sort of thing --
Microsoft
 and Sybase SQL Server had this from version 1.  I used it from 1990
 until the conversion to PostgreSQL over the last couple years. 

 Have you ever used serializable transactions with Sybase?
 
Every day for over 15 years.
 
 The locking is 
 actually based on memory-pages and you end-up with deadlocks if you 
 don't pad your data structures to prevent false sharing.
 
We only padded one table, which we were using to assign sequence
numbers.  Eventually they did come out with support for row level
locking, which could be chosen on a table by table basis, which
eliminated the need for the padding columns.  We didn't go to
row-level locking for all tables because the performance hit was
significant -- worse than the blocking.  Blocking was occasionally an
issue.  Deadlocks were initially a problem, but they can be handled
automatically with resubmit, which we eventually covered gracefully,
and they weren't an issue for us after that.
 
 Oracle also 
 provides SI like Postgres and I don't think they are doing that bad.
 
I don't quire understand.  Could you clarify?
  
 I'm going on second-hand information here, but I'm told that IBM
DB2
 has used similar techniques to provide true serializable
transactions
 for even longer.

 I'm somewhat mystified at the reaction this topic gets here.  :-
 I am somewhat mystified by the interest some people still have in 
 serializable transactions. Why don't users program the application to

 deal with a lower isolation (actually I think they do)?
 
There really are good reasons.  I'm not up to going through that now,
but if there is genuine interest in the topic perhaps I can follow up
later.
 
 But I am probably missing the point which was to fix the doc?
 
Thank you!
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Heikki Linnakangas

Kevin Grittner wrote:
Emmanuel Cecchet m...@frogthinker.org 12/23/08 8:59 AM  
I am somewhat mystified by the interest some people still have in 
serializable transactions. Why don't users program the application to



deal with a lower isolation (actually I think they do)?
 
There really are good reasons.  I'm not up to going through that now,

but if there is genuine interest in the topic perhaps I can follow up
later.


Well, the reason why people rely on isolation provided by database in 
general is to make it easier to develop applications. One less thing to 
worry about. That's why people use RDBMS, transactions, etc. to begin with.



But I am probably missing the point which was to fix the doc?
 
Thank you!


If you have a concrete suggestion (= patch) for the documentation, I'm 
all ears.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Kevin Grittner
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com 12/23/08
9:47 AM  
 Kevin Grittner wrote:
 Emmanuel Cecchet m...@frogthinker.org 12/23/08 8:59 AM  
 Why don't users program the application to
 deal with a lower isolation (actually I think they do)?
  
 There really are good reasons.  I'm not up to going through that
now,
 but if there is genuine interest in the topic perhaps I can follow
up
 later.
 
 Well, the reason why people rely on isolation provided by database in

 general is to make it easier to develop applications. One less thing
to 
 worry about. That's why people use RDBMS, transactions, etc. to begin
with.
 
This is the heart of it.  The argument is getting made that you don't
need serializable transactions because you can duplicate the
concurrency control with a combination of read_committed transactions
and explicit lock requests.  I could also duplicate the functionality
of a DBMS with a bunch of disk files and custom, hard-coded access
logic, but I sure wouldn't want to do so.
 
 If you have a concrete suggestion (= patch) for the documentation,
I'm 
 all ears.
 
Well, I figured I should try to get a consensus here before submitting
a patch.  Last time I tried submitting a simple patch to remove the
line about the PostgreSQL community not knowing about any other
databases which use predicate locking, I got shot down hard.
 
Does the phantom receipt example from the original post seem like a
reasonable thing to include in the doc patch?  Does someone have a
more comprehensive summary of where the issues are than I've been able
to generate so far?
 
The patch should also include information on workarounds.  I'd be
inclined to frame them in terms of how to duplicate the behavior of
serializable transactions on other databases from which the reader
might be converting, which would start with using SELECT FOR SHARE for
any SELECT within a serializable transaction.  That doesn't cover
everything, because, unlike Sybase, et al, it doesn't block
conflicting INSERTs.  I'm not sure I have a firm conceptual grasp on
when and how that renders the SELECT FOR SHARE technique incomplete,
but I could try to think it through.  If someone already has
comprehensive guidelines on that, it would save time and might improve
the quality of the docs.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Gregory Stark
Kevin Grittner kevin.gritt...@wicourts.gov writes:

 Emmanuel Cecchet m...@frogthinker.org 12/23/08 8:59 AM  
 Have you ever used serializable transactions with Sybase?
  
 Every day for over 15 years.

Afaict doing a few google searches Sybase doesn't do predicate locking either.
It would very much surprise me if they did since they've always had the most
primitive locking infrastructure of the three major databases. Locking records
or pages isn't going to provide true standards-compliant serializable
transactions in the way you're describing.

Predicate locking means being able to lock records which don't actually exist
yet. Ie, locking all records WHERE COLUMN=0 even if there are no such
records. This has to block any other transaction from inserting such a record
or updating another record to set COLUMN to 0.

 Oracle also provides SI like Postgres and I don't think they are doing that
 bad.
  
 I don't quire understand.  Could you clarify?

The point is Oracle doesn't provide this kind of true serializable isolation
and people still find it useful. In fact Sybase and DB2 also don't provide
true serializable transactions -- nobody does. It's a fantasy.

 There really are good reasons.  I'm not up to going through that now,
 but if there is genuine interest in the topic perhaps I can follow up
 later.

I suppose I'm curious whether you're mistaken and your app isn't safe on
Sybase because it's depending on truly serializable transactions and Sybase
isn't doing that, or if you have examples of transactions which Sybase
provides proper serialized semantics for but Postgres doesn't.

 But I am probably missing the point which was to fix the doc?

But missing the point and having pointless arguments is so much more fun than
documentation writing :)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Kevin Grittner
 Gregory Stark st...@enterprisedb.com wrote: 
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Emmanuel Cecchet m...@frogthinker.org 12/23/08 8:59 AM  
 Have you ever used serializable transactions with Sybase?
  
 Every day for over 15 years.
 
 Afaict doing a few google searches Sybase doesn't do predicate
locking 
 either.
 It would very much surprise me if they did since they've always had
the most
 primitive locking infrastructure of the three major databases.
Locking 
 records
 or pages isn't going to provide true standards-compliant
serializable
 transactions in the way you're describing.
 
 Predicate locking means being able to lock records which don't
actually 
 exist
 yet. Ie, locking all records WHERE COLUMN=0 even if there are no
such
 records. This has to block any other transaction from inserting such
a 
 record
 or updating another record to set COLUMN to 0.
 
The page locking provides this because every index page or data page
the serializable transaction looks at is locked against updates until
the end of the transaction.  If it can see all the COLUMN=0 rows
through an index, the index locks protect the transaction.  If a table
scan is required, the entire table is locked against all
modifications.  (That's right, it is not unusual to have entire tables
locked against any modification until the end of a database
transaction.)
 
 Oracle also provides SI like Postgres and I don't think they are
doing that
 bad.
  
 I don't quire understand.  Could you clarify?
 
 The point is Oracle doesn't provide this kind of true serializable
isolation
 and people still find it useful.
 
Sure, and I find PostgreSQL useful.  I'm not proposing to change it.
 
 In fact Sybase and DB2 also don't provide
 true serializable transactions -- nobody does. It's a fantasy.
 
They do.  They have for over 15 years.  If people will read it, I'll
try to find the a web page where they give all the details of the
strategy.
 
 There really are good reasons.  I'm not up to going through that
now,
 but if there is genuine interest in the topic perhaps I can follow
up
 later.
 
 I suppose I'm curious whether you're mistaken and your app isn't safe
on
 Sybase because it's depending on truly serializable transactions and
Sybase
 isn't doing that, or if you have examples of transactions which
Sybase
 provides proper serialized semantics for but Postgres doesn't.
 
All the examples provided in this thread would be handled by Sybase
with proper serializable semantics.  When I proposed changing the docs
to omit the reference to our lack of knowledge about other database
products, there was a full example of code that didn't serialize
according to the mathematical definition.  I cut and pasted into
Sybase and provided the results -- a deadlock.
 
Can you provide any example or logical explanation of where the
technique I outline above (locking against modification for every
index and data page read during the transaction until the end of the
transaction) would NOT provide true serializable behavior?  (Keep in
mind that that's the broad stroke overview -- the full details include
various lock escalation techniques, etc.)
 
 But I am probably missing the point which was to fix the doc?
 
 But missing the point and having pointless arguments is so much more
fun 
 than
 documentation writing :)
 
Frankly, I prefer other sports.   :-(
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Gregory Stark

Reading the spec it does seem our initial statement The SQL standard defines
four levels of transaction isolation in terms of three phenomena that must be
prevented between concurrent transactions is not accurate.

The spec defines the first three modes in terms of P1,P2,P3 but serializable
is defined, as Kevin describes, as literally serializable. It is included in
the table but with the note:

  NOTE 71 — The exclusion of these phenomena for SQL-transactions executing at
  isolation level SERIALIZABLE is a consequence of the requirement that such
  transactions be serializable.

So it's clear that that's not the definition. Excluding P1,P2,P3 is necessary
but not sufficient to meet the spec's definition of Serializable.



Kevin Grittner kevin.gritt...@wicourts.gov writes:

 Gregory Stark st...@enterprisedb.com wrote: 
 Afaict doing a few google searches Sybase doesn't do predicate locking 
 either.
  
 The page locking provides this because every index page or data page
 the serializable transaction looks at is locked against updates until
 the end of the transaction.  If it can see all the COLUMN=0 rows
 through an index, the index locks protect the transaction.  If a table
 scan is required, the entire table is locked against all
 modifications.  (That's right, it is not unusual to have entire tables
 locked against any modification until the end of a database
 transaction.)


Ah, so they don't actually use the term predicate locking which is why my
google-fu was inadequate. I'm not sure if this is technically predicate
locking or not. It sounds like it locks a whole lot more than just the
predicate.

 All the examples provided in this thread would be handled by Sybase
 with proper serializable semantics.  When I proposed changing the docs
 to omit the reference to our lack of knowledge about other database
 products, there was a full example of code that didn't serialize
 according to the mathematical definition.  I cut and pasted into
 Sybase and provided the results -- a deadlock.
  
 Can you provide any example or logical explanation of where the
 technique I outline above (locking against modification for every
 index and data page read during the transaction until the end of the
 transaction) would NOT provide true serializable behavior?  (Keep in
 mind that that's the broad stroke overview -- the full details include
 various lock escalation techniques, etc.)

I imagine they preemptively escalate to locking the table if you're going to
do a sequential scan? Otherwise an inserter might insert on a page you haven't
read yet (and therefore haven't locked yet)?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Stephan Szabo

On Tue, 23 Dec 2008, Kevin Grittner wrote:

 The page locking provides this because every index page or data page
 the serializable transaction looks at is locked against updates until
 the end of the transaction.  If it can see all the COLUMN=0 rows
 through an index, the index locks protect the transaction.  If a table
 scan is required, the entire table is locked against all
 modifications.  (That's right, it is not unusual to have entire tables
 locked against any modification until the end of a database
 transaction.)

Well, predicate locking for serializable also should only lock the
appropriate conditions. Getting a deadlock between two serializable
transactions for conditions that can be serialized would seemingly also be
disallowed by the definition of serializable since there would exist no
serial ordering of the transactions that has that effect.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Kevin Grittner
 Gregory Stark st...@enterprisedb.com wrote: 
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Gregory Stark st...@enterprisedb.com wrote: 
 Afaict doing a few google searches Sybase doesn't do predicate
locking 
 either.
  
 The page locking provides this because every index page or data
page
 the serializable transaction looks at is locked against updates
until
 the end of the transaction.  If it can see all the COLUMN=0 rows
 through an index, the index locks protect the transaction.  If a
table
 scan is required, the entire table is locked against all
 modifications.  (That's right, it is not unusual to have entire
tables
 locked against any modification until the end of a database
 transaction.)
 
 Ah, so they don't actually use the term predicate locking which is
why my
 google-fu was inadequate. I'm not sure if this is technically
predicate
 locking or not. It sounds like it locks a whole lot more than just
the
 predicate.
 
Well, I'm not sure whether it is or not; it's a matter of definition. 
If predicate locking is required for true serializable transactions,
and this provides true serializable transactions, it must be, eh? 
Also, an argument could be made that if it locks every page which must
be viewed for execution based on the search predicates, it is doing
predicate locking -- if only indirectly.
 
 All the examples provided in this thread would be handled by Sybase
 with proper serializable semantics.  When I proposed changing the
docs
 to omit the reference to our lack of knowledge about other database
 products, there was a full example of code that didn't serialize
 according to the mathematical definition.  I cut and pasted into
 Sybase and provided the results -- a deadlock.
  
 Can you provide any example or logical explanation of where the
 technique I outline above (locking against modification for every
 index and data page read during the transaction until the end of
the
 transaction) would NOT provide true serializable behavior?  (Keep
in
 mind that that's the broad stroke overview -- the full details
include
 various lock escalation techniques, etc.)
 
 I imagine they preemptively escalate to locking the table if you're
going to
 do a sequential scan? Otherwise an inserter might insert on a page
you 
 haven't
 read yet (and therefore haven't locked yet)?
 
I believe they do go straight to the table lock for a table scan, but
it isn't necessary for full semantics that the transaction lock all
pages in advance.  For most purposes the serializable transaction can
proceed to lock pages as it gets to them.  It will block or deadlock
if a conflict arises.  The transaction may serialize behind a
transaction which started later and read some page it hadn't gotten to
yet, but that doesn't violate the spec or cause any anomalies.  The
key phrase in the spec here is produces the same effect as *some*
serial execution [emphasis added].
 
It will escalate from page locks to a table lock if a (configurable)
number or percentage of a table's pages get locked.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Kevin Grittner
 Stephan Szabo ssz...@megazone.bigpanda.com wrote: 
 On Tue, 23 Dec 2008, Kevin Grittner wrote:
 
 The page locking provides this because every index page or data
page
 the serializable transaction looks at is locked against updates
until
 the end of the transaction.  If it can see all the COLUMN=0 rows
 through an index, the index locks protect the transaction.  If a
table
 scan is required, the entire table is locked against all
 modifications.  (That's right, it is not unusual to have entire
tables
 locked against any modification until the end of a database
 transaction.)
 
 Well, predicate locking for serializable also should only lock the
 appropriate conditions. Getting a deadlock between two serializable
 transactions for conditions that can be serialized would seemingly
also be
 disallowed by the definition of serializable since there would exist
no
 serial ordering of the transactions that has that effect.
 
Clever.  If a DBMS is capable of reporting that it was unable to
serialize a transaction in a situation where there is a logical
possibility that a different implementation might have succeeded, it
hasn't implemented true serializable transactions?  That strikes me as
being on a level with Codd's assertion that any database which takes
two different statements which can be proven to be logically identical
and runs one with a different plan than the other should not be called
relational.  (i.e., possibly true from a technical perspective, but
hardly useful in the real world.)
 
Perhaps it would be best to accept this as proof that there is no
current DBMS implementing true serializable transactions and move on
to the issue of documenting real difference between PostgreSQL other
products which come closer to compliance, so that real people
converting from those products don't suffer real production problems. 
As I see it, that's what matters.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Jeff Davis
On Tue, 2008-12-23 at 00:42 -0500, Emmanuel Cecchet wrote:
 I still don't 
 get why people would use SERIALIZABLE since there is no efficient 
 implementation of it.

If they need true serializable transactions, and don't care about
efficiency.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Robert Haas
 The page locking provides this because every index page or data page
 the serializable transaction looks at is locked against updates until
 the end of the transaction.  If it can see all the COLUMN=0 rows
 through an index, the index locks protect the transaction.  If a table
 scan is required, the entire table is locked against all
 modifications.  (That's right, it is not unusual to have entire tables
 locked against any modification until the end of a database
 transaction.)

You've mentioned a couple of times that you're surprised by the
reaction of the community to your proposed documentation changes.  I
have (a more muted version of) the same reaction as several previous
posters, and I think in the context of this paragraph I can explain
why.

If we were to construct a database that had one giant lock for the
entire database so that only a single query could execute at one time,
transactions would be serializable (because they'd in fact be
serialized).  However, performance would suck.  Therefore, database
developers and researchers have spent the last twenty (or more?) years
trying to come up with ways to allow multiple transactions to execute
in parallel while maintaining the appearance of serialization.
They've done this by introducing lower level locks: table-level,
page-level, row-level.  For most users, the artifacts that have been
introduced by these fine-grained locks are outweighed by the
performance benefits of greater concurrency - but, as you point out,
not necessarily always.

I don't think I can overstate the extent to which fine-grained locking
is viewed as a good thing.  It wouldn't surprise me to find out that
the locking behavior that you're relying on in Sybase is one which
some group of Sybase developers (if there still are any) are busily
laboring to eliminate.  I think you can see this reflected in Greg
Stark's comment as well: It would very much surprise me if they did
since they've always had the most primitive locking infrastructure of
the three major databases.

With respect to predicate locking, what you're describing is NOT
predicate locking.  It's coarse-grained locking that largely or
completely obviates the need for predicate locking by greatly reducing
concurrency.  Now, from the point of view of the application developer
who needs very strong serializability guarantees but doesn't care
about concurrency, that's six of one, half a dozen of the other, but
to me that's the opposite of the typical situation.  Maybe our
documentation could say something along the lines of PostgreSQL's
MVCC framework and row-level locking permit a greater degree of
concurrency than some other databases.  Even when the transaction
isolation level is set to serializable, serialization anomalies can
occur in the following situations.  When it is important to prevent
these anomalies, explicit row-level or table-level locking can be used
at the expense of reduced concurrency.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Simon Riggs

On Tue, 2008-12-23 at 10:10 -0600, Kevin Grittner wrote:

 Well, I figured I should try to get a consensus here before submitting
 a patch.  Last time I tried submitting a simple patch to remove the
 line about the PostgreSQL community not knowing about any other
 databases which use predicate locking, I got shot down hard.

The docs got changed though.

I think the current docs make too much of a deal about how hard it is to
do predicate locking in databases. Most RDBMS use predicate locking via
indexes, ie the locking happens in the index. One might also argue that
it is potentially more efficient design, as TPC-C shows, though such
cases of application scalability are rare in the extreme and the utility
of MVCC is by far the best general approach in terms of ease of use and
performance.

The example in the docs is not a realistic example, so your new one is
useful.

I would want you to update it though to show how use of row level locks
can be used to enforce correct behaviour when required, so provide a
problem and its solution. It will b useful for people moving from
systems like Sybase that use locking often fall foul of the *lack* of
locking in MVCC and write programs that won't work correctly as a
result.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Kevin Grittner
 Robert Haas robertmh...@gmail.com wrote: 
 
 For most users, the artifacts that have been
 introduced by these fine-grained locks are outweighed by the
 performance benefits of greater concurrency - but, as you point out,
 not necessarily always.
 
That's what I don't understand.  I never did point that out.  I never
suggested it.  I never requested a change to the software.  I just
suggested that we document the artifacts.
 
Ah, well; thanks for the feedback.
 
 With respect to predicate locking, what you're describing is NOT
 predicate locking.
 
I never would have considered it to be such except for some people
insisting that a true serializable implementation is impossible
without predicate locking.  Either that assertion is false or this is
a form of predicate locking, crude as it might be.
 
 Maybe our
 documentation could say something along the lines of PostgreSQL's
 MVCC framework and row-level locking permit a greater degree of
 concurrency than some other databases.  Even when the transaction
 isolation level is set to serializable, serialization anomalies can
 occur in the following situations.  When it is important to prevent
 these anomalies, explicit row-level or table-level locking can be
used
 at the expense of reduced concurrency.
 
That's responsive to my concern and nicely worded.  Thanks.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Kevin Grittner
 Simon Riggs si...@2ndquadrant.com wrote: 

 On Tue, 2008-12-23 at 10:10 -0600, Kevin Grittner wrote:
 
 I think the current docs make too much of a deal about how hard it is
to
 do predicate locking in databases. Most RDBMS use predicate locking
via
 indexes, ie the locking happens in the index. One might also argue
that
 it is potentially more efficient design, as TPC-C shows, though such
 cases of application scalability are rare in the extreme and the
utility
 of MVCC is by far the best general approach in terms of ease of use
and
 performance.
 
 The example in the docs is not a realistic example, so your new one
is
 useful.
 
The one currently in the docs is the simplest case I can see of an
anomaly which leaves the database in the wrong state after all the
commits go through.  I think it should probably be kept for that
reason.  The out-of-sequence appearance of changes before all commits
happen seems much more likely to bite people, though.  Agreed?
 
 I would want you to update it though to show how use of row level
locks
 can be used to enforce correct behaviour when required, so provide a
 problem and its solution. It will b useful for people moving from
 systems like Sybase that use locking often fall foul of the *lack*
of
 locking in MVCC and write programs that won't work correctly as a
 result.
 
Understood and agreed.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-23 Thread Emmanuel Cecchet

Jeff Davis wrote:

On Tue, 2008-12-23 at 00:42 -0500, Emmanuel Cecchet wrote:
  
I still don't 
get why people would use SERIALIZABLE since there is no efficient 
implementation of it.



If they need true serializable transactions, and don't care about
efficiency.
  
Is there such people who really understand the thing and don't complain 
about the performance? ;-)


Happy holidays,
Emmanuel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Martijn van Oosterhout
On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:
 As I've understood limitations of the PostgreSQL implementation of
 SERIALIZABLE transactions, at least the only example given in the
 documentation, revolve around a rather unlikely situation:
  
 Given concurrent transactions T1 and T2 and non-overlapping sets of
 data A and B, T1 reads data including A and uses the data to modify B
 while T2 reads data including B and uses that data to modify A, where
 the modifications performed by either would affect the modifications
 made by the other, if they were visible.

In so far as the modifications are just INSERTs (no UPDATEs or
DELETEs), yes. This case is covered in the documentation.

 Imagine, as an example, a system which involves recording receipts,
 each of which must go into a daily deposit.  There is a control table
 with one row containing the current deposit date for receipts. 
 Somewhere mid-afternoon that date is updated, all subsequent receipts
 fall into the new day, and a report is run listing the receipts for the
 day and giving the deposit total.

This is a variation of the above and has the same proper solution:
predicate locking. However, in this case the records in question are
already present so you can workaround it easily. First do a SELECT FOR
UPDATE on all the records you want to update. This will serialize all
parallel transactions to either before or after you. Then do your
update.

 This absolutely can't happen in a standard-compliant implementation.
 At a minimum, this window where visible data lacks coherency should be
 noted in the documentation.  I don't know if there's any way to fix
 this without killing performance.

Predicate locking is nasty and we don't try. I'm not sure if anybody
else does.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Kevin Grittner
 Martijn van Oosterhout klep...@svana.org wrote:
 On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:
 As I've understood limitations of the PostgreSQL implementation of
 SERIALIZABLE transactions, at least the only example given in the
 documentation, revolve around a rather unlikely situation:
  
 Given concurrent transactions T1 and T2 and non-overlapping sets of
 data A and B, T1 reads data including A and uses the data to modify
B
 while T2 reads data including B and uses that data to modify A,
where
 the modifications performed by either would affect the
modifications
 made by the other, if they were visible.

 In so far as the modifications are just INSERTs (no UPDATEs or
 DELETEs), yes. This case is covered in the documentation.
 
Let's not understate the scope of the issue.
UPDATEs and DELETEs count, too.  For example:
 
-- connection 1
cir= create table mytab (class int not null, value int not null);
CREATE TABLE
cir= copy mytab from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
 110
 120
 2100
 2200
 \.
cir= start transaction isolation level serializable;
START TRANSACTION
cir= update mytab set value = (select sum(value) from mytab where
class = 2) where class = 1 and value = 10;
UPDATE 1
 
-- connection 2
cir= start transaction isolation level serializable;
START TRANSACTION
cir= update mytab set value = (select sum(value) from mytab where
class = 1) where class = 2 and value = 100;
UPDATE 1
cir= commit transaction;
COMMIT
 
-- connection 1
cir= commit transaction;
COMMIT
cir= select * from mytab;
 class | value
---+---
 1 |20
 2 |   200
 1 |   300
 2 |30
(4 rows)
 
 Imagine, as an example, a system which involves recording receipts,
 each of which must go into a daily deposit.  There is a control
table
 with one row containing the current deposit date for receipts. 
 Somewhere mid-afternoon that date is updated, all subsequent
receipts
 fall into the new day, and a report is run listing the receipts for
the
 day and giving the deposit total.

 This is a variation of the above and has the same proper solution:
 predicate locking. However, in this case the records in question are
 already present so you can workaround it easily. First do a SELECT
FOR
 UPDATE on all the records you want to update. This will serialize
all
 parallel transactions to either before or after you. Then do your
 update.
 
My point isn't that there aren't workarounds, it is that people might
reasonably assume that SERIALIZABLE transactions provide sufficient
concurrency control for this, since the only example we give of a
problem is a rather contrived update anomaly.  The fact that even in
cases where the data settles into good form at commit leave windows
where race conditions could cause occasional bad results without extra
explicit locking is not obvious.
 
 This absolutely can't happen in a standard-compliant
implementation.
 At a minimum, this window where visible data lacks coherency should
be
 noted in the documentation.  I don't know if there's any way to fix
 this without killing performance.

 Predicate locking is nasty and we don't try. I'm not sure if anybody
 else does.
 
I know for a fact that Sybase ASE does.  I've heard from reasonably
reliable sources that DB2 does.  I know that Microsoft SQL Server did
for some time after the split from the Sybase code base, but I'm not
sure they've continued that; in fact there was a reference to
concurrency issues in wikipedia which implied that they no longer do.
 
The implementation is not pretty -- they do it by locking accessed
pages and insertion points, and in some cases entire tables.  It is so
ugly that at one point in discussing similar issues Tom said that it
couldn't really qualify as predicate locking, but in the face of the
fact that they have covered all the bases to provide true serializable
transactions, and that theory says that only predicate locking can do
that, he conceded that it was predicate locking -- but really ugly and
in a form he would never support.
 
Anyway, I didn't argue that we should provide truly serializable
transactions, just that we should provide a less contrived example of
where the PostgreSQL implementation can show anomalies, so that people
don't get burned through a false sense of security.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 At this point, SERIALIZABLE transactions appear to have worked, with
 receipt 3 happening before the update of deposit_date; however, there
 was a window of time when the update to deposit_date was visible and
 receipt 3 was not.
 
 This absolutely can't happen in a standard-compliant implementation.

I think you mean you'd like to believe that can't happen in a
standard-compliant implementation.  It doesn't include any of the
specific behaviors that are forbidden by the spec, though, so I'm less
than convinced.

An appropriate way to prevent the problem is probably for the
transaction that changes the deposit_date to take out a write-excluding
lock on the receipts table before it does so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Emmanuel Cecchet

Kevin,

If you want to know how to build SERIALIZABLE with a database that 
provides SI (Snapshot Isolation), read 
http://portal.acm.org/citation.cfm?doid=1376616.137669
Note that in practice, READ COMMITTED is the most largely used isolation 
level and its limitations are relatively well understood by the average 
programmer that can program his application accordingly. I still don't 
get why people would use SERIALIZABLE since there is no efficient 
implementation of it.


My 2 cents.
Emmanuel


Kevin Grittner wrote:

As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:
 
Given concurrent transactions T1 and T2 and non-overlapping sets of

data A and B, T1 reads data including A and uses the data to modify B
while T2 reads data including B and uses that data to modify A, where
the modifications performed by either would affect the modifications
made by the other, if they were visible.
 
For reasons I'll omit here, that scenario didn't worry me for my

current uses of PostgreSQL.
 
I've found another form of deviation from the standard SERIALIZABLE

behavior, though, which does worry me.  Although the above appears to be
the only situation where the end result after everything commits is
inconsistent with standard SERIALIZABLE behavior, the PostgreSQL
implementation allows transactions to view the data in states which
would never be possible during the application of the transactions in
series in the order they will appear to have been applied after the
commit.
 
Imagine, as an example, a system which involves recording receipts,

each of which must go into a daily deposit.  There is a control table
with one row containing the current deposit date for receipts. 
Somewhere mid-afternoon that date is updated, all subsequent receipts

fall into the new day, and a report is run listing the receipts for the
day and giving the deposit total.
 
Under a standard-compliant implementation of SERIALIZABLE, this is

straightforward: a transaction which is inserting a receipt selects the
deposit date to use in its transaction, and any SELECT of receipts for a
date prior to the current deposit date will see the accurate, final
data.  Under the PostgreSQL implementation, although data eventually
gets to a coherent state, there can be a window of time where a SELECT
can return an incomplete list of receipts for a date which appears to be
closed, even if all transactions for modifying and viewing data are
SERIALIZABLE.
 
-- setup

create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key, deposit_date
date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);

-- connection 1
start transaction isolation level serializable ;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2
start transaction isolation level serializable ;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;

-- connection 1
commit transaction;

-- connection 2
start transaction isolation level serializable ;
select * from receipt;
-- (All receipts for the 2008-12-22 deposit date now show.)
commit transaction;
 
At this point, SERIALIZABLE transactions appear to have worked, with

receipt 3 happening before the update of deposit_date; however, there
was a window of time when the update to deposit_date was visible and
receipt 3 was not.
 
This absolutely can't happen in a standard-compliant implementation.

At a minimum, this window where visible data lacks coherency should be
noted in the documentation.  I don't know if there's any way to fix
this without killing performance.
 
-Kevin


  



--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development  Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers