An update: following the recommendations on this list, I ran a number of 
experiments:

- I ran with all foreign keys deleted. There was a 4% drop in the rate of 
deadlocks/transaction, which went from 0.32 per transaction to 0.31. So we 
still have pretty much the same failure rate. One interesting side effect was 
that throughput went up by 9%. So maintaining (a lot of) foreign constraints 
costs us a lot

- I ran with max_pred_locks_per_tran as high as 26,400. No difference

- I rebuilt the database with fillfactor=15 for all the tables and indexes that 
are involved in the transactions that fail. This was to see if the problem is 
PGSQL upgrading row level locks to page level locks. With a low fillfactor, the 
chances of two transactions landing on the same page is low. We have around 9 
rows per data page instead of the original ~60. (The index pages are more 
tightly packed).   I ran a range of thread counts from 5 to 60 for the threads 
that issue transactions.  The failure rate per transaction dropped to around 
half for the thread count of 5, but that's misleading since with a fillfactor 
of15, our database size went up by around 6X, reducing the effectiveness of 
PGSQL and OS disk caches, resulting in a throughput of around half of what we 
used to see. So the reduced failure rate is just a result of fewer threads 
competing for resources. When I run with enough threads to max out the system 
with fillfactor=15 or 100, I get the same failure rates

- In case folks hadn't noticed, Ryan Johnson is getting very similar failure 
rates with dbt-5. So this isn't a case of our having made a silly mistake in 
our coding of the app or the stored procedures

Above experiments were the easy one. I am now working on rewriting the app code 
and the 6 stored procedures to see if I can execute the whole transaction in a 
single stored procedure

Thanks,
Reza

> -----Original Message-----
> From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> Sent: Sunday, July 27, 2014 8:58 PM
> To: Reza Taheri; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High rate of transaction failure with the Serializable
> Isolation Level
> 
> On 07/26/2014 02:58 AM, Reza Taheri wrote:
> > Hi Craig,
> >
> >> According to the attached SQL, each frame is a separate phase in the
> operation and performs many different operations.
> >> There's a *lot* going on here, so identifying possible
> >> interdependencies isn't something I can do in a ten minute skim read over
> my morning coffee.
> >
> > You didn't think I was going to bug you all with a trivial problem,
> > did you? :-) :-)
> 
> One can hope, but usually in vain...
> 
> > Yes, I am going to have to take an axe to the code and see what pops out.
> Just to put this in perspective, the transaction flow and its statements are
> borrowed verbatim from the TPC-E benchmark. There have been dozens of
> TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that,
> although not used in public disclosures for various non-technical reasons, are
> used internally in by the DB and server companies. These 3 products, and
> perhaps more, were used extensively in the prototyping phase of TPC-E.
> >
> > So, my hope is that if there is a "previously unidentified interdependency
> between transactions" as you point out, it will be due to a mistake we made
> in coding this for PGSQL. Otherwise, we will have a hard time convincing all
> the council member companies that we need to change the schema or the
> business logic to make the kit work with PGSQL.
> 
> Hopefully so.
> 
> Personally I think it's moderately likely that PostgreSQL's much stricter
> enforcement of serializable isolation is detecting anomalies that other
> products do not, so it's potentially preventing errors.
> 
> It would be nice to have the ability to tune this; sometimes there are
> anomalies you wish to ignore or permit. At present it is an all or nothing 
> affair
> - no predicate locking (REPEATABLE READ isolation) or strict predicate locking
> (SERIALIZABLE isolation).
> 
> I recommend running some of the examples in the SERIALIZABLE
> documentation on other products. If they don't fail where they do in Pg,
> then the other products either have less strict (and arguably therefor less
> correct) serialisable isolation enforcement or they rely on blocking predicate
> locks. In the latter case it should be easy to tell because statements will 
> block
> on locks where no ordinary row or table level lock could be taken.
> 
> If you do run comparative tests I would be quite interested in seeing the
> results.
> 
> --
>  Craig Ringer
> https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com
> /&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=b9TKmA0CPjroD2HLPTH
> U27nI9PJr8wgKO2rU9QZyZZU%3D%0A&m=jyJSmq%2BgXoPIgY%2BNtgswlUg
> zHSm45s%2FmevjxBmPKrIs%3D%0A&s=401415fb62d6f76b22bc76469cbefb85
> 8342612f1fea2d359fe2bb3f18cab1ab
>  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to