[PERFORM] Postgresql, and ODBC handles

2017-04-13 Thread Reza Taheri
Summary: I am facing a contention problem with ODBC on the client side. strace 
and perf top show we are serializing over what appears to be accesses to the 
ODBC statement handle. Contention goes down if I use multiple processes instead 
of multiple threads within a process. Also, all the threads get the same 
connection handle number and the same statement handle number. Is there a way 
to force different connection and statement handles? I have asked this question 
on the ODBC mailing list, and they suggested it could be something in the 
postgresql driver.

Details: Running the TPCx-V benchmark, we hit a performance bottleneck as the 
load increases. We have plenty of CPU and disk resources available in our 
driver VM, client VM, and database backend VM (all with high vCPU counts) on a 
dedicated server. When we increase the number of threads of execution, not only 
doesn’t throughput go up, it actually degrades. I am running with 80 threads in 
one process. When I divide these threads into 5 processes, performance nearly 
doubles. So, the problem is not in the database backend. Each thread has its 
own database connection and its own statement handle.

Looking more closely at the client, this is what I see in the strace output 
when everything flows through one process:

17:52:52.762491 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= -1 EAGAIN (Resource temporarily unavailable) <0.000102>
17:52:52.762635 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= 0 <0.000664>
17:52:52.763540 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= -1 EAGAIN (Resource temporarily unavailable) <0.27>
17:52:52.763616 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.76>
17:52:52.763738 futex(0x7fae463a9f00, FUTEX_WAIT_PRIVATE, 2, NULL)  
= -1 EAGAIN (Resource temporarily unavailable) <0.16>
17:52:52.763793 futex(0x7fae463a9f00, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.10>
17:52:52.763867 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= -1 EAGAIN (Resource temporarily unavailable) <0.38>
17:52:52.763982 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.37>
17:52:52.764078 futex(0x7fae1820, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.10>
17:52:52.764182 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= -1 EAGAIN (Resource temporarily unavailable) <0.30>
17:52:52.764264 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.75>
17:52:52.764401 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.14>
17:52:52.764455 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= -1 EAGAIN (Resource temporarily unavailable) <0.11>
17:52:52.764507 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.25>
17:52:52.764579 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.10>
17:52:52.764821 sendto(227, "\x51\x00\x00\x00\x0b\x43\x4f\x4d\x4d\x49\x54\x00", 
12, MSG_NOSIGNAL, NULL, 0) = 12 <0.29>
17:52:52.764911 recvfrom(227, 0x7fae18058760, 4096, 16384, 0, 0)
= -1 EAGAIN (Resource temporarily unavailable) <0.000107>
17:52:52.765065 poll([{fd=227, events=POLLIN}], 1, 4294967295)  
= 1 ([{fd=227, revents=POLLIN}]) <0.17>
17:52:52.765185 recvfrom(227, 
"\x43\x00\x00\x00\x0b\x43\x4f\x4d\x4d\x49\x54\x00\x5a\x00\x00\x00\x05\x49", 
4096, MSG_NOSIGNAL, NULL, NULL) = 18 <0.18>
17:52:52.765258 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= 0 <0.000470>
17:52:52.765764 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= -1 EAGAIN (Resource temporarily unavailable) <0.52>
17:52:52.765908 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.73>
17:52:52.766045 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1)
= 1 <0.40>
17:52:52.766246 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL)  
= -1 EAGAIN (Resource temporarily unavailable) <0.26>


And perf top shows:

  9.89%  [kernel] [k] _raw_spin_unlock_irqrestore
  4.86%  [kernel] [k] finish_task_switch
  3.53%  [kernel] [k] _raw_spin_lock
  3.00%  libodbc.so.2.0.0 [.] __check_stmt_from_dbc_v
  2.80%  [kernel] [k] __do_softirq
  2.43%  psqlodbcw.so [.] 0x0003b146
  1.95%  libodbc.so.2.0.0

[PERFORM] Announcing the public availability of the TPCx-V prototype

2015-09-10 Thread Reza Taheri
Hello PGSQL performance community,
You might remember me from these past postings:

* In 2012, we announced that the TPC was using PostgreSQL in a 
benchmark for virtualized databases. Unlike the traditional TPC Enterprise 
benchmarks, which are defined in paper specifications, this Express benchmark 
comes with a complete end-to-end benchmarking kit, which at this point in time, 
runs on Linux VMs (we have tested it with PGSQL 9.3 on RHEL 7.1) on the X86 
architecture.
On that occasion, I was looking for a PGSQL feature similar to MS SQL Server's 
"clustered indexes" or Oracle's "index-only-tables". (I was told about 
"index-only scans" which helped but not as much as clustered indexes or IOT)

* In 2013, I asked about a performance hit during checkpoints, which 
was quickly diagnosed as the well-known dirty_background_bytes problem

* Last year, I asked about the high rate of transaction aborts due to 
serialization failures. It turned out that even dbt5, which implements TPC-E, 
was running into the same issue, and simply resubmitting the transaction was an 
acceptable solution

We are done with the benchmark development, and it has entered TPC's "Public 
Review" phase. In parallel, the TPC council has approved the public release of 
a prototype of the benchmark kit to gather more experimental data, and to speed 
up the approval of the benchmark.

The Benchmark Specification, the Benchmark Kit, and the User's Guide are 
available from tpc.org/tpcx-v. In addition to these components of the benchmark 
standard, the subcommittee members have developed two other tools to help 
benchmarkers get a fast start:  a downloadable VM in the form of an ovf-format 
VM template that contains a complete benchmarking environment with all the 
software already installed and pre-configured; and a PowerCLI script that, in 
the VMware vSphere environment, allows you to quickly clone a VM template into 
a large number of benchmark VMs.

The review period is open until Thursday, October 15th. The subcommittee 
intends to resolve the comments from the Formal Review by November 12th, and 
bring forward a motion to the Council to approve the Benchmark Standard.

We would love to get feedback from the PGSQL community. The subcommittee is 
taking feedback via FogBugz at www.tpc.org/bugdb<http://www.tpc.org/bugdb> 
under the project "TPC-Virt".

Anticipating some of the questions/comments, here are some thoughts:

-  We wish we had a more representative workload; with multiple kits 
for multiple hardware architectures; etc. Those are questions for another day. 
Having an end-to-end database benchmarking kit for X86 virtualization with a 
workload derived from TPC-E is a pretty good first step.

-  You don't have to run the full multi-VM configuration of TPCx-V if 
you are just playing with the kit and not intending to have an audited result. 
You can run the benchmark on a single database on a single VM (or even a 
native, un-virtualized server!) This would be very similar to a simple TPC-E 
config. If we can get feedback based on a single-VM config, we will still be 
grateful

-  Having said that, there are enough subtle differences between the 
TPC-E schema and the TPCx-V schema that running TPCx-V on a single VM doesn't 
exactly take you to TPC-E.

Thanks,
Reza Taheri for the TPCx-V subcommittee


Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level

2014-08-20 Thread Reza Taheri
An update: following the recommendations on this list, I ran a number of 
experiments:

- I ran with all foreign keys deleted. There was a 4% drop in the rate of 
deadlocks/transaction, which went from 0.32 per transaction to 0.31. So we 
still have pretty much the same failure rate. One interesting side effect was 
that throughput went up by 9%. So maintaining (a lot of) foreign constraints 
costs us a lot

- I ran with max_pred_locks_per_tran as high as 26,400. No difference

- I rebuilt the database with fillfactor=15 for all the tables and indexes that 
are involved in the transactions that fail. This was to see if the problem is 
PGSQL upgrading row level locks to page level locks. With a low fillfactor, the 
chances of two transactions landing on the same page is low. We have around 9 
rows per data page instead of the original ~60. (The index pages are more 
tightly packed).   I ran a range of thread counts from 5 to 60 for the threads 
that issue transactions.  The failure rate per transaction dropped to around 
half for the thread count of 5, but that's misleading since with a fillfactor 
of15, our database size went up by around 6X, reducing the effectiveness of 
PGSQL and OS disk caches, resulting in a throughput of around half of what we 
used to see. So the reduced failure rate is just a result of fewer threads 
competing for resources. When I run with enough threads to max out the system 
with fillfactor=15 or 100, I get the same failure rates

- In case folks hadn't noticed, Ryan Johnson is getting very similar failure 
rates with dbt-5. So this isn't a case of our having made a silly mistake in 
our coding of the app or the stored procedures

Above experiments were the easy one. I am now working on rewriting the app code 
and the 6 stored procedures to see if I can execute the whole transaction in a 
single stored procedure

Thanks,
Reza

 -Original Message-
 From: Craig Ringer [mailto:cr...@2ndquadrant.com]
 Sent: Sunday, July 27, 2014 8:58 PM
 To: Reza Taheri; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] High rate of transaction failure with the Serializable
 Isolation Level
 
 On 07/26/2014 02:58 AM, Reza Taheri wrote:
  Hi Craig,
 
  According to the attached SQL, each frame is a separate phase in the
 operation and performs many different operations.
  There's a *lot* going on here, so identifying possible
  interdependencies isn't something I can do in a ten minute skim read over
 my morning coffee.
 
  You didn't think I was going to bug you all with a trivial problem,
  did you? :-) :-)
 
 One can hope, but usually in vain...
 
  Yes, I am going to have to take an axe to the code and see what pops out.
 Just to put this in perspective, the transaction flow and its statements are
 borrowed verbatim from the TPC-E benchmark. There have been dozens of
 TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that,
 although not used in public disclosures for various non-technical reasons, are
 used internally in by the DB and server companies. These 3 products, and
 perhaps more, were used extensively in the prototyping phase of TPC-E.
 
  So, my hope is that if there is a previously unidentified interdependency
 between transactions as you point out, it will be due to a mistake we made
 in coding this for PGSQL. Otherwise, we will have a hard time convincing all
 the council member companies that we need to change the schema or the
 business logic to make the kit work with PGSQL.
 
 Hopefully so.
 
 Personally I think it's moderately likely that PostgreSQL's much stricter
 enforcement of serializable isolation is detecting anomalies that other
 products do not, so it's potentially preventing errors.
 
 It would be nice to have the ability to tune this; sometimes there are
 anomalies you wish to ignore or permit. At present it is an all or nothing 
 affair
 - no predicate locking (REPEATABLE READ isolation) or strict predicate locking
 (SERIALIZABLE isolation).
 
 I recommend running some of the examples in the SERIALIZABLE
 documentation on other products. If they don't fail where they do in Pg,
 then the other products either have less strict (and arguably therefor less
 correct) serialisable isolation enforcement or they rely on blocking predicate
 locks. In the latter case it should be easy to tell because statements will 
 block
 on locks where no ordinary row or table level lock could be taken.
 
 If you do run comparative tests I would be quite interested in seeing the
 results.
 
 --
  Craig Ringer
 https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com
 /k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjroD2HLPTH
 U27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=jyJSmq%2BgXoPIgY%2BNtgswlUg
 zHSm45s%2FmevjxBmPKrIs%3D%0As=401415fb62d6f76b22bc76469cbefb85
 8342612f1fea2d359fe2bb3f18cab1ab
  PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your

[PERFORM] Re: High rate of transaction failure with the Serializable Isolation Level

2014-08-02 Thread Reza Taheri
Hi Ryan,
I just noticed that the mail alias manager has stalled the post below because 
of the attachment size. But you should have gotten it directly.

If anyone else is interested in a copy, let me know, and I will forward it

Thanks,
Reza

 -Original Message-
 From: Reza Taheri
 Sent: Monday, July 28, 2014 8:57 PM
 To: 'Ryan Johnson'
 Cc: pgsql-performance@postgresql.org
 Subject: RE: High rate of transaction failure with the Serializable Isolation
 Level
 
 Hi Ryan,
 We presented a paper at the TPCTC of last year's VLDB (attached). It
 described the architecture of the kit, and some of the tuning. Another tuning
 change was setting /proc/sys/vm/dirty_background_bytes to a small value
 (like 1000) on very-large memory machines, which was a problem I
 brought up on this same mailing list a while ago and got great advice. Also,
 make sure you do a SQLFreeStmt(stmt, SQL_DROP) at the end of
 transactions, not SQL_CLOSE.
 
 Let me know if you have any question about the paper
 
 Thanks,
 Reza
 
  -Original Message-
  From: Ryan Johnson [mailto:ryan.john...@cs.utoronto.ca]
  Sent: Saturday, July 26, 2014 4:51 PM
  To: Reza Taheri
  Cc: pgsql-performance@postgresql.org
  Subject: Re: High rate of transaction failure with the Serializable
  Isolation Level
 
  That does sound pretty similar, modulo the raw performance difference.
  I have no idea how many MEE threads there were; it was just a quick
  run with exactly zero tuning, so I use whatever dbt5 does out of the box.
  Actually, though, if you have any general tuning tips for TPC-E I'd be
  interested to learn them (PM if that's off topic for this discussion).
 
  Regards,
  Ryan
 
  On 26/07/2014 7:33 PM, Reza Taheri wrote:
   Hi Ryan,
   Thanks a lot for sharing this. When I run with 12 CE threads and 3-5
   MEE
  threads (how many MEE threads do you have?) @  80-90 tps, I get
  something in the 20-30% of trade-result transactions rolled back
  depending on how I count. E.g., in a 5.5-minute run with 3 MEE
  threads, I saw 87.5 tps. There were 29200 successful trade-result
  transactions. Of these, 5800 were rolled back, some more than once for
  a total of 8450 rollbacks. So I'd say your results and ours tell similar 
  stories!
  
   Thanks,
   Reza
  
   -Original Message-
   From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
   performance-ow...@postgresql.org] On Behalf Of Ryan Johnson
   Sent: Saturday, July 26, 2014 2:06 PM
   To: Reza Taheri
   Cc: pgsql-performance@postgresql.org
   Subject: Re: High rate of transaction failure with the Serializable
   Isolation Level
  
   Dredging through some old run logs, 12 dbt-5 clients gave the
   following when everything was run under SSI (fully serializable,
   even the transactions that allow repeatable read isolation). Not
   sure how that
  translates to your results.
   Abort rates were admittedly rather high, though perhaps lower than
   what you report.
  
   Transaction % Average: 90th %   Total Rollbacks% Warning 
   Invalid
   - --- --- --- -- --- 
   ---
   Trade Result5.568   0.022:  0.0562118417  19.69%   0 
91
   Broker Volume   5.097   0.009:  0.0141557  0   0.00%   0 
 0
   Customer Position  13.530   0.016:  0.0344134  1   0.02%   0 
 0
   Market Feed 0.547   0.033:  0.065 212 45  21.23%   0 
69
   Market Watch   18.604   0.031:  0.0615683  0   0.00%   0 
 0
   Security Detail14.462   0.015:  0.0204418  0   0.00%   0 
 0
   Trade Lookup8.325   0.059:  0.1462543  0   0.00% 432 
 0
   Trade Order 9.110   0.006:  0.0083227444  13.76%   0 
 0
   Trade Status   19.795   0.030:  0.0466047  0   0.00%   0 
 0
   Trade Update1.990   0.064:  0.145 608  0   0.00% 432 
 0
   Data Maintenance  N/A   0.012:  0.012   1  0   0.00%   0 
 0
   - --- --- --- --
   --- ---
   28.35 trade-result transactions per second (trtps)
  
   Regards,
   Ryan
  
   On 26/07/2014 3:55 PM, Reza Taheri wrote:
   Hi Ryan,
   That's a very good point. We are looking at dbt5. One question:
   what
   throughput rate, and how many threads of execution did you use for
  dbt5?
   The failure rates I reported were at ~120 tps with 15 trade-result
 threads.
   Thanks,
   Reza
  
   -Original Message-
   From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
   performance-ow...@postgresql.org] On Behalf Of Ryan Johnson
   Sent: Friday, July 25, 2014 2:36 PM
   To: pgsql-performance@postgresql.org
   Subject: Re: High rate of transaction failure with the
   Serializable Isolation Level
  
   On 25/07/2014 2:58 PM, Reza Taheri wrote:
   Hi Craig,
  
   According to the attached

[PERFORM] Re: High rate of transaction failure with the Serializable Isolation Level

2014-07-26 Thread Reza Taheri
Hi Ryan,
That's a very good point. We are looking at dbt5. One question: what throughput 
rate, and how many threads of execution did you use for dbt5?  The failure 
rates I reported were at ~120 tps with 15 trade-result threads.

Thanks,
Reza

 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Ryan Johnson
 Sent: Friday, July 25, 2014 2:36 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: High rate of transaction failure with the Serializable Isolation
 Level
 
 On 25/07/2014 2:58 PM, Reza Taheri wrote:
  Hi Craig,
 
  According to the attached SQL, each frame is a separate phase in the
 operation and performs many different operations.
  There's a *lot* going on here, so identifying possible
  interdependencies isn't something I can do in a ten minute skim read over
 my morning coffee.
  You didn't think I was going to bug you all with a trivial problem,
  did you? :-) :-)
 
  Yes, I am going to have to take an axe to the code and see what pops out.
 Just to put this in perspective, the transaction flow and its statements are
 borrowed verbatim from the TPC-E benchmark. There have been dozens of
 TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that,
 although not used in public disclosures for various non-technical reasons, are
 used internally in by the DB and server companies. These 3 products, and
 perhaps more, were used extensively in the prototyping phase of TPC-E.
 
  So, my hope is that if there is a previously unidentified interdependency
 between transactions as you point out, it will be due to a mistake we made
 in coding this for PGSQL. Otherwise, we will have a hard time convincing all
 the council member companies that we need to change the schema or the
 business logic to make the kit work with PGSQL.
 
  Just pointing out my uphill battle!!
 You might compare against dbt-5 [1], just to see if the same problem occurs. I
 didn't notice such high abort rates when I ran that workload a few weeks
 ago. Just make sure to use the latest commit, because the released version
 has fatal bugs.
 
 [1]
 https://urldefense.proofpoint.com/v1/url?u=https://github.com/petergeog
 hegan/dbt5k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjr
 oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=6E%2F9fWJPMGjpMyP
 xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0As=b3f269216d419410f3f07bb
 774a27b7d377744c9d423df52a3e62324d9279958
 
 Ryan
 
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m
 ailpref/pgsql-
 performancek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CP
 jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=6E%2F9fWJPMGjpMy
 PxtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0As=45ab94ce068dbe28956af
 8bb3f999e9a91138dd1e3c3345c036e87e902da1ef1


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: High rate of transaction failure with the Serializable Isolation Level

2014-07-26 Thread Reza Taheri
Hi Ryan,
Thanks a lot for sharing this. When I run with 12 CE threads and 3-5 MEE 
threads (how many MEE threads do you have?) @  80-90 tps, I get something in 
the 20-30% of trade-result transactions rolled back depending on how I count. 
E.g., in a 5.5-minute run with 3 MEE threads, I saw 87.5 tps. There were 29200 
successful trade-result transactions. Of these, 5800 were rolled back, some 
more than once for a total of 8450 rollbacks. So I'd say your results and ours 
tell similar stories!

Thanks,
Reza

 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Ryan Johnson
 Sent: Saturday, July 26, 2014 2:06 PM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org
 Subject: Re: High rate of transaction failure with the Serializable Isolation
 Level
 
 Dredging through some old run logs, 12 dbt-5 clients gave the following when
 everything was run under SSI (fully serializable, even the transactions that
 allow repeatable read isolation). Not sure how that translates to your 
 results.
 Abort rates were admittedly rather high, though perhaps lower than what
 you report.
 
 Transaction % Average: 90th %   Total Rollbacks% Warning 
 Invalid
 - --- --- --- -- --- 
 ---
 Trade Result5.568   0.022:  0.0562118417  19.69%   0  
 91
 Broker Volume   5.097   0.009:  0.0141557  0   0.00%   0  
  0
 Customer Position  13.530   0.016:  0.0344134  1   0.02%   0  
  0
 Market Feed 0.547   0.033:  0.065 212 45  21.23%   0  
 69
 Market Watch   18.604   0.031:  0.0615683  0   0.00%   0  
  0
 Security Detail14.462   0.015:  0.0204418  0   0.00%   0  
  0
 Trade Lookup8.325   0.059:  0.1462543  0   0.00% 432  
  0
 Trade Order 9.110   0.006:  0.0083227444  13.76%   0  
  0
 Trade Status   19.795   0.030:  0.0466047  0   0.00%   0  
  0
 Trade Update1.990   0.064:  0.145 608  0   0.00% 432  
  0
 Data Maintenance  N/A   0.012:  0.012   1  0   0.00%   0  
  0
 - --- --- --- -- --- 
 ---
 28.35 trade-result transactions per second (trtps)
 
 Regards,
 Ryan
 
 On 26/07/2014 3:55 PM, Reza Taheri wrote:
  Hi Ryan,
  That's a very good point. We are looking at dbt5. One question: what
 throughput rate, and how many threads of execution did you use for dbt5?
 The failure rates I reported were at ~120 tps with 15 trade-result threads.
 
  Thanks,
  Reza
 
  -Original Message-
  From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
  performance-ow...@postgresql.org] On Behalf Of Ryan Johnson
  Sent: Friday, July 25, 2014 2:36 PM
  To: pgsql-performance@postgresql.org
  Subject: Re: High rate of transaction failure with the Serializable
  Isolation Level
 
  On 25/07/2014 2:58 PM, Reza Taheri wrote:
  Hi Craig,
 
  According to the attached SQL, each frame is a separate phase in
  the
  operation and performs many different operations.
  There's a *lot* going on here, so identifying possible
  interdependencies isn't something I can do in a ten minute skim
  read over
  my morning coffee.
  You didn't think I was going to bug you all with a trivial problem,
  did you? :-) :-)
 
  Yes, I am going to have to take an axe to the code and see what pops
 out.
  Just to put this in perspective, the transaction flow and its
  statements are borrowed verbatim from the TPC-E benchmark. There
 have
  been dozens of TPC-E disclosures with MS SQL Server, and there are
  Oracle and DB2 kits that, although not used in public disclosures for
  various non-technical reasons, are used internally in by the DB and
  server companies. These 3 products, and perhaps more, were used
 extensively in the prototyping phase of TPC-E.
  So, my hope is that if there is a previously unidentified
  interdependency
  between transactions as you point out, it will be due to a mistake
  we made in coding this for PGSQL. Otherwise, we will have a hard time
  convincing all the council member companies that we need to change
  the schema or the business logic to make the kit work with PGSQL.
  Just pointing out my uphill battle!!
  You might compare against dbt-5 [1], just to see if the same problem
  occurs. I didn't notice such high abort rates when I ran that
  workload a few weeks ago. Just make sure to use the latest commit,
  because the released version has fatal bugs.
 
  [1]
 
 https://urldefense.proofpoint.com/v1/url?u=https://github.com/peterge
  og
 hegan/dbt5k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjr
 
 oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=6E%2F9fWJPMGjpMyP
 
 xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0As=b3f269216d419410f3f07bb
  774a27b7d377744c9d423df52a3e62324d9279958
 
  Ryan
 
 
 
  --
  Sent

Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level

2014-07-25 Thread Reza Taheri
Hi Craig,

 According to the attached SQL, each frame is a separate phase in the 
 operation and performs many different operations.
 There's a *lot* going on here, so identifying possible interdependencies 
 isn't something I can do in a ten minute skim
 read over my morning coffee.

You didn't think I was going to bug you all with a trivial problem, did you? 
:-) :-)

Yes, I am going to have to take an axe to the code and see what pops out. Just 
to put this in perspective, the transaction flow and its statements are 
borrowed verbatim from the TPC-E benchmark. There have been dozens of TPC-E 
disclosures with MS SQL Server, and there are Oracle and DB2 kits that, 
although not used in public disclosures for various non-technical reasons, are 
used internally in by the DB and server companies. These 3 products, and 
perhaps more, were used extensively in the prototyping phase of TPC-E.

So, my hope is that if there is a previously unidentified interdependency 
between transactions as you point out, it will be due to a mistake we made in 
coding this for PGSQL. Otherwise, we will have a hard time convincing all the 
council member companies that we need to change the schema or the business 
logic to make the kit work with PGSQL.

Just pointing out my uphill battle!!

 If there are foreign keys to it from other tables, they count too.

Yes, we have a lot of foreign keys. I dropped them all a few weeks ago with no 
impact. But when I start the axing process, they will be one of the first to go

Thanks,
Reza

 -Original Message-
 From: Craig Ringer [mailto:cr...@2ndquadrant.com]
 Sent: Thursday, July 24, 2014 6:30 PM
 To: Reza Taheri; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] High rate of transaction failure with the Serializable
 Isolation Level
 
 On 07/25/2014 03:50 AM, Reza Taheri wrote:
  Hi Craig,
  It's not just that statement that is relevant.
  Is that statement run standalone, or as part of a larger transaction?
 
  Yes, the size of the transaction seems to matter here. It is a complex
 transaction (attached). Each frame is one stored procedure, and the 6
 frames are called one after the other with no pause. After frame6 returns,
 we call SQLTransact(..., ...,  SQL_COMMIT). Below is the failure rate of the
 various frames:
 
  112 tid 18883: SQL Failed: DoTradeResultFrame3
  102 tid 18883: SQL Failed: DoTradeResultFrame4
18188 tid 18883: SQL Failed: DoTradeResultFrame5
 8566 tid 18883: SQL Failed: DoTradeResultFrame6
 4492 tid 18883: ERROR: TradeResultDB: commit failed
 
  So, no failures in frames 1 and 2, and then the failure rate grows as we
 approach the end of the transaction.
 
 According to the attached SQL, each frame is a separate phase in the
 operation and performs many different operations.
 
 There's a *lot* going on here, so identifying possible interdependencies isn't
 something I can do in a ten minute skim read over my morning coffee.
 
 I think the most useful thing to do here is to start cutting and simplifying 
 the
 case, trying to boil it down to the smallest thing that still causes the 
 problem.
 
 That'll likely either find a previously unidentified interdependency between
 transactions or, if you're unlucky, a Pg bug. Given the complexity of the
 operations there I'd be very surprised if it wasn't the former.
 
  If the INSERTing transaction previously queried for a key that was
  created by a concurrent transaction this can occur as there is no
 serialization execution order of the transactions that could produce the same
 result.
 
  As far as the inserts, your point is well-taken. But in this case, I
  have eliminated the transactions that query or otherwise manipulate
  the SETTELEMENT table. The only access to it is the single insert in
  this transaction
 
 If there are foreign keys to it from other tables, they count too.
 
 --
  Craig Ringer
 https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com
 /k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjroD2HLPTH
 U27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=SLSpdQUFSC%2BXlQIgotLSghfyEB
 qC7q8Sh1AEizZ3pBw%3D%0As=ceb740d5d6686cda7ed9dd31b4dce2de0eda
 3cf3a46ffead645c5bb6d9e7ec5c
  PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: High rate of transaction failure with the Serializable Isolation Level

2014-07-25 Thread Reza Taheri
We are hoping the spec will get wrapped up in the next 6 months, but industry 
standard councils move very slowly!  However, if there is interest in getting 
involved and helping, the TPC might be receptive to earlier access.

BTW, just to let folks know how large of an undertaking this has been, this is 
an estimate of the lines of code developed so far:

§  700 lines of run-time shell scripts
§  700 lines of build-time shell scripts
§  3.5K lines of DDL and DML
§  4K lines of C code to test the DML
§  22K lines of C, C++,  and Java code in the benchmark driver
§  45K lines of C++ code in VGen

 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Josh Berkus
 Sent: Friday, July 25, 2014 2:47 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: High rate of transaction failure with the Serializable Isolation
 Level
 
 On 07/23/2014 06:18 PM, Reza Taheri wrote:
  [By way of introduction, we are a TPC subcommittee that is developing
  a benchmark with cloud-like characteristics for virtualized databases.
  The end-to-end benchmarking kit will be publicly available, and will
  run on PGSQL]
 
 Awesome!  Any idea when it will be available?  Our community could really
 use some updated benchmark tooling ...
 
 --
 Josh Berkus
 PostgreSQL Experts Inc.
 https://urldefense.proofpoint.com/v1/url?u=http://pgexperts.com/k=oIv
 Rg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjroD2HLPTHU27nI9P
 Jr8wgKO2rU9QZyZZU%3D%0Am=tao9MSozUNfqQO2rwn3fQPjQbSY7t6i7va
 Qnrs%2F%2B%2FWI%3D%0As=3cc04ce20eaf8319b9d2727420905b24ad64be
 b8d283142b321dde138026ed8d
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m
 ailpref/pgsql-
 performancek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CP
 jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=tao9MSozUNfqQO2r
 wn3fQPjQbSY7t6i7vaQnrs%2F%2B%2FWI%3D%0As=9c73db2dfd89fef26217
 f410046c8557b726bc8c5d0bf53053ea0e33294032f7

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level

2014-07-24 Thread Reza Taheri
Hi Kevin,
Thanks for the reply

 As already pointed out by Craig, statements don't have serialization 
 failures; transactions do.  In some cases a transaction may become
 doomed to fail by the action of a concurrent transaction, but the actual 
 failure cannot occur until the next statement is run on the
 connection with the doomed transaction; it may have nothing to do with the 
 statement itself.

That's an interesting concept. I suppose I could test it by moving statements 
around to see what happens.


 Note that I have seen reports of cases where max_pred_locks_per_transaction 
 needed to be set to 20x the default to
 reduce serialization failures to an acceptable level.


I was running with the following two parameters set to 640; I then raised them 
to 6400, and saw no difference

max_locks_per_transaction = 6400
max_pred_locks_per_transaction = 6400

Thanks,
Reza

 -Original Message-
 From: Kevin Grittner [mailto:kgri...@ymail.com]
 Sent: Thursday, July 24, 2014 7:03 AM
 To: Reza Taheri; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] High rate of transaction failure with the Serializable
 Isolation Level
 
 Reza Taheri rtah...@vmware.com wrote:
 
  I am running into very high failure rates when I run with the
  Serializable Isolation Level. I have simplified our configuration to a
  single database with a constant workload, a TPC-E workload if you
  will, to focus on this this problem. We are running with PGSQL 9.2.4
 
 I don't remember any bug fixes that would be directly related to what you
 describe in the last 15 months, but it might be better to do any testing with
 fixes for known bugs:
 
 https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/su
 pport/versioning/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKm
 A0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2dsUC
 9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=6522bd258d0a034429522b
 61239134b07f1cabc086e8c2cb330aa9c9bc4a337d
 
  When we raise the Trade-Result transaction to SQL_TXN_SERIALIZABLE, we
  face a storm of conflicts. Out of
  37,342 Trade-Result transactions, 15,707 hit an error, and have to be
  rolled back and retired one or more times. The total failure count
  (due to many transactions failing more than once) is 31,388.
 
  What is unusual is that the majority of the failures occur in a
  statement that should not have any isolation conflicts.
 
 As already pointed out by Craig, statements don't have serialization failures;
 transactions do.  In some cases a transaction may become doomed to fail
 by the action of a concurrent transaction, but the actual failure cannot occur
 until the next statement is run on the connection with the doomed
 transaction; it may have nothing to do with the statement itself.
 
 If you want to understand the theory of how SERIALIZABLE transactions are
 implemented in PostgreSQL, these links may help:
 
 https://urldefense.proofpoint.com/v1/url?u=http://vldb.org/pvldb/vol5/p1
 850_danrkports_vldb2012.pdfk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0
 Ar=b9TKmA0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C
 2sdUv2dsUC9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=d1b8cd62c431
 c267124c21d4e639c98eebb650caaf8fd05ba47aa825a9b54a52
 
 https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitwe
 b/?p%3Dpostgresql.git%3Ba%3Dblob_plain%3Bf%3Dsrc/backend/storage/lm
 gr/README-
 SSI%3Bhb%3Dmasterk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9T
 KmA0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2d
 sUC9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=1f60010253b8012dbe5
 e5a51af48fcb831dae81200708f620438e6afb48c0eef
 
 https://urldefense.proofpoint.com/v1/url?u=http://wiki.postgresql.org/wiki
 /Serializablek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPj
 roD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2dsUC9oH2
 yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=040078780771088975f2abe3668
 5b182ca626557ed2cd1c7241c78b9f417d325
 
 For a more practical set of examples about the differences in using
 REPEATABLE READ and SERIALIZABLE transaction isolation levels in
 PostgreSQL, see:
 
 https://urldefense.proofpoint.com/v1/url?u=http://wiki.postgresql.org/wiki
 /SSIk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjroD2HLP
 THU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2dsUC9oH2yzDssdTb
 CEBF5mbQZbZ871laGw%3D%0As=3c2629d0256b802ed7b701be6bff7443480
 5f94beb1c400c772ace91c7204bc5
 
 If you are just interested in reducing the number of serialization failures, 
 see
 the suggestions near the end of this section of the
 documentation:
 
 https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/do
 cs/9.2/interactive/transaction-iso.html%23XACT-
 SERIALIZABLEk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CP
 jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2dsUC9oH2
 yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=ae5349ae3cafcd86c6ba6be9404
 990ae800d93d6ccfe892402c2d8d463bd8574
 
 Any of these items (or perhaps a combination of them) may ameliorate the
 problem.  Note that I have seen reports of cases where

[PERFORM] High rate of transaction failure with the Serializable Isolation Level

2014-07-23 Thread Reza Taheri
Hello PGSQL performance community,
[By way of introduction, we are a TPC subcommittee that is developing a 
benchmark with cloud-like characteristics for virtualized databases. The 
end-to-end benchmarking kit will be publicly available, and will run on PGSQL]

I am running into very high failure rates when I run with the Serializable 
Isolation Level. I have simplified our configuration to a single database with 
a constant workload, a TPC-E workload if you will, to focus on this this 
problem. We are running with PGSQL 9.2.4, ODBC 2.2.14 (as well as 2.3.3pre, 
which didn't help), RHEL 6.4, and a 6-way VM with 96GB of memory on a 4-socket 
Westmere server.

With our 9 transactions running with a mix of SQL_TXN_READ_COMMITTED and 
SQL_TXN_REPEATABLE_READ, we get less than 1% deadlocks, all of which occur 
because each row in one table, BROKER, may be read or written by multiple 
transactions at the same time. So, there are legitimate conflicts, which we 
deal with using an exponential backoff algorithm that sleeps for 
10ms/30ms/90ms/etc.

When we raise the Trade-Result transaction to SQL_TXN_SERIALIZABLE, we face a 
storm of conflicts. Out of 37,342 Trade-Result transactions, 15,707 hit an 
error, and have to be rolled back and retired one or more times. The total 
failure count (due to many transactions failing more than once) is 31,388.

What is unusual is that the majority of the failures occur in a statement that 
should not have any isolation conflicts. About 17K of failures are from the 
statement below:
2014-07-23 11:27:15 PDT 26085 ERROR:  could not serialize access due to 
read/write dependencies among transactions
2014-07-23 11:27:15 PDT 26085 DETAIL:  Reason code: Canceled on identification 
as a pivot, during write.
2014-07-23 11:27:15 PDT 26085 HINT:  The transaction might succeed if retried.
2014-07-23 11:27:15 PDT 26085 CONTEXT:  SQL statement update   TRADE
set T_COMM = comm_amount,
T_DTS = trade_dts,
T_ST_ID = st_completed_id,
T_TRADE_PRICE = trade_price
where   T_ID = trade_id
PL/pgSQL function traderesultframe5(ident_t,value_t,character,timestamp 
without time zone,trade_t,s_price_t) line 15 at SQL statement

This doesn't make sense since at any given time, only one transaction might 
possibly be accessing the row that is being updated. There should be no 
conflicts if we have row-level locking/isolation

The second most common conflict happens 7.6K times in the statement below:
2014-07-23 11:27:23 PDT 26039 ERROR:  could not serialize access due to 
read/write dependencies among transactions
2014-07-23 11:27:23 PDT 26039 DETAIL:  Reason code: Canceled on identification 
as a pivot, during conflict in checking.
2014-07-23 11:27:23 PDT 26039 HINT:  The transaction might succeed if retried.
2014-07-23 11:27:23 PDT 26039 CONTEXT:  SQL statement insert
intoSETTLEMENT (SE_T_ID,
SE_CASH_TYPE,
SE_CASH_DUE_DATE,
SE_AMT)
values (trade_id,
cash_type,
due_date,
se_amount
)
PL/pgSQL function traderesultframe6(ident_t,timestamp without time 
zone,character varying,value_t,timestamp without time 
zone,trade_t,smallint,s_qty_t,character) line 23 at SQL statement

I don't understand why an insert would hit a serialization conflict

We also have 4.5K conflicts when we try to commit:
2014-07-23 11:27:23 PDT 26037 ERROR:  could not serialize access due to 
read/write dependencies among transactions
2014-07-23 11:27:23 PDT 26037 DETAIL:  Reason code: Canceled on identification 
as a pivot, during commit attempt.
2014-07-23 11:27:23 PDT 26037 HINT:  The transaction might succeed if retried.
2014-07-23 11:27:23 PDT 26037 STATEMENT:  COMMIT


Does PGSQL raise locks to page level when we run with SQL_TXN_SERIALIZABLE? Are 
there any knobs I can play with to alleviate this?  FWIW, the same transactions 
on MS SQL Server see almost no conflicts.

Thanks,
Reza


Re: [PERFORM] PGSQL, checkpoints, and file system syncs

2014-04-03 Thread Reza Taheri
 Try setting the vm.dirty_bytes sysctl. Something like 256MB might be a good
 starting point.
 
 This comes up fairly often, see e.g.:
 http://www.postgresql.org/message-id/flat/27C32FD4-0142-44FE-8488-
 9f366dc75...@mr-paradox.net
 
 - Heikki

Thanks, Heikki. That sounds like my problem alright. I will play with these 
parameters right away, and will report back.

Cheers,
Reza


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-10 Thread Reza Taheri
Hi Merlin,
We are moving up to a larger testbed, and are planning to use 9.2. But the 
results will not comparable to our 8.4 results due to differences in hardware. 
But that comparison is a useful one. I'll try for a quick test on the new 
hardware with 8.4 before moving to 9.2.

Thanks,
Reza

 -Original Message-
 From: Merlin Moncure [mailto:mmonc...@gmail.com]
 Sent: Tuesday, July 10, 2012 12:06 PM
 To: Reza Taheri
 Cc: Greg Smith; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On Thu, Jul 5, 2012 at 10:33 PM, Reza Taheri rtah...@vmware.com wrote:
  Just to be clear, we have a number of people from different companies
 working on the kit. This is not a VMware project, it is a TPC project. But I
 hear you regarding coming in from the cold and asking for a major db engine
 feature. I know that I have caused a lot of rolling eyes. Believe me, I have
 had the same (no, worse!) reaction from every one of the commercial
 database companies in response to similar requests over the past 25 years.
 
 No rolling of eyes from me.  Clustered indexes work and if your table access
 mainly hits the table through that index you'll see enormous reductions in
 i/o.  Index only scans naturally are a related optimization in the same vein.
 Denying that is just silly.  BTW, putting postgres through a standard non
 trivial benchmark suite over reasonable hardware, reporting results,
 identifying bottlenecks, etc.
 is incredibly useful.  Please keep it up, and don't be afraid to ask for help
 here.  (one thing I'd love to see is side by side results comparing 8.4 to 
 9.1 to
 9.2).
 
 merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Daniel,
Yes, it sounds like GIT will take us half the way there by getting rid of much 
of the index I/O if we cluster the tables. We can set the fillfactor parameter 
to keep tables sorted after updates. I am not sure what impact inserts will 
have since the primary key keeps growing with new inserts, so perhaps the table 
will maintain the cluster order and the benefits of GIT for new rows, too. GIT 
won't save CPU cycles the way a clustered/integrated index would, and actually 
adds to the CPU cost since the data page has to be searched for the desired 
tuple.

Thanks,
Reza

 -Original Message-
 From: Daniel Farina [mailto:dan...@heroku.com]
 Sent: Wednesday, July 04, 2012 6:40 AM
 To: Craig Ringer
 Cc: Reza Taheri; pgsql-performance@postgresql.org; Robert Haas
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On Tue, Jul 3, 2012 at 10:43 PM, Craig Ringer ring...@ringerc.id.au wrote:
  On 07/04/2012 07:13 AM, Reza Taheri wrote:
 
  Following the earlier email introducing the TPC-V benchmark, and that
  we are developing an industry standard benchmarking kit for TPC-V
  using PostgreSQL, here is a specific performance issue we have run into.
 
 
  Which version of PostgreSQL are you using?
 
  How has it been tuned beyond the defaults - autovacuum settings,
  shared_buffers, effective_cache_size, WAL settings, etc?
 
  How much RAM is on the blade? What OS and version are on the blade?
 
 
  Comparing the table sizes, we are close to 2X larger (more on this in
  a later note). But the index size is what stands out. Our overall
  index usage (again, after accounting for different numbers of rows) is
  4.8X times larger. 35% of our I/Os are to the index space. I am
  guessing that the 4.8X ballooning has something to do with this, and
  that in itself explains a lot about our high I/O rate, as well as
  higher CPU/tran cycles compared to MS SQL (we are  2.5-3 times slower).
 
  This is making me wonder about bloat issues and whether proper
  vacuuming is being done. If the visibility map and free space map
  aren't maintained by proper vaccum operation everything gets messy,
 fast.
 
  Well, MS SQL used a clustered index for CT, i.e., the data is held
  in the leaf pages of the index B-Tree. The data and index are in one
  data structure. Once you lookup the index, you also have the data at
  zero additional cost.
 
  [snip]
 
 
 
  Is the PGSQL community willing to invest in a feature that a) has been
  requested by many others already; and b) can make a huge difference in
  a benchmark that can lend substantial credibility to PGSQL performance?
 
 
  while PostgreSQL doesn't support covering indexes or clustered indexes
  at this point, 9.2 has added support for index-only scans, which are a
  half-way point of sorts. See:
 
http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-
 em.html
http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
 
  http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9
  337a21f98ac4ce850bb4145acf47ca27
 
  If at all possible please see how your test is affected by this
  PostgreSQL
  9.2 enhancement. It should make a big difference, and if it doesn't
  it's important to know why.
 
  (CC'd Robert Haas)
 
  I'm not sure what the best option for getting a 9.2 beta build for
  Windows is.
 
 
  As for the invest side - that's really a matter for EnterpriseDB,
  Command Prompt, Red Hat, and the other backers who're employing
 people
  to work on the DB. Consider asking on pgsql-hackers, too; if nothing
  else you'll get a good explanation of the current state and progress toward
 clustered indexes.
 
  Some links that may be useful to you are:
 
http://wiki.postgresql.org/wiki/Todo
Things that it'd be good to support/implement at some point.
  Surprisingly, covering/clustered indexes aren't on there or at least aren't
 easily found.
  It's certainly a much-desired feature despite its apparent absence
  from the TODO.
 
 I think there is, deservingly, a lot of hesitation to implement a strictly
 ordered table construct.  A similar feature that didn't quite get finished --
 but maybe can be beaten into shape -- is the grouped-index-tuple
 implementation:
 
 http://community.enterprisedb.com/git/
 
 It is mentioned on the TODO page.  It's under the category that is perhaps
 poorly syntactically overloaded in the world cluster.
 
 --
 fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Robert,
Yes, the same concept. Oracle's IOT feature is used often with TPC benchmarks.

Thanks,
Reza

 -Original Message-
 From: Robert Klemme [mailto:shortcut...@googlemail.com]
 Sent: Thursday, July 05, 2012 5:30 AM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com);
 Greg Kopczynski; Jignesh Shah
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On Wed, Jul 4, 2012 at 1:13 AM, Reza Taheri rtah...@vmware.com wrote:
 
  Checking online, the subject of clustered indexes for PostgreSQL comes
  up often. PGSQL does have a concept called clustered table, which
  means a table has been organized in the order of an index. This would
  help with sequential accesses to a table, but has nothing to do with this
 problem.
  PGSQL folks sometimes refer to what we want as integrated index.
 
 I do understand this correctly that we are speaking about the concept which
 is known under the term index organized table (IOT) in Oracle land,
 correct?
 
 http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm#CBB
 JEBIH
 
 Kind regards
 
 robert
 
 --
 remember.guy do |as, often| as.you_can - without end
 http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Samuel,
The SSDs were used as a cache for the spinning drives. Here is a 30-second 
iostat sample representative of the whole run:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  24.870.00   12.54   62.390.000.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sdd   0.00   137.37 3058.40  106.17 34691.60   974.1322.54
15.754.98   0.32 100.00
sde   0.00   136.07 3063.37  107.70 35267.07   975.0722.86
15.584.92   0.32 100.00
sdf   0.00   135.37 3064.23  109.53 35815.60   979.6023.19
15.824.99   0.32 100.00
sdg   0.00   136.97 3066.57  116.67 35196.53  1014.5322.75
15.874.99   0.31 100.00
sdi   0.00  2011.030.00   87.90 0.00  8395.73   191.03 
0.131.45   1.42  12.51
sdk   0.00   136.63 3066.83  107.53 35805.07   976.6723.17
16.015.04   0.32 100.00
sdm   0.00   138.50 3054.40  111.10 34674.27   998.4022.54
15.524.91   0.32 100.00
sdj   0.00   136.73 3058.70  118.20 35227.20  1019.7322.82
15.814.98   0.31 100.00
sdl   0.00   137.53 3044.97  109.33 34448.00   987.4722.47
15.785.00   0.32 100.00

The data and index tablespaces were striped across the 8 LUNs, and saw an 
average 5ms response. We can beef up the storage to handle more I/Os so that 
our utilization doesn't stay below 40%, but that misses the point: we have an 
I/O rate twice the commercial database because they used clustered indexes.

I provided more config details in an earlier email.

As for asking for development to game a benchmark, no one is asking for 
benchmark specials. The question of enhancements in response to benchmark needs 
is an age old question. We can get into that, but it's really a different 
discussion. Let me just expose the flip side of it: are we willing to watch 
people use other databases to run benchmarks but feel content that no features 
were developed specifically in response to benchmark results?

I am trying to engage with the community. We can drown the mailing list with 
details. So I decided to open the discussion with the high level points, and we 
will give you all the details that you want as we move forward.

Thanks,
Reza

From: Samuel Gendler [mailto:sgend...@ideasculptor.com]
Sent: Thursday, July 05, 2012 12:46 PM
To: Reza Taheri
Cc: Robert Klemme; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance


On Thu, Jul 5, 2012 at 12:13 PM, Reza Taheri 
rtah...@vmware.commailto:rtah...@vmware.com wrote:
Hi Robert,
Yes, the same concept. Oracle's IOT feature is used often with TPC benchmarks.

Reza, it would be very helpful if you were to provide the list with a lot more 
information about your current software and hardware configuration before 
coming to the conclusion that the only possible way forward is with a 
significant architectural change to the db engine itself.  Not only is it not 
at all clear that you are extracting maximum performance from your current 
hardware and software, but I doubt anyone is particularly interested in doing a 
bunch of development purely to game a benchmark.  There has been significant 
discussion of the necessity and viability of the feature you are requesting in 
the past, so you should probably start where those discussions left off rather 
than starting the discussion all over again from the beginning.  Of course, if 
vmware were to sponsor development of the feature in question, it probably 
wouldn't require nearly as much buy-in from the wider community.

Getting back to the current performance issues -  I have little doubt that the 
MS SQL benchmark was set up and run by people who were intimately familiar with 
MS SQL performance tuning.  You stated in your earlier email that your team 
doesn't have significant postgresql-specific experience, so it isn't 
necessarily surprising that your first attempt at tuning didn't get the results 
that you are looking for. You stated that you have 14 SSDs and 90 spinning 
drives, but you don't specify how they are combined and how the database is 
laid out on top of them.  There is no mention of how much memory is available 
to the system. We don't know how you've configured postgresql's memory 
allocation or how your config weights the relative costs of index lookups, 
sequential scans, etc.  The guidelines for this mailing list include 
instructions for what information should be provided when asking about 
performance improvements.  http://archives.postgresql.org/pgsql-performance/  
Let's start by ascertaining how your benchmark results can be improved without 
engaging in a significant development effort on the db engine itself.





Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Well, I keep failing to send an email with an attachment. Do I need a 
moderator's approval?

Yes, running on VMs and a lower bin processor. With the virtualization 
overhead, etc., I figure we would be running right around 2/3 of the Dell 
throughput if we were running the same DBMS.

I sent the following message twice today with attachments (postgresql.conf, 
etc.), and it hasn't been posted yet. Here it is without an attachment.



From: Reza Taheri 
Sent: Thursday, July 05, 2012 11:34 AM
To: 'Craig Ringer'
Cc: pgsql-performance@postgresql.org; Robert Haas
Subject: RE: [PERFORM] The need for clustered indexes to boost TPC-V performance

OK, some config details.
We are using:

*   Two blades of an HP BladeSystem c-Class c7000 with 2-socket Intel E5520 
(Nehalem-EP) processors and 48GB of memory per blade
o   8 cores, 16 threads per blade
o   48GB of RAM per blade
*   Storage was an EMC VNX5700 with 14 SSDs fronting 32 15K RPM drives
*   The Tier B database VM was alone on a blade with 16 vCPUs, 40GB of 
memory, 4 virtual drives with various RAID levels
*   The driver and Tier A VMs were on the second blade
o   So we set PGHOST on the client system to point to the server
*   RHEL 6.1
*   PostgreSQL 8.4
*   unixODBC 2.3.2

We stuck with PGSQL 8.4 since it is the stock version shipped with RHEL 6. I am 
building a new, larger testbed, and will switch to PGSQL 9 with that.

Postgres.conf is attached.

Thanks,
Reza

 -Original Message-
 From: Andy Colson [mailto:a...@squeakycode.net]
 Sent: Thursday, July 05, 2012 5:42 PM
 To: Samuel Gendler
 Cc: Reza Taheri; Robert Klemme; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On 07/05/2012 03:52 PM, Samuel Gendler wrote:
 
 
  On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri rtah...@vmware.com
 mailto:rtah...@vmware.com wrote:
 
 
  I provided more config details in an earlier email.
 
  __ __
 
 
 
  I hate to disagree, but unless I didn't get a message sent to the list, you
 haven't provided any details about your postgresql config or otherwise
 adhered to the guidelines for starting a discussion of a performance
 problem around here.  I just searched my mailbox and no email from you
 has any such details.  Several people have asked for them, including myself.
 You say you will give any details we want, but this is at least the 3rd or 4th
 request for such details and they have not yet been forthcoming.
 
 
 Reza, I went back and looked myself.  I see no specs on OS, or hardware
 unless you mean this:
 
 
  http://bit.ly/QeWXhE. This was run on a similar server, and the database
 size is close to ours.
 
 
 You're running on windows then?  Server is 96Gig ram, 8 cores, (dell
 poweredge T610).
 with two powervault MD1120 NAS's?
 
 But then I assume you were not running on that, were you.  You were
 running vmware on it, probably?
 
 
 -Andy

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-05 Thread Reza Taheri
Hi Greg,
Yes, a single-instance benchmark is a natural fall-out from the TPC-V kit. Our 
coding team (4 people working directly on the benchmark with another 3-4 folks 
helping in various consulting capacities) is tasked with creating a multi-VM 
benchmark. The benchmark is still missing the critical Market Exchange Emulator 
function. Once that's done, it would be natural for someone else in the TPC to 
take our working prototype and simplify it for a single-system, TPC-E (not V) 
reference kit. The conversion is not technically difficult, but releasing kits 
is a new path for the TPC, and will take some work.

Cheers,
Reza

 -Original Message-
 From: Greg Smith [mailto:g...@2ndquadrant.com]
 Sent: Thursday, July 05, 2012 6:25 PM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com);
 Greg Kopczynski; Jignesh Shah
 Subject: Re: [PERFORM] Introducing the TPC-V benchmark, and its
 relationship to PostgreSQL
 
 On 07/03/2012 07:08 PM, Reza Taheri wrote:
  TPC-V is a new benchmark under development for virtualized databases.
  A TPC-V configuration has:
 
  - multiple virtual machines running a mix of DSS, OLTP, and business
  logic apps
 
  - VMs running with throughputs ranging from 10% to 40% of the total
  system ..
 
 I think this would be a lot more interesting to the traditional, dedicated
 hardware part of the PostgreSQL community if there was a clear way to run
 this with only a single active machine too.  If it's possible for us to use 
 this to
 compare instances of PostgreSQL on dedicated hardware, too, that is
 enormously more valuable to people with larger installations.  It might be
 helpful to VMWare as well.  Being able to say this VM install gets X% of the
 performance of a bare-metal install
 answers a question I get asked all the time--when people want to decide
 between dedicated and virtual setups.
 
 The PostgreSQL community could use a benchmark like this for its own
 performance regression testing too.  A lot of that work is going to happen on
 a dedicated machines.
 
   After waving our hands through a number
  of small differences between the platforms, we have calculated a CPU
  cost of around 3.2ms/transaction for the published MS SQL results,
  versus a measurement of 8.6ms/transaction for PostgreSQL. (TPC
  benchmarks are typically pushed to full CPU utilization. One removes
  all bottlenecks in storage, networking, etc., to achieve the 100% CPU
 usage.
  So CPU cost/tran is the final decider of performance.) So we need to
  cut the CPU cost of transactions in half to make publications with
  PostgreSQL comparable to commercial databases.
 
 I appreciate that getting close to parity here is valuable.  This situation 
 is so
 synthetic though--removing other bottlenecks and looking at CPU timing--
 that it's hard to get too excited about optimizing for it.  There's a lot of
 things in PostgreSQL that we know are slower than commercial databases
 because they're optimized for flexibility (the way operators are
 implemented is the best example) or for long-term code maintenance.
 Microsoft doesn't care if they have terribly ugly code that runs faster,
 because no one sees that code.  PostgreSQL does care.
 
 The measure that's more fair is a system cost based ones.  What I've found is
 that a fair number of people note PostgreSQL's low-level code isn't quite as
 fast as some of the less flexible alternatives--hard coding operators is 
 surely
 cheaper than looking them up each time--but the license cost savings more
 than pays for bigger hardware to offset that.  I wish I had any customer
 whose database was CPU bound, that would be an awesome world to live
 in.
 
 Anyway, guessing at causes here is premature speculation. When there's
 some code for the test kit published, at that point discussing the particulars
 of why it's not running well will get interesting.
 
 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Just to be clear, we have a number of people from different companies working 
on the kit. This is not a VMware project, it is a TPC project. But I hear you 
regarding coming in from the cold and asking for a major db engine feature. I 
know that I have caused a lot of rolling eyes. Believe me, I have had the same 
(no, worse!) reaction from every one of the commercial database companies in 
response to similar requests over the past 25 years.

We have our skin in the game, and as long as the community values the benchmark 
and wants to support us, we will figure out the details as we go forward.

Thanks,
Reza

 -Original Message-
 From: Greg Smith [mailto:g...@2ndquadrant.com]
 Sent: Thursday, July 05, 2012 6:42 PM
 To: Reza Taheri
 Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com);
 Greg Kopczynski; Jignesh Shah
 Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
 performance
 
 On 07/03/2012 07:13 PM, Reza Taheri wrote:
  Is the PGSQL community willing to invest in a feature that a) has been
  requested by many others already; and b) can make a huge difference in
  a benchmark that can lend substantial credibility to PGSQL performance?
 
 Larger PostgreSQL features usually get built because companies sponsor
 their development, they pass review as both useful  correct, and then get
 committed.  Asking the community to invest in a new feature isn't quite the
 right concept.  Yes, everyone would like one of the smaller index
 representations.  I'm sure we can find reviewers willing to look at such a
 feature and committers who would also be interested enough to commit it,
 on a volunteer basis.  But a feature this size isn't going to spring to life 
 based
 just on volunteer work.  The most useful questions would be who would
 be capable of writing that feature? and how can we get them sponsored
 to focus on it?  I can tell from your comments yet what role(s) in that
 process VMWare wants to take on internally, and which it's looking for help
 with.  The job of convincing people it's a useful feature isn't necessary--we
 know that's true.
 
 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] The overall experience of TPC-V benchmark team with PostgreSQL

2012-07-05 Thread Reza Taheri
Hello PGSQL fans,
Looking back at my posts the past couple of days and the replies that I've got, 
I realized that I have failed to make one point clear: we are very pleased with 
what we have seen from PostgreSQL so far. Let me explain. At this point of 
developing or porting a benchmark on a new DBMS, the team usually deals with 
stability, scalability, or fundamental performance issues. Our fear was that 
working with an open source DBMS, we'd experience more issues than usual. But 
we got the kit running transactions on PGSQL quickly, and after some early 
tests, I decided to try the kit on a larger testbed (two other folks are the 
developers of the benchmark code; I design, run, and analyze the experiments). 
I have the benchmark running on a 300,000-customer database on a 16-CPU system, 
unusual for this early in the prototyping phase. People who developed TPC-E 
(the father of our benchmark) did their prototyping on commercial databases 
with much smaller databases on smaller systems. On this large testbed, PGSQL 
has been working like a champ, and performance is what I would call decent. Put 
in other words, I have been pleasantly surprised by the throughput I am getting 
out of the system, saturating a 16-way with no visible signs of contention when 
we reduce the database size.

We are developing a reference kit. People are not obligated to use it to 
publish official results. They can use it to kick the tires, then go to one of 
the commercial DBMS vendors and ask for their kit for an official TPC-V 
publication. Even if that's all that people do with the reference kit, our team 
has achieved the goal that the TPC set for us. What I am trying to do is see if 
we can take this to the point that people use PGSQL to publish official results 
and use it in competitive situations. It looks possible, so I'd love to see it 
happen.

Again, overall, our experience with PGSQL has been positive, even in terms of 
performance.

Thanks,
Reza



Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-05 Thread Reza Taheri
Yes, I hear you. TPC's usual mode of operation has been to release details 
after the benchmark is complete. But TPC does have a policy clause that allows 
publication of draft specifications to get public feedback before the benchmark 
is complete. Our 2012 TPC TC paper will have a lot of the high level details. 
We need to see if we can use the draft clause to also release beta versions of 
code.

Thanks,
Reza

 -Original Message-
 From: Craig Ringer [mailto:ring...@ringerc.id.au]
 Sent: Thursday, July 05, 2012 8:41 PM
 To: Reza Taheri
 Cc: Greg Smith; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Introducing the TPC-V benchmark, and its
 relationship to PostgreSQL
 
 On 07/06/2012 11:22 AM, Reza Taheri wrote:
  Hi Greg,
  Yes, a single-instance benchmark is a natural fall-out from the TPC-V kit.
 Our coding team (4 people working directly on the benchmark with another
 3-4 folks helping in various consulting capacities) is tasked with creating a
 multi-VM benchmark. The benchmark is still missing the critical Market
 Exchange Emulator function. Once that's done, it would be natural for
 someone else in the TPC to take our working prototype and simplify it for a
 single-system, TPC-E (not V) reference kit. The conversion is not technically
 difficult, but releasing kits is a new path for the TPC, and will take some
 work.
 Please consider releasing sample versions early in the process, especially as
 such releases are new to the TPC. Giving others the opportunity to
 contribute different skill sets and experiences before everything is locked
 in to the final configuration is important, especially when trying new things.
 
 --
 Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Reza Taheri
Hi Craig,
I used the tool at depesz.com extensively during our early prototyping. It 
helped uncover ~10 problems that we solved by fixing issues in the code, adding 
or changing indexes, etc. Right now, I believe all our query plans look like 
what I would expect.

Yes, you are right, I did miss the link to the index-only scans. From what I 
can tell, it will do exactly what we want, but only as long as the index has 
all the columns in the query. I don't know what percentage of our queries have 
this property. But it does help.

The two main kit developers are out this week. We'll put our heads together 
next week to see what version to use when I switch to a larger testbed I am 
preparing.

Thanks,
Reza

From: Craig Ringer [mailto:ring...@ringerc.id.au]
Sent: Thursday, July 05, 2012 5:46 PM
To: Reza Taheri
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

First: Please do try 9.2 beta if you're upgrading from 8.4. It'll be out as a 
final release soon enough, and index only scans may make a big difference for 
the problem you're currently having.

Looking at your configuration I have a few comments, but it's worth noting that 
I don't work with hardware at that scale, and I'm more used to tuning I/O 
bottlenecked systems with onboard storage rather than CPU-bottlenecked ones on 
big SANs. Hopefully now that you've posted your configuration and setup there 
might be interest from others.

If you're able to post an EXPLAIN ANALYZE or two for a query you feel is slow 
that certainly won't hurt. Using http://explain.depesz.com/ saves you the 
hassle of dealing with word-wrapping when posting them, btw.

As for your config:

I notice that your autovacuum settings are at their defaults. With heavy UPDATE 
/ DELETE load this'll tend to lead to table and index bloat, so the DB has to 
scan more useless data to get what it needs. It also means table stats won't be 
maintained as well, potentially leading to poor planner decisions. The 
following fairly scary query can help identify bloat, as the database server 
doesn't currently have anything much built in to help you spot such issues:

   http://wiki.postgresql.org/wiki/Show_database_bloat

It might be helpful to set effective_cache_size and effective_io_concurrency so 
Pg has more idea of the scale of your hardware. The defaults are very 
conservative - it's supposed to be easy for people to use for simple things 
without melting their systems, and it's expected that anyone doing bigger work 
will tune the database.

http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html

It looks like you've already tweaked many of the critical points for big 
installs - your checkpoint_segments, wal_buffers, shared_buffers, etc. I lack 
the big hardware experience to know if they're appropriate, but they're not the 
extremely conservative defaults, which is a start.

Your random_page_cost and seq_page_cost are probably dead wrong for a SAN with 
RAM and SSD cache in front of fast disks. Their defaults are for local uncached 
spinning HDD media where seeks are expensive. The typical advice on such 
hardware is to set them to something more like seq_page_cost = 0.1  
random_page_cost  = 0.15 - ie cheaper relative to the cpu cost, and with random 
I/O only a little more expensive than sequential I/O. What's right for your 
situation varies a bit based on DB size vs hardware size, etc; Greg discusses 
this more in his book.

What isolation level do your transactions use? This is significant because of 
the move to true serializable isolation with predicate locking in 9.0; it made 
serializable transactions a bit slower in some circumstances in exchange for 
much stronger correctness guarantees. The READ COMMITTED default was unchanged.



It also looks like you might not have seen the second part of my earlier reply:

while PostgreSQL doesn't support covering indexes or clustered indexes at this 
point, 9.2 has added support for index-only scans, which are a half-way point 
of sorts. See:

  http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html
  http://rhaas.blogspot.com.au/2010/11/index-only-scans.html
  
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27

If at all possible please see how your test is affected by this PostgreSQL 9.2 
enhancement. It should make a big difference, and if it doesn't it's important 
to know why.

(CC'd Robert Haas)



As for the invest side - that's really a matter for EnterpriseDB, Command 
Prompt, Red Hat, and the other backers who're employing people to work on the 
DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a good 
explanation of the current state and progress toward clustered indexes.

Some links that may be useful to you are:

  http://wiki.postgresql.org/wiki/Todo
  Things that it'd be good to support/implement at some point. Surprisingly, 
covering/clustered

Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-04 Thread Reza Taheri
Thanks for reply, Craig. As far as publishing a draft, we are planning to do 
something along those lines.

For the schema and the queries, we are pretty much taking those wholesale from 
TPC-E, whose specification is public 
(http://www.tpc.org/tpce/spec/v1.12.0/TPCE-v1.12.0.pdf). The high-level 
differences with TPC-E are detailed in the 2010 and 2012 TPC TC papers I 
mentioned. We will stick closely to the TPC-E schema and queries. Anything new 
means a long specification writing process, which we are trying to avoid. We 
want to get this benchmark out there quickly.

I am not an expert in licensing. What I can tell you is that the kit will be 
available to anyone to download and use with a simple EULA based on existing 
TPC EULAs (although TPC hasn't had a complete end-to-end kit before, it has 
published partial code modules for its benchmarks). We broached the idea of 
open sourcing the kit, but it didn't pan out. The people on the subcommittee 
represent their companies, and different companies have different rules when 
their employees contribute to open source code.  Satisfying the armies of 
lawyers would have been impossible. So the kit won't be open source, but 
readily available for use. It will probably be similar to the licensing for 
SPEC benchmarks if you are familiar with them.

I'll pick up Greg's book. We had been focusing on functionality, but our focus 
will shift to performance soon. To be blunt, the team is very experienced in 
benchmarks and in database performance, but most of us are new to PGSQL.

Thanks,
Reza 

 -Original Message-
 From: Craig Ringer [mailto:ring...@ringerc.id.au]
 Sent: Tuesday, July 03, 2012 10:19 PM
 To: pgsql-performance@postgresql.org
 Cc: Reza Taheri; Andy Bond (ab...@redhat.com); Greg Kopczynski; Jignesh
 Shah; Greg Smith; Dave Page
 Subject: Re: [PERFORM] Introducing the TPC-V benchmark, and its
 relationship to PostgreSQL
 
 On 07/04/2012 07:08 AM, Reza Taheri wrote:
 
  ... so the subcommittee moved forward with developing its own
  reference kit. The reference kit has been developed to run on
  PostgreSQL, and we are focusing our development efforts and testing on
  PostgreSQL.
 That's a very positive step. The TPC seems to me to have a pretty poor
 reputation among open source database users and vendors. I think that's
 largely because the schema and tools are typically very closed and
 restrictively licensed, though the prohibition against publishing benchmarks
 by big commercial vendors doesn't help.
 
 This sounds like a promising change. The TPC benchmarks are really good for
 load-testing and regression testing, so having one that's directly PostgreSQL
 friendly will be a big plus, especially if it is appropriately licensed.
 
 The opportunity to audit the schema, queries, and test setup before the
 tool is finalized would certainly be appealing. What can you publish in draft
 form now?
 
 What license terms does the TPC plan to release the schema, queries, and
 data for TPC-V under?
 
 I've cc'd Greg Smith and Dave Page, both of whom I suspect will be
 interested in this development but could easily miss your message. If you
 haven't read Greg' book PostgreSQL High Performance it's probably a good
 idea to do so.
 
 --
 Craig Ringer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL

2012-07-03 Thread Reza Taheri
 a measurement of 8.6ms/transaction for PostgreSQL. (TPC 
benchmarks are typically pushed to full CPU utilization. One removes all 
bottlenecks in storage, networking, etc., to achieve the 100% CPU usage. So CPU 
cost/tran is the final decider of performance.) So we need to cut the CPU cost 
of transactions in half to make publications with PostgreSQL comparable to 
commercial databases. It is OK to be slower than MS SQL or Oracle. The 
benchmark running PostgreSQL can still be used to compare the performance of 
servers, processors, and especially, hypervisors under a demanding database 
workload. But the slower we are, the less credible we are.

Sorry for the long post. I will follow up with specific questions next.

Thanks,
Reza Taheri