[PERFORM] Postgresql, and ODBC handles
Summary: I am facing a contention problem with ODBC on the client side. strace and perf top show we are serializing over what appears to be accesses to the ODBC statement handle. Contention goes down if I use multiple processes instead of multiple threads within a process. Also, all the threads get the same connection handle number and the same statement handle number. Is there a way to force different connection and statement handles? I have asked this question on the ODBC mailing list, and they suggested it could be something in the postgresql driver. Details: Running the TPCx-V benchmark, we hit a performance bottleneck as the load increases. We have plenty of CPU and disk resources available in our driver VM, client VM, and database backend VM (all with high vCPU counts) on a dedicated server. When we increase the number of threads of execution, not only doesn’t throughput go up, it actually degrades. I am running with 80 threads in one process. When I divide these threads into 5 processes, performance nearly doubles. So, the problem is not in the database backend. Each thread has its own database connection and its own statement handle. Looking more closely at the client, this is what I see in the strace output when everything flows through one process: 17:52:52.762491 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000102> 17:52:52.762635 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = 0 <0.000664> 17:52:52.763540 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.27> 17:52:52.763616 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1) = 1 <0.76> 17:52:52.763738 futex(0x7fae463a9f00, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.16> 17:52:52.763793 futex(0x7fae463a9f00, FUTEX_WAKE_PRIVATE, 1) = 1 <0.10> 17:52:52.763867 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.38> 17:52:52.763982 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1) = 1 <0.37> 17:52:52.764078 futex(0x7fae1820, FUTEX_WAKE_PRIVATE, 1) = 0 <0.10> 17:52:52.764182 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.30> 17:52:52.764264 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1) = 1 <0.75> 17:52:52.764401 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1) = 1 <0.14> 17:52:52.764455 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.11> 17:52:52.764507 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1) = 1 <0.25> 17:52:52.764579 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1) = 1 <0.10> 17:52:52.764821 sendto(227, "\x51\x00\x00\x00\x0b\x43\x4f\x4d\x4d\x49\x54\x00", 12, MSG_NOSIGNAL, NULL, 0) = 12 <0.29> 17:52:52.764911 recvfrom(227, 0x7fae18058760, 4096, 16384, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000107> 17:52:52.765065 poll([{fd=227, events=POLLIN}], 1, 4294967295) = 1 ([{fd=227, revents=POLLIN}]) <0.17> 17:52:52.765185 recvfrom(227, "\x43\x00\x00\x00\x0b\x43\x4f\x4d\x4d\x49\x54\x00\x5a\x00\x00\x00\x05\x49", 4096, MSG_NOSIGNAL, NULL, NULL) = 18 <0.18> 17:52:52.765258 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = 0 <0.000470> 17:52:52.765764 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.52> 17:52:52.765908 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1) = 1 <0.73> 17:52:52.766045 futex(0x7fae351c5100, FUTEX_WAKE_PRIVATE, 1) = 1 <0.40> 17:52:52.766246 futex(0x7fae351c5100, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.26> And perf top shows: 9.89% [kernel] [k] _raw_spin_unlock_irqrestore 4.86% [kernel] [k] finish_task_switch 3.53% [kernel] [k] _raw_spin_lock 3.00% libodbc.so.2.0.0 [.] __check_stmt_from_dbc_v 2.80% [kernel] [k] __do_softirq 2.43% psqlodbcw.so [.] 0x0003b146 1.95% libodbc.so.2.0.0
[PERFORM] Announcing the public availability of the TPCx-V prototype
Hello PGSQL performance community, You might remember me from these past postings: * In 2012, we announced that the TPC was using PostgreSQL in a benchmark for virtualized databases. Unlike the traditional TPC Enterprise benchmarks, which are defined in paper specifications, this Express benchmark comes with a complete end-to-end benchmarking kit, which at this point in time, runs on Linux VMs (we have tested it with PGSQL 9.3 on RHEL 7.1) on the X86 architecture. On that occasion, I was looking for a PGSQL feature similar to MS SQL Server's "clustered indexes" or Oracle's "index-only-tables". (I was told about "index-only scans" which helped but not as much as clustered indexes or IOT) * In 2013, I asked about a performance hit during checkpoints, which was quickly diagnosed as the well-known dirty_background_bytes problem * Last year, I asked about the high rate of transaction aborts due to serialization failures. It turned out that even dbt5, which implements TPC-E, was running into the same issue, and simply resubmitting the transaction was an acceptable solution We are done with the benchmark development, and it has entered TPC's "Public Review" phase. In parallel, the TPC council has approved the public release of a prototype of the benchmark kit to gather more experimental data, and to speed up the approval of the benchmark. The Benchmark Specification, the Benchmark Kit, and the User's Guide are available from tpc.org/tpcx-v. In addition to these components of the benchmark standard, the subcommittee members have developed two other tools to help benchmarkers get a fast start: a downloadable VM in the form of an ovf-format VM template that contains a complete benchmarking environment with all the software already installed and pre-configured; and a PowerCLI script that, in the VMware vSphere environment, allows you to quickly clone a VM template into a large number of benchmark VMs. The review period is open until Thursday, October 15th. The subcommittee intends to resolve the comments from the Formal Review by November 12th, and bring forward a motion to the Council to approve the Benchmark Standard. We would love to get feedback from the PGSQL community. The subcommittee is taking feedback via FogBugz at www.tpc.org/bugdb<http://www.tpc.org/bugdb> under the project "TPC-Virt". Anticipating some of the questions/comments, here are some thoughts: - We wish we had a more representative workload; with multiple kits for multiple hardware architectures; etc. Those are questions for another day. Having an end-to-end database benchmarking kit for X86 virtualization with a workload derived from TPC-E is a pretty good first step. - You don't have to run the full multi-VM configuration of TPCx-V if you are just playing with the kit and not intending to have an audited result. You can run the benchmark on a single database on a single VM (or even a native, un-virtualized server!) This would be very similar to a simple TPC-E config. If we can get feedback based on a single-VM config, we will still be grateful - Having said that, there are enough subtle differences between the TPC-E schema and the TPCx-V schema that running TPCx-V on a single VM doesn't exactly take you to TPC-E. Thanks, Reza Taheri for the TPCx-V subcommittee
Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level
An update: following the recommendations on this list, I ran a number of experiments: - I ran with all foreign keys deleted. There was a 4% drop in the rate of deadlocks/transaction, which went from 0.32 per transaction to 0.31. So we still have pretty much the same failure rate. One interesting side effect was that throughput went up by 9%. So maintaining (a lot of) foreign constraints costs us a lot - I ran with max_pred_locks_per_tran as high as 26,400. No difference - I rebuilt the database with fillfactor=15 for all the tables and indexes that are involved in the transactions that fail. This was to see if the problem is PGSQL upgrading row level locks to page level locks. With a low fillfactor, the chances of two transactions landing on the same page is low. We have around 9 rows per data page instead of the original ~60. (The index pages are more tightly packed). I ran a range of thread counts from 5 to 60 for the threads that issue transactions. The failure rate per transaction dropped to around half for the thread count of 5, but that's misleading since with a fillfactor of15, our database size went up by around 6X, reducing the effectiveness of PGSQL and OS disk caches, resulting in a throughput of around half of what we used to see. So the reduced failure rate is just a result of fewer threads competing for resources. When I run with enough threads to max out the system with fillfactor=15 or 100, I get the same failure rates - In case folks hadn't noticed, Ryan Johnson is getting very similar failure rates with dbt-5. So this isn't a case of our having made a silly mistake in our coding of the app or the stored procedures Above experiments were the easy one. I am now working on rewriting the app code and the 6 stored procedures to see if I can execute the whole transaction in a single stored procedure Thanks, Reza -Original Message- From: Craig Ringer [mailto:cr...@2ndquadrant.com] Sent: Sunday, July 27, 2014 8:58 PM To: Reza Taheri; pgsql-performance@postgresql.org Subject: Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level On 07/26/2014 02:58 AM, Reza Taheri wrote: Hi Craig, According to the attached SQL, each frame is a separate phase in the operation and performs many different operations. There's a *lot* going on here, so identifying possible interdependencies isn't something I can do in a ten minute skim read over my morning coffee. You didn't think I was going to bug you all with a trivial problem, did you? :-) :-) One can hope, but usually in vain... Yes, I am going to have to take an axe to the code and see what pops out. Just to put this in perspective, the transaction flow and its statements are borrowed verbatim from the TPC-E benchmark. There have been dozens of TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that, although not used in public disclosures for various non-technical reasons, are used internally in by the DB and server companies. These 3 products, and perhaps more, were used extensively in the prototyping phase of TPC-E. So, my hope is that if there is a previously unidentified interdependency between transactions as you point out, it will be due to a mistake we made in coding this for PGSQL. Otherwise, we will have a hard time convincing all the council member companies that we need to change the schema or the business logic to make the kit work with PGSQL. Hopefully so. Personally I think it's moderately likely that PostgreSQL's much stricter enforcement of serializable isolation is detecting anomalies that other products do not, so it's potentially preventing errors. It would be nice to have the ability to tune this; sometimes there are anomalies you wish to ignore or permit. At present it is an all or nothing affair - no predicate locking (REPEATABLE READ isolation) or strict predicate locking (SERIALIZABLE isolation). I recommend running some of the examples in the SERIALIZABLE documentation on other products. If they don't fail where they do in Pg, then the other products either have less strict (and arguably therefor less correct) serialisable isolation enforcement or they rely on blocking predicate locks. In the latter case it should be easy to tell because statements will block on locks where no ordinary row or table level lock could be taken. If you do run comparative tests I would be quite interested in seeing the results. -- Craig Ringer https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com /k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjroD2HLPTH U27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=jyJSmq%2BgXoPIgY%2BNtgswlUg zHSm45s%2FmevjxBmPKrIs%3D%0As=401415fb62d6f76b22bc76469cbefb85 8342612f1fea2d359fe2bb3f18cab1ab PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your
[PERFORM] Re: High rate of transaction failure with the Serializable Isolation Level
Hi Ryan, I just noticed that the mail alias manager has stalled the post below because of the attachment size. But you should have gotten it directly. If anyone else is interested in a copy, let me know, and I will forward it Thanks, Reza -Original Message- From: Reza Taheri Sent: Monday, July 28, 2014 8:57 PM To: 'Ryan Johnson' Cc: pgsql-performance@postgresql.org Subject: RE: High rate of transaction failure with the Serializable Isolation Level Hi Ryan, We presented a paper at the TPCTC of last year's VLDB (attached). It described the architecture of the kit, and some of the tuning. Another tuning change was setting /proc/sys/vm/dirty_background_bytes to a small value (like 1000) on very-large memory machines, which was a problem I brought up on this same mailing list a while ago and got great advice. Also, make sure you do a SQLFreeStmt(stmt, SQL_DROP) at the end of transactions, not SQL_CLOSE. Let me know if you have any question about the paper Thanks, Reza -Original Message- From: Ryan Johnson [mailto:ryan.john...@cs.utoronto.ca] Sent: Saturday, July 26, 2014 4:51 PM To: Reza Taheri Cc: pgsql-performance@postgresql.org Subject: Re: High rate of transaction failure with the Serializable Isolation Level That does sound pretty similar, modulo the raw performance difference. I have no idea how many MEE threads there were; it was just a quick run with exactly zero tuning, so I use whatever dbt5 does out of the box. Actually, though, if you have any general tuning tips for TPC-E I'd be interested to learn them (PM if that's off topic for this discussion). Regards, Ryan On 26/07/2014 7:33 PM, Reza Taheri wrote: Hi Ryan, Thanks a lot for sharing this. When I run with 12 CE threads and 3-5 MEE threads (how many MEE threads do you have?) @ 80-90 tps, I get something in the 20-30% of trade-result transactions rolled back depending on how I count. E.g., in a 5.5-minute run with 3 MEE threads, I saw 87.5 tps. There were 29200 successful trade-result transactions. Of these, 5800 were rolled back, some more than once for a total of 8450 rollbacks. So I'd say your results and ours tell similar stories! Thanks, Reza -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Ryan Johnson Sent: Saturday, July 26, 2014 2:06 PM To: Reza Taheri Cc: pgsql-performance@postgresql.org Subject: Re: High rate of transaction failure with the Serializable Isolation Level Dredging through some old run logs, 12 dbt-5 clients gave the following when everything was run under SSI (fully serializable, even the transactions that allow repeatable read isolation). Not sure how that translates to your results. Abort rates were admittedly rather high, though perhaps lower than what you report. Transaction % Average: 90th % Total Rollbacks% Warning Invalid - --- --- --- -- --- --- Trade Result5.568 0.022: 0.0562118417 19.69% 0 91 Broker Volume 5.097 0.009: 0.0141557 0 0.00% 0 0 Customer Position 13.530 0.016: 0.0344134 1 0.02% 0 0 Market Feed 0.547 0.033: 0.065 212 45 21.23% 0 69 Market Watch 18.604 0.031: 0.0615683 0 0.00% 0 0 Security Detail14.462 0.015: 0.0204418 0 0.00% 0 0 Trade Lookup8.325 0.059: 0.1462543 0 0.00% 432 0 Trade Order 9.110 0.006: 0.0083227444 13.76% 0 0 Trade Status 19.795 0.030: 0.0466047 0 0.00% 0 0 Trade Update1.990 0.064: 0.145 608 0 0.00% 432 0 Data Maintenance N/A 0.012: 0.012 1 0 0.00% 0 0 - --- --- --- -- --- --- 28.35 trade-result transactions per second (trtps) Regards, Ryan On 26/07/2014 3:55 PM, Reza Taheri wrote: Hi Ryan, That's a very good point. We are looking at dbt5. One question: what throughput rate, and how many threads of execution did you use for dbt5? The failure rates I reported were at ~120 tps with 15 trade-result threads. Thanks, Reza -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Ryan Johnson Sent: Friday, July 25, 2014 2:36 PM To: pgsql-performance@postgresql.org Subject: Re: High rate of transaction failure with the Serializable Isolation Level On 25/07/2014 2:58 PM, Reza Taheri wrote: Hi Craig, According to the attached
[PERFORM] Re: High rate of transaction failure with the Serializable Isolation Level
Hi Ryan, That's a very good point. We are looking at dbt5. One question: what throughput rate, and how many threads of execution did you use for dbt5? The failure rates I reported were at ~120 tps with 15 trade-result threads. Thanks, Reza -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Ryan Johnson Sent: Friday, July 25, 2014 2:36 PM To: pgsql-performance@postgresql.org Subject: Re: High rate of transaction failure with the Serializable Isolation Level On 25/07/2014 2:58 PM, Reza Taheri wrote: Hi Craig, According to the attached SQL, each frame is a separate phase in the operation and performs many different operations. There's a *lot* going on here, so identifying possible interdependencies isn't something I can do in a ten minute skim read over my morning coffee. You didn't think I was going to bug you all with a trivial problem, did you? :-) :-) Yes, I am going to have to take an axe to the code and see what pops out. Just to put this in perspective, the transaction flow and its statements are borrowed verbatim from the TPC-E benchmark. There have been dozens of TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that, although not used in public disclosures for various non-technical reasons, are used internally in by the DB and server companies. These 3 products, and perhaps more, were used extensively in the prototyping phase of TPC-E. So, my hope is that if there is a previously unidentified interdependency between transactions as you point out, it will be due to a mistake we made in coding this for PGSQL. Otherwise, we will have a hard time convincing all the council member companies that we need to change the schema or the business logic to make the kit work with PGSQL. Just pointing out my uphill battle!! You might compare against dbt-5 [1], just to see if the same problem occurs. I didn't notice such high abort rates when I ran that workload a few weeks ago. Just make sure to use the latest commit, because the released version has fatal bugs. [1] https://urldefense.proofpoint.com/v1/url?u=https://github.com/petergeog hegan/dbt5k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjr oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=6E%2F9fWJPMGjpMyP xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0As=b3f269216d419410f3f07bb 774a27b7d377744c9d423df52a3e62324d9279958 Ryan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m ailpref/pgsql- performancek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CP jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=6E%2F9fWJPMGjpMy PxtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0As=45ab94ce068dbe28956af 8bb3f999e9a91138dd1e3c3345c036e87e902da1ef1 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: High rate of transaction failure with the Serializable Isolation Level
Hi Ryan, Thanks a lot for sharing this. When I run with 12 CE threads and 3-5 MEE threads (how many MEE threads do you have?) @ 80-90 tps, I get something in the 20-30% of trade-result transactions rolled back depending on how I count. E.g., in a 5.5-minute run with 3 MEE threads, I saw 87.5 tps. There were 29200 successful trade-result transactions. Of these, 5800 were rolled back, some more than once for a total of 8450 rollbacks. So I'd say your results and ours tell similar stories! Thanks, Reza -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Ryan Johnson Sent: Saturday, July 26, 2014 2:06 PM To: Reza Taheri Cc: pgsql-performance@postgresql.org Subject: Re: High rate of transaction failure with the Serializable Isolation Level Dredging through some old run logs, 12 dbt-5 clients gave the following when everything was run under SSI (fully serializable, even the transactions that allow repeatable read isolation). Not sure how that translates to your results. Abort rates were admittedly rather high, though perhaps lower than what you report. Transaction % Average: 90th % Total Rollbacks% Warning Invalid - --- --- --- -- --- --- Trade Result5.568 0.022: 0.0562118417 19.69% 0 91 Broker Volume 5.097 0.009: 0.0141557 0 0.00% 0 0 Customer Position 13.530 0.016: 0.0344134 1 0.02% 0 0 Market Feed 0.547 0.033: 0.065 212 45 21.23% 0 69 Market Watch 18.604 0.031: 0.0615683 0 0.00% 0 0 Security Detail14.462 0.015: 0.0204418 0 0.00% 0 0 Trade Lookup8.325 0.059: 0.1462543 0 0.00% 432 0 Trade Order 9.110 0.006: 0.0083227444 13.76% 0 0 Trade Status 19.795 0.030: 0.0466047 0 0.00% 0 0 Trade Update1.990 0.064: 0.145 608 0 0.00% 432 0 Data Maintenance N/A 0.012: 0.012 1 0 0.00% 0 0 - --- --- --- -- --- --- 28.35 trade-result transactions per second (trtps) Regards, Ryan On 26/07/2014 3:55 PM, Reza Taheri wrote: Hi Ryan, That's a very good point. We are looking at dbt5. One question: what throughput rate, and how many threads of execution did you use for dbt5? The failure rates I reported were at ~120 tps with 15 trade-result threads. Thanks, Reza -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Ryan Johnson Sent: Friday, July 25, 2014 2:36 PM To: pgsql-performance@postgresql.org Subject: Re: High rate of transaction failure with the Serializable Isolation Level On 25/07/2014 2:58 PM, Reza Taheri wrote: Hi Craig, According to the attached SQL, each frame is a separate phase in the operation and performs many different operations. There's a *lot* going on here, so identifying possible interdependencies isn't something I can do in a ten minute skim read over my morning coffee. You didn't think I was going to bug you all with a trivial problem, did you? :-) :-) Yes, I am going to have to take an axe to the code and see what pops out. Just to put this in perspective, the transaction flow and its statements are borrowed verbatim from the TPC-E benchmark. There have been dozens of TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that, although not used in public disclosures for various non-technical reasons, are used internally in by the DB and server companies. These 3 products, and perhaps more, were used extensively in the prototyping phase of TPC-E. So, my hope is that if there is a previously unidentified interdependency between transactions as you point out, it will be due to a mistake we made in coding this for PGSQL. Otherwise, we will have a hard time convincing all the council member companies that we need to change the schema or the business logic to make the kit work with PGSQL. Just pointing out my uphill battle!! You might compare against dbt-5 [1], just to see if the same problem occurs. I didn't notice such high abort rates when I ran that workload a few weeks ago. Just make sure to use the latest commit, because the released version has fatal bugs. [1] https://urldefense.proofpoint.com/v1/url?u=https://github.com/peterge og hegan/dbt5k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjr oD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=6E%2F9fWJPMGjpMyP xtY0nsamLLW%2FNsTXu7FP9Wzauj10%3D%0As=b3f269216d419410f3f07bb 774a27b7d377744c9d423df52a3e62324d9279958 Ryan -- Sent
Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level
Hi Craig, According to the attached SQL, each frame is a separate phase in the operation and performs many different operations. There's a *lot* going on here, so identifying possible interdependencies isn't something I can do in a ten minute skim read over my morning coffee. You didn't think I was going to bug you all with a trivial problem, did you? :-) :-) Yes, I am going to have to take an axe to the code and see what pops out. Just to put this in perspective, the transaction flow and its statements are borrowed verbatim from the TPC-E benchmark. There have been dozens of TPC-E disclosures with MS SQL Server, and there are Oracle and DB2 kits that, although not used in public disclosures for various non-technical reasons, are used internally in by the DB and server companies. These 3 products, and perhaps more, were used extensively in the prototyping phase of TPC-E. So, my hope is that if there is a previously unidentified interdependency between transactions as you point out, it will be due to a mistake we made in coding this for PGSQL. Otherwise, we will have a hard time convincing all the council member companies that we need to change the schema or the business logic to make the kit work with PGSQL. Just pointing out my uphill battle!! If there are foreign keys to it from other tables, they count too. Yes, we have a lot of foreign keys. I dropped them all a few weeks ago with no impact. But when I start the axing process, they will be one of the first to go Thanks, Reza -Original Message- From: Craig Ringer [mailto:cr...@2ndquadrant.com] Sent: Thursday, July 24, 2014 6:30 PM To: Reza Taheri; pgsql-performance@postgresql.org Subject: Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level On 07/25/2014 03:50 AM, Reza Taheri wrote: Hi Craig, It's not just that statement that is relevant. Is that statement run standalone, or as part of a larger transaction? Yes, the size of the transaction seems to matter here. It is a complex transaction (attached). Each frame is one stored procedure, and the 6 frames are called one after the other with no pause. After frame6 returns, we call SQLTransact(..., ..., SQL_COMMIT). Below is the failure rate of the various frames: 112 tid 18883: SQL Failed: DoTradeResultFrame3 102 tid 18883: SQL Failed: DoTradeResultFrame4 18188 tid 18883: SQL Failed: DoTradeResultFrame5 8566 tid 18883: SQL Failed: DoTradeResultFrame6 4492 tid 18883: ERROR: TradeResultDB: commit failed So, no failures in frames 1 and 2, and then the failure rate grows as we approach the end of the transaction. According to the attached SQL, each frame is a separate phase in the operation and performs many different operations. There's a *lot* going on here, so identifying possible interdependencies isn't something I can do in a ten minute skim read over my morning coffee. I think the most useful thing to do here is to start cutting and simplifying the case, trying to boil it down to the smallest thing that still causes the problem. That'll likely either find a previously unidentified interdependency between transactions or, if you're unlucky, a Pg bug. Given the complexity of the operations there I'd be very surprised if it wasn't the former. If the INSERTing transaction previously queried for a key that was created by a concurrent transaction this can occur as there is no serialization execution order of the transactions that could produce the same result. As far as the inserts, your point is well-taken. But in this case, I have eliminated the transactions that query or otherwise manipulate the SETTELEMENT table. The only access to it is the single insert in this transaction If there are foreign keys to it from other tables, they count too. -- Craig Ringer https://urldefense.proofpoint.com/v1/url?u=http://www.2ndquadrant.com /k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjroD2HLPTH U27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=SLSpdQUFSC%2BXlQIgotLSghfyEB qC7q8Sh1AEizZ3pBw%3D%0As=ceb740d5d6686cda7ed9dd31b4dce2de0eda 3cf3a46ffead645c5bb6d9e7ec5c PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: High rate of transaction failure with the Serializable Isolation Level
We are hoping the spec will get wrapped up in the next 6 months, but industry standard councils move very slowly! However, if there is interest in getting involved and helping, the TPC might be receptive to earlier access. BTW, just to let folks know how large of an undertaking this has been, this is an estimate of the lines of code developed so far: § 700 lines of run-time shell scripts § 700 lines of build-time shell scripts § 3.5K lines of DDL and DML § 4K lines of C code to test the DML § 22K lines of C, C++, and Java code in the benchmark driver § 45K lines of C++ code in VGen -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Josh Berkus Sent: Friday, July 25, 2014 2:47 PM To: pgsql-performance@postgresql.org Subject: Re: High rate of transaction failure with the Serializable Isolation Level On 07/23/2014 06:18 PM, Reza Taheri wrote: [By way of introduction, we are a TPC subcommittee that is developing a benchmark with cloud-like characteristics for virtualized databases. The end-to-end benchmarking kit will be publicly available, and will run on PGSQL] Awesome! Any idea when it will be available? Our community could really use some updated benchmark tooling ... -- Josh Berkus PostgreSQL Experts Inc. https://urldefense.proofpoint.com/v1/url?u=http://pgexperts.com/k=oIv Rg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjroD2HLPTHU27nI9P Jr8wgKO2rU9QZyZZU%3D%0Am=tao9MSozUNfqQO2rwn3fQPjQbSY7t6i7va Qnrs%2F%2B%2FWI%3D%0As=3cc04ce20eaf8319b9d2727420905b24ad64be b8d283142b321dde138026ed8d -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/m ailpref/pgsql- performancek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CP jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=tao9MSozUNfqQO2r wn3fQPjQbSY7t6i7vaQnrs%2F%2B%2FWI%3D%0As=9c73db2dfd89fef26217 f410046c8557b726bc8c5d0bf53053ea0e33294032f7 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level
Hi Kevin, Thanks for the reply As already pointed out by Craig, statements don't have serialization failures; transactions do. In some cases a transaction may become doomed to fail by the action of a concurrent transaction, but the actual failure cannot occur until the next statement is run on the connection with the doomed transaction; it may have nothing to do with the statement itself. That's an interesting concept. I suppose I could test it by moving statements around to see what happens. Note that I have seen reports of cases where max_pred_locks_per_transaction needed to be set to 20x the default to reduce serialization failures to an acceptable level. I was running with the following two parameters set to 640; I then raised them to 6400, and saw no difference max_locks_per_transaction = 6400 max_pred_locks_per_transaction = 6400 Thanks, Reza -Original Message- From: Kevin Grittner [mailto:kgri...@ymail.com] Sent: Thursday, July 24, 2014 7:03 AM To: Reza Taheri; pgsql-performance@postgresql.org Subject: Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level Reza Taheri rtah...@vmware.com wrote: I am running into very high failure rates when I run with the Serializable Isolation Level. I have simplified our configuration to a single database with a constant workload, a TPC-E workload if you will, to focus on this this problem. We are running with PGSQL 9.2.4 I don't remember any bug fixes that would be directly related to what you describe in the last 15 months, but it might be better to do any testing with fixes for known bugs: https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/su pport/versioning/k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKm A0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2dsUC 9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=6522bd258d0a034429522b 61239134b07f1cabc086e8c2cb330aa9c9bc4a337d When we raise the Trade-Result transaction to SQL_TXN_SERIALIZABLE, we face a storm of conflicts. Out of 37,342 Trade-Result transactions, 15,707 hit an error, and have to be rolled back and retired one or more times. The total failure count (due to many transactions failing more than once) is 31,388. What is unusual is that the majority of the failures occur in a statement that should not have any isolation conflicts. As already pointed out by Craig, statements don't have serialization failures; transactions do. In some cases a transaction may become doomed to fail by the action of a concurrent transaction, but the actual failure cannot occur until the next statement is run on the connection with the doomed transaction; it may have nothing to do with the statement itself. If you want to understand the theory of how SERIALIZABLE transactions are implemented in PostgreSQL, these links may help: https://urldefense.proofpoint.com/v1/url?u=http://vldb.org/pvldb/vol5/p1 850_danrkports_vldb2012.pdfk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0 Ar=b9TKmA0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C 2sdUv2dsUC9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=d1b8cd62c431 c267124c21d4e639c98eebb650caaf8fd05ba47aa825a9b54a52 https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitwe b/?p%3Dpostgresql.git%3Ba%3Dblob_plain%3Bf%3Dsrc/backend/storage/lm gr/README- SSI%3Bhb%3Dmasterk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9T KmA0CPjroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2d sUC9oH2yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=1f60010253b8012dbe5 e5a51af48fcb831dae81200708f620438e6afb48c0eef https://urldefense.proofpoint.com/v1/url?u=http://wiki.postgresql.org/wiki /Serializablek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPj roD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2dsUC9oH2 yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=040078780771088975f2abe3668 5b182ca626557ed2cd1c7241c78b9f417d325 For a more practical set of examples about the differences in using REPEATABLE READ and SERIALIZABLE transaction isolation levels in PostgreSQL, see: https://urldefense.proofpoint.com/v1/url?u=http://wiki.postgresql.org/wiki /SSIk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CPjroD2HLP THU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2dsUC9oH2yzDssdTb CEBF5mbQZbZ871laGw%3D%0As=3c2629d0256b802ed7b701be6bff7443480 5f94beb1c400c772ace91c7204bc5 If you are just interested in reducing the number of serialization failures, see the suggestions near the end of this section of the documentation: https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/do cs/9.2/interactive/transaction-iso.html%23XACT- SERIALIZABLEk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=b9TKmA0CP jroD2HLPTHU27nI9PJr8wgKO2rU9QZyZZU%3D%0Am=H8C2sdUv2dsUC9oH2 yzDssdTbCEBF5mbQZbZ871laGw%3D%0As=ae5349ae3cafcd86c6ba6be9404 990ae800d93d6ccfe892402c2d8d463bd8574 Any of these items (or perhaps a combination of them) may ameliorate the problem. Note that I have seen reports of cases where
[PERFORM] High rate of transaction failure with the Serializable Isolation Level
Hello PGSQL performance community, [By way of introduction, we are a TPC subcommittee that is developing a benchmark with cloud-like characteristics for virtualized databases. The end-to-end benchmarking kit will be publicly available, and will run on PGSQL] I am running into very high failure rates when I run with the Serializable Isolation Level. I have simplified our configuration to a single database with a constant workload, a TPC-E workload if you will, to focus on this this problem. We are running with PGSQL 9.2.4, ODBC 2.2.14 (as well as 2.3.3pre, which didn't help), RHEL 6.4, and a 6-way VM with 96GB of memory on a 4-socket Westmere server. With our 9 transactions running with a mix of SQL_TXN_READ_COMMITTED and SQL_TXN_REPEATABLE_READ, we get less than 1% deadlocks, all of which occur because each row in one table, BROKER, may be read or written by multiple transactions at the same time. So, there are legitimate conflicts, which we deal with using an exponential backoff algorithm that sleeps for 10ms/30ms/90ms/etc. When we raise the Trade-Result transaction to SQL_TXN_SERIALIZABLE, we face a storm of conflicts. Out of 37,342 Trade-Result transactions, 15,707 hit an error, and have to be rolled back and retired one or more times. The total failure count (due to many transactions failing more than once) is 31,388. What is unusual is that the majority of the failures occur in a statement that should not have any isolation conflicts. About 17K of failures are from the statement below: 2014-07-23 11:27:15 PDT 26085 ERROR: could not serialize access due to read/write dependencies among transactions 2014-07-23 11:27:15 PDT 26085 DETAIL: Reason code: Canceled on identification as a pivot, during write. 2014-07-23 11:27:15 PDT 26085 HINT: The transaction might succeed if retried. 2014-07-23 11:27:15 PDT 26085 CONTEXT: SQL statement update TRADE set T_COMM = comm_amount, T_DTS = trade_dts, T_ST_ID = st_completed_id, T_TRADE_PRICE = trade_price where T_ID = trade_id PL/pgSQL function traderesultframe5(ident_t,value_t,character,timestamp without time zone,trade_t,s_price_t) line 15 at SQL statement This doesn't make sense since at any given time, only one transaction might possibly be accessing the row that is being updated. There should be no conflicts if we have row-level locking/isolation The second most common conflict happens 7.6K times in the statement below: 2014-07-23 11:27:23 PDT 26039 ERROR: could not serialize access due to read/write dependencies among transactions 2014-07-23 11:27:23 PDT 26039 DETAIL: Reason code: Canceled on identification as a pivot, during conflict in checking. 2014-07-23 11:27:23 PDT 26039 HINT: The transaction might succeed if retried. 2014-07-23 11:27:23 PDT 26039 CONTEXT: SQL statement insert intoSETTLEMENT (SE_T_ID, SE_CASH_TYPE, SE_CASH_DUE_DATE, SE_AMT) values (trade_id, cash_type, due_date, se_amount ) PL/pgSQL function traderesultframe6(ident_t,timestamp without time zone,character varying,value_t,timestamp without time zone,trade_t,smallint,s_qty_t,character) line 23 at SQL statement I don't understand why an insert would hit a serialization conflict We also have 4.5K conflicts when we try to commit: 2014-07-23 11:27:23 PDT 26037 ERROR: could not serialize access due to read/write dependencies among transactions 2014-07-23 11:27:23 PDT 26037 DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt. 2014-07-23 11:27:23 PDT 26037 HINT: The transaction might succeed if retried. 2014-07-23 11:27:23 PDT 26037 STATEMENT: COMMIT Does PGSQL raise locks to page level when we run with SQL_TXN_SERIALIZABLE? Are there any knobs I can play with to alleviate this? FWIW, the same transactions on MS SQL Server see almost no conflicts. Thanks, Reza
Re: [PERFORM] PGSQL, checkpoints, and file system syncs
Try setting the vm.dirty_bytes sysctl. Something like 256MB might be a good starting point. This comes up fairly often, see e.g.: http://www.postgresql.org/message-id/flat/27C32FD4-0142-44FE-8488- 9f366dc75...@mr-paradox.net - Heikki Thanks, Heikki. That sounds like my problem alright. I will play with these parameters right away, and will report back. Cheers, Reza -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] The need for clustered indexes to boost TPC-V performance
Hi Merlin, We are moving up to a larger testbed, and are planning to use 9.2. But the results will not comparable to our 8.4 results due to differences in hardware. But that comparison is a useful one. I'll try for a quick test on the new hardware with 8.4 before moving to 9.2. Thanks, Reza -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Tuesday, July 10, 2012 12:06 PM To: Reza Taheri Cc: Greg Smith; pgsql-performance@postgresql.org Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance On Thu, Jul 5, 2012 at 10:33 PM, Reza Taheri rtah...@vmware.com wrote: Just to be clear, we have a number of people from different companies working on the kit. This is not a VMware project, it is a TPC project. But I hear you regarding coming in from the cold and asking for a major db engine feature. I know that I have caused a lot of rolling eyes. Believe me, I have had the same (no, worse!) reaction from every one of the commercial database companies in response to similar requests over the past 25 years. No rolling of eyes from me. Clustered indexes work and if your table access mainly hits the table through that index you'll see enormous reductions in i/o. Index only scans naturally are a related optimization in the same vein. Denying that is just silly. BTW, putting postgres through a standard non trivial benchmark suite over reasonable hardware, reporting results, identifying bottlenecks, etc. is incredibly useful. Please keep it up, and don't be afraid to ask for help here. (one thing I'd love to see is side by side results comparing 8.4 to 9.1 to 9.2). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] The need for clustered indexes to boost TPC-V performance
Hi Daniel, Yes, it sounds like GIT will take us half the way there by getting rid of much of the index I/O if we cluster the tables. We can set the fillfactor parameter to keep tables sorted after updates. I am not sure what impact inserts will have since the primary key keeps growing with new inserts, so perhaps the table will maintain the cluster order and the benefits of GIT for new rows, too. GIT won't save CPU cycles the way a clustered/integrated index would, and actually adds to the CPU cost since the data page has to be searched for the desired tuple. Thanks, Reza -Original Message- From: Daniel Farina [mailto:dan...@heroku.com] Sent: Wednesday, July 04, 2012 6:40 AM To: Craig Ringer Cc: Reza Taheri; pgsql-performance@postgresql.org; Robert Haas Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance On Tue, Jul 3, 2012 at 10:43 PM, Craig Ringer ring...@ringerc.id.au wrote: On 07/04/2012 07:13 AM, Reza Taheri wrote: Following the earlier email introducing the TPC-V benchmark, and that we are developing an industry standard benchmarking kit for TPC-V using PostgreSQL, here is a specific performance issue we have run into. Which version of PostgreSQL are you using? How has it been tuned beyond the defaults - autovacuum settings, shared_buffers, effective_cache_size, WAL settings, etc? How much RAM is on the blade? What OS and version are on the blade? Comparing the table sizes, we are close to 2X larger (more on this in a later note). But the index size is what stands out. Our overall index usage (again, after accounting for different numbers of rows) is 4.8X times larger. 35% of our I/Os are to the index space. I am guessing that the 4.8X ballooning has something to do with this, and that in itself explains a lot about our high I/O rate, as well as higher CPU/tran cycles compared to MS SQL (we are 2.5-3 times slower). This is making me wonder about bloat issues and whether proper vacuuming is being done. If the visibility map and free space map aren't maintained by proper vaccum operation everything gets messy, fast. Well, MS SQL used a clustered index for CT, i.e., the data is held in the leaf pages of the index B-Tree. The data and index are in one data structure. Once you lookup the index, you also have the data at zero additional cost. [snip] Is the PGSQL community willing to invest in a feature that a) has been requested by many others already; and b) can make a huge difference in a benchmark that can lend substantial credibility to PGSQL performance? while PostgreSQL doesn't support covering indexes or clustered indexes at this point, 9.2 has added support for index-only scans, which are a half-way point of sorts. See: http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got- em.html http://rhaas.blogspot.com.au/2010/11/index-only-scans.html http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9 337a21f98ac4ce850bb4145acf47ca27 If at all possible please see how your test is affected by this PostgreSQL 9.2 enhancement. It should make a big difference, and if it doesn't it's important to know why. (CC'd Robert Haas) I'm not sure what the best option for getting a 9.2 beta build for Windows is. As for the invest side - that's really a matter for EnterpriseDB, Command Prompt, Red Hat, and the other backers who're employing people to work on the DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a good explanation of the current state and progress toward clustered indexes. Some links that may be useful to you are: http://wiki.postgresql.org/wiki/Todo Things that it'd be good to support/implement at some point. Surprisingly, covering/clustered indexes aren't on there or at least aren't easily found. It's certainly a much-desired feature despite its apparent absence from the TODO. I think there is, deservingly, a lot of hesitation to implement a strictly ordered table construct. A similar feature that didn't quite get finished -- but maybe can be beaten into shape -- is the grouped-index-tuple implementation: http://community.enterprisedb.com/git/ It is mentioned on the TODO page. It's under the category that is perhaps poorly syntactically overloaded in the world cluster. -- fdr -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] The need for clustered indexes to boost TPC-V performance
Hi Robert, Yes, the same concept. Oracle's IOT feature is used often with TPC benchmarks. Thanks, Reza -Original Message- From: Robert Klemme [mailto:shortcut...@googlemail.com] Sent: Thursday, July 05, 2012 5:30 AM To: Reza Taheri Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com); Greg Kopczynski; Jignesh Shah Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance On Wed, Jul 4, 2012 at 1:13 AM, Reza Taheri rtah...@vmware.com wrote: Checking online, the subject of clustered indexes for PostgreSQL comes up often. PGSQL does have a concept called clustered table, which means a table has been organized in the order of an index. This would help with sequential accesses to a table, but has nothing to do with this problem. PGSQL folks sometimes refer to what we want as integrated index. I do understand this correctly that we are speaking about the concept which is known under the term index organized table (IOT) in Oracle land, correct? http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm#CBB JEBIH Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] The need for clustered indexes to boost TPC-V performance
Hi Samuel, The SSDs were used as a cache for the spinning drives. Here is a 30-second iostat sample representative of the whole run: avg-cpu: %user %nice %system %iowait %steal %idle 24.870.00 12.54 62.390.000.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sdd 0.00 137.37 3058.40 106.17 34691.60 974.1322.54 15.754.98 0.32 100.00 sde 0.00 136.07 3063.37 107.70 35267.07 975.0722.86 15.584.92 0.32 100.00 sdf 0.00 135.37 3064.23 109.53 35815.60 979.6023.19 15.824.99 0.32 100.00 sdg 0.00 136.97 3066.57 116.67 35196.53 1014.5322.75 15.874.99 0.31 100.00 sdi 0.00 2011.030.00 87.90 0.00 8395.73 191.03 0.131.45 1.42 12.51 sdk 0.00 136.63 3066.83 107.53 35805.07 976.6723.17 16.015.04 0.32 100.00 sdm 0.00 138.50 3054.40 111.10 34674.27 998.4022.54 15.524.91 0.32 100.00 sdj 0.00 136.73 3058.70 118.20 35227.20 1019.7322.82 15.814.98 0.31 100.00 sdl 0.00 137.53 3044.97 109.33 34448.00 987.4722.47 15.785.00 0.32 100.00 The data and index tablespaces were striped across the 8 LUNs, and saw an average 5ms response. We can beef up the storage to handle more I/Os so that our utilization doesn't stay below 40%, but that misses the point: we have an I/O rate twice the commercial database because they used clustered indexes. I provided more config details in an earlier email. As for asking for development to game a benchmark, no one is asking for benchmark specials. The question of enhancements in response to benchmark needs is an age old question. We can get into that, but it's really a different discussion. Let me just expose the flip side of it: are we willing to watch people use other databases to run benchmarks but feel content that no features were developed specifically in response to benchmark results? I am trying to engage with the community. We can drown the mailing list with details. So I decided to open the discussion with the high level points, and we will give you all the details that you want as we move forward. Thanks, Reza From: Samuel Gendler [mailto:sgend...@ideasculptor.com] Sent: Thursday, July 05, 2012 12:46 PM To: Reza Taheri Cc: Robert Klemme; pgsql-performance@postgresql.org Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance On Thu, Jul 5, 2012 at 12:13 PM, Reza Taheri rtah...@vmware.commailto:rtah...@vmware.com wrote: Hi Robert, Yes, the same concept. Oracle's IOT feature is used often with TPC benchmarks. Reza, it would be very helpful if you were to provide the list with a lot more information about your current software and hardware configuration before coming to the conclusion that the only possible way forward is with a significant architectural change to the db engine itself. Not only is it not at all clear that you are extracting maximum performance from your current hardware and software, but I doubt anyone is particularly interested in doing a bunch of development purely to game a benchmark. There has been significant discussion of the necessity and viability of the feature you are requesting in the past, so you should probably start where those discussions left off rather than starting the discussion all over again from the beginning. Of course, if vmware were to sponsor development of the feature in question, it probably wouldn't require nearly as much buy-in from the wider community. Getting back to the current performance issues - I have little doubt that the MS SQL benchmark was set up and run by people who were intimately familiar with MS SQL performance tuning. You stated in your earlier email that your team doesn't have significant postgresql-specific experience, so it isn't necessarily surprising that your first attempt at tuning didn't get the results that you are looking for. You stated that you have 14 SSDs and 90 spinning drives, but you don't specify how they are combined and how the database is laid out on top of them. There is no mention of how much memory is available to the system. We don't know how you've configured postgresql's memory allocation or how your config weights the relative costs of index lookups, sequential scans, etc. The guidelines for this mailing list include instructions for what information should be provided when asking about performance improvements. http://archives.postgresql.org/pgsql-performance/ Let's start by ascertaining how your benchmark results can be improved without engaging in a significant development effort on the db engine itself.
Re: [PERFORM] The need for clustered indexes to boost TPC-V performance
Well, I keep failing to send an email with an attachment. Do I need a moderator's approval? Yes, running on VMs and a lower bin processor. With the virtualization overhead, etc., I figure we would be running right around 2/3 of the Dell throughput if we were running the same DBMS. I sent the following message twice today with attachments (postgresql.conf, etc.), and it hasn't been posted yet. Here it is without an attachment. From: Reza Taheri Sent: Thursday, July 05, 2012 11:34 AM To: 'Craig Ringer' Cc: pgsql-performance@postgresql.org; Robert Haas Subject: RE: [PERFORM] The need for clustered indexes to boost TPC-V performance OK, some config details. We are using: * Two blades of an HP BladeSystem c-Class c7000 with 2-socket Intel E5520 (Nehalem-EP) processors and 48GB of memory per blade o 8 cores, 16 threads per blade o 48GB of RAM per blade * Storage was an EMC VNX5700 with 14 SSDs fronting 32 15K RPM drives * The Tier B database VM was alone on a blade with 16 vCPUs, 40GB of memory, 4 virtual drives with various RAID levels * The driver and Tier A VMs were on the second blade o So we set PGHOST on the client system to point to the server * RHEL 6.1 * PostgreSQL 8.4 * unixODBC 2.3.2 We stuck with PGSQL 8.4 since it is the stock version shipped with RHEL 6. I am building a new, larger testbed, and will switch to PGSQL 9 with that. Postgres.conf is attached. Thanks, Reza -Original Message- From: Andy Colson [mailto:a...@squeakycode.net] Sent: Thursday, July 05, 2012 5:42 PM To: Samuel Gendler Cc: Reza Taheri; Robert Klemme; pgsql-performance@postgresql.org Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance On 07/05/2012 03:52 PM, Samuel Gendler wrote: On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri rtah...@vmware.com mailto:rtah...@vmware.com wrote: I provided more config details in an earlier email. __ __ I hate to disagree, but unless I didn't get a message sent to the list, you haven't provided any details about your postgresql config or otherwise adhered to the guidelines for starting a discussion of a performance problem around here. I just searched my mailbox and no email from you has any such details. Several people have asked for them, including myself. You say you will give any details we want, but this is at least the 3rd or 4th request for such details and they have not yet been forthcoming. Reza, I went back and looked myself. I see no specs on OS, or hardware unless you mean this: http://bit.ly/QeWXhE. This was run on a similar server, and the database size is close to ours. You're running on windows then? Server is 96Gig ram, 8 cores, (dell poweredge T610). with two powervault MD1120 NAS's? But then I assume you were not running on that, were you. You were running vmware on it, probably? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL
Hi Greg, Yes, a single-instance benchmark is a natural fall-out from the TPC-V kit. Our coding team (4 people working directly on the benchmark with another 3-4 folks helping in various consulting capacities) is tasked with creating a multi-VM benchmark. The benchmark is still missing the critical Market Exchange Emulator function. Once that's done, it would be natural for someone else in the TPC to take our working prototype and simplify it for a single-system, TPC-E (not V) reference kit. The conversion is not technically difficult, but releasing kits is a new path for the TPC, and will take some work. Cheers, Reza -Original Message- From: Greg Smith [mailto:g...@2ndquadrant.com] Sent: Thursday, July 05, 2012 6:25 PM To: Reza Taheri Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com); Greg Kopczynski; Jignesh Shah Subject: Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL On 07/03/2012 07:08 PM, Reza Taheri wrote: TPC-V is a new benchmark under development for virtualized databases. A TPC-V configuration has: - multiple virtual machines running a mix of DSS, OLTP, and business logic apps - VMs running with throughputs ranging from 10% to 40% of the total system .. I think this would be a lot more interesting to the traditional, dedicated hardware part of the PostgreSQL community if there was a clear way to run this with only a single active machine too. If it's possible for us to use this to compare instances of PostgreSQL on dedicated hardware, too, that is enormously more valuable to people with larger installations. It might be helpful to VMWare as well. Being able to say this VM install gets X% of the performance of a bare-metal install answers a question I get asked all the time--when people want to decide between dedicated and virtual setups. The PostgreSQL community could use a benchmark like this for its own performance regression testing too. A lot of that work is going to happen on a dedicated machines. After waving our hands through a number of small differences between the platforms, we have calculated a CPU cost of around 3.2ms/transaction for the published MS SQL results, versus a measurement of 8.6ms/transaction for PostgreSQL. (TPC benchmarks are typically pushed to full CPU utilization. One removes all bottlenecks in storage, networking, etc., to achieve the 100% CPU usage. So CPU cost/tran is the final decider of performance.) So we need to cut the CPU cost of transactions in half to make publications with PostgreSQL comparable to commercial databases. I appreciate that getting close to parity here is valuable. This situation is so synthetic though--removing other bottlenecks and looking at CPU timing-- that it's hard to get too excited about optimizing for it. There's a lot of things in PostgreSQL that we know are slower than commercial databases because they're optimized for flexibility (the way operators are implemented is the best example) or for long-term code maintenance. Microsoft doesn't care if they have terribly ugly code that runs faster, because no one sees that code. PostgreSQL does care. The measure that's more fair is a system cost based ones. What I've found is that a fair number of people note PostgreSQL's low-level code isn't quite as fast as some of the less flexible alternatives--hard coding operators is surely cheaper than looking them up each time--but the license cost savings more than pays for bigger hardware to offset that. I wish I had any customer whose database was CPU bound, that would be an awesome world to live in. Anyway, guessing at causes here is premature speculation. When there's some code for the test kit published, at that point discussing the particulars of why it's not running well will get interesting. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] The need for clustered indexes to boost TPC-V performance
Just to be clear, we have a number of people from different companies working on the kit. This is not a VMware project, it is a TPC project. But I hear you regarding coming in from the cold and asking for a major db engine feature. I know that I have caused a lot of rolling eyes. Believe me, I have had the same (no, worse!) reaction from every one of the commercial database companies in response to similar requests over the past 25 years. We have our skin in the game, and as long as the community values the benchmark and wants to support us, we will figure out the details as we go forward. Thanks, Reza -Original Message- From: Greg Smith [mailto:g...@2ndquadrant.com] Sent: Thursday, July 05, 2012 6:42 PM To: Reza Taheri Cc: pgsql-performance@postgresql.org; Andy Bond (ab...@redhat.com); Greg Kopczynski; Jignesh Shah Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance On 07/03/2012 07:13 PM, Reza Taheri wrote: Is the PGSQL community willing to invest in a feature that a) has been requested by many others already; and b) can make a huge difference in a benchmark that can lend substantial credibility to PGSQL performance? Larger PostgreSQL features usually get built because companies sponsor their development, they pass review as both useful correct, and then get committed. Asking the community to invest in a new feature isn't quite the right concept. Yes, everyone would like one of the smaller index representations. I'm sure we can find reviewers willing to look at such a feature and committers who would also be interested enough to commit it, on a volunteer basis. But a feature this size isn't going to spring to life based just on volunteer work. The most useful questions would be who would be capable of writing that feature? and how can we get them sponsored to focus on it? I can tell from your comments yet what role(s) in that process VMWare wants to take on internally, and which it's looking for help with. The job of convincing people it's a useful feature isn't necessary--we know that's true. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] The overall experience of TPC-V benchmark team with PostgreSQL
Hello PGSQL fans, Looking back at my posts the past couple of days and the replies that I've got, I realized that I have failed to make one point clear: we are very pleased with what we have seen from PostgreSQL so far. Let me explain. At this point of developing or porting a benchmark on a new DBMS, the team usually deals with stability, scalability, or fundamental performance issues. Our fear was that working with an open source DBMS, we'd experience more issues than usual. But we got the kit running transactions on PGSQL quickly, and after some early tests, I decided to try the kit on a larger testbed (two other folks are the developers of the benchmark code; I design, run, and analyze the experiments). I have the benchmark running on a 300,000-customer database on a 16-CPU system, unusual for this early in the prototyping phase. People who developed TPC-E (the father of our benchmark) did their prototyping on commercial databases with much smaller databases on smaller systems. On this large testbed, PGSQL has been working like a champ, and performance is what I would call decent. Put in other words, I have been pleasantly surprised by the throughput I am getting out of the system, saturating a 16-way with no visible signs of contention when we reduce the database size. We are developing a reference kit. People are not obligated to use it to publish official results. They can use it to kick the tires, then go to one of the commercial DBMS vendors and ask for their kit for an official TPC-V publication. Even if that's all that people do with the reference kit, our team has achieved the goal that the TPC set for us. What I am trying to do is see if we can take this to the point that people use PGSQL to publish official results and use it in competitive situations. It looks possible, so I'd love to see it happen. Again, overall, our experience with PGSQL has been positive, even in terms of performance. Thanks, Reza
Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL
Yes, I hear you. TPC's usual mode of operation has been to release details after the benchmark is complete. But TPC does have a policy clause that allows publication of draft specifications to get public feedback before the benchmark is complete. Our 2012 TPC TC paper will have a lot of the high level details. We need to see if we can use the draft clause to also release beta versions of code. Thanks, Reza -Original Message- From: Craig Ringer [mailto:ring...@ringerc.id.au] Sent: Thursday, July 05, 2012 8:41 PM To: Reza Taheri Cc: Greg Smith; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL On 07/06/2012 11:22 AM, Reza Taheri wrote: Hi Greg, Yes, a single-instance benchmark is a natural fall-out from the TPC-V kit. Our coding team (4 people working directly on the benchmark with another 3-4 folks helping in various consulting capacities) is tasked with creating a multi-VM benchmark. The benchmark is still missing the critical Market Exchange Emulator function. Once that's done, it would be natural for someone else in the TPC to take our working prototype and simplify it for a single-system, TPC-E (not V) reference kit. The conversion is not technically difficult, but releasing kits is a new path for the TPC, and will take some work. Please consider releasing sample versions early in the process, especially as such releases are new to the TPC. Giving others the opportunity to contribute different skill sets and experiences before everything is locked in to the final configuration is important, especially when trying new things. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] The need for clustered indexes to boost TPC-V performance
Hi Craig, I used the tool at depesz.com extensively during our early prototyping. It helped uncover ~10 problems that we solved by fixing issues in the code, adding or changing indexes, etc. Right now, I believe all our query plans look like what I would expect. Yes, you are right, I did miss the link to the index-only scans. From what I can tell, it will do exactly what we want, but only as long as the index has all the columns in the query. I don't know what percentage of our queries have this property. But it does help. The two main kit developers are out this week. We'll put our heads together next week to see what version to use when I switch to a larger testbed I am preparing. Thanks, Reza From: Craig Ringer [mailto:ring...@ringerc.id.au] Sent: Thursday, July 05, 2012 5:46 PM To: Reza Taheri Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V performance First: Please do try 9.2 beta if you're upgrading from 8.4. It'll be out as a final release soon enough, and index only scans may make a big difference for the problem you're currently having. Looking at your configuration I have a few comments, but it's worth noting that I don't work with hardware at that scale, and I'm more used to tuning I/O bottlenecked systems with onboard storage rather than CPU-bottlenecked ones on big SANs. Hopefully now that you've posted your configuration and setup there might be interest from others. If you're able to post an EXPLAIN ANALYZE or two for a query you feel is slow that certainly won't hurt. Using http://explain.depesz.com/ saves you the hassle of dealing with word-wrapping when posting them, btw. As for your config: I notice that your autovacuum settings are at their defaults. With heavy UPDATE / DELETE load this'll tend to lead to table and index bloat, so the DB has to scan more useless data to get what it needs. It also means table stats won't be maintained as well, potentially leading to poor planner decisions. The following fairly scary query can help identify bloat, as the database server doesn't currently have anything much built in to help you spot such issues: http://wiki.postgresql.org/wiki/Show_database_bloat It might be helpful to set effective_cache_size and effective_io_concurrency so Pg has more idea of the scale of your hardware. The defaults are very conservative - it's supposed to be easy for people to use for simple things without melting their systems, and it's expected that anyone doing bigger work will tune the database. http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html It looks like you've already tweaked many of the critical points for big installs - your checkpoint_segments, wal_buffers, shared_buffers, etc. I lack the big hardware experience to know if they're appropriate, but they're not the extremely conservative defaults, which is a start. Your random_page_cost and seq_page_cost are probably dead wrong for a SAN with RAM and SSD cache in front of fast disks. Their defaults are for local uncached spinning HDD media where seeks are expensive. The typical advice on such hardware is to set them to something more like seq_page_cost = 0.1 random_page_cost = 0.15 - ie cheaper relative to the cpu cost, and with random I/O only a little more expensive than sequential I/O. What's right for your situation varies a bit based on DB size vs hardware size, etc; Greg discusses this more in his book. What isolation level do your transactions use? This is significant because of the move to true serializable isolation with predicate locking in 9.0; it made serializable transactions a bit slower in some circumstances in exchange for much stronger correctness guarantees. The READ COMMITTED default was unchanged. It also looks like you might not have seen the second part of my earlier reply: while PostgreSQL doesn't support covering indexes or clustered indexes at this point, 9.2 has added support for index-only scans, which are a half-way point of sorts. See: http://rhaas.blogspot.com.au/2011/10/index-only-scans-weve-got-em.html http://rhaas.blogspot.com.au/2010/11/index-only-scans.html http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2822fb9337a21f98ac4ce850bb4145acf47ca27 If at all possible please see how your test is affected by this PostgreSQL 9.2 enhancement. It should make a big difference, and if it doesn't it's important to know why. (CC'd Robert Haas) As for the invest side - that's really a matter for EnterpriseDB, Command Prompt, Red Hat, and the other backers who're employing people to work on the DB. Consider asking on pgsql-hackers, too; if nothing else you'll get a good explanation of the current state and progress toward clustered indexes. Some links that may be useful to you are: http://wiki.postgresql.org/wiki/Todo Things that it'd be good to support/implement at some point. Surprisingly, covering/clustered
Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL
Thanks for reply, Craig. As far as publishing a draft, we are planning to do something along those lines. For the schema and the queries, we are pretty much taking those wholesale from TPC-E, whose specification is public (http://www.tpc.org/tpce/spec/v1.12.0/TPCE-v1.12.0.pdf). The high-level differences with TPC-E are detailed in the 2010 and 2012 TPC TC papers I mentioned. We will stick closely to the TPC-E schema and queries. Anything new means a long specification writing process, which we are trying to avoid. We want to get this benchmark out there quickly. I am not an expert in licensing. What I can tell you is that the kit will be available to anyone to download and use with a simple EULA based on existing TPC EULAs (although TPC hasn't had a complete end-to-end kit before, it has published partial code modules for its benchmarks). We broached the idea of open sourcing the kit, but it didn't pan out. The people on the subcommittee represent their companies, and different companies have different rules when their employees contribute to open source code. Satisfying the armies of lawyers would have been impossible. So the kit won't be open source, but readily available for use. It will probably be similar to the licensing for SPEC benchmarks if you are familiar with them. I'll pick up Greg's book. We had been focusing on functionality, but our focus will shift to performance soon. To be blunt, the team is very experienced in benchmarks and in database performance, but most of us are new to PGSQL. Thanks, Reza -Original Message- From: Craig Ringer [mailto:ring...@ringerc.id.au] Sent: Tuesday, July 03, 2012 10:19 PM To: pgsql-performance@postgresql.org Cc: Reza Taheri; Andy Bond (ab...@redhat.com); Greg Kopczynski; Jignesh Shah; Greg Smith; Dave Page Subject: Re: [PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL On 07/04/2012 07:08 AM, Reza Taheri wrote: ... so the subcommittee moved forward with developing its own reference kit. The reference kit has been developed to run on PostgreSQL, and we are focusing our development efforts and testing on PostgreSQL. That's a very positive step. The TPC seems to me to have a pretty poor reputation among open source database users and vendors. I think that's largely because the schema and tools are typically very closed and restrictively licensed, though the prohibition against publishing benchmarks by big commercial vendors doesn't help. This sounds like a promising change. The TPC benchmarks are really good for load-testing and regression testing, so having one that's directly PostgreSQL friendly will be a big plus, especially if it is appropriately licensed. The opportunity to audit the schema, queries, and test setup before the tool is finalized would certainly be appealing. What can you publish in draft form now? What license terms does the TPC plan to release the schema, queries, and data for TPC-V under? I've cc'd Greg Smith and Dave Page, both of whom I suspect will be interested in this development but could easily miss your message. If you haven't read Greg' book PostgreSQL High Performance it's probably a good idea to do so. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Introducing the TPC-V benchmark, and its relationship to PostgreSQL
a measurement of 8.6ms/transaction for PostgreSQL. (TPC benchmarks are typically pushed to full CPU utilization. One removes all bottlenecks in storage, networking, etc., to achieve the 100% CPU usage. So CPU cost/tran is the final decider of performance.) So we need to cut the CPU cost of transactions in half to make publications with PostgreSQL comparable to commercial databases. It is OK to be slower than MS SQL or Oracle. The benchmark running PostgreSQL can still be used to compare the performance of servers, processors, and especially, hypervisors under a demanding database workload. But the slower we are, the less credible we are. Sorry for the long post. I will follow up with specific questions next. Thanks, Reza Taheri