Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 04/28/2017 01:34 AM, Andres Freund wrote: On 2017-04-28 01:29:14 +0200, Tomas Vondra wrote: I can confirm this observation. I bought the Intel 750 NVMe SSD last year, the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of sustained O_DIRECT sequential writes. But when running pgbench, I can't push more than ~300MB/s of WAL to it, no matter what I do because of WALWriteLock. Hm, interesting. Even if you up wal_buffers to 128MB, use synchronous_commit = off, and play with wal_writer_delay/flush_after? I think I've tried things like that, but let me do some proper testing. I'll report the numbers in a few days. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 2017-04-28 01:29:14 +0200, Tomas Vondra wrote: > I can confirm this observation. I bought the Intel 750 NVMe SSD last year, > the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of > sustained O_DIRECT sequential writes. But when running pgbench, I can't push > more than ~300MB/s of WAL to it, no matter what I do because of > WALWriteLock. Hm, interesting. Even if you up wal_buffers to 128MB, use synchronous_commit = off, and play with wal_writer_delay/flush_after? - Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 04/27/2017 07:35 PM, Andres Freund wrote: On 2017-04-27 10:29:48 -0700, Joshua D. Drake wrote: On 04/27/2017 09:34 AM, Andres Freund wrote: On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: On 04/27/2017 08:59 AM, Andres Freund wrote: I would agree it isn't yet a widespread issue. I'm not yet sure about that actually. I suspect a large percentage of people with such workloads aren't lingering lots on the lists. That would probably be true. I was thinking of it more as the "most new users are in the cloud" and the "cloud" is going to be rare that a cloud user is going to be able to hit that level of writes. (at least not without spending LOTS of money) You can get pretty decent NVMe SSD drives on serveral cloud providers these days, without immediately bancrupting you. Sure, it's instance storage, but with a decent replication and archival setup, that's not necessarily an issue. It's not that hard to get to the point where postgres can't keep up with storage, at least for some workloads. I can confirm this observation. I bought the Intel 750 NVMe SSD last year, the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of sustained O_DIRECT sequential writes. But when running pgbench, I can't push more than ~300MB/s of WAL to it, no matter what I do because of WALWriteLock. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 04/27/2017 06:34 PM, Andres Freund wrote: On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: On 04/27/2017 08:59 AM, Andres Freund wrote: Ok, based on the, few, answers I've got so far, my experience is indeed skewed. A number of the PG users I interacted with over the last couple years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s (max I'veseen). At that point WAL insertion became a major bottleneck, even if storage was more than fast enough to keep up. To address these we'd need some changes, but the feedback so far suggest that it's not yet a widespread issue... I would agree it isn't yet a widespread issue. I'm not yet sure about that actually. I suspect a large percentage of people with such workloads aren't lingering lots on the lists. To a certain extent, this is a self-fulfilling prophecy. If you know you'll have such a busy system, you probably do some research and testing first, before choosing the database. If we don't perform well enough, you pick something else. Which removes the data point. Obviously, there are systems that start small and get busier and busier over time. And those are the ones we see. cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
Hi, On 04/25/2017 06:17 AM, Andres Freund wrote: Hi, I've lately seen more and more installations where the generation of write-ahead-log (WAL) is one of the primary bottlenecks. I'm curious whether that's primarily a "sampling error" of mine, or whether that's indeed more common. I see those cases too. To some degree it's a sampling bias. People generally don't call us to look at the 99% of systems that perform fine, so we tend to see the the 1% of systems under pressure. That doesn't make that observation irrelevant, though. Those demanding systems are one of the things that pushes us forward. > The primary reason I'm curious is that I'm pondering a few potential optimizations, and would like to have some guidance which are more and which are less important. I think any optimization you do will improve at least some of those busy systems. Questions (answer as many you can comfortably answer): - How many MB/s, segments/s do you see on busier servers? That depends on the cause (see the next point). - What generates the bulk of WAL on your servers (9.5+ can use pg_xlogdump --stats to compute that)? a) systems doing large batches - bulk loads/updates/deletes, one or few sessions doing a lot - easily high hundreds of MB/s (on a separate device) b) OLTP systems doing a lot of tiny/small transactions - many concurrent sessions - often end up much more limited by WAL, due to locking etc. - often the trouble is random updates all over the place, causing amplification due to FPI (PK on UUID is a great way to cause this unnecessarily even on apps with naturally tiny working set) - Are you seeing WAL writes being a bottleneck?OA On the write-intensive systems, yes. Often the CPUs are waiting for WAL I/O to complete during COMMIT. - What kind of backup methods are you using and is the WAL volume a problem? The large and busy systems can easily produce so much WAL, that the basebackup is not the largest part of the backup. That is somewhat solvable by using other means of obtaining the basebackup snapshot (e.g. by taking some sort of filesystem / SAN / ... snapshot). That reduces the amount of WAL needed to make the basebackup consistent, but it doesn't solve the WAL archiving issue. - What kind of replication are you using and is the WAL volume a problem? Generally streaming replication, and yes, the amount of WAL may be an issue, partially because the standby is a single-process thing. And as it has to process something generated by N sessions on the primary, that can't end well. Interestingly enough, FPIs can actually make it way faster, because the standby does not need to read the data from disk during recovery. - What are your settings for wal_compression, max_wal_size (9.5+) / checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? I'd say the usual practice is to tune for timed checkpoints, say 30+ minutes apart (or more). wal_compression is typically 'off' (i.e. the default value). - Could you quickly describe your workload? Pretty much a little bit of everything, depending on the customer. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 2017-04-27 10:29:48 -0700, Joshua D. Drake wrote: > On 04/27/2017 09:34 AM, Andres Freund wrote: > > On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: > > > On 04/27/2017 08:59 AM, Andres Freund wrote: > > > > > > > I would agree it isn't yet a widespread issue. > > > > I'm not yet sure about that actually. I suspect a large percentage of > > people with such workloads aren't lingering lots on the lists. > > That would probably be true. I was thinking of it more as the "most new > users are in the cloud" and the "cloud" is going to be rare that a cloud > user is going to be able to hit that level of writes. (at least not without > spending LOTS of money) You can get pretty decent NVMe SSD drives on serveral cloud providers these days, without immediately bancrupting you. Sure, it's instance storage, but with a decent replication and archival setup, that's not necessarily an issue. It's not that hard to get to the point where postgres can't keep up with storage, at least for some workloads. - Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 04/27/2017 09:34 AM, Andres Freund wrote: On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: On 04/27/2017 08:59 AM, Andres Freund wrote: I would agree it isn't yet a widespread issue. I'm not yet sure about that actually. I suspect a large percentage of people with such workloads aren't lingering lots on the lists. That would probably be true. I was thinking of it more as the "most new users are in the cloud" and the "cloud" is going to be rare that a cloud user is going to be able to hit that level of writes. (at least not without spending LOTS of money) The only people that are likely going to see this are going to be on bare metal. We should definitely plan on that issue for say 11. "plan on that issue" - heh. We're talking about major engineering projects here ;) Sorry, wasn't trying to make light of the effort. :D I do have a question though, where you have seen this issue is it with synchronous_commit on or off? Both. Whether that matters or not really depends on the workload. If you have bulk writes, it doesn't really matter much. Sure, o.k. Thanks, Andres - Andres -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: > On 04/27/2017 08:59 AM, Andres Freund wrote: > > > > > Ok, based on the, few, answers I've got so far, my experience is indeed > > skewed. A number of the PG users I interacted with over the last couple > > years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s > > (max I'veseen). At that point WAL insertion became a major bottleneck, > > even if storage was more than fast enough to keep up. To address these > > we'd need some changes, but the feedback so far suggest that it's not > > yet a widespread issue... > > I would agree it isn't yet a widespread issue. I'm not yet sure about that actually. I suspect a large percentage of people with such workloads aren't lingering lots on the lists. > The only people that are likely going to see this are going to be on bare > metal. We should definitely plan on that issue for say 11. "plan on that issue" - heh. We're talking about major engineering projects here ;) > I do have a question though, where you have seen this issue is it with > synchronous_commit on or off? Both. Whether that matters or not really depends on the workload. If you have bulk writes, it doesn't really matter much. - Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 04/27/2017 08:59 AM, Andres Freund wrote: Ok, based on the, few, answers I've got so far, my experience is indeed skewed. A number of the PG users I interacted with over the last couple years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s (max I'veseen). At that point WAL insertion became a major bottleneck, even if storage was more than fast enough to keep up. To address these we'd need some changes, but the feedback so far suggest that it's not yet a widespread issue... I would agree it isn't yet a widespread issue. The only people that are likely going to see this are going to be on bare metal. We should definitely plan on that issue for say 11. I do have a question though, where you have seen this issue is it with synchronous_commit on or off? Thanks, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
Hi, On 2017-04-24 21:17:43 -0700, Andres Freund wrote: > I've lately seen more and more installations where the generation of > write-ahead-log (WAL) is one of the primary bottlenecks. I'm curious > whether that's primarily a "sampling error" of mine, or whether that's > indeed more common. > > The primary reason I'm curious is that I'm pondering a few potential > optimizations, and would like to have some guidance which are more and > which are less important. > > Questions (answer as many you can comfortably answer): > - How many MB/s, segments/s do you see on busier servers? > - What generates the bulk of WAL on your servers (9.5+ can use > pg_xlogdump --stats to compute that)? > - Are you seeing WAL writes being a bottleneck?OA > - What kind of backup methods are you using and is the WAL volume a > problem? > - What kind of replication are you using and is the WAL volume a > problem? > - What are your settings for wal_compression, max_wal_size (9.5+) / > checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? > - Could you quickly describe your workload? Ok, based on the, few, answers I've got so far, my experience is indeed skewed. A number of the PG users I interacted with over the last couple years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s (max I'veseen). At that point WAL insertion became a major bottleneck, even if storage was more than fast enough to keep up. To address these we'd need some changes, but the feedback so far suggest that it's not yet a widespread issue... - Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On Tue, Apr 25, 2017 at 1:17 AM, Andres Freundwrote: > Questions (answer as many you can comfortably answer): > - How many MB/s, segments/s do you see on busier servers? ~20MB/s with FPW compression, with peaks of ~35MB/s. Writes become the bottleneck without compression and it tops at about 40-50MB/s, WAL archiving cannot keep up beyond that point. > - What generates the bulk of WAL on your servers (9.5+ can use > pg_xlogdump --stats to compute that)? Type N (%) Record size (%) FPI size (%)Combined size (%) - --- --- --- --- - --- XLOG 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) Transaction 30 ( 0.00) 960 ( 0.00)0 ( 0.00) 960 ( 0.00) Storage0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) CLOG 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) Database 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) Tablespace 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) MultiXact110 ( 0.01) 7456 ( 0.02)0 ( 0.00) 7456 ( 0.00) RelMap 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) Standby2 ( 0.00) 368 ( 0.00)0 ( 0.00) 368 ( 0.00) Heap2 2521 ( 0.22) 78752 ( 0.24) 4656133 ( 2.82) 4734885 ( 2.39) Heap 539419 ( 46.52) 15646903 ( 47.14) 98720258 ( 59.87)114367161 ( 57.73) Btree 606573 ( 52.31) 15872182 ( 47.82) 57514798 ( 34.88) 73386980 ( 37.05) Hash 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) Gin 2866 ( 0.25) 134330 ( 0.40) 4012251 ( 2.43) 4146581 ( 2.09) Gist 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) Sequence7970 ( 0.69) 1450540 ( 4.37)0 ( 0.00) 1450540 ( 0.73) SPGist 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) BRIN 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) CommitTs 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) ReplicationOrigin 0 ( 0.00) 0 ( 0.00)0 ( 0.00)0 ( 0.00) Total1159491 33191491 [16.76%]164903440 [83.24%]198094931 [100%] > - Are you seeing WAL writes being a bottleneck?OA Sometimes, more so without FPW compression > - What kind of backup methods are you using and is the WAL volume a > problem? > - What kind of replication are you using and is the WAL volume a > problem? Streaming to hot standby + WAL archiving, delayed standby as backup and PITR. Backups are regular filesystem-level snapshots of the delayed standby (with postgres down to get consistent snapshots). WAL volume getting full during periods where the hot standby lags behind (or when we have to stop it to create consistent snapshots) are an issue indeed, and we've had to provision significant storage to be able to absorb those peaks (1TB of WAL) We bundle WAL segments into groups of 256 segments for archiving and recovery to minimize the impact of TCP slow start. We further gzip segments before transfer with pigz, and we use mostly rsync (with a wrapper script that takes care of durability and error handling) to move segments around. Getting the archive/recovery scripts to handle the load hasn't been trivial. > - What are your settings for wal_compression, max_wal_size (9.5+) / > checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? wal_compression = on max_wal_size = 12GB min_wal_size = 2GB checkpoint_timeout = 30min
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
Hi Andres. > 25 апр. 2017 г., в 7:17, Andres Freundнаписал(а): > > Hi, > > I've lately seen more and more installations where the generation of > write-ahead-log (WAL) is one of the primary bottlenecks. I'm curious > whether that's primarily a "sampling error" of mine, or whether that's > indeed more common. > > The primary reason I'm curious is that I'm pondering a few potential > optimizations, and would like to have some guidance which are more and > which are less important. > > Questions (answer as many you can comfortably answer): > - How many MB/s, segments/s do you see on busier servers? Nearly one WAL (16 MB) per second most of the time and 3 WALs per second in the beginning of checkpoint (due to full_page_writes). > - What generates the bulk of WAL on your servers (9.5+ can use > pg_xlogdump --stats to compute that)? Here is the output from a couple of our masters (and that is actually two hours before peak load): $ pg_xlogdump --stats 000100012B280089 000100012B300088 | fgrep -v 0.00 Type N (%) Record size (%) FPI size (%)Combined size (%) - --- --- --- ---- --- Heap2 55820638 ( 21.31) 1730485085 ( 22.27) 1385795249 ( 13.28) 3116280334 ( 17.12) Heap74366993 ( 28.39) 2288644932 ( 29.46) 5880717650 ( 56.34) 8169362582 ( 44.87) Btree 84655827 ( 32.32) 2243526276 ( 28.88) 3170518879 ( 30.38) 5414045155 ( 29.74) Total 2619337907769663301 [42.67%] 10437031778 [57.33%] 18206695079 [100%] $ $ pg_xlogdump --stats 0001D17F00A5 0001D1910004 | fgrep -v 0.00 Type N (%) Record size (%) FPI size (%)Combined size (%) - --- --- --- ---- --- Heap2 13676881 ( 18.95)422289539 ( 19.97) 15319927851 ( 25.63) 15742217390 ( 25.44) Heap22284283 ( 30.88)715293050 ( 33.83) 17119265188 ( 28.64) 17834558238 ( 28.82) Btree 27640155 ( 38.30)725674896 ( 34.32) 19244109632 ( 32.19) 19969784528 ( 32.27) Gin 6580760 ( 9.12)172246586 ( 8.15) 8091332009 ( 13.54) 8263578595 ( 13.35) Total 721729832114133847 [3.42%] 59774634680 [96.58%] 61888768527 [100%] $ > - Are you seeing WAL writes being a bottleneck?OA We do sometimes see WALWriteLock in pg_stat_activity.wait_event, but not too often. > - What kind of backup methods are you using and is the WAL volume a > problem? We use fork of barman project. In most cases that’s not a problem. > - What kind of replication are you using and is the WAL volume a > problem? Physical streaming replication. We used to have problems with network bandwidth (1 Gbit/s was consumed by transferring WAL to two replicas and one archive) but that became better after 1. upgrading to 9.5 and turning wal_compression on, 2. changing archive command to doing parallel compression and sending WALs to archive, 3. increasing checkpoint_timeout. > - What are your settings for wal_compression, max_wal_size (9.5+) / > checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? xdb301e/postgres M # SELECT name, current_setting(name) FROM pg_settings WHERE name IN ('max_wal_size', 'checkpoint_timeout', 'wal_compression', 'wal_buffers'); name| current_setting +- checkpoint_timeout | 1h max_wal_size | 128GB wal_buffers| 16MB wal_compression| on (4 rows) Time: 0.938 ms xdb301e/postgres M # > - Could you quickly describe your workload? OLTP workload with 80% reads and 20% writes. > > Feel free to add any information you think is pertinent ;) Well, we actually workarounded issues with WAL write rate by increasing checkpoint_timeout to maximum possible (in 9.6 it can be even more). The downside of this change is recovery time. Thanks postgres
[PERFORM] Questionaire: Common WAL write rates on busy servers.
Hi, I've lately seen more and more installations where the generation of write-ahead-log (WAL) is one of the primary bottlenecks. I'm curious whether that's primarily a "sampling error" of mine, or whether that's indeed more common. The primary reason I'm curious is that I'm pondering a few potential optimizations, and would like to have some guidance which are more and which are less important. Questions (answer as many you can comfortably answer): - How many MB/s, segments/s do you see on busier servers? - What generates the bulk of WAL on your servers (9.5+ can use pg_xlogdump --stats to compute that)? - Are you seeing WAL writes being a bottleneck?OA - What kind of backup methods are you using and is the WAL volume a problem? - What kind of replication are you using and is the WAL volume a problem? - What are your settings for wal_compression, max_wal_size (9.5+) / checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? - Could you quickly describe your workload? Feel free to add any information you think is pertinent ;) Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance