Re: [PERFORM] Slow execution of SET ROLE, SET search_path and RESET ROLE
On 2017-11-07 18:48:14 +0100, Ulf Lohbrügge wrote: > Hi, > > 2017-11-07 16:11 GMT+01:00 Andres Freund <and...@anarazel.de>: > > > Hi, > > > > On 2017-11-07 11:11:36 +0100, Ulf Lohbrügge wrote: > > > I'm using PostgreSQL 9.5.9 on Debian and experience slow execution of > > some > > > basic SET statements. > > > > > > I created about 1600 roles and use that setup for a multi tenancy > > > application: > > > > Hm. How often do you drop/create these roles? How many other > > roles/groups is one role a member of? > > > > I create between 10-40 roles per day. Could you VACUUM (VERBOSE, FREEZE) that table and report the output? Do you ever delete roles? > > Can you manually reproduce the problem? What times do you get if you > > manually run the statement? > > > > Unfortunately not. Every time I manually execute "SET ROLE ..." the > statement is pretty fast. I created a simple SQL file that contains the > following statements: > > --snip-- > SET ROLE tenant382; > SET ROLE tenant1337; > SET ROLE tenant2; > -- repeat the lines above 100k times > --snap-- > > When I execute those statements via 'time psql < set-roles.sql', the call > lasts 138,7 seconds. So 300k "SET ROLE" statements result in 0,46ms per > call on average. And most of that is going to be roundtrip time. Hm. Could it be that you're just seeing the delays when pgbouncer establishes new pooling connections and you're attributing that to SET ROLE in your app? Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow execution of SET ROLE, SET search_path and RESET ROLE
Hi, On 2017-11-07 11:11:36 +0100, Ulf Lohbrügge wrote: > I'm using PostgreSQL 9.5.9 on Debian and experience slow execution of some > basic SET statements. > > I created about 1600 roles and use that setup for a multi tenancy > application: Hm. How often do you drop/create these roles? How many other roles/groups is one role a member of? > My application solely uses the role 'admin' to connect to the database. > When performing sql statements for a specific tenant (e.g. tenant1337), a > connection with user 'admin' is established and the following commands are > executed: > > SET ROLE 1337; > SET search_path = tenant1337; > > Then the application uses that connection to perform various statements in > the database. Just to be sure: You realize bad application code could escape from that, right? > My application is a web service that approximately executes some hundred > statements per second. > > I set "log_min_duration_statement = 200ms" and I get about 2k to 4k lines > per day with statements like "SET ROLE"", "SET search_path ..." and "RESET > ROLE": > > --snip-- > 2017-11-07 09:44:30 CET [27760]: [3-1] user=admin,db=mydb LOG: duration: > 901.591 ms execute : SET ROLE "tenant762" > 2017-11-07 09:44:30 CET [27659]: [4-1] user=admin,db=mydb LOG: duration: > 1803.971 ms execute : SET ROLE "tenant392" That is weird. > Besides those peaks in statement duration, my application performs (i.e. > has acceptable response times) most of the time. > > Is there anything I can do to improve performance here? > Any help is greatly appreciated! Can you manually reproduce the problem? What times do you get if you manually run the statement? Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query of partitioned object doesnt use index in qa
On September 15, 2017 1:42:23 PM PDT, Tom Lanewrote: >One thing you could consider doing about this is creating an index >on (body ->> 'SID'::text), which would prompt ANALYZE to gather >statistics >about that expression. Even if the index weren't actually used in the >plan, this might improve the estimates and the resulting planning >choices >enough to make it worth maintaining such an index. I'm wondering if we should extend the new CREATE STATISTICS framework to be able to do that without requiring an index. I.e. allow expressions and add a new type of stats that just correspond to what normal columns have. Could even create that implicitly for expression indexes, but allow to drop it, if the overtrading isn't worth it. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 2017-04-28 01:29:14 +0200, Tomas Vondra wrote: > I can confirm this observation. I bought the Intel 750 NVMe SSD last year, > the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of > sustained O_DIRECT sequential writes. But when running pgbench, I can't push > more than ~300MB/s of WAL to it, no matter what I do because of > WALWriteLock. Hm, interesting. Even if you up wal_buffers to 128MB, use synchronous_commit = off, and play with wal_writer_delay/flush_after? - Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 2017-04-27 10:29:48 -0700, Joshua D. Drake wrote: > On 04/27/2017 09:34 AM, Andres Freund wrote: > > On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: > > > On 04/27/2017 08:59 AM, Andres Freund wrote: > > > > > > > I would agree it isn't yet a widespread issue. > > > > I'm not yet sure about that actually. I suspect a large percentage of > > people with such workloads aren't lingering lots on the lists. > > That would probably be true. I was thinking of it more as the "most new > users are in the cloud" and the "cloud" is going to be rare that a cloud > user is going to be able to hit that level of writes. (at least not without > spending LOTS of money) You can get pretty decent NVMe SSD drives on serveral cloud providers these days, without immediately bancrupting you. Sure, it's instance storage, but with a decent replication and archival setup, that's not necessarily an issue. It's not that hard to get to the point where postgres can't keep up with storage, at least for some workloads. - Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote: > On 04/27/2017 08:59 AM, Andres Freund wrote: > > > > > Ok, based on the, few, answers I've got so far, my experience is indeed > > skewed. A number of the PG users I interacted with over the last couple > > years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s > > (max I'veseen). At that point WAL insertion became a major bottleneck, > > even if storage was more than fast enough to keep up. To address these > > we'd need some changes, but the feedback so far suggest that it's not > > yet a widespread issue... > > I would agree it isn't yet a widespread issue. I'm not yet sure about that actually. I suspect a large percentage of people with such workloads aren't lingering lots on the lists. > The only people that are likely going to see this are going to be on bare > metal. We should definitely plan on that issue for say 11. "plan on that issue" - heh. We're talking about major engineering projects here ;) > I do have a question though, where you have seen this issue is it with > synchronous_commit on or off? Both. Whether that matters or not really depends on the workload. If you have bulk writes, it doesn't really matter much. - Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.
Hi, On 2017-04-24 21:17:43 -0700, Andres Freund wrote: > I've lately seen more and more installations where the generation of > write-ahead-log (WAL) is one of the primary bottlenecks. I'm curious > whether that's primarily a "sampling error" of mine, or whether that's > indeed more common. > > The primary reason I'm curious is that I'm pondering a few potential > optimizations, and would like to have some guidance which are more and > which are less important. > > Questions (answer as many you can comfortably answer): > - How many MB/s, segments/s do you see on busier servers? > - What generates the bulk of WAL on your servers (9.5+ can use > pg_xlogdump --stats to compute that)? > - Are you seeing WAL writes being a bottleneck?OA > - What kind of backup methods are you using and is the WAL volume a > problem? > - What kind of replication are you using and is the WAL volume a > problem? > - What are your settings for wal_compression, max_wal_size (9.5+) / > checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? > - Could you quickly describe your workload? Ok, based on the, few, answers I've got so far, my experience is indeed skewed. A number of the PG users I interacted with over the last couple years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s (max I'veseen). At that point WAL insertion became a major bottleneck, even if storage was more than fast enough to keep up. To address these we'd need some changes, but the feedback so far suggest that it's not yet a widespread issue... - Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Questionaire: Common WAL write rates on busy servers.
Hi, I've lately seen more and more installations where the generation of write-ahead-log (WAL) is one of the primary bottlenecks. I'm curious whether that's primarily a "sampling error" of mine, or whether that's indeed more common. The primary reason I'm curious is that I'm pondering a few potential optimizations, and would like to have some guidance which are more and which are less important. Questions (answer as many you can comfortably answer): - How many MB/s, segments/s do you see on busier servers? - What generates the bulk of WAL on your servers (9.5+ can use pg_xlogdump --stats to compute that)? - Are you seeing WAL writes being a bottleneck?OA - What kind of backup methods are you using and is the WAL volume a problem? - What kind of replication are you using and is the WAL volume a problem? - What are your settings for wal_compression, max_wal_size (9.5+) / checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers? - Could you quickly describe your workload? Feel free to add any information you think is pertinent ;) Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pgsql-performance issue
Hi, On 2016-08-20 08:38:43 +, debasis.mohar...@ipathsolutions.co.in wrote: > I have a PostgreSQL 9.5 instance running on Windows 8 machine with 4GB of > RAM.This server is mainly used for inserting/updating large amounts of data > via copy/insert/update commands, and seldom for running select queries. > > Here are the relevant configuration parameters I changed: > > max_connections = 100 > shared_buffers = 512MB > effective_cache_size = 3GB > work_mem = 12233kB > maintenance_work_mem = 256MB > min_wal_size = 1GB max_wal_size = 2GB > checkpoint_completion_target = 0.7 > wal_buffers = 16MB > default_statistics_target = 100 > > After setting in postgresql.conf. I run the select query to fetch large > amount of record of 29000 in postgresql but it takes 10.3 seconds but the > same query takes 2 seconds for execution in MSSQL. > > So my query is how to improve the perfermance in postgresql. Please provide the output EXPLAIN (ANALYZE, BUFFERS) yourquery; and your query. Then we'll possibly be able to help you - atm we don't have enough information. Regards, Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
On 2016-07-04 16:30:51 +0300, Vladimir Borodin wrote: > > > 13 июня 2016 г., в 21:58, Vladimir Borodin <r...@simply.name> написал(а): > > > >> > >> 13 июня 2016 г., в 0:51, Andres Freund <and...@anarazel.de > >> <mailto:and...@anarazel.de>> написал(а): > >> > >> Hi Vladimir, > >> > >> Thanks for these reports. > >> > >> On 2016-06-13 00:42:19 +0300, Vladimir Borodin wrote: > >>> perf report -g -i pg9?_all.data >/tmp/pg9?_perf_report.txt > >> > >> Any chance you could redo the reports with --no-children > >> --call-graph=fractal > >> added? The mode that includes child overheads unfortunately makes the > >> output hard to interpet/compare. > > > > Of course. Not sure if that is important but I upgraded perf for that > > (because --no-children option was introduced in ~3.16), so perf record and > > perf report were done with different perf versions. > > > > > > > > > > > > Also I’ve done the same test on same host (RHEL 6) but with 4.6 > > kernel/perf and writing perf data to /dev/shm for not loosing > > events. Perf report output is also attached but important thing is > > that the regression is not so significant: FWIW, you can instead use -F 300 or something to reduce the sampling frequency. > > root@pgload05g ~ # uname -r > > 4.6.0-1.el6.elrepo.x86_64 > > root@pgload05g ~ # cat /proc/sys/kernel/sched_autogroup_enabled > > 1 > > root@pgload05g ~ # /tmp/run.sh > > RHEL 6 9.4 71634 0.893 > > RHEL 6 9.5 54005 1.185 > > RHEL 6 9.6 65550 0.976 > > root@pgload05g ~ # echo 0 >/proc/sys/kernel/sched_autogroup_enabled > > root@pgload05g ~ # /tmp/run.sh > > RHEL 6 9.4 73041 0.876 > > RHEL 6 9.5 60105 1.065 > > RHEL 6 9.6 67984 0.941 > > root@pgload05g ~ # > > > > > > > > > > Andres, is there any chance that you would find time to look at those > results? Are they actually useful? I don't really see anything suspicious in the profile. This looks more like a kernel scheduler issue than a postgres bottleneck one. It seems that somehow using nonblocking IO (started in 9.5) causes scheduling issues when pgbouncer is also local. Could you do perf stat -ddd -a sleep 10 or something during both runs? I suspect that the context switch ratios will be quite different. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
On 2016-06-13 21:58:30 +0300, Vladimir Borodin wrote: > > > 13 июня 2016 г., в 0:51, Andres Freund <and...@anarazel.de> написал(а): > > > > Hi Vladimir, > > > > Thanks for these reports. > > > > On 2016-06-13 00:42:19 +0300, Vladimir Borodin wrote: > >> perf report -g -i pg9?_all.data >/tmp/pg9?_perf_report.txt > > > > Any chance you could redo the reports with --no-children > > --call-graph=fractal > > added? The mode that includes child overheads unfortunately makes the > > output hard to interpet/compare. > > Of course. Not sure if that is important but I upgraded perf for that > (because --no-children option was introduced in ~3.16), so perf record and > perf report were done with different perf versions. > > > > Also I’ve done the same test on same host (RHEL 6) but with 4.6 kernel/perf > and writing perf data to /dev/shm for not loosing events. Perf report output > is also attached but important thing is that the regression is not so > significant: > > root@pgload05g ~ # uname -r > 4.6.0-1.el6.elrepo.x86_64 > root@pgload05g ~ # cat /proc/sys/kernel/sched_autogroup_enabled > 1 > root@pgload05g ~ # /tmp/run.sh > RHEL 69.4 71634 0.893 > RHEL 69.5 54005 1.185 > RHEL 69.6 65550 0.976 > root@pgload05g ~ # echo 0 >/proc/sys/kernel/sched_autogroup_enabled > root@pgload05g ~ # /tmp/run.sh > RHEL 69.4 73041 0.876 > RHEL 69.5 60105 1.065 > RHEL 69.6 67984 0.941 > root@pgload05g ~ # Hm. Have you measured how large the slowdown is if you connect via tcp to pgbouncer, but have pgbouncer connect to postgres via unix sockets? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
Hi Vladimir, Thanks for these reports. On 2016-06-13 00:42:19 +0300, Vladimir Borodin wrote: > perf report -g -i pg9?_all.data >/tmp/pg9?_perf_report.txt Any chance you could redo the reports with --no-children --call-graph=fractal added? The mode that includes child overheads unfortunately makes the output hard to interpet/compare. > The results from pg9?_perf_report.txt are attached. Note that in all cases > some events were lost, i.e.: > > root@pgload05g ~ # perf report -g -i pg94_all.data >/tmp/pg94_perf_report.txt > Failed to open [vsyscall], continuing without symbols > Warning: > Processed 537137 events and lost 7846 chunks! You can reduce the overhead by reducing the sampling frequency, e.g. by specifying -F 300. Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
Hi, On 2016-06-02 14:18:26 +0300, Антон Бушмелев wrote: > UP. repeat tests on local vm.. reults are discouraging > OSPG TPS AVG latency > Centos 7 9.5.3 23.711023 168.421 > Centos 7 9.5.3 26.609271 150.188 > Centos 7 9.5.3 25.220044 158.416 > Centos 7 9.5.3 25.598977 156.047 > Centos 7 9.4.8 278.572191 14.077 > Centos 7 9.4.8 247.237755 16.177 > Centos 7 9.4.8 240.007524 16.276 > Centos 7 9.4.8 237.862238 16.596 Could you provide profiles on 9.4 and 9.5? Which kernel did you have enabled? Is /proc/sys/kernel/sched_autogroup_enabled 1 or 0? Regards, Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queries intermittently slow
On 2016-01-07 13:34:51 -0500, Tom Lane wrote: > It's fairly well established that the implementation of transparent > huge pages in Linux kernels from the 2.6-or-so era sucks, and you're > best off turning it off if you care about consistency of performance. I think the feature wasn't introduced in original 2.6 kernels (3.2 or so?), but red hat had backported it to their 2.6.32 kernel. > I am not sure whether modern kernels have improved this area. I think the problem has largely been solved around 3.16. Around 4.1 I could still reproduce problems, but the regressions were only in the sub 10% range in my test workload. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal for unlogged tables
On 2016-01-04 19:12:22 +0200, Mark Zealey wrote: > If there was a command to flush a specific unlogged table to disk it would > work around all these issues no? Perhaps if you marked the table as read > only at the same time it would flush it to disk and ensure no more data > could be written to it eg (ALTER TABLE ... SET READ ONLY on an unlogged > table would flush + not truncate after crash). In our case this would be > great as we want to use these as permanent tables for speed; but after an > initial data dump we don't change the data again so we could just do this at > the end of the import process. It's more complex than that, even unmodified tables need to be processed by vacuum every now and then (xid wraparound handling). It probably possible to work around such things, but it's not a line or ten. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal for unlogged tables
On 2016-01-04 16:38:40 +0200, Mark Zealey wrote: > I don't know how the internals work but unlogged tables definitely flushed > to disk and persist through normal server restarts. It is just according to > the docs if the server ever has an unclean shutdown the tables are truncated > even if they have not been updated in a year. I can't understand why it has > to be like this and it seems that it would be much nicer to not > automatically truncate if it doesn't have to. Pages containing data of unlogged tables aren't ever flushed to disk unless a) a shutdown checkpoint is performed b) a buffer containing data from an unlogged table is used for something else c) the database being copied is the the source of a CREATE DATABASE .. TEMPLATE Hence, if there's an unclean shutdown, there's absolutely no guarantee about the on-disk state of unlogged tables. Even if they haven't been modified in ages - there could have been many many dirty pages in shared buffers when crashing. Always flushing dirty pages of unlogged tables at checkpoint would greatly increase the overhead for memory resident, write heavy workloads that use unlogged tables. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] One long transaction or multiple short transactions?
On 2015-10-17 10:26:01 -0500, Jim Nasby wrote: > Except inserts *do* take a lot of locks, just not user-level locks. > Operations like finding a page to insert into, seeing if that page is in > shared buffers, loading the page into shared buffers, modifying a shared > buffer, getting the relation extension lock if you need to add a new page. > Then there's a whole pile of additional locking you could be looking at for > inserting into any indexes. > > Now, most of the locks I described up there are transaction-aware Missing *not*? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Having some problems with concurrent COPY commands
On 2015-10-13 07:14:01 -0700, Shaun Thomas wrote: > On Tue, Oct 13, 2015 at 2:32 AM, Heikki Linnakangaswrote: > > 80% of the CPU time is spent in the b-tree comparison function. > > In the logs, my duration per COPY command increases from about 1400ms > for one process to about 3800ms when I have four running concurrently. > That's really my only data point, unfortunately. Strace isn't super > helpful because it just says 70-ish% of the time is wait4, but that's > not significantly different than the results using one process. Please run a profile. Compile postgres with CFLAGS='-O2 -fno-omit-frame-pointer' as an argument to configure. That'll allow us to get a hierarchical profile. Then first do a plain cpu profile: perf record -g -a sleep 5 perf report > somefile Then let's look at lock contention: perf record -g -a -e syscalls:sys_enter_semop sleep 5 perf report > somefile and send the results. Thanks, Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Having some problems with concurrent COPY commands
Hi, On 2015-10-12 13:17:53 -0500, Shaun Thomas wrote: > It would appear I'm running into whatever issue the xloginsert_slots patch > tried to address, but not much discussion exists afterwards. That patch is merged, it's just that the number of slots is hardcoded. You can recompile postgres with different values by changing #define NUM_XLOGINSERT_LOCKS 8 in xlog.c to a different value. A restart is enough afterwards. > Is there any other way to > get normal parallel COPY performance, or is that just currently impossible? > > I also know 9.5 underwent a lot of locking improvements, so it might > not be relevant. I just haven't gotten a chance to repeat my tests > with 9.5 just yet. Hard to say anything substantive without further information. Any chance you could provide profiles of such a run? If yes, I can help you with instructions. I'm just to lazy to write them up if not. Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Multi processor server overloads occationally with system process while running postgresql-9.4
On 2015-10-03 01:39:33 -0700, ajaykbs wrote: > It is observed that at some times during moderate load > the CPU usage goes up to 400% and the users are not able to complete the > queries in expected time. But the load is contributed by some system process > only.The average connections are normally 50. This email is nearly impossible to read. But it sounds a bit like you need to disable transparent hugepages and/or zone_reclaim mode. Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 2015-07-08 23:38:38 -0400, Tom Lane wrote: and...@anarazel.de (Andres Freund) writes: On 2015-07-08 15:38:24 -0700, Craig James wrote: From my admittedly naive point of view, it's hard to see why any of this matters. I have functions that do purely CPU-intensive mathematical calculations ... you could imagine something like is_prime(N) that determines if N is a prime number. I have eight clients that connect to eight backends. Each client issues an SQL command like, select is_prime(N) where N is a simple number. I mostly replied to Merlin's general point (additionally in the context of plpgsql). But I have a hard time seing that postgres would be the bottleneck for a is_prime() function (or something with similar characteristics) that's written in C where the average runtime is more than, say, a couple thousand cyles. I'd like to see a profile of that. But that was not the case that Graeme was complaining about. No, Craig was complaining about that case... One of my Salesforce colleagues has been looking into ways that we could decide to skip the per-statement snapshot acquisition even in volatile functions, if we could be sure that a particular statement isn't going to do anything that would need a snapshot. Yea, I actually commented about that on IRC as well. I was thinking about actually continuing to get a snapshot, but mark it as 'complete on usage'. I.e. only call GetSnapshotData() only when the snapshot is used to decide about visibility. We probably can't do that in the toplevel visibility case because it'll probably have noticeable semantic effects, but ISTM it should be doable for the volatile function using spi case. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 2015-07-09 10:30:35 +, Graeme B. Bell wrote: Well, that requires reviewing the source code of the run script and such. No, of course it doesn't. It appears that you didn't look at the repo or read my previous mail before you wrote this. FFS, I *ran* some of the tests and reported on results. With you in CC. What I mean is that I don't just run random code from some random github repository. I do not wish to antagonise you either, so please go and look at the repo before you write the next reply. Over and out. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 2015-07-08 11:13:04 +, Graeme B. Bell wrote: I'm guessing you are maybe pressed for time at the moment because I already clearly included this on the last email, as well as the links to the alternative benchmarks with the same problem I referred to on both of my last emails which are also trivial to drop into pgbench (cut/paste). You realize that you want something here, not Merlin, right? e.g. did you see these parts of my previous email To clear up the issue I build a little test harness around your comment below. http://github.com/gbb/t; Well, that requires reviewing the source code of the run script and such. I think we shouldn't discuss this on two threads (-performance, -bugs), that makes it hard to follow. Given Tom's more detailed answer I think the -bugs thread already contains more pertinent information. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote: On Wed, Jul 8, 2015 at 12:48 PM, Craig James cja...@emolecules.com wrote: On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake j...@commandprompt.com Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource. With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections. Well, right, which is why I mentioned even with dozens of clients. Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)? only in the absence of inter-process locking and cache line bouncing. And addititionally memory bandwidth (shared between everything, even in the numa case), cross socket/bus bandwidth (absolutely performance critical in multi-socket configurations), cache capacity (shared between cores, and sometimes even sockets!). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
On 2015-07-08 15:38:24 -0700, Craig James wrote: From my admittedly naive point of view, it's hard to see why any of this matters. I have functions that do purely CPU-intensive mathematical calculations ... you could imagine something like is_prime(N) that determines if N is a prime number. I have eight clients that connect to eight backends. Each client issues an SQL command like, select is_prime(N) where N is a simple number. I mostly replied to Merlin's general point (additionally in the context of plpgsql). But I have a hard time seing that postgres would be the bottleneck for a is_prime() function (or something with similar characteristics) that's written in C where the average runtime is more than, say, a couple thousand cyles. I'd like to see a profile of that. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 9.5alpha1 vs 9.4
Hi, On 2015-07-05 13:10:51 +0200, Mkrtchyan, Tigran wrote: today I have update my test system to 9.5alpha1. Most of the operations are ok, except delete. I get ~1000 times slower! 255.88 | 566.11 | 452 | DELETE FROM t_inodes WHERE ipnfsid=$1 AND inlink = ? That certainly should not be the case. Could you show the query plan for this statement in both versions? Any chance that there's a parameter type mismatch for $1? Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL disk fragmentation causes performance problems on Windows
On 2015-05-21 11:54:40 -0700, Josh Berkus wrote: This has been talked about as a feature, but would require major work on PostgreSQL to make it possible. You'd be looking at several months of effort by a really good hacker, and then a whole bunch of performance testing. If you have the budget for this, then please let's talk about it because right now nobody is working on it. I think this is overestimating the required effort quite a bit. While not trivial, it's also not that complex to make this work. Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL disk fragmentation causes performance problems on Windows
On 2015-04-29 10:06:39 +0200, Andres Freund wrote: Hi, On 2015-04-23 19:47:06 +, Jan Gunnar Dyrset wrote: I am using PostgreSQL to log data in my application. A number of rows are added periodically, but there are no updates or deletes. There are several applications that log to different databases. This causes terrible disk fragmentation which again causes performance degradation when retrieving data from the databases. The table files are getting more than 5 fragments over time (max table size about 1 GB). The problem seems to be that PostgreSQL grows the database with only the room it need for the new data each time it is added. Because several applications are adding data to different databases, the additions are never contiguous. Which OS and filesystem is this done on? Because many halfway modern systems, like e.g ext4 and xfs, implement this in the background as 'delayed allocation'. Oh, it's in the subject. Stupid me, sorry for that. I'd consider testing how much better this behaves under a different operating system, as a shorter term relief. Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL disk fragmentation causes performance problems on Windows
Hi, On 2015-04-23 19:47:06 +, Jan Gunnar Dyrset wrote: I am using PostgreSQL to log data in my application. A number of rows are added periodically, but there are no updates or deletes. There are several applications that log to different databases. This causes terrible disk fragmentation which again causes performance degradation when retrieving data from the databases. The table files are getting more than 5 fragments over time (max table size about 1 GB). The problem seems to be that PostgreSQL grows the database with only the room it need for the new data each time it is added. Because several applications are adding data to different databases, the additions are never contiguous. Which OS and filesystem is this done on? Because many halfway modern systems, like e.g ext4 and xfs, implement this in the background as 'delayed allocation'. Is it possible that e.g. you're checkpointing very frequently - which includes fsyncing dirty files - and that that causes delayed allocation not to work? How often did you checkpoint? How did you measure the fragmentation? Using filefrag? If so, could you perhaps send its output? I think that preallocating lumps of a given, configurable size, say 4 MB, for the tables would remove this problem. The max number of fragments on a 1 GB file would then be 250, which is no problem. Is this possible to configure in PostgreSQL? If not, how difficult is it to implement in the database? It's not impossible, but there are complexities because a) extension happens under a sometimes contended lock, and doing more there will have possible negative scalability implications. we need to restructure the logging first to make that more realistic. b) postgres also tries to truncate files, and we need to make sure that happens only in the right cirumstances. Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 13:08:13 +0100, Robert Kaye wrote: On Mar 15, 2015, at 12:41 PM, Andreas Kretschmer akretsch...@spamfence.net wrote: just a wild guess: raid-controller BBU faulty We don’t have a BBU in this server, but at least we have redundant power supplies. In any case, how would a fault batter possibly cause this? Many controllers disable write-back caching when the battery is dead. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 13:07:25 +0100, Robert Kaye wrote: On Mar 15, 2015, at 12:13 PM, Josh Krupka jkru...@gmail.com wrote: It sounds like you've hit the postgres basics, what about some of the linux check list items? what does free -m show on your db server? total used free sharedbuffers cached Mem: 48295 31673 16622 0 5 12670 -/+ buffers/cache: 18997 29298 Swap:22852 2382 20470 Could you post /proc/meminfo instead? That gives a fair bit more information. Also: * What hardware is this running on? * Why do you need 500 connections (that are nearly all used) when you have a pgbouncer in front of the database? That's not going to be efficient. * Do you have any data tracking the state connections are in? I.e. whether they're idle or not? The connections graph on you linked doesn't give that information? * You're apparently not graphing CPU usage. How busy are the CPUs? How much time is spent in the kernel (i.e. system)? * Consider installing perf (linux-utils-$something) and doing a systemwide profile. 3.2 isn't the greatest kernel around, efficiency wise. At some point you might want to upgrade to something newer. I've seen remarkable differences around this. You really should upgrade postgres to a newer major version one of these days. Especially 9.2. can give you a remarkable improvement in performance with many connections in a read mostly workload. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. I think that's a outdated wisdom, i.e. not generally true. I've now seen a significant number of systems where a larger shared_buffers can help quite massively. The primary case where it can, in my experience, go bad are write mostly database where every buffer acquiration has to write out dirty data while holding locks. Especially during relation extension that's bad. A new enough kernel, a sane filesystem (i.e. not ext3) and sane checkpoint configuration takes care of most of the other disadvantages. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 20:42:51 +0300, Ilya Kosmodemiansky wrote: On Sun, Mar 15, 2015 at 8:20 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote: shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so. I think that's a outdated wisdom, i.e. not generally true. Quite agreed. With note, that proper configured controller with BBU is needed. That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. But, besides influencing the default for wal_buffers, a larger shared_buffers doesn't change the amount of synchronous writes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
(please quote properly) On 2015-03-15 19:55:23 -0400, mich...@sqlexec.com wrote: Why is 500 connections insane. We got 32 CPU with 96GB and 3000 max connections, and we are doing fine, even when hitting our max concurrent connection peaks around 4500. At a previous site, we were using 2000 max connections on 24 CPU and 64GB RAM, with about 1500 max concurrent connections. So I wouldn't be too hasty in saying more than 500 is asking for trouble. Just as long as you got your kernel resources set high enough to sustain it (SHMMAX, SHMALL, SEMMNI, and ulimits), and RAM for work_mem. It may work acceptably in some scenarios, but it can lead to significant problems. Several things in postgres things in postgres scale linearly (from the algorithmic point of view, often CPU characteristics like cache sizes make it wors) with max_connections, most notably acquiring a snapshot. It usually works ok enough if you don't have a high number of queries per second, but if you do, you can run into horrible contention problems. Absurdly enough that matters *more* on bigger machines with several sockets. It's especially bad on 4+ socket systems. The other aspect is that such a high number of full connections usually just isn't helpful for throughput. Not even the most massive NUMA systems (~256 hardware threads is the realistic max atm IIRC) can process 4.5k queries at the same time. It'll often be much more efficient if all connections above a certain number aren't allocated a full postgres backend, with all it's overhead, but use a much more lightweight pooler connection. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 20:54:51 +0300, Ilya Kosmodemiansky wrote: On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund and...@2ndquadrant.com wrote: That imo doesn't really have anything to do with it. The primary benefit of a BBU with writeback caching is accelerating (near-)synchronous writes. Like the WAL. My point was, that having no proper raid controller (today bbu surely needed for the controller to be a proper one) + heavy writes of any kind, it is absolutely impossible to live with large shared_buffers and without io problems. And my point is that you're mostly wrong. What a raid controller's writeback usefully cache accelerates is synchronous writes. I.e. writes that the application waits for. Usually raid controllers don't have much chance to reorderer the queued writes (i.e. turning neighboring writes into one larger sequential write). What they do excel at is making synchronous writes to disk return faster because the data is only written to the the controller's memory, not to actual storage. They're also good at avoiding actual writes to disk when the *same* page is written to multiple times in short amount of time. In postgres writes for data that goes through shared_buffers are usally asynchronous. We write them to the OS's page cache when a page is needed for other contents, is undirtied by the bgwriter, or written out during a checkpoint; but do *not* wait for the write to hit the disk. The controller's write back cache doesn't hugely help here, because it doesn't make *that* much of a difference whether the dirty data stays in the kernel's page cache or in the controller. In contrast to that, writes to the WAL are often more or les synchronous. We actually wait (via fdatasync()/fsync() syscalls) for writes to hit disk in a bunch of scenarios, most commonly when committing a transaction. Unless synchronous_commit = off every COMMIT in a transaction that wrote data implies a fdatasync() of a file in pg_xlog (well, we do optimize that in some condition, but let's leave that out for now). Additionally, if there are many smaller/interleaved transactions, we will write()/fdatasync() out the same 8kb WAL page repeatedly. Everytime a transaction commits (and some other things) the page that commit record is on will be flushed. As the WAL records for insertions, updates, deletes, commits are frequently much smaller than 8kb that will often happen 20-100 for the same page in OLTP scenarios with narrow rows. That's why synchronous_commit = off can be such a huge win for OLTP write workloads without a writeback cache - synchronous writes are turned into asynchronous writes, and repetitive writes to the same page are avoided. It also explains why synchronous_commit = off has much less an effect for bulk write workloads: As there are no synchronous disk writes due to WAL flushes at commit time (there's only very few commits), synchronous commit doesn't have much of an effect. That said, there's a couple reasons why you're not completely wrong: Historically, when using ext3 with data=ordered and some other filesystems, synchronous writes to one file forced *all* other previously dirtied data to also be flushed. That means that if you have pg_xlog and the normal relation files on the same filesystem, the synchronous writes to the WAL will not only have to write out the new WAL (often not that much data), but also all the other dirty data. The OS will often be unable to do efficient write combining in that case, because a) there's not yet that much data there, b) postgres doesn't order writes during checkpoints. That means that WAL writes will suddenly have to write out much more data = COMMITs are slow. That's where the suggestion to keep pg_xlog on a separate partion largely comes from. Writes going through shared_buffers are sometimes indirectly turned into synchronous writes (from the OSs perspective at least. Which means they'll done at a higher priority). That happens when the checkpointer fsync()s all the files at the end of a checkpoint. When things are going well and checkpoints are executed infrequently and completely guided by time (i.e. triggered solely by checkpoint_timeout, and not checkpoint_segments) that's usually not too bad. You'll see a relatively small latency spike for transactions. Unfortunately the ext* filesystems have a implementation problem here, which can make this problem much worse: The way writes are priorized during an fsync() can stall out concurrent synchronous reads/writes pretty badly. That's much less of a problem with e.g. xfs. Which is why I'd atm not suggest ext4 for write intensive applications. The other situation where this can lead to big problems is if your checkpoints aren't scheduled by time (you can recognize that by enabling log_checkpoints and check a) that time is the trigger, b) they're actually happening in a interval consistent with checkpoint_timeout). If the relation files are not writtten out in a smoothed out fashion (configured
Re: [PERFORM] MusicBrainz postgres performance issues
On 2015-03-15 12:25:07 -0600, Scott Marlowe wrote: Here's the problem with a large shared_buffers on a machine that's getting pushed into swap. It starts to swap BUFFERs. Once buffers start getting swapped you're not just losing performance, that huge shared_buffers is now working against you because what you THINK are buffers in RAM to make things faster are in fact blocks on a hard drive being swapped in and out during reads. It's the exact opposite of fast. :) IMNSHO that's tackling things from the wrong end. If 12GB of shared buffers drive your 48GB dedicated OLTP postgres server into swapping out actively used pages, the problem isn't the 12GB of shared buffers, but that you require so much memory for other things. That needs to be fixed. But! We haven't even established that swapping is an actual problem here. The ~2GB of swapped out memory could just as well be the java raid controller management monstrosity or something similar. Those pages won't ever be used and thus can better be used to buffer IO. You can check what's actually swapped out using: grep ^VmSwap /proc/[0-9]*/status|grep -v '0 kB' For swapping to be actually harmful you need to have pages that are regularly swapped in. vmstat will tell. In a concurrent OLTP workload (~450 established connections do suggest that) with a fair amount of data keeping the hot data set in shared_buffers can significantly reduce problems. Constantly searching for victim buffers isn't a nice thing, and that will happen if your most frequently used data doesn't fit into s_b. On the other hand, if your data set is so large that even the hottest part doesn't fit into memory (perhaps because there's no hottest part as there's no locality at all), a smaller shared buffers can make things more efficient, because the search for replacement buffers is cheaper with a smaller shared buffers setting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 2014-11-10 10:48:24 -0800, Josh Berkus wrote: On 12/31/2013 09:55 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: Tom, There's an abbreviated version of this argument in the comments in my proposed patch at http://www.postgresql.org/message-id/11927.1384199...@sss.pgh.pa.us What I'm hoping will happen next is that the complainants will hot-patch that and see if it fixes their problems. We can't really determine what to do without that information. Unfortunately, the original reporter of this issue will not be available for testing for 2-3 weeks, and I haven't been able to devise a synthetic test which clearly shows the issue. Ping? I've been waiting on committing that patch pending some real-world testing. It'd be nice to resolve this question before we ship 9.3.3, which I'm supposing will be sometime in January ... Did this patch every make it in? Or did it hang waiting for verification? src/tools/git_changelog is your friend. Author: Tom Lane t...@sss.pgh.pa.us Branch: master Release: REL9_4_BR [fccebe421] 2014-02-25 16:04:06 -0500 Branch: REL9_3_STABLE Release: REL9_3_4 [4162a55c7] 2014-02-25 16:04:09 -0500 Branch: REL9_2_STABLE Release: REL9_2_8 [00283cae1] 2014-02-25 16:04:12 -0500 Branch: REL9_1_STABLE Release: REL9_1_13 [3e2db4c80] 2014-02-25 16:04:16 -0500 Branch: REL9_0_STABLE Release: REL9_0_17 [1e0fb6a2c] 2014-02-25 16:04:20 -0500 Use SnapshotDirty rather than an active snapshot to probe index endpoints. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres slave not catching up (on 9.2)
Hi, On 2014-11-08 13:11:25 +, Ruben Domingo Gaspar Aparicio wrote: Hello, I have built up a hot-standby between a master running 9.2.4 and a slave running 9.2.9. I did the initial copy using pg_basebackup . My recovery.conf looks like: standby_mode = 'on' primary_conninfo = 'host=p port= user=replicator' trigger_file = 'failover.now' restore_command = 'test -f /ORA/dbs02/PUPDBTST/archive/%f ln -fns /ORA/dbs02/PUPDBTST/archive/%f %p' archive_cleanup_command = '/usr/local/pgsql/postgresql-9.2.9/bin/pg_archivecleanup /ORA/dbs03/PUPDBTST/data %r' The slave (I don't have control on the master) is using 2 NFS file systems, one for WALs and another one for the data, on Netapp controllers: dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) The streaming is working perfectly: /usr/local/pgsql/postgresql-9.2.9/bin/psql -U admin -h X -p X -d puppetdb -c select pid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,replay_location,sync_state from pg_stat_replication; pid | usesysid | usename | application_name | client_addr | state | sent_location | write_location | replay_location | sync_state +--++--+-+---+---++-+ 117659 |16384 | replicator | walreceiver | 10.16.7.137 | streaming | AA74/DD630978 | AA74/DD630978 | A977/F84F0BE0 | async (1 row) But the lag is increasing constantly, it looks the replay can not cope with: I have a couple of questions: 1) Is the standby actually used for querying? Is it possible that replay frequently conflicts with active queries? As you don't have hot_standby_feedback enabled that seems quite possible. 2) Is the startup process on the standby CPU or IO bound? 3) Does the workload involve loads of temporary tables or generally transactions locking lots of tables exclusively in one transaction? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3
On 2014-08-21 14:02:26 -0700, Josh Berkus wrote: On 08/20/2014 07:40 PM, Bruce Momjian wrote: Not sure how you can make such a blanket statement when so many people have tested and shown the benefits of hyper-threading. Actually, I don't know that anyone has posted the benefits of HT. Link? There's definitely cases where it can help. But it's highly workload *and* hardware dependent. OS is RHEL with 2.6.32-431.3.1.el6.x86_64. I've emailed a kernel hacker who works at Intel for comment; for one thing, I'm wondering if the older kernel version is a problem for a system like this. I'm not sure if it has been backported by redhat, but there definitely have been significant improvement in SMT aware scheduling after vanilla 2.6.32. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query against pg_locks leads to large memory alloc
On 2014-08-18 14:36:52 -0700, Dave Owens wrote: On Mon, Aug 18, 2014 at 2:21 PM, Matheus de Oliveira matioli.math...@gmail.com wrote: Do you really need such large values? What is your max_connections value? max_connections = 450 ...we have found that we run out of shared memory when max_pred_locks_per_transaction is less than 30k. What was the precise error message when that happened? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 60 core performance with 9.3
On 2014-07-11 12:40:15 +1200, Mark Kirkwood wrote: On 01/07/14 22:13, Andres Freund wrote: On 2014-07-01 21:48:35 +1200, Mark Kirkwood wrote: - cherry picking the last 5 commits into 9.4 branch and building a package from that and retesting: Clients | 9.4 tps 60 cores (rwlock) +-- 6 | 70189 12 | 128894 24 | 233542 48 | 422754 96 | 590796 192 | 630672 Wow - that is more like it! Andres that is some nice work, we definitely owe you some beers for that :-) I am aware that I need to retest with an unpatched 9.4 src - as it is not clear from this data how much is due to Andres's patches and how much to the steady stream of 9.4 development. I'll post an update on that later, but figured this was interesting enough to note for now. Cool. That's what I like (and expect) to see :). I don't think unpatched 9.4 will show significantly different results than 9.3, but it'd be good to validate that. If you do so, could you post the results in the -hackers thread I just CCed you on? That'll help the work to get into 9.5. So we seem to have nailed read only performance. Going back and revisiting read write performance finds: Postgres 9.4 beta rwlock patch pgbench scale = 2000 max_connections = 200; shared_buffers = 10GB; maintenance_work_mem = 1GB; effective_io_concurrency = 10; wal_buffers = 32MB; checkpoint_segments = 192; checkpoint_completion_target = 0.8; clients | tps (32 cores) | tps -++- 6| 8313 | 8175 12 | 11012 | 14409 24 | 16151 | 17191 48 | 21153 | 23122 96 | 21977 | 22308 192 | 22917 | 23109 On that scale - that's bigger than shared_buffers IIRC - I'd not expect the patch to make much of a difference. kernel.sched_autogroup_enabled=0 kernel.sched_migration_cost_ns=500 net.core.somaxconn=1024 /sys/kernel/mm/transparent_hugepage/enabled [never] Full report http://paste.ubuntu.com/886/ # 8.82%postgres [kernel.kallsyms][k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--75.69%-- pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | putback_lru_page | migrate_pages | migrate_misplaced_page | do_numa_page | handle_mm_fault | __do_page_fault | do_page_fault | page_fault So, the majority of the time is spent in numa page migration. Can you disable numa_balancing? I'm not sure if your kernel version does that at runtime or whether you need to reboot. The kernel.numa_balancing sysctl might work. Otherwise you probably need to boot with numa_balancing=0. It'd also be worthwhile to test this with numactl --interleave. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres Replaying WAL slowly
On 2014-07-01 15:20:37 -0400, Tom Lane wrote: Jeff Frost j...@pgexperts.com writes: On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Did you check whether the locks were all on temp tables of the ON COMMIT DROP persuasion? And indeed it did catch up overnight and the lag increased shortly after a correlating spike in AccessExclusiveLocks that were generated by temp table creation with on commit drop. OK, so we have a pretty clear idea of where the problem is now. It seems like there are three, not mutually exclusive, ways we might address this: 1. Local revisions inside StandbyReleaseLocks to make it perform better in the presence of many locks. This would only be likely to improve matters much if there's a fixable O(N^2) algorithmic issue; but there might well be one. It sounded like Andres had taken a preliminary look at #1 and found a possible avenue for improvement, which I'd encourage him to pursue. I don't have the resources to do this right now, but yes, I think we can get relatively easily get rid of the O(num_locks * num_subtransactions) behaviour. 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on the grounds that no standby should be touching them. I'm not entirely sure that that argument is bulletproof though; in particular, even though a standby couldn't access the table's data, it's possible that it would be interested in seeing consistent catalog entries. Hm. We definitely perform checks surprisingly late for those. It's possible to do SELECT * FROM pg_temp_nn.whatever; without an error f there's no rows of if the rest of the plan doesn't do accesses to that table. The check prohibiting access is only in bufmgr.c... So yea, I don't think we can do this for at least 9.4. And there it'll still be hard. 3. Avoid WAL-logging AccessExclusiveLocks associated with new-in-transaction tables, temp or not, on the grounds that no standby could even see such tables until they're committed. We could go a bit further and not take out any locks on a new-in-transaction table in the first place, on the grounds that other transactions on the master can't see 'em either. For both #2 and the conservative version of #3, the main implementation problem would be whether the lock WAL-logging code has cheap access to the necessary information. I suspect it doesn't. Not trivially. It's logged directly in LockAcquireExtended(). We could add the information into locktags as there's unused fields for relation locktags, but brrr. The radical version of #3 might be pretty easy to do, at least to the extent of removing locks taken out during CREATE TABLE. I suspect there are some assertions or other consistency checks that would get unhappy if we manipulate relations without locks, though, so those would have to be taught about the exception. Also, we sometimes forget new-in-transaction status during relcache flush events; it's not clear if that would be a problem for this. I think that hole is actually pluggable in newer releases - at least there's no code around that assumes rd_createSubid now is persistent, even across cache resets. But I think more importantly it's probably quite possible to hit a similar problem without ON COMMIT DROP relations. Say DISCARD TEMP inside a transaction (with several subxacts) or so? So we probaly really should fix the bad scaling. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 60 core performance with 9.3
On 2014-07-01 21:48:35 +1200, Mark Kirkwood wrote: On 27/06/14 21:19, Andres Freund wrote: On 2014-06-27 14:28:20 +1200, Mark Kirkwood wrote: My feeling is spinlock or similar, 'perf top' shows kernel find_busiest_group kernel _raw_spin_lock as the top time users. Those don't tell that much by themselves, could you do a hierarchical profile? I.e. perf record -ga? That'll at least give the callers for kernel level stuff. For more information compile postgres with -fno-omit-frame-pointer. Unfortunately this did not help - had lots of unknown symbols from postgres in the profile - I'm guessing the Ubuntu postgresql-9.3 package needs either the -dev package or to be rebuilt with the enable profile option (debug and no-omit-frame-pointer seem to be there already). You need to install the -dbg package. My bet is you'll see s_lock high in the profile, called mainly from the procarray and buffer mapping lwlocks. Test: pgbench Options: scale 500 read only Os: Ubuntu 14.04 Pg: 9.3.4 Pg Options: max_connections = 200 Just as an experiment I'd suggest increasing max_connections by one and two and quickly retesting - there's some cacheline alignment issues that aren't fixed yet that happen to vanish with some max_connections settings. shared_buffers = 10GB maintenance_work_mem = 1GB effective_io_concurrency = 10 wal_buffers = 32MB checkpoint_segments = 192 checkpoint_completion_target = 0.8 Results Clients | 9.3 tps 32 cores | 9.3 tps 60 cores +--+- 6 | 70400 | 71028 12 | 98918 | 129140 24 | 230345 | 240631 48 | 324042 | 409510 96 | 346929 | 120464 192 | 312621 | 92663 So we have anti scaling with 60 cores as we increase the client connections. Ouch! A level of urgency led to trying out Andres's 'rwlock' 9.4 branch [1] - cherry picking the last 5 commits into 9.4 branch and building a package from that and retesting: Clients | 9.4 tps 60 cores (rwlock) +-- 6 | 70189 12 | 128894 24 | 233542 48 | 422754 96 | 590796 192 | 630672 Wow - that is more like it! Andres that is some nice work, we definitely owe you some beers for that :-) I am aware that I need to retest with an unpatched 9.4 src - as it is not clear from this data how much is due to Andres's patches and how much to the steady stream of 9.4 development. I'll post an update on that later, but figured this was interesting enough to note for now. Cool. That's what I like (and expect) to see :). I don't think unpatched 9.4 will show significantly different results than 9.3, but it'd be good to validate that. If you do so, could you post the results in the -hackers thread I just CCed you on? That'll help the work to get into 9.5. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres Replaying WAL slowly
On 2014-06-30 19:14:24 +0300, Heikki Linnakangas wrote: On 06/30/2014 05:46 PM, Soni M wrote: Here's what 'perf top' said on streaming replica : Samples: 26K of event 'cpu-clock', Event count (approx.): 19781 95.97% postgres [.] 0x002210f3 Ok, so it's stuck doing something.. Can you get build with debug symbols installed, so that we could see the function name? My guess it's a spinlock, probably xlogctl-info_lck via RecoveryInProgress(). Unfortunately inline assembler doesn't always seem to show up correctly in profiles... What worked for me was to build with -fno-omit-frame-pointer - that normally shows the callers, even if it can't generate a proper symbol name. Soni: Do you use Hot Standby? Are there connections active while you have that problem? Any other processes with high cpu load? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres Replaying WAL slowly
On 2014-06-30 11:34:52 -0700, Jeff Frost wrote: On Jun 30, 2014, at 10:29 AM, Soni M diptat...@gmail.com wrote: It is 96.62% postgres [.] StandbyReleaseLocks as Jeff said. It runs quite long time, more than 5 minutes i think i also use hot standby. we have 4 streaming replica, some of them has active connection some has not. this issue has last more than 4 days. On one of the standby, above postgres process is the only process that consume high cpu load. compiled with -fno-omit-frame-pointer doesn't yield much more info: You'd need to do perf record -ga instead of perf record -a to see additional information. But: 76.24% postgres [.] StandbyReleaseLocks already is quite helpful. What are you doing on that system? Is there anything requiring large amounts of access exclusive locks on the primary? Possibly large amounts of temporary relations? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres Replaying WAL slowly
On 2014-06-30 12:57:56 -0700, Jeff Frost wrote: On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira matioli.math...@gmail.com wrote: On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost j...@pgexperts.com wrote: And if you go fishing in pg_class for any of the oids, you don't find anything: That is probably because you are connected in the wrong database. Once you connect to the database of interest, you don't even need to query pg_class, just cast relation attribute to regclass: SELECT relation::regclass, ... FROM pg_locks WHERE database = (SELECT oid FROM pg_database WHERE datname = current_database()); Yah, i thought about that too, but verified I am in the correct DB. Just for clarity sake: So these are probably relations created in uncommitted transactions. Possibly ON COMMIT DROP temp tables? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres Replaying WAL slowly
On 2014-06-30 19:04:20 -0400, Tom Lane wrote: Jeff Frost j...@pgexperts.com writes: So it seems like we have a candidate explanation. I'm a bit surprised that StandbyReleaseLocks would get this slow if there are only a dozen AccessExclusiveLocks in place at any one time, though. Perhaps that was a low point and there are often many more? Since we turned on the monitoring for that, we had a peak of 13,550 AccessExclusiveLocks. Any chance the workload also uses lots of subtransactions? Ah ... that's more like a number I can believe something would have trouble coping with. Did you see a noticeable slowdown with this? Now that we've seen that number, of course it's possible there was an even higher peak occurring when you saw the trouble. Perhaps there's an O(N^2) behavior in StandbyReleaseLocks, or maybe it just takes awhile to handle that many locks. I don't think there's a O(n^2) in StandbyReleaseLocks() itself, but in combination with StandbyReleaseLockTree() it looks possibly bad. The latter will call StandbyReleaseLocks() for every xid/subxid, and each of the StandbyReleaseLocks() will then trawl the entire RecoveryLockList... It'd probably be better to implement ReleaseLocksTree() by sorting the subxid list and bsearch that while iterating RecoveryLockList. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 60 core performance with 9.3
On 2014-06-27 14:28:20 +1200, Mark Kirkwood wrote: My feeling is spinlock or similar, 'perf top' shows kernel find_busiest_group kernel _raw_spin_lock as the top time users. Those don't tell that much by themselves, could you do a hierarchical profile? I.e. perf record -ga? That'll at least give the callers for kernel level stuff. For more information compile postgres with -fno-omit-frame-pointer. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sudden crazy high CPU usage
On 2014-03-31 19:16:58 +0200, Niels Kristian Schjødt wrote: Yes, I could install “perf”, though I’m not familiar with it. What would i do? :-) As root: perf record -a sleep 5 perf report my-nice-perf-report.txt And then send the my-nice-perf-report.txt file. Locally it's much nicer to see the output using perf report without redirect into a file, you'll get an interactive UI. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Parallel Select query performance and shared buffers
On 2013-12-05 11:15:20 +0200, Metin Doslu wrote: - When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is disappeared for 8 core machines and come back with 16 core machines on Amazon EC2. Would it be related with PostgreSQL locking mechanism? If we build with -DLWLOCK_STATS to print locking stats from PostgreSQL, we see tons of contention with default value of NUM_BUFFER_PARTITIONS which is 16: Is your workload bigger than RAM? I think a good bit of the contention you're seeing in that listing is populating shared_buffers - and might actually vanish once you're halfway cached. From what I've seen so far the bigger problem than contention in the lwlocks itself, is the spinlock protecting the lwlocks... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Parallel Select query performance and shared buffers
On 2013-12-05 11:33:29 +0200, Metin Doslu wrote: Is your workload bigger than RAM? RAM is bigger than workload (more than a couple of times). I think a good bit of the contention you're seeing in that listing is populating shared_buffers - and might actually vanish once you're halfway cached. From what I've seen so far the bigger problem than contention in the lwlocks itself, is the spinlock protecting the lwlocks... Could you clarify a bit what do you mean by halfway cached Well, your stats showed a) fairly low lock counts overall b) a high percentage of exclusive locks. a) indicates the system wasn't running long. b) tells me there were lots of changes to the buffer mapping - which basically only happens if a buffer is placed or removed from shared-buffers. If your shared_buffers is big enough to contain most of the data you shouldn't see many exclusive locks in comparison to the number of shared locks. and spinlock protecting the lwlocks. Every LWLock has an internal spinlock to protect its state. So whenever somebody does a LWLockAcquire()/Release(), even if only in shared mode, we currently acquire that spinlock, manipulate the LWLocks state, and release the spinlock again. In lots of workloads that internal spinlock is the contention point, not the lenght over which the lwlock is held. Especially when they are mostly held in shared mode. Makes sense? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-05 17:46:44 +0200, Metin Doslu wrote: I tried your patches on next link. As you suspect I didn't see any improvements. I tested it on PostgreSQL 9.2 Stable. You tested the correct branch, right? Which commit does git rev-parse HEAD show? But generally, as long as your profile hides all the important information behind the hypervisor's cost, you're going to have a hard time analyzing the problems. You really should try to reproduce the problems on native hardware (as similar to the host hardware as possible), to get accurate data. On CPU bound workloads that information is often transportable to the virtual world. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-04 14:27:10 -0200, Claudio Freire wrote: On Wed, Dec 4, 2013 at 9:19 AM, Metin Doslu me...@citusdata.com wrote: Here are the results of vmstat 1 while running 8 parallel TPC-H Simple (#6) queries: Although there is no need for I/O, wa fluctuates between 0 and 1. procs ---memory-- ---swap-- -io --system-- -cpu- r b swpd free buffcache si sobiboin cs us sy id wa st 0 0 0 30093568 84892 3872389600 0 022 14 0 0 100 0 0 8 1 0 30043056 84892 3872389600 0 0 27080 52708 16 14 70 0 0 8 1 0 30006600 84892 3872389600 0 0 44952 118286 43 44 12 1 0 8 0 0 29986264 84900 3872389600 020 28043 95934 49 42 8 1 0 7 0 0 29991976 84900 3872389600 0 0 8308 73641 52 42 6 0 0 0 0 0 30091828 84900 3872389600 0 0 3996 30978 23 24 53 0 0 0 0 0 30091968 84900 3872389600 0 01723 0 0 100 0 0 Notice the huge %sy My bet is on transparent hugepage defragmentation. Alternatively it's scheduler overhead, due to superflous context switches around the buffer mapping locks. I'd strongly suggest doing a perf record -g -a wait a bit, ctrl-c; perf report run to check what's eating up the time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-04 18:43:35 +0200, Metin Doslu wrote: I'd strongly suggest doing a perf record -g -a wait a bit, ctrl-c; perf report run to check what's eating up the time. Here is one example: + 38.87% swapper [kernel.kallsyms] [k] hypercall_page + 9.32% postgres [kernel.kallsyms] [k] hypercall_page + 6.80% postgres [kernel.kallsyms] [k] xen_set_pte_at All that time is spent in your virtualization solution. One thing to try is to look on the host system, sometimes profiles there can be more meaningful. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-04 16:00:40 -0200, Claudio Freire wrote: On Wed, Dec 4, 2013 at 1:54 PM, Andres Freund and...@2ndquadrant.com wrote: All that time is spent in your virtualization solution. One thing to try is to look on the host system, sometimes profiles there can be more meaningful. You cannot profile the host on EC2. Didn't follow the thread from the start. So, this is EC2? Have you checked, with a recent enough version of top or whatever, how much time is reported as stolen? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [HACKERS] Parallel Select query performance and shared buffers
On 2013-12-04 20:19:55 +0200, Metin Doslu wrote: - When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is disappeared for 8 core machines and come back with 16 core machines on Amazon EC2. Would it be related with PostgreSQL locking mechanism? You could try my lwlock-scalability improvement patches - for some workloads here, the improvements have been rather noticeable. Which version are you testing? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 2013-09-27 13:57:02 -0700, Josh Berkus wrote: Andres, Jeff, As far as I can tell, the only downside of doing that is that, since hint bits might be set later, it is possible some dirty pages will get written unhinted and then re-dirtied by the hint bit setting, when more aggressive setting would have only one combined dirty write instead. But that seems rather hypothetical, and if it really is a problem we should probably tackle it directly rather than by barring other optimizations. I am - as evidenced - too tired to think about this properly, but I think you might be right here. Any thoughts on a fix for this we could get into 9.2.5? I don't see much chance to apply anything like this in a backbranch. Changing IO patterns in a noticeable way in a minor release is just asking for trouble. Also, this really isn't going to fix the issue discussed here - this was just about the additional ProcArrayLock contention. I don't think it would change anything dramatical in your case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote: On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote: + bool + RecoveryMightBeInProgress(void) + { + /* + * We check shared state each time only until we leave recovery mode. We + * can't re-enter recovery, so there's no need to keep checking after the + * shared variable has once been seen false. + */ + if (!LocalRecoveryInProgress) + return false; + else + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + /* Intentionally query xlogctl without spinlocking! */ + LocalRecoveryInProgress = xlogctl-SharedRecoveryInProgress; + + return LocalRecoveryInProgress; + } + } I don't think it's acceptable to *set* LocalRecoveryInProgress here. That should only be done in the normal routine. quite right -- that was a major error -- you could bypass the initialization call to the xlog with some bad luck. I've seen this in profiles since, so I'd appreciate pushing this forward. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 2013-09-25 00:06:06 -0700, Jeff Janes wrote: On 09/20/2013 03:01 PM, Jeff Janes wrote: 3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked. So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query. That should be gone in master, we don't use SnapshotNow anymore which had those TransactionIdIsInProgress() calls you're probably referring to. The lookups discussed in this thread now use the statement's snapshot. And all those have their own copy of the currently running transactions. Why do we need a procarraylock for this? Seems like the solution would be not to take a lock at all; the information on transaction commit is in the clog, after all. More clog accesses would hardly improve the situation. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
On 2013-09-25 11:17:51 -0700, Jeff Janes wrote: On Wed, Sep 25, 2013 at 10:53 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-09-25 00:06:06 -0700, Jeff Janes wrote: On 09/20/2013 03:01 PM, Jeff Janes wrote: 3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked. So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query. That should be gone in master, we don't use SnapshotNow anymore which had those TransactionIdIsInProgress() calls you're probably referring to. The lookups discussed in this thread now use the statement's snapshot. And all those have their own copy of the currently running transactions. See HeapTupleSatisfiesMVCC, near line 943 of tqual.c: else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple))) return false; else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple))) SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED, HeapTupleHeaderGetXmin(tuple)); Hm, sorry, misrembered things a bit there. If we guarded that check by moving up line 961 to before 943: if (XidInMVCCSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot)) return false; /* treat as still in progress */ Then we could avoid the contention, as that check only refers to local memory. That wouldn't be correct afaics - the current TransactionIdIsInProgress callsite is only called when no HEAP_XMIN_COMMITTED was set. So you would have to duplicate it. As far as I can tell, the only downside of doing that is that, since hint bits might be set later, it is possible some dirty pages will get written unhinted and then re-dirtied by the hint bit setting, when more aggressive setting would have only one combined dirty write instead. But that seems rather hypothetical, and if it really is a problem we should probably tackle it directly rather than by barring other optimizations. I am - as evidenced - too tired to think about this properly, but I think you might be right here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
Hi, On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote: We have not been able to reproduce this problem on a test servers. Use this patch to production servers do not dare. In the course of studying the problems we have identified that many queries are executed on the slave several times slower. On master function heap_hot_search_buffer execute 100 cycles, on the slave the same query with the same plan function heap_hot_search_buffer execute 2000 cycles. Also, we were able to reproduce the problem on the master and detect that there s_lock of slow queries. What you describe is normally an indication that you have too many longrunning transactions around preventing hot pruning from working. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote: We have not been able to reproduce this problem on a test servers. Use this patch to production servers do not dare. In the course of studying the problems we have identified that many queries are executed on the slave several times slower. On master function heap_hot_search_buffer execute 100 cycles, on the slave the same query with the same plan function heap_hot_search_buffer execute 2000 cycles. Also, we were able to reproduce the problem on the master and detect that there s_lock of slow queries. What you describe is normally an indication that you have too many longrunning transactions around preventing hot pruning from working. Do you think it's worth submitting the lock avoidance patch for formal review? You mean the bufmgr.c thing? Generally I think that that code needs a good of scalability work - there's a whole thread about it somewhere. But TBH the theories you've voiced about the issues you've seen haven't convinced me so far. If you can manage to prove it has a benefit in some case that's reproducable - why not go ahead? Quick question: Do you happen to have pg_locks output from back then around? We've recently found servers going into somewhat similar slowdowns because they exhausted the fastpath locks which made lwlocks far more expensive and made s_lock go up very high in the profle. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote: On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: Do you think it's worth submitting the lock avoidance patch for formal review? You mean the bufmgr.c thing? Generally I think that that code needs a good of scalability work - there's a whole thread about it somewhere. But TBH the theories you've voiced about the issues you've seen haven't convinced me so far. er, no (but I share your skepticism -- my challenge right now is to demonstrate measurable benefit which so far I've been unable to do). I was talking about the patch on *this* thread which bypasses the s_lock in RecoveryInProgress() :-). Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense. Quick question: Do you happen to have pg_locks output from back then around? We've recently found servers going into somewhat similar slowdowns because they exhausted the fastpath locks which made lwlocks far more expensive and made s_lock go up very high in the profle. I do. Unfortunately I don't have profile info. Not sure how useful it is -- I'll send it off-list. Great. The primary thing I'd like to know is whether there are lots of non-fastpath locks... If you ever get into the situation I mistakenly referred to again, I'd strongly suggest recompling postgres with -fno-omit-frame-pointer. That makes hierarchical profiles actually useful which can help tremendously with diagnosing issues like this... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 2013-09-17 08:40:23 -0500, Merlin Moncure wrote: If you ever get into the situation I mistakenly referred to again, I'd strongly suggest recompling postgres with -fno-omit-frame-pointer. That makes hierarchical profiles actually useful which can help tremendously with diagnosing issues like this... We may get an opportunity to do that. I'm curious enough about the THP compaction issues that Kevin mentioned to to maybe consider cranking buffers again. If I do that, it will be with strict instructions to the site operators to catch a profile before taking further action. The THP issues should be very clearly diagnosable because a good part of the time will be spent in the kernel. Lots of spinlocking there, but the function names are easily discernible from pg's code. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intermittent hangs with 9.2
On 2013-09-11 07:43:35 -0500, Merlin Moncure wrote: I've been seeing a strange issue with our Postgres install for about a year now, and I was hoping someone might be able to help point me at the cause. At what seem like fairly random intervals Postgres will become unresponsive to the 3 application nodes it services. These periods tend to last for 10 - 15 minutes before everything rights itself and the system goes back to normal. During these periods the server will report a spike in the outbound bandwidth (from about 1mbs to about 5mbs most recently), a huge spike in context switches / interrupts (normal peaks are around 2k/8k respectively, and during these periods they‘ve gone to 15k/22k), and a load average of 100+. CPU usage stays relatively low, but it’s all system time reported, user time goes to zero. It doesn‘t seem to be disk related since we’re running with a shared_buffers setting of 24G, which will fit just about our entire database into memory, and the IO transactions reported by the server, as well as the disk reads reported by Postgres stay consistently low. We‘ve recently started tracking how long statements take to execute, and we’re seeing some really odd numbers. A simple delete by primary key, for example, from a table that contains about 280,000 rows, reportedly took 18h59m46.900s. An update by primary key in that same table was reported as 7d 17h 58m 30.415s. That table is frequently accessed, but obviously those numbers don't seem reasonable at all. Some other changes we've made to postgresql.conf: synchronous_commit = off maintenance_work_mem = 1GB wal_level = hot_standby wal_buffers = 16MB max_wal_senders = 10 wal_keep_segments = 5000 checkpoint_segments = 128 checkpoint_timeout = 30min checkpoint_completion_target = 0.9 max_connections = 500 The server is a Dell Poweredge R900 with 4 Xeon E7430 processors, 48GB of RAM, running Cent OS 6.3. So far we‘ve tried disabling Transparent Huge Pages after I found a number of resources online that indicated similar interrupt/context switch issues, but it hasn’t resolve the problem. I managed to catch it happening once and run a perf which showed: + 41.40% 48154 postmaster 0x347ba9 f 0x347ba9 + 9.55% 10956 postmaster 0x2dc820 f set_config_option + 8.64%9946 postmaster 0x5a3d4 f writeListPage + 5.75%6609 postmaster 0x5a2b0 f ginHeapTupleFastCollect + 2.68%3084 postmaster 0x192483 f build_implied_join_equality + 2.61%2990 postmaster 0x187a55 f build_paths_for_OR + 1.86%2131 postmaster 0x794aa f get_collation_oid + 1.56%1822 postmaster 0x5a67e f ginHeapTupleFastInsert + 1.53%1766 postmaster 0x1929bc f distribute_qual_to_rels + 1.33%1558 postmaster 0x249671 f cmp_numerics I‘m not sure what 0x347ba9 represents, or why it’s an address rather than a method name. Try converting it to something more meaningful with addr2line, that often has more sucess. That's about the sum of it. Any help would be greatly appreciated and if you want any more information about our setup, please feel free to ask. Reducing shared buffers to around 2gb will probably make the problem go away That profile doesn't really look like one of the problem you are referring to would look like. Based on the profile I'd guess it's possible that you're seing problems with GIN's fastupdate mechanism. Try ALTER INDEX whatever SET (FASTUPDATE = OFF); VACUUM whatever's_table for all gin indexes. It's curious that set_config_option is so high in the profile... Any chance you could recompile postgres with -fno-omit-frame-pointers in CFLAGS? That would allow you to use perf -g. The performance price of that usually is below 1% for postgres. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 2013-09-11 15:06:23 -0400, Andrew Dunstan wrote: On 09/11/2013 02:35 PM, Josh Berkus wrote: All, We've confirmed that this issue is caused by having long-running idle transactions on the server. When we disabled their queueing system (which prodiced hour-long idle txns), the progressive slowness went away. Why that should affect 9.X far more strongly than 8.4, I'm not sure about. Does that mean that 8.4 was unsafe, or that this is something which *could* be fixed in later versions? I'm also confused as to why this would affect BIND time rather than EXECUTE time. One thing that this made me wonder is why we don't have transaction_timeout, or maybe transaction_idle_timeout. Because it's harder than it sounds, at least if you want to support idle-in-transactions. Note that we do not support pg_cancel_backend() for those yet... Also, I think it might lead to papering over actual issues with applications leaving transactions open. I don't really see a valid reason for an application needing cancelling of long idle transactions. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 2013-09-11 11:35:45 -0700, Josh Berkus wrote: All, We've confirmed that this issue is caused by having long-running idle transactions on the server. When we disabled their queueing system (which prodiced hour-long idle txns), the progressive slowness went away. Why that should affect 9.X far more strongly than 8.4, I'm not sure about. Does that mean that 8.4 was unsafe, or that this is something which *could* be fixed in later versions? The explanation is in http://archives.postgresql.org/message-id/20130910132133.GJ1024477%40alap2.anarazel.de The referenced commit introduced a planner feature. Funnily you seem to have been the trigger for it's introduction ;) I'm also confused as to why this would affect BIND time rather than EXECUTE time. Because we're doing the histogram checks during planning and not during execution. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 2013-09-09 20:38:09 -0400, Andrew Dunstan wrote: On 08/01/2013 03:20 PM, Jeff Janes wrote: On Thu, Aug 1, 2013 at 10:58 AM, Josh Berkus j...@agliodbs.com wrote: Amit, All: So we just retested this on 9.3b2. The performance is the same as 9.1 and 9.2; that is, progressively worse as the test cycles go on, and unacceptably slow compared to 8.4. Some issue introduced in 9.1 is causing BINDs to get progressively slower as the PARSEs BINDs get run repeatedly. Per earlier on this thread, that can bloat to 200X time required for a BIND, and it's definitely PostgreSQL-side. I'm trying to produce a test case which doesn't involve the user's application. However, hints on other things to analyze would be keen. Does it seem to be all CPU time (it is hard to imagine what else it would be, but...) Could you use oprofile or perf or gprof to get a profile of the backend during a run? That should quickly narrow it down to which C function has the problem. Did you test 9.0 as well? This has been tested back to 9.0. What we have found is that the problem disappears if the database has come in via dump/restore, but is present if it is the result of pg_upgrade. There are some long-running transactions also running alongside this - we are currently planning a test where those are not present. We're also looking at constructing a self-contained test case. Here is some perf output from the bad case: + 14.67% postgres [.] heap_hot_search_buffer + 11.45% postgres [.] LWLockAcquire + 8.39% postgres [.] LWLockRelease + 6.60% postgres [.] _bt_checkkeys + 6.39% postgres [.] PinBuffer + 5.96% postgres [.] hash_search_with_hash_value + 5.43% postgres [.] hash_any + 5.14% postgres [.] UnpinBuffer + 3.43% postgres [.] ReadBuffer_common + 2.34% postgres [.] index_fetch_heap + 2.04% postgres [.] heap_page_prune_opt + 2.00% libc-2.15.so [.] 0x8041b + 1.94% postgres [.] _bt_next + 1.83% postgres [.] btgettuple + 1.76% postgres [.] index_getnext_tid + 1.70% postgres [.] LockBuffer + 1.54% postgres [.] ReadBufferExtended + 1.25% postgres [.] FunctionCall2Coll + 1.14% postgres [.] HeapTupleSatisfiesNow + 1.09% postgres [.] ReleaseAndReadBuffer + 0.94% postgres [.] ResourceOwnerForgetBuffer + 0.81% postgres [.] _bt_saveitem + 0.80% postgres [.] _bt_readpage + 0.79% [kernel.kallsyms] [k] 0x81170861 + 0.64% postgres [.] CheckForSerializableConflictOut + 0.60% postgres [.] ResourceOwnerEnlargeBuffers + 0.59% postgres [.] BufTableLookup After a second look at this, I very tentatively guess that you'll see get_actual_variable_range() as the entry point here. Which would explain why you're seing this during PARSE. But there still is the question why we never actually seem to prune... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
Hi, On 2013-09-09 20:38:09 -0400, Andrew Dunstan wrote: On 08/01/2013 03:20 PM, Jeff Janes wrote: On Thu, Aug 1, 2013 at 10:58 AM, Josh Berkus j...@agliodbs.com wrote: Amit, All: So we just retested this on 9.3b2. The performance is the same as 9.1 and 9.2; that is, progressively worse as the test cycles go on, and unacceptably slow compared to 8.4. Some issue introduced in 9.1 is causing BINDs to get progressively slower as the PARSEs BINDs get run repeatedly. Per earlier on this thread, that can bloat to 200X time required for a BIND, and it's definitely PostgreSQL-side. I'm trying to produce a test case which doesn't involve the user's application. However, hints on other things to analyze would be keen. Does it seem to be all CPU time (it is hard to imagine what else it would be, but...) Could you use oprofile or perf or gprof to get a profile of the backend during a run? That should quickly narrow it down to which C function has the problem. Did you test 9.0 as well? This has been tested back to 9.0. What we have found is that the problem disappears if the database has come in via dump/restore, but is present if it is the result of pg_upgrade. There are some long-running transactions also running alongside this - we are currently planning a test where those are not present. We're also looking at constructing a self-contained test case. Here is some perf output from the bad case: + 14.67% postgres [.] heap_hot_search_buffer + 11.45% postgres [.] LWLockAcquire + 8.39% postgres [.] LWLockRelease + 6.60% postgres [.] _bt_checkkeys + 6.39% postgres [.] PinBuffer + 5.96% postgres [.] hash_search_with_hash_value + 5.43% postgres [.] hash_any + 5.14% postgres [.] UnpinBuffer + 3.43% postgres [.] ReadBuffer_common + 2.34% postgres [.] index_fetch_heap + 2.04% postgres [.] heap_page_prune_opt A backtrace for this would be useful. Alternatively you could recompile postgres using -fno-omit-frame-pointer in CFLAGS and use perf record -g. Any chance you have older prepared xacts, older sessions or something like that around? I'd expect heap_prune* to be present in workloads that spend significant time in heap_hot_search_buffer... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 2013-09-10 08:45:33 -0400, Andrew Dunstan wrote: On 09/10/2013 08:20 AM, Andres Freund wrote: A backtrace for this would be useful. Alternatively you could recompile postgres using -fno-omit-frame-pointer in CFLAGS and use perf record -g. It's using a custom build, so this should be doable. Great. Any chance you have older prepared xacts, older sessions or something like that around? I'd expect heap_prune* to be present in workloads that spend significant time in heap_hot_search_buffer... Not sure about prepared transactions. There are certainly probably old prepared statements around, and long running transactions alongside this one. Ok, long running transactions will do the trick. I quicky checked and doing an index lookup for min/max histogram lookups was added *after* 8.4 which would explain why you're not seing the issue there (c.f. 40608e7f949fb7e4025c0ddd5be01939adc79eec). It getting slower and slower during a testrun would be explained by the additional tuple versions amassing which cannot be marked dead because of older transactions around. I guess those are also part of the test? If I interpret things correctly you're using serializable? I guess there is no chance to use repeatable read instead? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance bug in prepared statement binding in 9.2?
On 2013-09-10 15:21:33 +0200, Andres Freund wrote: If I interpret things correctly you're using serializable? I guess there is no chance to use repeatable read instead? Err, that wouldn't help much. Read committed. That lets PGXACT-xmin advance these days and thus might help to reduce the impact of the longrunning transactions. Otherwise you will have to shorten those... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote: + bool + RecoveryMightBeInProgress(void) + { + /* + * We check shared state each time only until we leave recovery mode. We + * can't re-enter recovery, so there's no need to keep checking after the + * shared variable has once been seen false. + */ + if (!LocalRecoveryInProgress) + return false; + else + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + /* Intentionally query xlogctl without spinlocking! */ + LocalRecoveryInProgress = xlogctl-SharedRecoveryInProgress; + + return LocalRecoveryInProgress; + } + } I don't think it's acceptable to *set* LocalRecoveryInProgress here. That should only be done in the normal routine. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitions not Working as Expected
On 2013-06-27 14:42:26 -0400, Tom Lane wrote: Shaun Thomas stho...@optionshouse.com writes: On 06/27/2013 12:42 PM, Dave Johansen wrote: Or what about something like DATE_TRUNC(DAY, now())? Or would that run into the same optimization/planner problems as CURRENT_DATE? Same issue. This seems to work, though I'm not entirely sure of the implications: UPDATE pg_proc SET provolatile = 'i' WHERE proname = 'date_in'; That will break things: CURRENT_DATE will then be equivalent to just writing today's date as a literal. It's conceivable that it wouldn't break any scenario that you personally care about, if you never use CURRENT_DATE in any view, rule, column default expression, or cached plan; but it seems mighty risky from here. I don't see any very good solution to your problem within the current approach to partitioning, which is basically theorem-proving. That proof engine has no concept of time passing, let alone the sort of detailed knowledge of the semantics of this particular function that would allow it to conclude if CURRENT_DATE '2013-06-20' is true now, it will always be so in the future as well. Couldn't we at least significantly improve on the status quo by detecting we're currently planning a query that's only going to be executed once (because it's directly executed or because were planning a onetime plan for specific parameters) and inline stable functions before doing the theorem proving? Maybe I am missing something here? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Setting vacuum_freeze_min_age really low
On 2013-05-13 13:21:54 -0400, Robert Haas wrote: On Sun, May 12, 2013 at 8:50 AM, Andres Freund and...@2ndquadrant.com wrote: [ a response that I entirely agree with ] +1 to all that. It's maybe worth noting that it's probably fairly uncommon for vacuum to read a page and not dirty it, because if the page is all-visible, we won't read it. But only if 50(?)+ pages are marked all-visible in one go, otherwise we afair won't skip afair. And we don't skip them at all during full table vacuums. And if it's not all-visible, and there's nothing else interesting to do with it, we'll probably make it all-visible, which will dirty it. It can happen, if for example we vacuum a page with no dead tuples while the inserting transaction is still running, or committed but not yet all-visible. Of course, in those cases we won't be able to freeze, either. IIRC the actual values below which we freeze are always computed relative to GetOldestXmin() (and have to, otherwise rows will suddently appear visible). In many, many environment thats lagging behind quite a bit. Longrunning user transactions, pg_dump, hot_standby_feedback, vacuum_defer_cleanup_age... Also, even if the *whole* page isn't all visible because e.g. there just was another row inserted we still freeze individual rows. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Setting vacuum_freeze_min_age really low
Hi Josh, On 2013-05-11 16:28:32 -0700, Josh Berkus wrote: That, and Tom's concern about forensics, which I understand to be the larger sticking point. I don't buy the idea that we should cause regular recurring performance issues for all of our users in order to aid diagnosing the kind of issues which happen 1% of the time to 2% of our users. Well. For one you haven't proven that the changed setting actually improves performance. So the comparison isn't really valid. We will still need full table vacuums to be able to change relfrozenxids. Also, he small percentages are the cases where the shit really hit the fan. Making sure you have at least some chance of a) diagnosing the issue b) recovering data is a pretty good thing. So, if the table's age is less than vacuum_freeze_table_age, we'll only scan pages not already marked all-visible. Regardless of vfma, we probably won't freeze much. Right, but the pages which were dirtied *anyway* will get frozen. I think you're missing the fact that we don't neccessarily dirty pages, just because vacuum visits them. In a mostly insert workload its not uncommon that vacuum doesn't change anything. In many scenarios the first time vacuum visits a page it cannot yet me marked all-visible yet so we will visit again soon after anyway. And after that there will be regular full table vacuums. It will also often enough lead to a page being frozen repeatedly which causes unneccessary IO and WAL traffic. If a page contains pages from several transactions its not unlikely that some tuples are older and some are newer than vfma. That scenario isn't unlikely because of two scenarios: Nobody has yet explained to me where this extra WAL and IO traffic would come from. vfma only takes effect if the page is being vacuumed *anyway*. There's multiple points here: a) we don't necessarily write/dirty anything if vacuum doesn't find anything to do b) freezing tuples requires a xlog_heap_freeze wal record to be emitted. If we don't freeze, we don't need to emit it. And if the page is being vacuumed anyway, the page is being rewritten anyway, and it doesn't matter how many changes we make on that page, except as far as CPU time is concerned. As far as IO is concerned, an 8K page is an 8K page. No? Sure, *if* we writeout the page, it doesn't matter at all whether we changed one byte or all of them. Unless it also requires extra xlog records to be emitted. The only time I can imagine this resulting in extra IO is if vacuum is regularly visiting pages which don't have any other work to do, but do have tuples which could be frozen if vfma was lowered. I would tend to think that this would be a tiny minority of pages, but testing may be the only way to answer that. INSERT only produces workloads like that. When a page contains freezable items, as determined by freeze_min_age, and we are doing a full table scan we won't skip buffers that we can't lock for cleanup. Instead we will wait and then lock them for cleanup. So I think this would be rather noticeably impact the speed of vacuum (since it waits more often) and concurrency (since we lock more buffers than before, even if they are actively used). Well, that behavior sounds like something we should maybe fix, regardless of whether we're lowering the default vfma or not. Well, that's easier said than done ;) I wonder if we couldn't do the actual freezeing - not the dead tuple deletion - without a cleanup but just with an exclusive lock? I think I have said that before, but anyway: I think as long as we need to regularly walk the whole relation for correctness there isn't much hope to get this into an acceptable state. If we would track the oldest xid in a page in a 'freeze map' we could make much of this more efficient and way more scalable to bigger data volumes. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Setting vacuum_freeze_min_age really low
Hi, On 2013-03-25 13:31:17 -0700, Josh Berkus wrote: In the past, setting vacuum_freeze_min_age (vfma) really low (say to 1 or 5) would have caused lots of extra writing work due to dirtying extra pages for freezing. This has been our stated reason to keep vfma high, despite the obvious advantage of freezing tuples while they're still in the cache. With the visibility map, though, vfma should only be dirtying pages which vacuum is already visiting because there's dirty tuples on the page. That is, pages which vacuum will probably dirty anyway, freezing or not. (This is assuming one has applied the 9.2.3 update.) Given that, it seems like the cost of lowering vfma *should* be marginal. The only extra work done by a lower vfma should be: 1. extra cpu time to put in the froxenXIDs on vacuumed pages, and 2. dirtying the minority of pages which vacuum decided to scan, but not write to. It will also often enough lead to a page being frozen repeatedly which causes unneccessary IO and WAL traffic. If a page contains pages from several transactions its not unlikely that some tuples are older and some are newer than vfma. That scenario isn't unlikely because of two scenarios: - INSERT/UPDATE reusing space on older pages where tuples have been deleted. - When a backend extends a relation that page is *not* known to have free space to other relations. Until vacuum comes along for the first time only this backend will use its space. Given that busy clusters frequently burn loads of xids per second it is not uncommon to have a wide range of xids on such a page. And are there other costs I'm not thinking of? I think (but am not 100% sure right now) it would have another rather big cost: When a page contains freezable items, as determined by freeze_min_age, and we are doing a full table scan we won't skip buffers that we can't lock for cleanup. Instead we will wait and then lock them for cleanup. So I think this would be rather noticeably impact the speed of vacuum (since it waits more often) and concurrency (since we lock more buffers than before, even if they are actively used). Makes sense? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Setting vacuum_freeze_min_age really low
On 2013-05-09 12:09:04 -0400, Robert Haas wrote: On Mon, Mar 25, 2013 at 4:31 PM, Josh Berkus j...@agliodbs.com wrote: In the past, setting vacuum_freeze_min_age (vfma) really low (say to 1 or 5) would have caused lots of extra writing work due to dirtying extra pages for freezing. This has been our stated reason to keep vfma high, despite the obvious advantage of freezing tuples while they're still in the cache. That, and Tom's concern about forensics, which I understand to be the larger sticking point. FWIW I found having sensible xmin/xmax repeatedly really useful for debugging problems. Most problems don't get noticed within minutes so loosing evidence that fast is bad. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 9.2.1 index-only scans : abnormal heap fetches after VACUUM FULL
On 2012-11-29 17:20:01 +0530, Pavan Deolasee wrote: On Thu, Nov 29, 2012 at 5:03 PM, Guillaume Cottenceau g...@mnc.ch wrote: Hello, I am toying around with 9.2.1, trying to measure/determine how index-only scans can improve our performance. A small script which is attached to this mail, shows that as long as the table has been VACUUM FULL'd, there is a unusual high amount of heap fetches. It is strange that the visibilitymap_test predicate fails in these situations, is the visibility map somehow trashed in this situation? It should not, or at least the documentation[1] should state it (my understanding is that vacuum full does *more* than vacuum, but nothing less) (note to usual anti vacuum full trollers: I know you hate vacuum full). I don't find it very surprising given that VACUUM FULL is now implemented as a CLUSTER command which rewrites the entire heap, thus invalidating all the visibility map info whatsoever. Me neither. Now can CLUSTER or VACUUM FULL recreate the visibility map with all bits set to visible, thats an entirely different question. I don't think it can, but then I haven't thought through this completely. It can't set everything to visible as it also copies RECENTLY_DEAD tuples and tuples which are not yet visible to other transactions, but it should be relatively easy to keep enough information about whether it can set the current page to all visible. At least for the data in the main relation, the toast tables are a different matter. Just tracking whether the page in rewriteheap.c's state-rs_buffer contains only tuples that are clearly visible according to the xmin horizon seems to be enough. The current effect of resetting the VM has the disadvantage of making the next autovacuum basically a full table vacuum without any benefits... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 9.2.1 index-only scans : abnormal heap fetches after VACUUM FULL
On 2012-11-29 17:59:39 +0530, Pavan Deolasee wrote: On Thu, Nov 29, 2012 at 5:42 PM, Andres Freund and...@2ndquadrant.comwrote: On 2012-11-29 17:20:01 +0530, Pavan Deolasee wrote: Now can CLUSTER or VACUUM FULL recreate the visibility map with all bits set to visible, thats an entirely different question. I don't think it can, but then I haven't thought through this completely. It can't set everything to visible as it also copies RECENTLY_DEAD tuples and tuples which are not yet visible to other transactions, but it should be relatively easy to keep enough information about whether it can set the current page to all visible. Yeah, that looks fairly easy to have. Thinking about it more, now that we have ability to skip WAL for the case when a table is created and populated in the same transaction, we could also set the visibility map bits for such a table (if we are not doing that already). That should be fairly safe too. I don't think the latter would be safe at all. Every repeatable read transaction that started before the table creation would see that tables content based on the visibilitymap instead of seeing it as empty. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 2012-11-21 10:21:16 -0500, Andrew Dunstan wrote: On 11/21/2012 09:59 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: If we're going to do it can we please come up with something more intuitive and much, much more documented than OFFSET 0? And if/when we do this we'll need to have big red warnings all over then release notes, since a lot of people I know will need to do some extensive remediation before moving to such a release. The probability that we would actually *remove* that behavior of OFFSET 0 is not distinguishable from zero. I'm not terribly excited about having an alternate syntax to specify an optimization fence, but even if we do create such a thing, there's no need to break the historical usage. I wasn't talking about removing it. My point was that if the optimization fence around CTEs is removed a lot of people will need to rework apps where they have used them for that purpose. And I continue to think that spelling it OFFSET 0 is horribly obscure. +1 WITH foo AS (SELECT ...) (barrier=on|off)? 9.3 introduces the syntax, defaulting to on 9.4 switches the default to off. Greetings, Andres Freund -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 2012-11-21 13:16:25 -0300, Claudio Freire wrote: On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund and...@anarazel.de wrote: +1 WITH foo AS (SELECT ...) (barrier=on|off)? 9.3 introduces the syntax, defaulting to on 9.4 switches the default to off. Why syntax? What about a guc? collapse_cte_limit? Because there are very good reasons to want to current behaviour. A guc is a global either/or so I don't see it helping much. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance using CTE
On 2012-11-21 13:32:45 -0300, Claudio Freire wrote: On Wed, Nov 21, 2012 at 1:24 PM, Andres Freund and...@2ndquadrant.com wrote: On 2012-11-21 13:16:25 -0300, Claudio Freire wrote: On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund and...@anarazel.de wrote: +1 WITH foo AS (SELECT ...) (barrier=on|off)? 9.3 introduces the syntax, defaulting to on 9.4 switches the default to off. Why syntax? What about a guc? collapse_cte_limit? Because there are very good reasons to want to current behaviour. A guc is a global either/or so I don't see it helping much. set collapse_cte_limit=8; with blah as (blah) select blah; Not global at all. Not very manageable though. And it doesn't help if you need both in a query which isn't actually that unlikely. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Increasing work_mem and shared_buffers on Postgres 9.2 significantly slows down queries
On 2012-10-30 14:08:56 -0500, Petr Praus wrote: select count(*) from contest c left outer join contestparticipant cp on c.id=cp.contestId left outer join teammember tm on tm.contestparticipantid=cp.id left outer join staffmember sm on cp.id=sm.contestparticipantid left outer join person p on p.id=cp.personid left outer join personinfo pi on pi.id=cp.personinfoid where pi.lastname like '%b%' or pi.firstname like '%a%'; Btw, not really related to the question, but the way you use left joins here doesn't really make sense and does lead to inferior plans. As you restrict on 'pi', the rightmost table in a chain of left joins, there is no point in all those left joins. I would guess the overall plan is better if use straight joins. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote: Maciek Sakrejda m.sakre...@gmail.com writes: Presumably something like this?: maciek=# CREATE TABLE test AS SELECT g, random() FROM generate_series(1,1000) g; CREATE maciek=# EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY ctid) x where x.g = test.g; There's no guarantee that the planner won't re-sort the rows coming from the sub-select, unfortunately. More often than not you can prevent the planner from doing that by putting a OFFSET 0 in the query. Not 100% but better than nothing. We really need ORDER BY for DML. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
On Friday, October 05, 2012 05:46:05 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote: There's no guarantee that the planner won't re-sort the rows coming from the sub-select, unfortunately. More often than not you can prevent the planner from doing that by putting a OFFSET 0 in the query. Not 100% but better than nothing. No, that will accomplish exactly nothing. The ORDER BY is already an optimization fence. Yea, sorry. I was thinking of related problem/solution. We really need ORDER BY for DML. Meh. That's outside the SQL standard (not only outside the letter of the standard, but foreign to its very conceptual model) and I don't think the problem really comes up that often. Back when I mostly did consulting/development on client code it came up about once a week. I might have a warped view though because thats the kind of thing you would ask a consultant about... Having said all that, are we sure this is even a deletion-order problem? I was wondering about deadlocks from foreign key references, for instance. Absolutely not sure, no. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres becoming slow, only full vacuum fixes it
On Monday, September 24, 2012 02:21:09 PM Julien Cigar wrote: 5) synchronous_commit = off should only be used if you have a battery-backed write cache. Huh? Are you possibly confusing this with full_page_writes? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres becoming slow, only full vacuum fixes it
On Monday, September 24, 2012 02:53:59 PM Julien Cigar wrote: On 09/24/2012 14:34, Andres Freund wrote: On Monday, September 24, 2012 02:21:09 PM Julien Cigar wrote: 5) synchronous_commit = off should only be used if you have a battery-backed write cache. Huh? Are you possibly confusing this with full_page_writes? indeed...! sorry for that (note that you still have a (very) small chance of loosing data with synchronous_commit = off if your server crashes between two commit chunks) Sure, you have a chance of loosing the last some transactions, but you won't corrupt anything. Thats the entire point of the setting ;) Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] transactions start time
Hi, On Wednesday, July 25, 2012 04:56:20 PM Tom Lane wrote: Aleksei Arefjev aleksei.aref...@nordicgaming.com writes: On 24 July 2012 20:21, Richard Huxton d...@archonet.com wrote: I'm not sure if I'm reading this right, but are there more than 48 million BEGINs that took 0s each (presumably rounded down) and then a handful taking about 0.8s? I'm wondering exactly where/how the duration was measured. If it was at a client, maybe the apparent delay had something to do with network glitches? It seems suspicious that all the outliers are around 0.8s. It would be useful to look to see if there's any comparable pattern for statements other than BEGIN. As Richard says, a BEGIN by itself ought to take negligible time. He earlier also asked on the IRC-Channel and I got the idea that the problem could be explained by pgbouncer in transaction pooling mode waiting for a free backend connection. Aleksei confirmed that they use pgbouncer in that configuration, so that might be it. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Massive I/O spikes during checkpoint
On Tuesday, July 10, 2012 08:14:00 AM Maxim Boguk wrote: On Tue, Jul 10, 2012 at 4:03 PM, David Kerr d...@mr-paradox.net wrote: On Jul 9, 2012, at 10:51 PM, Maxim Boguk wrote: But what appears to be happening is that all of the data is being written out at the end of the checkpoint. This happens at every checkpoint while the system is under load. I get the feeling that this isn't the correct behavior and i've done something wrong. It's not an actual checkpoints. It's is a fsync after checkpoint which create write spikes hurting server. You should set sysctl vm.dirty_background_bytes and vm.dirty_bytes to reasonable low values So use bla_bytes instead of bla_ratio? Yes because on 256GB server echo 10 /proc/sys/vm/dirty_ratio is equivalent to 26Gb dirty_bytes and echo 5 /proc/sys/vm/dirty_background_ratio is equivalent to 13Gb dirty_background_bytes It is really huge values. So kernel doesn't start write any pages out in background before it has at least 13Gb dirty pages in kernel memory. And at end of the checkpoint kernel trying flush all dirty pages to disk. Thast not entirely true. The kernel will also writeout pages which haven't been written to for dirty_expire_centisecs. But yes, adjusting dirty_* is definitely a good idea. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Massive I/O spikes during checkpoint
On Tuesday, July 10, 2012 03:36:35 PM Jeff Janes wrote: On Tue, Jul 10, 2012 at 5:44 AM, Andres Freund and...@2ndquadrant.com wrote: On Tuesday, July 10, 2012 08:14:00 AM Maxim Boguk wrote: So kernel doesn't start write any pages out in background before it has at least 13Gb dirty pages in kernel memory. And at end of the checkpoint kernel trying flush all dirty pages to disk. Thast not entirely true. The kernel will also writeout pages which haven't been written to for dirty_expire_centisecs. There seems to be many situations in which it totally fails to do that. Totally as in diry pages sitting around without any io activity? Or just not agressive enough? Currently its a bit hard to speculate about all without specifying the kernel because there have been massive rewrites of all that stuff in several kernels in the last two years... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Friday, July 06, 2012 01:38:56 PM Daniel Farina wrote: ll, I don't know a mechanism besides slow file system truncation time that would explain why DELETE would be significantly faster. There is no filesystem truncation happening. The heap and the indexes get mapped into a new file. Otherwise rollback would be pretty hard to implement. I guess the biggest cost in a bigger cluster is the dropping the buffers that were formerly mapped to that relation (DropRelFileNodeBuffers). Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Maximum number of sequences that can be created
On Tuesday, May 15, 2012 08:29:11 AM Віталій Тимчишин wrote: 2012/5/13 Robert Klemme shortcut...@googlemail.com On Sun, May 13, 2012 at 10:12 AM, Віталій Тимчишин tiv...@gmail.com wrote: 2012/5/11 Robert Klemme shortcut...@googlemail.com On the contrary: what would be the /advantage/ of being able to create millions of sequences? What's the use case? We are using sequences as statistics counters - they produce almost no performance impact and we can tolerate it's non-transactional nature. I can imaging someone who wants to have a sequence per user or other relation row. I can almost see the point. But my natural choice in that case would be a table with two columns. Would that actually be so much less efficient? Of course you'd have fully transactional behavior and thus locking. We've had concurrency problems with table solution (a counter that is updated by many concurrent queries), so we traded transactionality for speed. We are actually using this data to graph pretty graphs in nagios, so it's quite OK. But we have only ~10 sequences, not millions :) I would rather suggest going with a suming table if you need to do something like that: sequence_id | value 1 | 3434334 1 | 1 1 | -1 1 | 1 1 | 1 ... You then can get the current value with SELECT SUM(value) WHERE sequence_id = 1. For garbage collection you can delete those values and insert the newly summed up value again. That solution won't ever block if done right. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fwd: [HACKERS] client performance v.s. server statistics
Hi, On Wednesday, February 15, 2012 11:19:00 AM Zhou Han wrote: I have tried unix domain socket and the performance is similar with TCP socket. It is MIPS architecture so memory copy to/from kernel can occupy much time, and apparently using unit domain socket has no difference than TCP in terms of memory copy. But it is still unbelievable for the ten-fold gap between the client side statistic and the server side statistics. So I want to know what exactly the operations are involved in the server side statistics in EXPLAIN ANALYZE. May I check the code later on when I get time. My guess is that the time difference youre seing is actually the planning time. The timing shown at the end of EXPLAIN ANALYZE is just the execution, not the planning time. You can use \timing on in psql to let it display timing information that include planning. Whats the query? For the query itself, it was just for performance comparison. There are other index based queries, which are of course much faster, but still result in similar ten-fold of time gap between client side and server side statistics. I am thinking of non-kernel involved client interface, is there such an option, or do I have to develop one from scratch? Its unlikely thats possible in a sensible amount of time. But I don't think thats your problem anyway. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fwd: [HACKERS] client performance v.s. server statistics
On Wednesday, February 15, 2012 12:33:13 PM Han Zhou wrote: Hi, To be more specific, I list my calculation here: The timing shown in psql may include: plan + execution + copying to result set in backend (does this step exist?) + transferring data to client via socket. Correct. Then I want to know what's the time shown in pg_stat_statement and EXPLAIN ANALYZE in terms of the above mentioned parts. And why are the gap is almost 10 times (100 ms v.s. 1 second)? As a comparison, transferring same amount of data with unix domain socket should cost only a very small fraction of this (almost negligible), according to my other performance tests. Yea, you proved my quick theory wrong. And I don't think the plan time plays an important role here in EXPLAIN ANALYZE, because the command itself costs similar time to the Total runtime as shown in psql (timing on), which means the plan is too simple to take any significant part of time in this case. Sounds like that. It would be interesting to see the time difference between: COPY (SELECT * FROM blub) TO '/tmp/somefile'; COPY (SELECT * FROM blub) TO '/tmp/somefile' BINARY; EXPLAIN ANALYZE SELECT * FROM blub; Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] STRICT SQL functions never inline
On Tuesday, November 08, 2011 15:29:03 Josh Berkus wrote: Folks, After having some production issues, I did some testing and it seems that any SQL function declared STRICT will never inline. As a result, it won't work with either indexes (on the underlying predicate) or partitioning. This seems like a horrible gotcha for our users. At the very least I'd like to document it (in CREATE FUNCTION, presumably), but it would be better to fix it. Thoughts? I am all for documenting it somewhere. There were lots of people hit by it in the past - e.g. the postgis folks. Its not so easy to fix though. The problem is that straight inlining would change the behaviour because suddenly the expression might not return NULL anymore even though one of the parameters is NULL. Or even cause more problems because the content wasn't prepared to handle NULLs. It would be possible to inline a CASE $1 IS NULL OR $2 IS NULL THEN NULL ELSE orig_expression END but that would be usefull in far fewer cases because it won't help much in most cases and actually might hurt performance in some. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
On Wednesday 02 Nov 2011 16:13:09 Robert Haas wrote: On Wed, Nov 2, 2011 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jay Levitt jay.lev...@gmail.com writes: So you can see where I'm going. I know if I break everything into elegant, composable functions, it'll continue to perform poorly. If I write one big hairy, it'll perform great but it will be difficult to maintain, and it will be inelegant and a kitten will die. My tools are CTEs, subqueries, aliases, SQL functions, PL/pgSQL functions, and views (and other tools?) What optimizations do each of those prevent? plpgsql functions are black boxes to the optimizer. If you can express your functions as single SQL commands, using SQL-language functions is usually a better bet than plpgsql. CTEs are also treated as optimization fences; this is not so much an optimizer limitation as to keep the semantics sane when the CTE contains a writable query. I wonder if we need to rethink, though. We've gotten a number of reports of problems that were caused by single-use CTEs not being equivalent - in terms of performance - to a non-CTE formulation of the same idea. It seems necessary for CTEs to behave this way when the subquery modifies data, and there are certainly situations where it could be desirable otherwise, but I'm starting to think that we shouldn't do it that way by default. Perhaps we could let people say something like WITH x AS FENCE (...) when they want the fencing behavior, and otherwise assume they don't (but give it to them anyway if there's a data-modifying operation in there). +1. I avoid writing CTEs in many cases where they would be very useful just for that reasons. I don't even think some future inlining necessarily has to be restricted to one-use cases only... +1 for making fencing behaviour as well. Currently there is no real explicit method to specify this which is necessarily future proof (WITH, OFFSET 0)... Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow cursor
Hi, On Wednesday 26 Oct 2011 14:43:08 Cezariusz Marek wrote: Is there any known problem with slow cursors in PostgreSQL 8.4.5? I have a following query, which is slow (on my database it takes 11 seconds to execute), probably should be rewritten, but it doesn't matter here. The problem is, that in cursor, each fetch takes much longer (even few minutes!), while only the first one should be slow. Am I doing something wrong? Does the problem persist if you play around with cursor_tuple_fraction? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
On Monday 05 Sep 2011 22:23:32 Scott Marlowe wrote: On Mon, Sep 5, 2011 at 11:24 AM, Andres Freund and...@anarazel.de wrote: On Monday, September 05, 2011 14:57:43 Richard Shaw wrote: Autovacuum has been disabled and set to run manually via cron during a quiet period and fsync has recently been turned off to gauge any real world performance increase, there is battery backup on the raid card providing some level of resilience. That doesn't help you against a failure due to fsync() off as the BBU can only protect data that actually has been written to disk. Without fsync=on no guarantee about that exists. Further, if you've got a bbu cache on the RAID card the gains from fsync=off wll be low / nonexistent. Thats not necessarily true. If you have a mixed load of many small writes and some parallel huge writes (especially in combination with big indexes) fsync=off still can give you quite big performance increases. Even in the presenence of synchronous_commit=off. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Rather large LA
On Monday, September 05, 2011 14:57:43 Richard Shaw wrote: Autovacuum has been disabled and set to run manually via cron during a quiet period and fsync has recently been turned off to gauge any real world performance increase, there is battery backup on the raid card providing some level of resilience. That doesn't help you against a failure due to fsync() off as the BBU can only protect data that actually has been written to disk. Without fsync=on no guarantee about that exists. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] setting configuration values inside a stored proc
Hi, On Friday, May 13, 2011 01:10:19 AM Samuel Gendler wrote: I've got a stored proc that constructs some aggregation queries as strings and then executes them. I'd like to be able to increase work_mem before running those queries. If I set a new value for work_mem within the stored proc prior to executing my query string, will that actually have an impact on the query or is work_mem basically a constant once the outer statement that calls the stored proc has begun? I'd just test, but it will take hours for me to grab a copy of production data and import into a new db host for testing. I've already started that process, but I'm betting I'll have an answer by the time it completes. It's just the difference between modifying the application which calls the procs (and doing a full software release in order to do so or else waiting a month to go in the next release) vs modifying the procs themselves, which requires only db a update. I would suggest doing ALTER FUNCTION blub(blarg) SET work_mem = '512MB'; Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance