[PERFORM] fragmention issue with ext4: e4defrag?
Hello, Has anyone some experience using defragmentation tools on Linux against tablespaces ? we are facing fragmentation problems with postgres instances having a few TB of data. ( RAID 5 ) I/O througput decreased from 300MB/s to 160. - We first moved some schemas to separate servers. After that we still have 150'000 tables in 1.5 TB - Now we are in the process of vacuuming FULL historical tables which are not written anymore. This seems to improve the I/O considerably Our remaining issue is that the free space fragmentíon is still suboptimal so that fragmention will probably start again soon. Would it make sense to use a tool like e4defrag (http://www.linux.org/threads/online-defragmentation.4121/) in order to defrag the free space ? And how safe is it to use such a tool against a running postgres instance? many thanks, Marc Mamin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Hash Join node sometimes slow
Hi, I'm in the process of attempting to tune some slow queries. I came across a scenario where I'm not entirely sure how I might figure out why a node is taking awhile to process. I'm not concerned with the query itself, we are working to figure out how we can make it faster. But I was hoping someone might be able to provide some insight into why a hash join is sometimes slow. For example, running explain (analyze, buffers) with the query, 4/5 times we will see the following: - Hash Join (cost=16385.76..103974.09 rows=523954 width=64) (actual time=532.634..4018.678 rows=258648 loops=1) Hash Cond: (p.a = c.c) Buffers: shared hit=4 read=29147, temp read=12943 written=12923 - Seq Scan on p (cost=0.00..38496.88 rows=1503188 width=60) (actual time=0.013..1388.205 rows=1503188 loops=1) Buffers: shared hit=1 read=23464 - Hash (cost=15382.47..15382.47 rows=57703 width=12) (actual time=527.237..527.237 rows=57789 loops=1) Buckets: 4096 Batches: 4 Memory Usage: 632kB Buffers: shared hit=3 read=5683, temp read=617 written=771 The other times, we will see something like this: - Hash Join (cost=16385.76..103974.09 rows=523954 width=64) (actual time=587.277..15208.621 rows=258648 loops=1) Hash Cond: (p.a = c.c) Buffers: shared hit=26 read=29125, temp read=12943 written=12923 - Seq Scan on p (cost=0.00..38496.88 rows=1503188 width=60) (actual time=0.013..1525.608 rows=1503188 loops=1) Buffers: shared hit=22 read=23443 - Hash (cost=15382.47..15382.47 rows=57703 width=12) (actual time=581.638..581.638 rows=57789 loops=1) Buckets: 4096 Batches: 4 Memory Usage: 632kB Buffers: shared hit=4 read=5682, temp read=617 written=771 Does anyone have ideas on what might be causing the difference in timing for the hash join node? Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash Join node sometimes slow
Dave Roberge drobe...@bluetarp.com writes: For example, running explain (analyze, buffers) with the query, 4/5 times we will see the following: - Hash Join (cost=16385.76..103974.09 rows=523954 width=64) (actual time=532.634..4018.678 rows=258648 loops=1) Hash Cond: (p.a = c.c) Buffers: shared hit=4 read=29147, temp read=12943 written=12923 - Seq Scan on p (cost=0.00..38496.88 rows=1503188 width=60) (actual time=0.013..1388.205 rows=1503188 loops=1) Buffers: shared hit=1 read=23464 - Hash (cost=15382.47..15382.47 rows=57703 width=12) (actual time=527.237..527.237 rows=57789 loops=1) Buckets: 4096 Batches: 4 Memory Usage: 632kB Buffers: shared hit=3 read=5683, temp read=617 written=771 The other times, we will see something like this: - Hash Join (cost=16385.76..103974.09 rows=523954 width=64) (actual time=587.277..15208.621 rows=258648 loops=1) Hash Cond: (p.a = c.c) Buffers: shared hit=26 read=29125, temp read=12943 written=12923 - Seq Scan on p (cost=0.00..38496.88 rows=1503188 width=60) (actual time=0.013..1525.608 rows=1503188 loops=1) Buffers: shared hit=22 read=23443 - Hash (cost=15382.47..15382.47 rows=57703 width=12) (actual time=581.638..581.638 rows=57789 loops=1) Buckets: 4096 Batches: 4 Memory Usage: 632kB Buffers: shared hit=4 read=5682, temp read=617 written=771 Does anyone have ideas on what might be causing the difference in timing for the hash join node? I'd bet on the extra time being in I/O for the per-batch temp files, since it's hard to see what else would be different if the data were identical in each run. Maybe the kernel is under memory pressure and is dropping the file data from in-memory disk cache. Or maybe it's going to disk all the time but the slow runs face more I/O congestion. Personally, for a problem of this size I'd increase work_mem enough so you don't get multiple batches in the first place. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] fragmention issue with ext4: e4defrag?
Marc Mamin m.ma...@intershop.de wrote: I/O througput decreased from 300MB/s to 160. I don't have any experience with ext4 defrag tools, but just wanted to point out that the difference in performance you cite above is about the same as the difference between accessing data on the outer (and usually first-filled) tracks on a disk drive and the inner tracks. One of the reasons performance falls as a drive fills is that the OS is compelled to use slower and slower portions of the disk. Part of the benefit you are seeing might be due to freeing fast tracks and data being relocated there. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres Replaying WAL slowly
On 2014-07-01 15:20:37 -0400, Tom Lane wrote: Jeff Frost j...@pgexperts.com writes: On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Did you check whether the locks were all on temp tables of the ON COMMIT DROP persuasion? And indeed it did catch up overnight and the lag increased shortly after a correlating spike in AccessExclusiveLocks that were generated by temp table creation with on commit drop. OK, so we have a pretty clear idea of where the problem is now. It seems like there are three, not mutually exclusive, ways we might address this: 1. Local revisions inside StandbyReleaseLocks to make it perform better in the presence of many locks. This would only be likely to improve matters much if there's a fixable O(N^2) algorithmic issue; but there might well be one. It sounded like Andres had taken a preliminary look at #1 and found a possible avenue for improvement, which I'd encourage him to pursue. I don't have the resources to do this right now, but yes, I think we can get relatively easily get rid of the O(num_locks * num_subtransactions) behaviour. 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on the grounds that no standby should be touching them. I'm not entirely sure that that argument is bulletproof though; in particular, even though a standby couldn't access the table's data, it's possible that it would be interested in seeing consistent catalog entries. Hm. We definitely perform checks surprisingly late for those. It's possible to do SELECT * FROM pg_temp_nn.whatever; without an error f there's no rows of if the rest of the plan doesn't do accesses to that table. The check prohibiting access is only in bufmgr.c... So yea, I don't think we can do this for at least 9.4. And there it'll still be hard. 3. Avoid WAL-logging AccessExclusiveLocks associated with new-in-transaction tables, temp or not, on the grounds that no standby could even see such tables until they're committed. We could go a bit further and not take out any locks on a new-in-transaction table in the first place, on the grounds that other transactions on the master can't see 'em either. For both #2 and the conservative version of #3, the main implementation problem would be whether the lock WAL-logging code has cheap access to the necessary information. I suspect it doesn't. Not trivially. It's logged directly in LockAcquireExtended(). We could add the information into locktags as there's unused fields for relation locktags, but brrr. The radical version of #3 might be pretty easy to do, at least to the extent of removing locks taken out during CREATE TABLE. I suspect there are some assertions or other consistency checks that would get unhappy if we manipulate relations without locks, though, so those would have to be taught about the exception. Also, we sometimes forget new-in-transaction status during relcache flush events; it's not clear if that would be a problem for this. I think that hole is actually pluggable in newer releases - at least there's no code around that assumes rd_createSubid now is persistent, even across cache resets. But I think more importantly it's probably quite possible to hit a similar problem without ON COMMIT DROP relations. Say DISCARD TEMP inside a transaction (with several subxacts) or so? So we probaly really should fix the bad scaling. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres Replaying WAL slowly
Andres Freund and...@2ndquadrant.com writes: On 2014-07-01 15:20:37 -0400, Tom Lane wrote: It seems like there are three, not mutually exclusive, ways we might address this: But I think more importantly it's probably quite possible to hit a similar problem without ON COMMIT DROP relations. Say DISCARD TEMP inside a transaction (with several subxacts) or so? So we probaly really should fix the bad scaling. Well, my thought was that these approaches would address somewhat different sets of use-cases, and we might well want to do more than one. Even if StandbyReleaseLocks were zero-cost, not emitting the WAL in the first place is surely considerably cheaper yet. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance