[PERFORM] fragmention issue with ext4: e4defrag?

2014-07-02 Thread Marc Mamin
Hello,

Has anyone some experience using defragmentation tools on Linux against 
tablespaces ?

we are facing fragmentation problems with postgres instances having a few TB of 
data.
( RAID 5 )

I/O througput decreased from 300MB/s to 160.


- We first moved some schemas to separate servers.
  After that we still have  150'000 tables in 1.5 TB
  
- Now we are in the process of vacuuming FULL historical tables which are not 
written anymore. 
  This seems to improve the I/O considerably
  
Our remaining issue is that the free space fragmentíon is still suboptimal 
so that fragmention will probably start again soon.

Would it make sense to use a tool like e4defrag 
(http://www.linux.org/threads/online-defragmentation.4121/)
in order to defrag the free space ?
And how safe is it to use such a tool against a running postgres instance?



many thanks,

Marc Mamin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hash Join node sometimes slow

2014-07-02 Thread Dave Roberge
Hi, I'm in the process of attempting to tune some slow queries. I came 
across a scenario where I'm not entirely sure how I 
might figure out why a node is taking awhile to process. I'm not concerned 
with the query itself, we are working to figure 
out how we can make it faster. But I was hoping someone might be able to 
provide some insight into why a hash join is 
sometimes slow.

For example, running explain (analyze, buffers) with the query, 4/5 times we 
will see the following:

-  Hash Join  (cost=16385.76..103974.09 rows=523954 width=64) (actual 
time=532.634..4018.678 rows=258648 loops=1)
  Hash Cond: (p.a = c.c)
  Buffers: shared hit=4 read=29147, temp read=12943 written=12923
  -  Seq Scan on p (cost=0.00..38496.88 rows=1503188 width=60) (actual 
time=0.013..1388.205 rows=1503188 loops=1)
Buffers: shared hit=1 read=23464
  -  Hash  (cost=15382.47..15382.47 rows=57703 width=12) (actual 
time=527.237..527.237 rows=57789 loops=1)
Buckets: 4096  Batches: 4  Memory Usage: 632kB
Buffers: shared hit=3 read=5683, temp read=617 written=771

The other times, we will see something like this:

-  Hash Join  (cost=16385.76..103974.09 rows=523954 width=64) (actual 
time=587.277..15208.621 rows=258648 loops=1)
  Hash Cond: (p.a = c.c)
  Buffers: shared hit=26 read=29125, temp read=12943 written=12923
  -  Seq Scan on p  (cost=0.00..38496.88 rows=1503188 width=60) (actual 
time=0.013..1525.608 rows=1503188 loops=1)
Buffers: shared hit=22 read=23443
  -  Hash  (cost=15382.47..15382.47 rows=57703 width=12) (actual 
time=581.638..581.638 rows=57789 loops=1)
Buckets: 4096  Batches: 4  Memory Usage: 632kB
Buffers: shared hit=4 read=5682, temp read=617 written=771

Does anyone have ideas on what might be causing the difference in timing for 
the hash join node?

Thanks



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash Join node sometimes slow

2014-07-02 Thread Tom Lane
Dave Roberge drobe...@bluetarp.com writes:
 For example, running explain (analyze, buffers) with the query, 4/5 times we 
 will see the following:

 -  Hash Join  (cost=16385.76..103974.09 rows=523954 width=64) (actual 
 time=532.634..4018.678 rows=258648 loops=1)
   Hash Cond: (p.a = c.c)
   Buffers: shared hit=4 read=29147, temp read=12943 written=12923
   -  Seq Scan on p (cost=0.00..38496.88 rows=1503188 width=60) (actual 
 time=0.013..1388.205 rows=1503188 loops=1)
 Buffers: shared hit=1 read=23464
   -  Hash  (cost=15382.47..15382.47 rows=57703 width=12) (actual 
 time=527.237..527.237 rows=57789 loops=1)
 Buckets: 4096  Batches: 4  Memory Usage: 632kB
 Buffers: shared hit=3 read=5683, temp read=617 written=771

 The other times, we will see something like this:

 -  Hash Join  (cost=16385.76..103974.09 rows=523954 width=64) (actual 
 time=587.277..15208.621 rows=258648 loops=1)
   Hash Cond: (p.a = c.c)
   Buffers: shared hit=26 read=29125, temp read=12943 written=12923
   -  Seq Scan on p  (cost=0.00..38496.88 rows=1503188 width=60) (actual 
 time=0.013..1525.608 rows=1503188 loops=1)
 Buffers: shared hit=22 read=23443
   -  Hash  (cost=15382.47..15382.47 rows=57703 width=12) (actual 
 time=581.638..581.638 rows=57789 loops=1)
 Buckets: 4096  Batches: 4  Memory Usage: 632kB
 Buffers: shared hit=4 read=5682, temp read=617 written=771

 Does anyone have ideas on what might be causing the difference in timing for 
 the hash join node?

I'd bet on the extra time being in I/O for the per-batch temp files,
since it's hard to see what else would be different if the data were
identical in each run.  Maybe the kernel is under memory pressure and
is dropping the file data from in-memory disk cache.  Or maybe it's
going to disk all the time but the slow runs face more I/O congestion.

Personally, for a problem of this size I'd increase work_mem enough
so you don't get multiple batches in the first place.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] fragmention issue with ext4: e4defrag?

2014-07-02 Thread Kevin Grittner
Marc Mamin m.ma...@intershop.de wrote:

 I/O througput decreased from 300MB/s to 160.

I don't have any experience with ext4 defrag tools, but just wanted
to point out that the difference in performance you cite above is
about the same as the difference between accessing data on the
outer (and usually first-filled) tracks on a disk drive and the
inner tracks.  One of the reasons performance falls as a drive
fills is that the OS is compelled to use slower and slower portions
of the disk.  Part of the benefit you are seeing might be due to
freeing fast tracks and data being relocated there.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-02 Thread Andres Freund
On 2014-07-01 15:20:37 -0400, Tom Lane wrote:
 Jeff Frost j...@pgexperts.com writes:
  On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Did you check whether the locks were all on temp tables of the
  ON COMMIT DROP persuasion?
 
  And indeed it did catch up overnight and the lag increased shortly after a 
  correlating spike in AccessExclusiveLocks that were generated by temp table 
  creation with on commit drop.
 
 OK, so we have a pretty clear idea of where the problem is now.
 
 It seems like there are three, not mutually exclusive, ways we might
 address this:
 
 1. Local revisions inside StandbyReleaseLocks to make it perform better in
 the presence of many locks.  This would only be likely to improve matters
 much if there's a fixable O(N^2) algorithmic issue; but there might well
 be one.

 It sounded like Andres had taken a preliminary look at #1 and found a
 possible avenue for improvement, which I'd encourage him to pursue.
 

I don't have the resources to do this right now, but yes, I think we can
get relatively easily get rid of the O(num_locks * num_subtransactions)
behaviour.

 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
 the grounds that no standby should be touching them.  I'm not entirely
 sure that that argument is bulletproof though; in particular, even though
 a standby couldn't access the table's data, it's possible that it would be
 interested in seeing consistent catalog entries.

Hm. We definitely perform checks surprisingly late for those. It's
possible to do SELECT * FROM pg_temp_nn.whatever; without an error f
there's no rows of if the rest of the plan doesn't do accesses to that
table. The check prohibiting access is only in bufmgr.c...
So yea, I don't think we can do this for at least  9.4. And there
it'll still be hard.

 3. Avoid WAL-logging AccessExclusiveLocks associated with
 new-in-transaction tables, temp or not, on the grounds that no standby
 could even see such tables until they're committed.  We could go a bit
 further and not take out any locks on a new-in-transaction table in the
 first place, on the grounds that other transactions on the master can't
 see 'em either.
 
 For both #2 and the conservative version of #3, the main implementation
 problem would be whether the lock WAL-logging code has cheap access to
 the necessary information.  I suspect it doesn't.

Not trivially. It's logged directly in LockAcquireExtended(). We could
add the information into locktags as there's unused fields for relation
locktags, but brrr.

 The radical version of #3 might be pretty easy to do, at least to the
 extent of removing locks taken out during CREATE TABLE.  I suspect there
 are some assertions or other consistency checks that would get unhappy if
 we manipulate relations without locks, though, so those would have to be
 taught about the exception.

 Also, we sometimes forget new-in-transaction
 status during relcache flush events; it's not clear if that would be a
 problem for this.

I think that hole is actually pluggable in newer releases - at least
there's no code around that assumes rd_createSubid now is persistent,
even across cache resets.

But I think more importantly it's probably quite possible to hit a
similar problem without ON COMMIT DROP relations. Say DISCARD TEMP
inside a transaction (with several subxacts) or so? So we probaly really
should fix the bad scaling.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-02 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-07-01 15:20:37 -0400, Tom Lane wrote:
 It seems like there are three, not mutually exclusive, ways we might
 address this:

 But I think more importantly it's probably quite possible to hit a
 similar problem without ON COMMIT DROP relations. Say DISCARD TEMP
 inside a transaction (with several subxacts) or so? So we probaly really
 should fix the bad scaling.

Well, my thought was that these approaches would address somewhat
different sets of use-cases, and we might well want to do more than one.
Even if StandbyReleaseLocks were zero-cost, not emitting the WAL in the
first place is surely considerably cheaper yet.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance