Hi Jeff and all,

On 23/05/15 22:13, Jeff Janes wrote:
> Are you sure it is the read IO that causes the problem?

Yes. Trouble is here that we are talking about a 361 GB table

                                   List of relations
 Schema |            Name             |   Type   |  Owner   |    Size    |
Description
--------+-----------------------------+----------+----------+------------+-------------
 public | *redacted*_y2015m04         | table    | postgres | 361 GB     |

and while we have

        shared_buffers = 325GB
        huge_pages = on

this is not the only table of this size (total db size ist 1.8tb) and more
current data got written to *redacted*_y2015m05 (the manually-partitioned table
for may), so most of the m04 data would have got evicted from the cache when
this issue surfaced initially.

There is one application pushing data (bulk inserts) and we have transaction
rates for this app in a log. The moment the vacuum started, these rates dropped.
Unfortunately I cannot present helpful log excerpts here as the autovacuum never
finished so far (because the admin killed the db), so we have zero logging about
past autovac events.

At the moment, the application is shut down and the machine is only running the
vacs:

query_start      | 2015-05-22 19:33:52.44334+02
waiting          | f
query            | autovacuum: VACUUM public.*redacted*_y2015m04 (to prevent
wraparound)
query_start      | 2015-05-22 19:34:02.46004+02
waiting          | f
query            | autovacuum: VACUUM ANALYZE public.*redacted*_y2015m05 (to
prevent wraparound)

so we know that any io must be caused by the vacs:

shell# uptime
 13:33:33 up 1 day, 18:01,  2 users,  load average: 5.75, 12.71, 8.43
shell# zpool iostat
                    capacity     operations    bandwidth
pool             alloc   free   read  write   read  write
---------------  -----  -----  -----  -----  -----  -----
tank1             358G  6.90T    872     55  15.1M  3.08M


Again, we know IO capacity is insufficient, the pool is on 2 magnetic disks only
atm, so an avg read rate of 872 IOPS averaged over 42 hours is not even bad.

> I don't know happened to that, but there is another patch waiting for review 
> and
> testing:
> 
> https://commitfest.postgresql.org/5/221/

This is really interesting, thank you very much for the pointer.

Cheers, Nils


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to