Re: [GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-31 Thread Jim Nasby

On 12/30/15 1:31 PM, Joe Conway wrote:

On 12/30/2015 11:09 AM, Cory Tucker wrote:
With this scenario you can expect an autoanalyze every 5 million rows
and autovacuum every 10 million. In my experience (and based on your
description, yours as well) this is not often enough. Not only that,
when it does run it runs longer than you would like, causing an I/O hit
while it does.

You probably should tune this table specifically, e.g.


Another option is to explicitly analyze then SELECT from the table after 
you're done inserting into it. The advantage is it doesn't tie up an 
autovac worker and you can ensure that the newly added tuples get 
properly hinted.


You can run the ANALYZE immediately after your insert finishes. The 
reason to do that is to get up-to-date statistics for other queries to 
use. That can be particularly important if the new rows have values 
significantly outside what was in the table before. That's common with 
things like sequence IDs and timestamp data.



The SELECT is a bit trickier; you want to ensure that there is no 
transaction still running in the database that's older than the 
transaction that added all the new data. You can check that by comparing 
the xmin field of one of your new rows with 
txid_snapshot_xmin(txid_current_snapshot()). Note that because of 
wraparound you can't do a simple comparison; txid 3 is actually greater 
than txid 2^32.


The whole point of this SELECT is to get the new tuples hinted while the 
pages are still hot in cache. If you don't do that, the next query that 
reads the tuple will have to set the hints, which also dirties the page. 
VACUUM does that too, but there's really no point in having vacuum run 
through the entire table just to set hints on less than 1% of it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-30 Thread Tom Lane
Cory Tucker  writes:
> This table is almost always queried using a combination of (account_id,
> record_id) and is generally pretty fast.  However, under certain loads, the
> query becomes slower and slower as time goes on.  The workload that causes
> this to happen is when data for a new account_id is being inserted into the
> table.  This will happen in rapid succession and may insert millions of
> rows over the course of several hours.

Are those insertions happening in one enormous transaction, or even just
a few very large ones?

> The pattern that I notice when this happens is that the CPU on DB will be
> pegged much higher than usual, and the query to lookup records for the
> (account_id, record_id) combo will steadily rise from <1ms to more then 2
> or 3 seconds over time.

I'm suspicious that this is not autovacuum's fault but reflects the cost
of checking uncommitted tuples to see if they've become committed yet.
If so, there may be little you can do about it except break the insertion
into smaller transactions ... which might or might not be all right from
a data consistency standpoint.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-30 Thread Cory Tucker
On Wed, Dec 30, 2015 at 11:20 AM Tom Lane  wrote:

> Cory Tucker  writes:
> > This table is almost always queried using a combination of (account_id,
> > record_id) and is generally pretty fast.  However, under certain loads,
> the
> > query becomes slower and slower as time goes on.  The workload that
> causes
> > this to happen is when data for a new account_id is being inserted into
> the
> > table.  This will happen in rapid succession and may insert millions of
> > rows over the course of several hours.
>
> Are those insertions happening in one enormous transaction, or even just
> a few very large ones?
>

No, one transaction per row insert.


>
> > The pattern that I notice when this happens is that the CPU on DB will be
> > pegged much higher than usual, and the query to lookup records for the
> > (account_id, record_id) combo will steadily rise from <1ms to more then 2
> > or 3 seconds over time.
>
> I'm suspicious that this is not autovacuum's fault but reflects the cost
> of checking uncommitted tuples to see if they've become committed yet.
> If so, there may be little you can do about it except break the insertion
> into smaller transactions ... which might or might not be all right from
> a data consistency standpoint.
>
> regards, tom lane
>


[GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-30 Thread Cory Tucker
We have a performance problem accessing one of our tables, I think because
the statistics are out of date.  The table is fairly large, on the order of
100M rows or so.
The general structure of the table is as follows:

Column | Type | Modifiers
---+--+
id | bigint | not null default nextval('foo_id_seq'::regclass)
record_id | text |
account_id | bigint | not null

With indexes:
"foo_pkey" PRIMARY KEY, btree (id)
"uq_account_id_record_id" UNIQUE CONSTRAINT, btree (account_id,
record_id)


This table is almost always queried using a combination of (account_id,
record_id) and is generally pretty fast.  However, under certain loads, the
query becomes slower and slower as time goes on.  The workload that causes
this to happen is when data for a new account_id is being inserted into the
table.  This will happen in rapid succession and may insert millions of
rows over the course of several hours.

The pattern that I notice when this happens is that the CPU on DB will be
pegged much higher than usual, and the query to lookup records for the
(account_id, record_id) combo will steadily rise from <1ms to more then 2
or 3 seconds over time.

The fix I have employed to restore the speed of the query after I notice it
is happening is to manually issue a VACUUM ANALYZE on the table.  After the
analyze is done, the query returns to its normal speed.

I am looking for suggestions for how to tune, or perhaps automatically
detect this pattern, so that I don't have to manually intervene whenever
this happens.

Here are my autovacuum settings:

name |  setting  | unit
-+---+--
 autovacuum  | on|
 autovacuum_analyze_scale_factor | 0.05  |
 autovacuum_analyze_threshold| 50|
 autovacuum_freeze_max_age   | 2 |
 autovacuum_max_workers  | 3 |
 autovacuum_multixact_freeze_max_age | 4 |
 autovacuum_naptime  | 30| s
 autovacuum_vacuum_cost_delay| 20| ms
 autovacuum_vacuum_cost_limit| -1|
 autovacuum_vacuum_scale_factor  | 0.1   |
 autovacuum_vacuum_threshold | 50|
 autovacuum_work_mem | -1| kB

We're using 9.4.4 (RDS)


Re: [GENERAL] Efficiently Triggering Autovacuum Analyze?

2015-12-30 Thread Joe Conway
On 12/30/2015 11:09 AM, Cory Tucker wrote:
> We have a performance problem accessing one of our tables, I think
> because the statistics are out of date.  The table is fairly large, on
> the order of 100M rows or so. 

> The fix I have employed to restore the speed of the query after I notice
> it is happening is to manually issue a VACUUM ANALYZE on the table. 
> After the analyze is done, the query returns to its normal speed.

>  autovacuum_analyze_scale_factor | 0.05  |
>  autovacuum_analyze_threshold| 50|
>  autovacuum_vacuum_scale_factor  | 0.1   |
>  autovacuum_vacuum_threshold | 50|

With this scenario you can expect an autoanalyze every 5 million rows
and autovacuum every 10 million. In my experience (and based on your
description, yours as well) this is not often enough. Not only that,
when it does run it runs longer than you would like, causing an I/O hit
while it does.

You probably should tune this table specifically, e.g.

ALTER TABLE foo SET (autovacuum_vacuum_threshold=10,
 autovacuum_vacuum_scale_factor=0);
ALTER TABLE foo SET (autovacuum_analyze_threshold=10,
 autovacuum_analyze_scale_factor=0);

That will cause autovac and autoanalyze to run every 100k records
changed (pick your own number here, but I have used this very
successfully in the past). This way not only will the table remain well
vacuum analyzed, when they run they will finish quickly and have minimal
impact.

HTH,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature