Hi,

On 06/05/15 14:10, Naoya Anzai wrote:
Thank you for quick feedback, and I'm sorry for slow response.
All of your opinions were very helpful for me.

I have confirmed Greg's Idea "Timing events".
http://www.postgresql.org/message-id/509300f7.5000...@2ndquadrant.com

Greg said at first,
"Parsing log files for commonly needed performance data is no fun."
Yes, I completely agree with him.
>
That looks a nice idea but I don't know why this idea has
not been commited yet. Anybody knows?

Most likely lack of time, I guess.


I have reworked my idea since I heard dear hacker's opinions.

====================
pg_stat_vacuum view
====================

I understand it is not good to simply add more counters in
pg_stat_*_tables. For now, I'd like to suggest an extension
which can confirm vacuum statistics like pg_stat_statements.

I don't see how you want to collect the necessary information from an
extension? pg_stat_statements get most of the stats from BufferUsage structure, but vacuum keeps all this internal, AFAIK.

So it'd be necessary to make this somehow public - either by creating something like BufferUsage with all the vacuum stats, or perhaps a set of callbacks (either a single log_relation_vacuum or different callbacks for tables and indexs).

IMHO the callbacks are a better idea - for example because it naturally handles database-wide vacuum. The global structure makes this difficult, because you'll only see data for all the vacuumed objects (or it'd have to track per-object stats internally, somehow).

VACUUM is a most important feature in PostgreSQL, but a
special view for vacuum does not exist. Don't you think
the fact is inconvenience? At least, I am disgruntled with
that we need to parse pg_log for tune VACUUM.

+1

My first design of pg_stat_vacuum view is following.
(There are two views.)

pg_stat_vacuum_table
---------------
dbid
schemaname
relid
relname
elapsed
page_removed
page_remain
page_skipped
tuple_removed
tuple_remain
tuple_notremovable
buffer_hit
buffer_miss
buffer_dirty
avg_read
avg_write
vm_count
vac_start
vac_end
is_autovacuum

pg_stat_vacuum_index
---------------
dbid
shemaname
relid
indexrelid
indexname
elapsed
num_index_tuples
num_pages
tuples_removed
pages_deleted
pages_free
is_autovacuum

At present, I think memory design of pg_stat_statements can
divert into this feature.And I think this module needs to
prepare following parameters like pg_stat_statements.

I'm not really sure about this.

Firstly, the very fist response from TL in this thread was that adding per-table counters is not a particularly good idea, as it'll bloat the statistics files. It's true you're not adding the data into the main stats files, but you effectively establish a new 'vertical partition' with one record per table/index. It might be worth the overhead, if it really brings useful functionality (especially if it's opt-in feature, like pg_stat_statements).

Secondly, the main issue of this design IMHO is that it only tracks the very last vacuum run (or do I understand it wrong?). That means even if you snapshot the pg_stat_vacuum views, you'll not know how many vacuums executed in between (and the more frequently you snapshot that, the greater the overhead). The other stats counters have the same issue, but the snapshotting works a bit better because the counters are cumulative (so you can easily do deltas etc.). But that's not the case here - certainly not with the timestamps, for example.

I don't think the vacuum start/end timestamps are particularly interesting, TBH - we already have them in pg_stat_all_tables anyway, including the vacuum_count etc. So I'd propose dropping the timestamps, possibly replacing them with a single 'elapsed time', and making all the counters cumulative (so that you can do snapshots and deltas).

I'm also wondering whether this should track the vacuum costs (because that determines how aggressive the vacuum is, and how much work will be done in a particular time), if it was anti-wraparound vacuum, if there was also ANALYZE performed, if the autovacuum was interrupted because of user activity, etc.

pg_stat_vacuum.max(integer)
pg_stat_vacuum.save(boolean)
pg_stat_vacuum.excluded_dbnames(text)
pg_stat_vacuum.excluded_schemas(text)
pg_stat_vacuum.min_duration(integer)
... and so on.

To implement this feature, I have to collect each vacuum-stats
every lazy_vacuum_* and I need to embed a hook function point
where needed. (probably last point of lazy_vacuum_rel).
Do you hesitate to add the hook only for this function?

Aha! So you plan to use the callbacks.


Similar feature has been already provided by pg_statsinfo package.
But it is a full-stack package for PG-stats and it needs to
redesign pg_log and design a repository database for introduce.
And it is not a core-extension for PostgreSQL.
(I don't intend to hate pg_statsinfo,
  I think this package is a very convinient tool)

Everyone will be able to do more easily tuning of VACUUM.
That's all I want.

I'm still wondering whether these stats will really make the tuning any easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup and if it exceeds some threshold, it's a sign that vacuum may need a bit of tuning. Sometimes it really requires tuning vacuum itself, but more often than not it's due to something else (a large bulk delete, autovacuum getting stuck on another table, ...). I don't see how the new stats would make this any easier.

Can you give some examples on how the new stats might be used (and where the current stats are insufficient)? What use cases do you imagine for those stats?

It might help differentiate the autovacuum activity from the rest of the system (e.g. there's a lot of I/O going on - how much of that is coming from autovacuum workers?). This would however require a more fine-grained reporting, because often the vacuums run for a very long time, especially on very large tables (which is exactly the case when this might be handy) - I just had a VACUUM that ran for 12 hours. These jobs should report the stats incrementally, not just once at the very end, because that makes it rather useless IMNSHO.


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to