(2013/11/19 12:03), Peter Geoghegan wrote:
On Mon, Nov 18, 2013 at 6:12 PM, KONDO Mitsumasa
<kondo.mitsum...@lab.ntt.co.jp> wrote:
I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw
values be just simple. However, were his changes just simple? I cannot
understand his aesthetics sense and also you, too:-(

It's too complicated, and do you know how to tuning PG from information of
local_* and temp_*?
At least, I think that most user cannot tuning from these information, and
it might not be useful information only part of them.

All of those costs are cumulative aggregates. If we didn't aggregate
them, then the user couldn't possibly determine them on their own, to
any approximation. That's the difference. If you think the local_* and
temp_* aren't very useful, I'm inclined to agree, but it's too late to
do anything about that now.
I regret past decision of Itagaki-san's patch, and improvement might not be possible. However, we can change it, if we get have logical reason to change it.

No. It's not for geek tools and people having pre-packaged solution in big
company, but also for common DBA tools.

I don't think that the tool needs to be expensive. If selecting from
the pg_stat_statements view every 1-3 seconds is too expensive for
such a tool, we can have a discussion about being smarter, because
there certainly are ways to optimize it.
I can understand why you say my patch is heavy now! Your monitoring methods are redically heavy. In general, we get pg_stat_statements view every 1 min - 5min. It is because monitoring SQLs must not heavier than common main SQLs. If we measure the real performance, we don't measure with monitoring SQL's cost. And, I cannot still understand you'd like to collect drastic detail performance of statements. I'd like to only know max, avg and stddev in each statement. They are enough, because we can improve them by using these information.

Regarding your min/max patch: I'm opposed to adding even more to the
spinlock-protected counters struct, so that we can get an exact answer
to a question where an approximate answer would very likely be good
enough. And as Itagaki-san said 4 years ago, who is to say that what
you've done here for buffers (or equally, what you've done in your
min/max patch) is more interesting than the same thing but for another
cost? The point of having what you've removed from the
pg_stat_statements docs about calculating averages is that it is an
example that can be generalized from. I certainly think there should
be better tooling to make displaying costs over time easier, or
characterizing the distribution, but unfortunately this isn't it.

Something like pg_stat_statements is allowed to be approximate. That's
considered an appropriate trade-off. Most obviously, today there can
be hash table collisions, and some of the entries can therefore be
plain wrong. Previously, I put the probability of 1 collision in the
hash table at about 1% when pg_stat_statements.max is set to 10,000.
So if your min/max patch was "implemented in userspace", and an
outlier is lost in the noise with just one second of activity, I'm not
terribly concerned about that. It's a trade-off, and if you don't
think it's the correct one, well then I'm afraid that's just where you
and I differ. As I've said many times, if you want to have a
discussion about making aggressive snapshotting of the
pg_stat_statements view more practical, I think that would be very
useful.
In summary of your comment, your patch is lower cost than I proposed patch. Because my patch has long lock problem, on the other hands your patch doesn't these problem. Is it right? If it true, I can understand your theoretically, but I'm not sure without real practice or benchmark that it is really or not. So we will need benchmark test in my patch and yours. I try it.

By the way, MySQL and Oracle database which are very popular have these
statistics. I think that your argument might disturb people who wants to
migration from these database and will accelerate popularity of these
database more.

I think that there should be better tooling built on top of
pg_stat_statements. I don't know what Oracle does, but I'm pretty sure
that MySQL has nothing like pg_stat_statements. Please correct me if
I'm mistaken.
I joined the db tech show case 2013 which is held in japan last week. Oracle speaker intoroduced performance schema and like these in MySQL 5.6. This is the slide of his. It's almost in japanese, but please see it since 31page. It is wirtten in SQL.
http://www.slideshare.net/yoyamasaki/20131110-tuning-onmysql56

In MySQL 5.6, it has information which are sum_time, min_time, avg_time, max_time and sum_lock_time. I think it is useful for understanding our executing statements.

As I said on the min/max thread, if we're not
doing enough to help people who would like to build such a tool, we
should discuss how we can do better.

Could you tell me how to get min/max statistics with low cost?

See my previous comments on the other thread about making
pg_stat_statements only return changed entries, and only sending the
query text once.
OK.

Repeatedly, I think that if we want to get drastic detail statistics, we
have to create another tools of statistics. Your patch will be these
statistics tools. However, pg_stat_statement sholud be just light weight.

This is incomprehensible. As with the cumulative aggregate costs, how
is a consumer of pg_stat_statements possibly going to get the full
query text from anywhere else? It *has* to come from
pg_stat_statements directly.
I think that it is necessary to make it clear with real test.
I will read your patch and execute benchmark with each our patches.
However, I will travel on business tomorrow until this weekend.
I'm very sorry, but please wait it more.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to