On 1 October 2012 08:57, Magnus Hagander <mag...@hagander.net> wrote: > I know there was discussions about it earlier, and it wasn't done with > an argument of it not being stable between releases (IIRC). I think we > can live with that drawback, assuming of course that we document this > properly.
Well, I'll point out once again that the argument about its stability is invalid, because we serialise the entries to disk. If a point release changes the representation of the query tree such that the hash values won't match, then we have no recourse but to bump pg_stat_statements version number, and invalidate all existing entries. I credit our users with the intelligence to not jump to any rash conclusions about the hash if directly exposed, such as assuming that it has any particular degree of stability with respect to the queries that are fingerprinted, or any degree greater than the self-evident bare minimum. I'm pretty sure that the "stability among point releases in the face of potential minor changes to query tree representation" thing was something that I imagined as a reason for the proposal being rejected, when I tried to read between the lines of a flat rejection. Perhaps I should have asked for clarification on that point. Now that I think about it, I'm pretty sure that the need to bump catversion, as a result of any change in the way dumping the query tree struct into stored rules needs to happen, will preclude that problem in practice. > I've now run into multiple customer installations where it would be > very useful to have. The usecase is mainly storing snapshots of the > pg_stat_statements output over time and analyzing those. Weird things > happen for example when the query text is the same, but the hash is > different (which can happen for example when a table is dropped and > recreated). And even without that, in order to do anything useful with > it, you end up hashing the query text anyway - so using the already > existing hash would be easier and more useful. Yes, these are all arguments that I'm familiar with :-) . I've always thought of pg_stat_statements as a low-level statistical view that people would naturally want to store snapshots of for analysis, in much the same way as many do now with things like pg_stat_bgwriter using tools like Munin. Who wouldn't want to know what queries were running a half an hour ago when the database server seemed slower than usual, for example? Such tools should naturally have access to the same "candidate key" for entries, rather than adding a subtle impedance mismatch by using the string. That reminds me - when are you writing the pg_stat_statements Postgres plugin for Munin? I was disappointed that my proposal was shot down, despite the fact that I independently raised it on list at least twice, and pushed as hard as I felt that I could at the time. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers