Hello All, I am trying to revive the discussion about exposing queryid in pg_stat_statements.
I did find the same request posted on hackers @ http://www.postgresql.org/message-id/CABUevExLnb6xJwS=8rTaLOfDOS-tFm09==z2m_vz5hhfkgw...@mail.gmail.com and http://www.postgresql.org/message-id/cacn56+nlmtwhg8eqqqnyzqe2q0negjokmgfiusk_aohw627...@mail.gmail.com From the discussions I concluded 1. The main use case for exposing queryid, is it being a better substitute to hashing the query text of a pg_stat_statements snapshot, to make a candidate key. Problems occur when hash value should be different even if query text is same. For example when a table referred in a query is dropped and recreated or when the query text is same on different schemas and schema name is not included in query text. 2. Exposing queryid was proposed earlier but was not accepted. The main reason was that queryid could be unstable as well. Since queryid was derived from hashing query tree and query tree could undergo changes between minor PostgreSQL releases, meant the queryid for same query could be different between releases, resulting in incorrect statement statistics collection. 3. Another problem is to distinguish between queries whose statistics are continuously maintained and queries which are intermittent, whose statistics might be silent reset, without the reporting tool being wiser. 4. A solution to avoid misrepresentation of intermittent queries as consistent queries would be to assign a unique number to each new row and once that row is discarded, the unique number cannot be reused. The drawbacks here is possible collision of unique values generated. 5. A patch implementing solution for identifying intermittent query is @ https://github.com/fdr/postgres/branches/error-prop-pg_stat_statements-v2. The solution avoids using a counter, and achieves the same result by the property that intermittent queries accumulate errors due to eviction from hashtable while consistent queries do not. Error accumulation would be the parameter by which a reporting tool can figure out if there was eviction of queries between snapshots. 6. To address the problem of unstable queryid generated from query tree, it was proposed to eliminate any possible misunderstanding that queryid will remain the same between releases, by xoring the hash from query tree with statistics sessionid. This also helps in all cases where the statistics file is reset like crash recovery,recovery mode, ensuring a new hash value for reset statistics. To avoid increasing the chance of collision, a longer session key and padding the queryid can be done to complete the XOR. Implementation of this is @ https://github.com/fdr/postgres/branches/pg_stat_statements-identification-v3 7. The patch pg_stat_statements-identification-v3 was returned with feedback for more documentation in commitfest 2013-01. Questions: 1. Is there a plan to re-introduce this patch? The code seems to be documented. 2. There was mention of further testing of error propagation using hooks. Could this be elaborated? 3. There was a use case that exposing queryid could be used to aggregate statistics across WAL based replication clusters. But now that queryid is derived from statistics session id, which is randomly generated, this use case is still not addressed. Is this correct? Regards Sameer