Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

Fabien COELHO Wed, 21 Dec 2016 00:30:07 -0800


Hello Robert & Kyotaro,

I'm a little doubtful about your proposed fix. However, I haven'tstudied the code, so I don't know what other approach might be better.


That is indeed the question!

The key point is that the parser parses several statements from a string,but currently there is no clue about where the queries was found in thestring. Only a parser may know about what is being parsed so can generatethe location information. Now the possible solutions I see are:


 - the string is split per query before parsing, but this requires at
   least a lexer... it looks pretty uneffective to have another lexing
   phase involved, even if an existing lexer is reused.

 - the parser processes one query at a time and keep the "remaining
   unparse string" in some state that can be queried to check up to
   where it proceeded, but then the result must be stored somewhere
   and it would make sense that it would be in the statement anyway,
   just the management of the location information would be outside
   the parser. Also that would add the cost of relaunching the parser,
   not sure how bad or insignificant this is. This is (2) below.

 - the parser still generates a List<Node> but keep track of the location
   of statements doing so, somehow... propably as I outlined.

The query string information can be at some way of pointing in the initial
string, or the substring itself that could be extracted at some point.

I initially suggested the former because this is already what the parserdoes for some nodes, and because it seems simpler to do so.


Extracting the string instead would suggest that the location of tokens

within this statement are relative to this string rather than the initialone, but that involves a lot more changes and it is easier to misssomething doing so.

That will work and doable, but the more fundamental issue here
seems to be that post_parse_analyze_hook or other similar hooks
are called with a Query incosistent with query_debug_string.


Sure.

It is very conveniently used but the name seems to me to be suggestingthat such usage is out of purpose. I'm not sure, though.
A similar behavior is shown in error messages but this harms far
less than the case of pg_stat_statements.

ERROR:  column "b" does not exist
LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com...
                                                            ^

Ah, I wrote this piece of code a long time ago:-) The location is relativeto the full string, see comment above, changing that would involve much

more changes, and I'm not sure whether it is desirable.

Also, think of:

SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo;

Maybe the context to know which "SELECT * FROM foo" generates an errorshould be kept.

1. Let all of the parse node have location in
 debug_query_string. (The original proposal)


Yep.

2. Let the parser return once for each statement (like psql
  parser)

I'm not sure it does... "\;" generates ";" in the output and the psqllexer keeps on lexing.

and corresponding query string is stored in a
  varialble other than debug_query_string.

I think that would involve many changes because of the way postgres iswritten, the list is expected and returned by quite a few functions.Moreover query rewriting may generate several queries out of one anyway,so the list would be kept.

So I'm rather still in favor of my initial proposal, that is extend theexisting location information to statements, not only some tokens.


--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

Reply via email to