Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
Yes. I'll try to put together a patch and submit it to the next CF. Here it is. I'll add this to the next CF. Oops... better without a stupid overflow. Shame on me! -- Fabien.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 3573c19..5a1b227 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -1,21 +1,346 @@ CREATE EXTENSION pg_stat_statements; -CREATE TABLE test (a int, b char(20)); --- test the basic functionality of pg_stat_statements +-- +-- +-- simple and compound statements +-- +SET pg_stat_statements.track_utility = FALSE; SELECT pg_stat_statements_reset(); pg_stat_statements_reset -- (1 row) +SELECT 1 AS "int"; + int +- + 1 +(1 row) + +SELECT 'hello' + -- multiline + AS "text"; + text +--- + hello +(1 row) + +SELECT 'world' AS "text"; + text +--- + world +(1 row) + +-- transaction +BEGIN; +SELECT 1 AS "int"; + int +- + 1 +(1 row) + +SELECT 'hello' AS "text"; + text +--- + hello +(1 row) + +COMMIT; +-- compound transaction +BEGIN \; +SELECT 2.0 AS "float" \; +SELECT 'world' AS "text" \; +COMMIT; +-- compound with empty statements and spurious leading spacing +\;\; SELECT 3 + 3 \;\;\; SELECT ' ' || ' !' \;\; SELECT 1 + 4 \;; + ?column? +-- +5 +(1 row) + +-- non ;-terminated statements +SELECT 1 + 1 + 1 AS "add" \gset +SELECT :add + 1 + 1 AS "add" \; +SELECT :add + 1 + 1 AS "add" \gset +-- set operator +SELECT 1 AS i UNION SELECT 2 ORDER BY i; + i +--- + 1 + 2 +(2 rows) + +-- cte +WITH t(f) AS ( + VALUES (1.0), (2.0) +) + SELECT f FROM t ORDER BY f; + f +- + 1.0 + 2.0 +(2 rows) + +SELECT query, calls, rows FROM pg_stat_statements ORDER BY rows, query; + query | calls | rows +-+---+-- + SELECT ? || ? | 1 |1 + SELECT ? AS "float" | 1 |1 + SELECT pg_stat_statements_reset() | 1 |1 + SELECT ? + ?| 2 |2 + SELECT ? AS "int" | 2 |2 + SELECT ? AS i UNION SELECT ? ORDER BY i | 1 |2 + WITH t(f) AS ( +| 1 |2 + VALUES (?), (?) +| | + ) +| | + SELECT f FROM t ORDER BY f| | + SELECT ? + ? + ? AS "add" | 3 |3 + SELECT ? +| 4 |4 ++| | + AS "text" | | +(9 rows) + +-- +-- +-- CRUD: INSERT SELECT UPDATE DELETE on test table +-- +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +-- utility "create table" must not show +CREATE TABLE test (a int, b char(20)); INSERT INTO test VALUES(generate_series(1, 10), 'aaa'); -UPDATE test SET b = 'bbb' WHERE a > 5; -SELECT query, calls, rows from pg_stat_statements ORDER BY rows; - query| calls | rows -+---+-- - SELECT pg_stat_statements_reset(); | 1 |1 - UPDATE test SET b = ? WHERE a > ?; | 1 |5 - INSERT INTO test VALUES(generate_series(?, ?), ?); | 1 | 10 +UPDATE test SET b = 'bbb' WHERE a > 7; +DELETE FROM test WHERE a > 9; +-- explicit transaction +BEGIN; +UPDATE test SET b = '111' WHERE a = 1 ; +COMMIT; +BEGIN \; +UPDATE test SET b = '222' WHERE a = 2 \; +COMMIT ; +UPDATE test SET b = '333' WHERE a = 3 \; +UPDATE test SET b = '444' WHERE a = 4 ; +BEGIN \; +UPDATE test SET b = '555' WHERE a = 5 \; +UPDATE test SET b = '666' WHERE a = 6 \; +COMMIT ; +-- SELECT with constants +SELECT * FROM test WHERE a > 5 ORDER BY a ; + a | b +---+-- + 6 | 666 + 7 | aaa + 8 | bbb + 9 | bbb +(4 rows) + +SELECT * + FROM test + WHERE a > 9 + ORDER BY a ; + a | b +---+--- +(0 rows) + +-- SELECT without constants +SELECT * FROM test ORDER BY a; + a | b +---+-- + 1 | 111 + 2 | 222 + 3 | 333 + 4 | 444 + 5 | 555 + 6 | 666 + 7 | aaa + 8 | bbb + 9 | bbb +(9 rows) + +SELECT query, calls, rows FROM pg_stat_statements ORDER BY rows, query; + query | calls | rows +---+---+-- + DELETE FROM test WHERE a > ? | 1 |1 + SELECT pg_stat_statements_reset() | 1 |1 + UPDATE test
Re: [HACKERS] Cluster wide option to control symbol case folding
On Saturday, December 24, 2016 10:49 PM Tom Lane [mailto:t...@sss.pgh.pa.us] wrote: > No. This has been looked into repeatedly in the past, and we simply don't want to deal with it. Fair enough. We will not pursue the issue then. That is why I asked. Ian Lewis (www.mstarlabs.com) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
On December 24, 2016 9:52 PM Craig Ringer [mailto:craig.rin...@2ndquadrant.com] wrote: > Personally I can see such an option being ok as an initdb-time setting or at CREATE DATABASE time. Case folding can know the current db from global context. > > It'd have to be fast though. Very fast. That seems reasonable. In fact, it is nicer to configure at the database level, rather than at the cluster level. I did not know it was possible to tell the database from global context. Since a connection only allows access to a single database, it makes sense that could be possible. A check of a single global configuration variable to switch between one of three handler paths would be very small compared with the character-by-character checks currently performed by downcase_identifier(), though, of course, the extra check would not be free. However, it would likely be faster to setup up the processing to call through a global pointer to one of three handler functions. On most Intel processors, at least, that extra pointer indirection costs little to nothing. The pointer could be set up during database connect (I do not know what I am talking about here, but there must be such a process somewhere). Presumably, the handler pointer would have to go into the global database descriptor whatever that is. Or, if you allow use of global objects for storing information about database scope run-time configuration, it could just be a function pointer stored with the handlers. The database initialization processing could call a setup function when it runs to select the correct handling for its configuration. The default would be the current downcase_identifier() handling. Does this seem like an approach that would meet your "Very fast" requirement? Ian Lewis (www.mstarlabs.com)
Re: [HACKERS] Cluster wide option to control symbol case folding
"Lewis, Ian \(Microstar Laboratories\)"writes: > Is there any chance that the PostgreSQL developers would accept a new > cluster wide configuration option to control how the system handles > symbol case folding? No. This has been looked into repeatedly in the past, and we simply don't want to deal with it. Quite aside from the impact on the server (which would be extensive), it would break every nontrivial application, and force them all to try to deal with each possible folding behavior. The more behaviors there are, the worse that gets. Pretty soon, every application is double-quoting every identifier out of sheer paranoia. Which is exactly the behavior you say you'd rather avoid --- but not only have you not avoided it, you've forced the entire Postgres ecosystem into it. The closest precedent that ever actually got into the server was the transaction-autocommit option that existed for awhile circa PG 7.3. Once we realized just how much complexity we were forcing on every application, we took that out again. If somehow we forgot that episode and agreed to take a case-folding behavioral change, I'm pretty sure the same dynamics would play out again. There's a lot of material on this in the archives. The latest substantive discussion I can find, which includes links to several previous investigations, is here: https://www.postgresql.org/message-id/flat/200807081925.40467.peter_e%40gmx.net regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster wide option to control symbol case folding
On 25 Dec. 2016 10:30 am, "Lewis, Ian (Microstar Laboratories)" < ile...@mstarlabs.com> wrote: Is there any chance that the PostgreSQL developers would accept a new cluster wide configuration option to control how the system handles symbol case folding? Probably not as a GUC (configuration option, like in PostgreSQL.conf). There's a somewhat established principle that GUCs should not change query semantics. Personally I can see such an option being ok as an initdb-time setting or at CREATE DATABASE time. Case folding can know the current db from global context. It'd have to be fast though. Very fast.
[HACKERS] Cluster wide option to control symbol case folding
Is there any chance that the PostgreSQL developers would accept a new cluster wide configuration option to control how the system handles symbol case folding? Currently PostgreSQL folds all un-quoted symbols to lower case. We would like to add a global configuration option with a name like symbol_casefold with settings to allow folding symbols to lc - lower case (default) uc - upper case none - no case folding USE CASE 1 The option we actually want for our own use is "none". Currently we have several large bodies of code that work with an SQL Anywhere backend. This server preserves case and - by a configuration option - performs case insensitive lookup on all symbols. We are moving to PostgreSQL for internal applications. We are also considering using PostgreSQL for the storage backend in a product. We use camel case for symbol names in all of our application and backend code (largely written in C++, Object Pascal, and Python). In a language like Pascal that ignores case, we still maintain consistent use of case in symbol names for the sake of human readers of the code. Where we make a connection from application code to a storage backend we would like to use exactly the same symbol name - including case - for the symbol in the backend as we use in the corresponding symbol in application code. For example, if we intend to read a value into a variable called FirstName we would like the corresponding field in the database to also be FirstName. The main reason we want this exact match is so that a human reader sees exactly the same thing in the two places. This makes it easier to see the connection between the two bodies of code. However, in places we also automate the connection between client symbols and server symbols, and in such a case it is useful, though certainly not necessary, to have an exact match. While we do not do so at present, where we use a database purely through code, we can likely add double quotes around all symbols in our generated SQL, which means we can have an exact match as we want under PostgreSQL as it currently behaves. However, in many cases we also access the same database backend through third party tools and ad hoc queries. If we double quote all of our symbols, which is what we have tried to do in our initial tests, then all such tools and manually written queries must also double quote all symbols. We use several tools and libraries that do not appear to have any way to properly quote symbols when they are obtained automatically from the schema. So, for example, if our reporting tool retrieves a field called FirstName from the database schema, it happily uses that name in its internally generated SQL. And, on PostgreSQL, of course this fails because FirstName in script is firstname at the server, which is not defined if we have double quoted the field name in the table definition. Since we do not have the code, we cannot work around this in any easy way. For ad hoc queries we can double quote all symbol references. But, this makes the queries noticeably harder for a person to read. And, I do not believe this is just a matter of "getting used to it". The quotes clutter the script, and that clutter makes the script fundamentally harder to understand. Maybe this is small, but anything that makes comprehension harder is a bad thing. Queries can be hard enough to understand without extra syntactic clutter. So, these considerations leave us the option of never quoting symbols when using PostgreSQL. This works everywhere we have tried it. But, it is pretty unattractive from the point of view of looking at the symbol names in the backend if they are to match exactly the symbol names we use in code. Using a different naming convention than we use everywhere else in our code (underscore separated all lower case symbol names, say) is not appealing either. We are very consistent in our symbol name handling in our code, and breaking our conventions in some relatively large section of our code is very unattractive. In addition, in many places our tools and code use schema supplied field names to form column titles in a table or in a caption on an edit box (etc.). The mixed case names are much nicer for this purpose than the folded names. For our purposes, these are our arguments for wanting control of how the server folds case. USE CASE 2 Even though we have no use for it, I have included the option "uc" because, in trying to determine whether PostgreSQL could support our desired behavior, I found a fairly large number of people who are coming from a different backend, such as Oracle (from what I see on the internet - no personal experience), that case fold similarly to PostgreSQL. However, instead of folding to lower case, it appears a number of other database servers fold to upper case. This leaves people who are moving from these other database systems with problems in their own code that they have to patch up to be able to make the port
Re: [HACKERS] pg_stat_activity.waiting_start
> This is not an easy problem. See our most recent discussion at > https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us Thanks for the small test program. I tested it on my MacBook Pro and gettimeofday() was way faster than time(). The clock_gettime() used by the patch clock_gettime_1.patch in the mailing thread from Hari Babu apparently doesn't work on OS X. Instead, I tested the OS X specific mach_absolute_time() which was the fastest: gcc -Wall -O2 -o time-timing-calls -DUSE_MACH_ABSOLUTE_TIME time-timing-calls.c time ./time-timing-calls real 0m16.806s user 0m16.781s sys 0m0.012s gcc -Wall -O2 -o time-timing-calls -DUSE_GETTIMEOFDAY time-timing-calls.c time ./time-timing-calls real 0m35.466s user 0m35.062s sys 0m0.393s Code: #ifdef __MACH__ #include #endif #ifdef USE_MACH_ABSOLUTE_TIME uint64_t tv; tv = mach_absolute_time(); #endif > I'm prepared to consider an argument that wait timing might have weaker > requirements than EXPLAIN ANALYZE (which certainly needs to measure short > durations) but you didn't actually make that argument. I can see why timing overload is a problem in EXPLAIN ANALYZE and at other places, and that would of course be a great thing to fix. However, I'm not sure I fully understand how it can be that much of a problem in pgstat_report_wait_start()? As far as I can tell from reading the source code, it only appears pgstat_report_wait_start() is only entered when a process is waiting? Is it not likely the time spent waiting will vastly exceed the amount of extra time for the gettimeofday() call? Is it really a typical real-life scenario that processes can be waiting extremely often for extremely short periods of time, where the timing overhead would be significant? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_activity.waiting_start
Joel Jacobsonwrites: > On Sat, Dec 24, 2016 at 9:00 AM, Tom Lane wrote: >> The difficulty with that is it'd require a gettimeofday() call for >> every wait start. > I don't think we need the microsecond resolution provided by > gettimeofday() via GetCurrentTimestamp() > It would be enough to know which second the waiting started, so we > could use time(). For some use-cases, perhaps ... > gettimeofday() takes 42 cycles. > time() only takes 3 cycles. [1] ... on some platforms, perhaps, back in 2011. My own quick testing flatly contradicts that. On Linux (RHEL6) x86_64, I see gettimeofday() taking about 40ns (or circa 100 CPU cycles @ 2.4GHz), time() only a shade better at 38ns. On macOS x86_64 2.7GHz, I see gettimeofday() taking about 37ns, time() way behind at 178ns. I think we'd need at least an order of magnitude cheaper to consider putting timing calls into spinlock or lwlock paths, and that's just not available at all, let alone portably. This is not an easy problem. See our most recent discussion at https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us I'm prepared to consider an argument that wait timing might have weaker requirements than EXPLAIN ANALYZE (which certainly needs to measure short durations) but you didn't actually make that argument. regards, tom lane Linux: $ gcc -Wall -O2 -o time-timing-calls time-timing-calls.c $ time ./time-timing-calls real0m37.678s user0m37.685s sys 0m0.002s $ gcc -Wall -O2 -o time-timing-calls -DUSE_GT time-timing-calls.c $ time ./time-timing-calls real0m39.964s user0m39.971s sys 0m0.001s macOS: $ gcc -Wall -O2 -o time-timing-calls time-timing-calls.c $ time ./time-timing-calls real2m58.683s user2m58.515s sys 0m0.131s $ gcc -Wall -O2 -o time-timing-calls -DUSE_GT time-timing-calls.c $ time ./time-timing-calls real0m37.004s user0m36.993s sys 0m0.006s #include #include #include int main(int argc, char **argv) { int i; for (i=0; i<10; i++) { #ifdef USE_GT struct timeval tv; gettimeofday(, NULL); #else time_t tv; tv = time(NULL); #endif } return 0; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Index-only scan
Extremely sorry for the inconvenience caused, please find the attached file for the latest version of the patch. On Sat, Dec 24, 2016 at 1:41 AM, Robert Haaswrote: > On Fri, Dec 23, 2016 at 3:03 PM, Tom Lane wrote: > > Or in words of one syllable: please do not use nabble to post to the > > community mailing lists. > > Many of those words have two syllables, and one has four. > > Anyhow, I think three castigating emails from committers in a span of > 62 minutes is probably enough for the OP to get the point, unless > someone else REALLY feels the need to pile on. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/ parallel_index_only_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: session server side variables
2016-12-24 12:19 GMT+01:00 Fabien COELHO: > > Hello Pavel, > > Hmmm. Switching role within a transaction. I never did need that... but >>> that is a use case. >>> >> >> Any application with security definer functions - depends on different >> communities - it is used sometimes strongly. >> > > Hmmm. So I understand that you would like to do something like: > > - call a secure function which sets a session variable with restricted > permissions > - do some things which cannot access or change the variable > - call another secure function which can access, update, remove the > variable... > > Probably we have different expectation from variables. I don't expect so >> variable can be changed by any rollback. >> > > Indeed, it seems that we do not have the same expectations. > > What is use case for transactional variables? I miss any experience - I >> wrote lot plpgsql lines and newer would it. >> > > Here are two use cases, which are neither good nor bad, but that I have in > mind when I'm argumenting. > > (1) First use case I'm thinking of is software update, with persistent > transactional variables, eg: > > -- let assume we have application_version = 1 > BEGIN; >-- lock things up >-- update application schema and data to version 2 >-- set application_version = 2 >-- unlock things > COMMIT; > > I would not want the application_version to remain at 2 if the COMMIT > fails, obviously. This is usually implemented with a one-row table, but > some kind of variable syntax could be quite elegant. For this use case, a > variable should be persistant, it does not it to be efficient, it should > have permissions and should be transactional. > > > (2) Second use case I'm thinking of is some kind of large batch management. > > -- variable batch_1_is_done = false > BEGIN; > -- try to do large batch 1... > -- set batch_1_is_done = true > COMMIT; > -- then test whether it worked, do some cleanup if not... > -- there are some discussions to get some \if in psql... > > For this second example, I would not like batch_is_done to be true if the > commit failed, but I do not think that any permissions would be useful, and > it would be fine if it is just accessible from a session only. > On server side you can use PLpgSQL and handling exception. On client side you can use technique used in MSSQL, where variables are not transactional too. BEGIN -- servar state 1 statement; -- servar state 2 statement; COMMIT We should to introduce client side session variable :STATUS \if eq(:STATUS, 'ok') ... > > > When I remove ACID, and allow only one value - then the implementation can >> be simple and fast - some next step can be support of expandable types. >> Sure - anybody can use temporary tables now and in future. But it is slow >> - >> more now, because we doesn't support global temporary tables. But ACID >> needs lot of CPU times, needs possible VACUUM, ... >> > > Yep, but if you need persistant and transactional then probably you can > accept less performant... > When you accept less performance, then you can use temporary tables. You can easy wrap it by few polymorphic functions. > No ACID variables are simple to implement, simple to directly accessible >> from any PL (although I am thinking about better support in 2nd phase for >> PLpgSQL). >> > > ACID may be simple to implement with some kind of underlying table, or > maybe a row in a table. How efficient it could be is another question, but > then if the feature does not allow some use cases, and it not so > interesting to have it. That is why I think that it is worth discussing > "silly" semantics and syntax. > > The namespace issue is unclear to me. Would a variable name clash with a > table name? It should if you want to be able write "SELECT stuff FROM > variablename", which may or may not be a good idea. It is based on history and experience - one fundamental issue of languages for stored procedures is a conflict of variables and SQL identifiers. When variables are based on pg_class, there are not possibility to any new conflict. More I can use a security related to schema - It is partial coverage of package variables. Regards Pavel > > -- > Fabien. >
Re: [HACKERS] proposal: session server side variables
Hello Pavel, Hmmm. Switching role within a transaction. I never did need that... but that is a use case. Any application with security definer functions - depends on different communities - it is used sometimes strongly. Hmmm. So I understand that you would like to do something like: - call a secure function which sets a session variable with restricted permissions - do some things which cannot access or change the variable - call another secure function which can access, update, remove the variable... Probably we have different expectation from variables. I don't expect so variable can be changed by any rollback. Indeed, it seems that we do not have the same expectations. What is use case for transactional variables? I miss any experience - I wrote lot plpgsql lines and newer would it. Here are two use cases, which are neither good nor bad, but that I have in mind when I'm argumenting. (1) First use case I'm thinking of is software update, with persistent transactional variables, eg: -- let assume we have application_version = 1 BEGIN; -- lock things up -- update application schema and data to version 2 -- set application_version = 2 -- unlock things COMMIT; I would not want the application_version to remain at 2 if the COMMIT fails, obviously. This is usually implemented with a one-row table, but some kind of variable syntax could be quite elegant. For this use case, a variable should be persistant, it does not it to be efficient, it should have permissions and should be transactional. (2) Second use case I'm thinking of is some kind of large batch management. -- variable batch_1_is_done = false BEGIN; -- try to do large batch 1... -- set batch_1_is_done = true COMMIT; -- then test whether it worked, do some cleanup if not... -- there are some discussions to get some \if in psql... For this second example, I would not like batch_is_done to be true if the commit failed, but I do not think that any permissions would be useful, and it would be fine if it is just accessible from a session only. When I remove ACID, and allow only one value - then the implementation can be simple and fast - some next step can be support of expandable types. Sure - anybody can use temporary tables now and in future. But it is slow - more now, because we doesn't support global temporary tables. But ACID needs lot of CPU times, needs possible VACUUM, ... Yep, but if you need persistant and transactional then probably you can accept less performant... No ACID variables are simple to implement, simple to directly accessible from any PL (although I am thinking about better support in 2nd phase for PLpgSQL). ACID may be simple to implement with some kind of underlying table, or maybe a row in a table. How efficient it could be is another question, but then if the feature does not allow some use cases, and it not so interesting to have it. That is why I think that it is worth discussing "silly" semantics and syntax. The namespace issue is unclear to me. Would a variable name clash with a table name? It should if you want to be able write "SELECT stuff FROM variablename", which may or may not be a good idea. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compiler warning
On Dec 24, 2016 01:21, "Bruce Momjian"wrote: I am seeing this compiler warning in the 9.4 branch: 9.4: basebackup.c:1284:6: warning: variable 'wait_result' set but not used [-Wunused-but-set-variable] This is on Debian Jessie with gcc version 4.9.2. It is from this commit: commit f6508827afe76b2c3735a9ce073620e708d60c79 Hi! This was already reported by Dean back on the thread on - committers, including one question still to be investigated. I plan to get back to it when I get back from Christmas holidays. /Magnus