Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY
From: Tom Lane [mailto:t...@sss.pgh.pa.us] I wrote: Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes: I have a question. I think it would be also better to extend the syntax for the SQL COPY command in the same way, ie, COPY foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with format 'csv' Yeah, sure --- that case is already superuser-only, so why not give it the option of being a popen instead of just fopen. BTW, one thought that comes to mind is that such an operation is extremely likely to fail under environments such as SELinux. That's not necessarily a reason not to do it, but we should be wary of promising that it will work everywhere. Probably a documentation note about this would be enough. OK I'll revise the patch. Thank you for your advice! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed
On Tue, Sep 18, 2012 at 07:22:39PM -0400, Bruce Momjian wrote: Based on the fact that sql_features exists in the information_schema schema, I don't think 'sql_features' table is actually being processed by pg_upgrade, but I think its TOAST table, because it has a high oid, is being processed because it is in the pg_toast schema. This is causing the mismatch between the old and new clusters. I am thinking this query needs to be split apart into a UNION where the second part handles TOAST tables and looks at the schema of the _owner_ of the TOAST table. Needs to be backpatched too. OK, I am at a conference now so will not be able to write-up a patch until perhaps next week. You can drop the information schema in the old database and pg_upgrade should run fine. I will test your failure once I create a patch. One good thing is that pg_upgrade detected there was a bug in the code and threw an error, rather than producing an inaccurate dump. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal - assign result of query to psql variable
On Fri, Aug 10, 2012 at 3:21 AM, Pavel Stehule pavel.steh...@gmail.com wrote: there is new version of this patch * cleaned var list parser * new regress tests * support FETCH_COUNT 0 Here are my review comments. Submission == The patch is formatted in context diff style, and it could be applied cleanly against latest master. This patch include document and tests, but IMO they need some enhancement. Usability = This patch provides new psql command \gset which sends content of query buffer to server, and stores result of the query into psql variables. The name \gset is mixture of \g, which sends result to file or pipe, and \set, which sets variable to some value, so it would sound natural to psql users. Freature test = Compile completed without warning. Regression tests for \gset passed, but I have some comments on them. - Other regression tests have comment -- ERROR just after queries which should fail. It would be nice to follow this manner. - Typo to few in expected file and source file. - How about adding testing \gset (no variable list) to should fail? - Is it intentional that \gset can set special variables such as AUTOCOMMIT and HOST? I don't see any downside for this behavior, because \set also can do that, but it is not documented nor tested at all. Document - Adding some description of \gset command, especially about limitation of variable list, seems necessary. - In addition to the meta-command section, Advanced features section mentions how to set psql's variables, so we would need some mention there too. - The term target list might not be familiar to users, since it appears in only sections mentioning PG internal relatively. I think that the feature described in the section Retrieving Query Results in ECPG document is similar to this feature. http://www.postgresql.org/docs/devel/static/ecpg-variables.html Coding == The code follows our coding conventions. Here are comments for coding. - Some typo found in comments, please see attached patch. - There is a code path which doesn't print error message even if libpq reported error (PGRES_BAD_RESPONSE, PGRES_NONFATAL_ERROR, PGRES_FATAL_ERROR) in StoreQueryResult. Is this intentional? FYI, ecpg prints bad response message for those errors. Although I'll look the code more closely later, but anyway I marked the patch Waiting on Author for comments above. Regards, -- Shigeru HANADA diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 0e9b408..a76b84d 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -832,7 +832,7 @@ PrintQueryResults(PGresult *results) * * Note: Utility function for use by SendQuery() only. * - * Returns true if the query executed sucessfully, false otherwise. + * Returns true if the query executed successfully, false otherwise. */ static bool StoreQueryResult(PGresult *result) @@ -865,7 +865,7 @@ StoreQueryResult(PGresult *result) { if (!iter) { - psql_error(to few target variables\n); + psql_error(too few target variables\n); success = false; break; } @@ -902,7 +902,7 @@ StoreQueryResult(PGresult *result) case PGRES_COPY_OUT: case PGRES_COPY_IN: - psql_error(COPY isnot supported by \\gset command\n); + psql_error(COPY is not supported by \\gset command\n); success = false; break; @@ -1797,7 +1797,7 @@ expand_tilde(char **filename) /* - * Add name of internal variable to query targer list + * Add name of internal variable to query target list * */ TargetList diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out index 90ab9bd..54fa490 100644 --- a/src/test/regress/expected/psql.out +++ b/src/test/regress/expected/psql.out @@ -15,7 +15,7 @@ select 10, 20, 'Hello World' -- should fail \gset ,, \gset , -to few target variables +too few target variables \gset ,,, too many target variables -- should be ok -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reduce palloc's in numeric operations.
On 14.09.2012 11:25, Kyotaro HORIGUCHI wrote: Hello, I will propose reduce palloc's in numeric operations. The numeric operations are slow by nature, but usually it is not a problem for on-disk operations. Altough the slowdown is enhanced on on-memory operations. I inspcted them and found some very short term pallocs. These palloc's are used for temporary storage for digits of unpaked numerics. The formats of numeric digits in packed and unpaked forms are same. So we can kicked out a part of palloc's using digits in packed numeric in-place to make unpakced one. In this patch, I added new function set_var_from_num_nocopy() to do this. And make use of it for operands which won't modified. Have to be careful to really not modify the operands. numeric_out() and numeric_out_sci() are wrong; they call get_str_from_var(), which modifies the argument. Same with numeric_expr(): it passes the argument to numericvar_to_double_no_overflow(), which passes it to get_str_from_var(). numericvar_to_int8() also modifies its argument, so all the functions that use that, directly or indirectly, must make a copy. Perhaps get_str_from_var(), and the other functions that currently scribble on the arguments, should be modified to not do so. They could easily make a copy of the argument within the function. Then the callers could safely use set_var_from_num_nocopy(). The performance would be the same, you would have the same number of pallocs, but you would get rid of the surprising argument-modifying behavior of those functions. The performance gain seems quite moderate 'SELECT SUM(numeric_column) FROM on_memory_table' for ten million rows and about 8 digits numeric runs for 3480 ms aganst original 3930 ms. It's 11% gain. 'SELECT SUM(int_column) FROM on_memory_table' needed 1570 ms. Similary 8% gain for about 30 - 50 digits numeric. Performance of avg(numeric) made no gain in contrast. Do you think this worth doing? Yes, I think this is worthwhile. I'm seeing an even bigger gain, with smaller numerics. I created a table with this: CREATE TABLE numtest AS SELECT a::numeric AS col FROM generate_series(1, 1000) a; And repeated this query with \timing: SELECT SUM(col) FROM numtest; The execution time of that query fell from about 5300 ms to 4300 ms, ie. about 20%. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: allow to get a number of processed rows by COPY statement
On 16.08.2012 14:43, Pavel Stehule wrote: Hello here is updated patch The patch seems to be truncated, it ends with: *** a/src/test/regress/input/copy.source --- b/src/test/regress/input/copy.source *** *** 106,108 this is just a line full of junk that would error out if parsed --- 106,112 \. copy copytest3 to stdout csv header; + + -- copy should to return processed rows + do $$ + I believe the code changes are OK, but regression test changes are missing. Can you resend the full patch, please? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
florian.schoppm...@emc.com (Florian Schoppmann) writes: In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query --8-- WITH source AS ( SELECT i FROM generate_series(1,10) AS i ) SELECT i FROM source, ( SELECT count(*) AS _n FROM source ) AS _stats WHERE random() 5::DOUBLE PRECISION/_n; --8-- [ doesn't do what you think it should ] I can't get excited about this. Any time you put a volatile function into WHERE, you're playing with fire. The docs warn against it: http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL To do what you want, I'd suggest wrapping the join into a sub-select with an OFFSET 0 clause, which will serve as an optimization fence that prevents the random() call from being pushed down. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
On Wed, Sep 19, 2012 at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: florian.schoppm...@emc.com (Florian Schoppmann) writes: In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query --8-- WITH source AS ( SELECT i FROM generate_series(1,10) AS i ) SELECT i FROM source, ( SELECT count(*) AS _n FROM source ) AS _stats WHERE random() 5::DOUBLE PRECISION/_n; --8-- [ doesn't do what you think it should ] I can't get excited about this. Any time you put a volatile function into WHERE, you're playing with fire. The docs warn against it: http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL To do what you want, I'd suggest wrapping the join into a sub-select with an OFFSET 0 clause, which will serve as an optimization fence that prevents the random() call from being pushed down. I like the more standard CTE approach to optimization fencing where it works: postgres=# WITH source AS ( SELECT i, random() r FROM generate_series(1,10) AS i ) SELECT i FROM source, ( SELECT count(*) AS _n FROM source ) AS _stats WHERE r 5::DOUBLE PRECISION/_n; merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
On Wed, Sep 19, 2012 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: florian.schoppm...@emc.com (Florian Schoppmann) writes: In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query --8-- WITH source AS ( SELECT i FROM generate_series(1,10) AS i ) SELECT i FROM source, ( SELECT count(*) AS _n FROM source ) AS _stats WHERE random() 5::DOUBLE PRECISION/_n; --8-- [ doesn't do what you think it should ] I can't get excited about this. Any time you put a volatile function into WHERE, you're playing with fire. The docs warn against it: http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL To do what you want, I'd suggest wrapping the join into a sub-select with an OFFSET 0 clause, which will serve as an optimization fence that prevents the random() call from being pushed down. You've repeatedly objected to complaints on pgsql-performance on the grounds that WITH is an optimization fence. It seems awfully inconsistent to turn around and say, oh, sometimes it's not a fence after all. It seems that users may not rely on WITH either to do the optimizations necessary to have good performance or to fail to do optimizations that lead to wrong results. Ouch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
Robert Haas robertmh...@gmail.com writes: On Wed, Sep 19, 2012 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: To do what you want, I'd suggest wrapping the join into a sub-select with an OFFSET 0 clause, which will serve as an optimization fence that prevents the random() call from being pushed down. You've repeatedly objected to complaints on pgsql-performance on the grounds that WITH is an optimization fence. It seems awfully inconsistent to turn around and say, oh, sometimes it's not a fence after all. Huh? The join in question is not inside a WITH. If it were, that would work too, as noted by Merlin. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
On Wed, Sep 19, 2012 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Sep 19, 2012 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: To do what you want, I'd suggest wrapping the join into a sub-select with an OFFSET 0 clause, which will serve as an optimization fence that prevents the random() call from being pushed down. You've repeatedly objected to complaints on pgsql-performance on the grounds that WITH is an optimization fence. It seems awfully inconsistent to turn around and say, oh, sometimes it's not a fence after all. Huh? The join in question is not inside a WITH. If it were, that would work too, as noted by Merlin. Oh, hmm. I see now: the problem isn't that random() is being pushed into the WITH, it's that it's being pushed into the join. Sorry, I should have read that more carefully. It still seems like awfully weird behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removal of AcceptInvalidationMessages broke things
I looked into bug #7557, which demonstrates a case where a session fails to notice a just-committed change in table permissions. This is pretty obviously due to a failure to read the sinval message notifying other backends of the pg_class.relacl update. Some digging in the git history says it got broken here: commit 4240e429d0c2d889d0cda23c618f94e12c13ade7 Author: Robert Haas rh...@postgresql.org Date: Fri Jul 8 22:19:30 2011 -0400 Try to acquire relation locks in RangeVarGetRelid. In the previous coding, we would look up a relation in RangeVarGetRelid, lock the resulting OID, and then AcceptInvalidationMessages(). While this was sufficient to ensure that we noticed any changes to the relation definition before building the relcache entry, it didn't handle the possibility that the name we looked up no longer referenced the same OID. This was particularly problematic in the case where a table had been dropped and recreated: we'd latch on to the entry for the old relation and fail later on. Now, we acquire the relation lock inside RangeVarGetRelid, and retry the name lookup if we notice that invalidation messages have been processed meanwhile. Many operations that would previously have failed with an error in the presence of concurrent DDL will now succeed. because that commit removed this bit from relation_openrv: - /* -* Check for shared-cache-inval messages before trying to open the -* relation. This is needed to cover the case where the name identifies a -* rel that has been dropped and recreated since the start of our -* transaction: if we don't flush the old syscache entry then we'll latch -* onto that entry and suffer an error when we do RelationIdGetRelation. -* Note that relation_open does not need to do this, since a relation's -* OID never changes. -* -* We skip this if asked for NoLock, on the assumption that the caller has -* already ensured some appropriate lock is held. -*/ - if (lockmode != NoLock) - AcceptInvalidationMessages(); and there are no other places where a transaction will do AcceptInvalidationMessages if it thinks it already holds lock on the tables it's working with. Since we have only a few hours before 9.2.1 is due to wrap, my inclination for a band-aid fix is to put back that code. There might be some more elegant answer, but we haven't got time to find it now. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
Robert Haas robertmh...@gmail.com writes: It still seems like awfully weird behavior. Why? The WHERE condition relates only to the output of the _stats subquery, so why shouldn't it be evaluated there, rather than after the join? In a green field I might agree that we should de-optimize such cases, but the problem with doing so is that it would totally destroy performance for cases in which a user has defined a function that's actually stable or immutable but they forgot to mark it so. If VOLATILE weren't the default marking, such a change wouldn't be so problematic ... but it is. Given that the behavior has been like this since the late stone age, I'm not inclined to change it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
BTW, what should our advice be for recovering from corruption due to this bug? As far as the btree and GIN problems go, we can tell people that REINDEX will fix it. And in 9.1, you don't really need to worry about the visibility map being bad. But what do you do in 9.2, if you have a bad visibility map? Is there any fix short of VACUUM FULL? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of AcceptInvalidationMessages broke things
On Wed, Sep 19, 2012 at 01:17:17PM -0400, Tom Lane wrote: I looked into bug #7557, which demonstrates a case where a session fails to notice a just-committed change in table permissions. - /* -* Check for shared-cache-inval messages before trying to open the -* relation. This is needed to cover the case where the name identifies a -* rel that has been dropped and recreated since the start of our -* transaction: if we don't flush the old syscache entry then we'll latch -* onto that entry and suffer an error when we do RelationIdGetRelation. -* Note that relation_open does not need to do this, since a relation's -* OID never changes. -* -* We skip this if asked for NoLock, on the assumption that the caller has -* already ensured some appropriate lock is held. -*/ - if (lockmode != NoLock) - AcceptInvalidationMessages(); and there are no other places where a transaction will do AcceptInvalidationMessages if it thinks it already holds lock on the tables it's working with. Since we have only a few hours before 9.2.1 is due to wrap, my inclination for a band-aid fix is to put back that code. There might be some more elegant answer, but we haven't got time to find it now. Sounds fine for now. I suspect the better change would be to make AcceptInvalidationMessages() unconditional in LockRelationOid() and friends. There's no reason to desire recent ACLs only when opening by name. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
On 19 September 2012 18:47, Tom Lane t...@sss.pgh.pa.us wrote: BTW, what should our advice be for recovering from corruption due to this bug? As far as the btree and GIN problems go, we can tell people that REINDEX will fix it. And in 9.1, you don't really need to worry about the visibility map being bad. But what do you do in 9.2, if you have a bad visibility map? Is there any fix short of VACUUM FULL? SET vacuum_freeze_table_age = 0; VACUUM; I'm writing the full notes out now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of AcceptInvalidationMessages broke things
Noah Misch n...@leadboat.com writes: On Wed, Sep 19, 2012 at 01:17:17PM -0400, Tom Lane wrote: Since we have only a few hours before 9.2.1 is due to wrap, my inclination for a band-aid fix is to put back that code. There might be some more elegant answer, but we haven't got time to find it now. Sounds fine for now. I suspect the better change would be to make AcceptInvalidationMessages() unconditional in LockRelationOid() and friends. There's no reason to desire recent ACLs only when opening by name. I think it's enough for now because the first access to a relation in a statement is always a name-based lookup from the parser. Were that not sufficient, we'd have had complaints before. The core problem really is that GRANT/REVOKE don't take any object-level lock on what they're changing. A real fix might require sprinkling AcceptInvalidationMessages calls into aclchk.c, but I'm unsure of the performance costs of that. Anyway, today is not the time to design something better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removal of AcceptInvalidationMessages broke things
On Wed, Sep 19, 2012 at 2:17 PM, Noah Misch n...@leadboat.com wrote: Sounds fine for now. I suspect the better change would be to make AcceptInvalidationMessages() unconditional in LockRelationOid() and friends. There's no reason to desire recent ACLs only when opening by name. I agree, on both counts. I think I failed to realize when doing that refactoring that I was reducing the number of cases where AcceptInvalidationMessages() would actually be called. AFAICS, the only reason why we have such complicated rules for calling that function is that it's traditionally been expensive. But that should be much less true due now due to improvements in 9.2 (cf commit b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4). So we can probably afford to be a little less byzantine about the way we do this now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
On Wed, Sep 19, 2012 at 1:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: It still seems like awfully weird behavior. Why? The WHERE condition relates only to the output of the _stats subquery, so why shouldn't it be evaluated there, rather than after the join? Because my mental model (and apparently that of the OP) is that the WHERE clause gets evaluated separately for each row. Obviously in many cases that can be optimized without changing the results, but not in this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: It still seems like awfully weird behavior. Why? The WHERE condition relates only to the output of the _stats subquery, so why shouldn't it be evaluated there, rather than after the join? In another thread, Tom Lane t...@sss.pgh.pa.us wrote: It's easier to understand why this is if you realize that SQL has a very clear model of a pipeline of query execution. Conceptually, what happens is: 1. Form the cartesian product of the tables listed in FROM (ie, all combinations of rows). 2. Apply the WHERE condition to each row from 1, and drop rows that don't pass it. People expect that the results will be consistent with this model, even if the implementation is optimized under the covers. I think correct semantics should trump performance here. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
On Wed, Sep 19, 2012 at 02:39:12PM -0500, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: It still seems like awfully weird behavior. Why? The WHERE condition relates only to the output of the _stats subquery, so why shouldn't it be evaluated there, rather than after the join? In another thread, Tom Lane t...@sss.pgh.pa.us wrote: It's easier to understand why this is if you realize that SQL has a very clear model of a pipeline of query execution. Conceptually, what happens is: 1. Form the cartesian product of the tables listed in FROM (ie, all combinations of rows). 2. Apply the WHERE condition to each row from 1, and drop rows that don't pass it. People expect that the results will be consistent with this model, even if the implementation is optimized under the covers. I think correct semantics should trump performance here. -Kevin It seems like this is what happens here except that the function is evaluated once for the WHERE and not once per ROW. Both of these meet the criterion for 2 above and Tom's earlier comments both hold. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CTE optimization fence on the todo list?
I would like to have the option of disabling the CTE optimization fence for certain CTEs and/or queries. Can that be added to the official todo list? If not, why not? I would find the option beneficial because large, complicated queries are often a lot clearer, simpler, and easier to read with CTEs than the equivalent query without CTEs. In some cases, the query with CTEs is also faster because of the optimization fence. But in other cases, the fence makes it a lot slower. In the latter cases, you are left with a choice between ugly and slow. If there was some method to disable the optimization fence for certain CTEs or entire queries, then it would be possible to have the best of both worlds. I apologize if this has already been covered before. I could only find two earlier discussions on this topic: http://archives.postgresql.org/pgsql-performance/2011-10/msg00208.php http://archives.postgresql.org/pgsql-performance/2011-11/msg00015.php In the latter, I counted four people would are in support of the general idea: Robert Haas, Andres Freund, Gavin Flower, Justin Pitts. However, I'm sure there are a lot of conflicting ideas on how exactly to go about it, such as whether to enable or disable it by default, the specific syntax to use, backwards compatibility, future-proofing, etc. One good reason to reject it would be if it can't be done with SQL standard syntax and would require some sort of PG-specific hint or GUC variable for the query planner. If so, then I understand that it's opposed for all the same reasons that hints are opposed in general. Another good reason to reject it might be because the only way to disable the CTE fence is to disable it by default. If that were the case, then I would imagine that it would break backwards compatibility, especially in the case of writable CTEs that currently depend on the fence for their current functionality. If there is no way to palatably enable it by default but allow certain CTEs or certain queries to disable it, then I don't see any way around that problem. A third reason I can imagine is that the only desirable solution (e.g. the one without additional non-standard keywords or session GUC variables) is effectively impossible. For example, if it requires that the query planner determine definitively whether a CTE is read only or not, that may be a bridge too far. A fourth possible reason is that the core team feels that CTEs do not improve readability, or that any such readability benefits are not worth the effort to support the option. Personally, I feel that the queries which could most benefit from the readability of CTEs are precisely the same ones that could most often benefit from the performance increase of disabling the fence (particularly if it could be done on a per-CTE basis rather than for the whole query at once). Of course the real reason could be something else entirely, hence this post. Thanks in advance for your feedback. -- DB -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
On 12 September 2012 04:30, Amit Kapila amit.kap...@huawei.com wrote: On Tuesday, September 11, 2012 9:09 PM Alvaro Herrera wrote: Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 2012: We have some use cases for this patch, when can you post a new version? I would test and review it. What use cases do you have in mind? Wouldn't it be helpful for some features like parallel query in future? Trying to solve that is what delayed this patch, so the scope of this needs to be permanent daemons rather than dynamically spawned worker tasks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
k...@rice.edu k...@rice.edu wrote: On Wed, Sep 19, 2012 at 02:39:12PM -0500, Kevin Grittner wrote: In another thread, Tom Lane t...@sss.pgh.pa.us wrote: 2. Apply the WHERE condition to each row from 1, and drop rows that don't pass it. People expect that the results will be consistent with this model, even if the implementation is optimized under the covers. I think correct semantics should trump performance here. It seems like this is what happens here except that the function is evaluated once for the WHERE and not once per ROW. Both of these meet the criterion for 2 above and Tom's earlier comments both hold. There really needs to be some way to specify that when an expression is evaluated for each row in a set, a function used within that expression is not optimized away for some rows. Fortunately we have a way: http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html | VOLATILE indicates that the function value can change even within | a single table scan, so no optimizations can be made. Relatively | few database functions are volatile in this sense; some examples | are random(), [...] The behavior in the OP's query would certainly be sane if the function were not VOLATILE; as it is, I have a hard time seeing this as anything but a bug. There is a workaround, if you don't mind ugly: CREATE FUNCTION random_really_i_mean_it(dummy int) RETURNS double precision LANGUAGE plpgsql VOLATILE AS $$ BEGIN -- no need to reference dummy parameter RETURN random(); END; $$; WITH source AS ( SELECT i FROM generate_series(1,10) AS i ) SELECT i FROM source, ( SELECT count(*) AS _n FROM source ) AS _stats WHERE random_really_i_mean_it(i) 5::DOUBLE PRECISION/_n; -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
Kevin Grittner kevin.gritt...@wicourts.gov wrote: There is a workaround, if you don't mind ugly: Or, better: WITH source AS ( SELECT i, random() AS r FROM generate_series(1,10) AS i ) SELECT i FROM source, ( SELECT count(*) AS _n FROM source ) AS _stats WHERE r 5::DOUBLE PRECISION/_n; -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
-Original Message- There really needs to be some way to specify that when an expression is evaluated for each row in a set, a function used within that expression is not optimized away for some rows. Fortunately we have a way: http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html | VOLATILE indicates that the function value can change even within a | single table scan, so no optimizations can be made. Relatively few | database functions are volatile in this sense; some examples are | random(), [...] The behavior in the OP's query would certainly be sane if the function were not VOLATILE; as it is, I have a hard time seeing this as anything but a bug. What are the arguments against adding a 4th identifier - call it PER_ROW for this argument? The main reason VOLATILE is broken is that it is the default and in order to minimize beginner's penalty it is not treated as such in some situations. The new one could behave just like VOLATILE but would never be optimized away and would always evaluate once for each row in its context. Then the question is whether you write a new random() function or break backwards compatibility and alter the existing version. David J. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
David Johnston pol...@yahoo.com wrote: | VOLATILE indicates that the function value can change even | within a single table scan, so no optimizations can be made. | Relatively few database functions are volatile in this sense; | some examples are random(), [...] What are the arguments against adding a 4th identifier - call it PER_ROW for this argument? The main reason VOLATILE is broken is that it is the default and in order to minimize beginner's penalty it is not treated as such in some situations. The new one could behave just like VOLATILE but would never be optimized away and would always evaluate once for each row in its context. So how would you document that? It sounds like the proposed level would behave exactly as the VOLATILE level is currently documented to behave; so I guess we could shift the documentation of VOLATILE to PER_ROW (or whatever). How would you then describe the behavior of VOLATILE? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
-Original Message- | VOLATILE indicates that the function value can change even within a | single table scan, so no optimizations can be made. | Relatively few database functions are volatile in this sense; some | examples are random(), [...] What are the arguments against adding a 4th identifier - call it PER_ROW for this argument? The main reason VOLATILE is broken is that it is the default and in order to minimize beginner's penalty it is not treated as such in some situations. The new one could behave just like VOLATILE but would never be optimized away and would always evaluate once for each row in its context. So how would you document that? It sounds like the proposed level would behave exactly as the VOLATILE level is currently documented to behave; so I guess we could shift the documentation of VOLATILE to PER_ROW (or whatever). How would you then describe the behavior of VOLATILE? I'm not sure but however we would describe it we might as well make the change now regardless of whether another level is added. The main distinguishing characteristic is that VOLATILE is not guaranteed to evaluate once-per-row if it is not dependent upon particular values within a given row. VOLATILE: A Volatile function used in an ORDER BY or WHERE clause without referencing any columns from the query itself (i.e., no parameters or all constants) will be evaluated a single time and the result treated as a constant (i.e., all rows will have identical values) for that part of the query. PER_ROW: A per_row function will be evaluated once for every row that is visible to the function and will be treated as a virtual column of said relation with each cell having an its own value as a result of the function call. Using random() as an example of the two possible behaviors should further clarify the differences quite nicely. Quick pass - hopefully, a) this inspires someone else, and b) this is the correct understanding in the first place. David J. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
David Johnston pol...@yahoo.com wrote: VOLATILE: A Volatile function used in an ORDER BY or WHERE clause without referencing any columns from the query itself (i.e., no parameters or all constants) will be evaluated a single time and the result treated as a constant (i.e., all rows will have identical values) for that part of the query. I hope you're wrong about the ORDER BY part of that. A quick test confirms that it works in ORDER BY, at least for some cases. If there are any exceptions to that, I would sure like to know about it -- and really soon. select * from generate_series(1, 1) s(n) order by random() limit 10; -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
-Original Message- From: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] Sent: Wednesday, September 19, 2012 5:51 PM To: k...@rice.edu; David Johnston Cc: 'Florian Schoppmann'; 'Robert Haas'; pgsql-hackers@postgresql.org; 'Tom Lane' Subject: RE: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause? David Johnston pol...@yahoo.com wrote: VOLATILE: A Volatile function used in an ORDER BY or WHERE clause without referencing any columns from the query itself (i.e., no parameters or all constants) will be evaluated a single time and the result treated as a constant (i.e., all rows will have identical values) for that part of the query. I hope you're wrong about the ORDER BY part of that. A quick test confirms that it works in ORDER BY, at least for some cases. If there are any exceptions to that, I would sure like to know about it -- and really soon. select * from generate_series(1, 1) s(n) order by random() limit 10; -Kevin I'd rather have someone who knows the code assert one way or the other; I tossed it in there because I thought I've seen people complain that random() doesn't work as expected with ORDER BY but that may just be faulty memory. It may or may not depend on whether LIMIT/OFFSET are involved...? Used in the SELECT-list it gets evaluated for each row and I guess the ORDER BY could have that behavior as well (I would expect it to anyway), so is it strictly limited to WHERE clause evaluation that this discrepancy manifests? David J. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c
On Tuesday, September 18, 2012 04:18:01 AM Robert Haas wrote: Maybe what we should do is - if this is an end-of-recovery checkpoint - *assert* that the BM_PERMANENT bit is set on every buffer we find. That would provide a useful cross-check that we don't have a bug similar to the one Jeff already fixed in any other code path. I haven't looked into the details, but can't a new unlogged relation be created since the last checkpoint and thus have pages in s_b? Data changes to unlogged relations are not WAL-logged, so there's no reason for recovery to ever read them. Even if such a reason existed, there wouldn't be anything to read, because the backing files are unlinked before WAL replay begins. Back then I thought that resetting the relation by copying the init fork might use the buffer cache. It doesn't atm... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?
Tom Lane t...@sss.pgh.pa.us wrote: florian.schoppm...@emc.com (Florian Schoppmann) writes: [VOLATILE function in WHERE clause *does* get optimized] I can't get excited about this. Any time you put a volatile function into WHERE, you're playing with fire. The docs warn against it: http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX- EXPRESS-EVAL All this section tells me is that one cannot rely on the evaluation order in expressions, and that side effects are dangerous in WHERE and HAVING clauses. I do not read in this section that VOLATILE functions are unsafe per se. After all, a VOLATILE function is not required to have side effects (suppose, e.g., somebody implemented a true random nubmer generator). However, http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html is actually very clear in its wording: | VOLATILE indicates that the function value can change even within a | single table scan, so no optimizations can be made. I therefore tend to see the behavior as a bug. I concede though that there is also good reason to keep the current behavior (VOLATILE is the default for UDFs, etc.). But then I think the documentation needs to be changed, and it has to be made explicit where the optimizer may change semantics and what, on the other hand, is defined behavior. E.g., users should need to know if the following rewrite causes defined or undefined behavior. Does it give correct results by accident, or are correct results guaranteed? --8-- WITH source AS ( SELECT i FROM generate_series(1,10) AS i ) SELECT i FROM source AS _stats WHERE random() 5::DOUBLE PRECISION / (SELECT count(*) FROM source); --8-- To do what you want, I'd suggest wrapping the join into a sub-select with an OFFSET 0 clause, which will serve as an optimization fence that prevents the random() call from being pushed down. My interpretation so far is that VOLATILE functions in a WHERE clause can always be avoided: E.g., move the condition to the SELECT list and embed in an outer query that then filters on the condition column. Or is my assumption wrong, and the optimizer could theoretically interfere even here? Florian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] newline conversion in SQL command strings
I have received a number of bug reports about plsh choking on Windows-style line endings. The problem is that the user uses some Windows-based tool or other to execute an SQL command line this: CREATE FUNCTION foo() RETURNS somethingCRLF LANGUAGE plshCRLF AS $$CRLF #!/bin/shCRLF CRLF do somethingCRLF do somethingCRLF $$;CRLF which (apparently, I don't have Windows handy) creates a function with the prosrc body of 'CRLF #!/bin/shCRLF CRLF do somethingCRLF do somethingCRLF ' But executing this fails because Unix shells reject CR characters in inappropriate places as syntax errors. I don't know how to handle that. It would be unfortunate to have the behavior of a function depend on the kind of client used to create or modify it. I suppose the other more mainstream PLs don't expose that problem so much because the interpreters can handle either kind of line ending. But there could still be functional differences if for example a line ending is embedded into a string constant. Ideas? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit API for large object
I checked this patch. It can be applied onto the latest master branch without any problems. My comments are below. 2012/9/11 Tatsuo Ishii is...@postgresql.org: Ok, here is the patch to implement 64-bit API for large object, to allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to 32KB). The patch is based on Jeremy Drake's patch posted on September 23, 2005 (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php) and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai for the backend part and Yugo Nagata for the rest(including documentation patch). Here are changes made in the patch: 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata) lo_initialize() gathers backend 64-bit large object handling function's oid, namely lo_lseek64, lo_tell64, lo_truncate64. If client calls lo_*64 functions and backend does not support them, lo_*64 functions return error to caller. There might be an argument since calls to lo_*64 functions can automatically be redirected to 32-bit older API. I don't know this is worth the trouble though. I think it should definitely return an error code when user tries to use lo_*64 functions towards the backend v9.2 or older, because fallback to 32bit API can raise unexpected errors if application intends to seek the area over than 2GB. Currently lo_initialize() throws an error if one of oids are not available. I doubt we do the same way for 64-bit functions since this will make 9.3 libpq unable to access large objects stored in pre-9.2 PostgreSQL servers. It seems to me the situation to split the case of pre-9.2 and post-9.3 using a condition of conn-sversion = 90300. To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr is a pointer to 64-bit integer and actual data is placed somewhere else. There might be other way: add new member to union u to store 64-bit integer: typedef struct { int len; int isint; union { int*ptr;/* can't use void (dec compiler barfs) */ int integer; int64 bigint; /* 64-bit integer */ } u; } PQArgBlock; I'm a little bit worried about this way because PQArgBlock is a public interface. I'm inclined to add a new field for the union; that seems to me straight forward approach. For example, the manner in lo_seek64() seems to me confusable. It set 1 on isint field even though pointer is delivered actually. + argv[1].isint = 1; + argv[1].len = 8; + argv[1].u.ptr = (int *) len; Also we add new type pg_int64: #ifndef NO_PG_INT64 #define HAVE_PG_INT64 1 typedef long long int pg_int64; #endif in postgres_ext.h per suggestion from Tom Lane: http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php I'm uncertain about context of this discussion. Does it make matter if we include stdint.h and use int64_t instead of the self defined data type? 2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai) Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle 64-bit seek position and data length. loread64 and lowrite64 are not added because if a program tries to read/write more than 2GB at once, it would be a sign that the program need to be re-designed anyway. I think it is a reasonable. 3) Backend inv_api.c functions(Nozomi Anzai) No need to add new functions. Just extend them to handle 64-bit data. BTW , what will happen if older 32-bit libpq accesses large objects over 2GB? lo_read and lo_write: they can read or write lobjs using 32-bit API as long as requested read/write data length is smaller than 2GB. So I think we can safely allow them to access over 2GB lobjs. lo_lseek: again as long as requested offset is smaller than 2GB, there would be no problem. lo_tell:if current seek position is beyond 2GB, returns an error. Even though iteration of lo_lseek() may move the offset to 4TB, it also makes unavailable to use lo_tell() to obtain the current offset, so I think it is reasonable behavior. However, error code is not an appropriate one. + if (INT_MAX offset) + { + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), +errmsg(invalid large-object descriptor: %d, fd))); + PG_RETURN_INT32(-1); + } According to the manpage of lseek(2) EOVERFLOW The resulting file offset cannot be represented in an off_t. Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW. 4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata) Comments and suggestions are welcome. miscellaneous comments are below. Regression test is helpful. Even though no need to try to create 4TB large object, it is helpful to write some chunks around the design boundary. Could
Re: [v9.3] Extra Daemons (Re: [HACKERS] elegant and effective way for running jobs inside a database)
On Thursday, September 20, 2012 1:44 AM Simon Riggs wrote: On 12 September 2012 04:30, Amit Kapila amit.kap...@huawei.com wrote: On Tuesday, September 11, 2012 9:09 PM Alvaro Herrera wrote: Excerpts from Boszormenyi Zoltan's message of vie jun 29 09:11:23 -0400 2012: We have some use cases for this patch, when can you post a new version? I would test and review it. What use cases do you have in mind? Wouldn't it be helpful for some features like parallel query in future? Trying to solve that is what delayed this patch, so the scope of this needs to be permanent daemons rather than dynamically spawned worker tasks. Why can't worker tasks be also permanent, which can be controlled through configuration. What I mean to say is that if user has need for parallel operations he can configure max_worker_tasks and those many worker tasks will get created. Otherwise without having such parameter, we might not be sure whether such deamons will be of use to database users who don't need any background ops. The dynamism will come in to scene when we need to allocate such daemons for particular ops(query), because might be operation need certain number of worker tasks, but no such task is available, at that time it need to be decided whether to spawn a new task or change the parallelism in operation such that it can be executed with available number of worker tasks. Although I understood and agree that such permanent daemons will be useful for usecases other than parallel operations. However my thinking is that having permanent daemons can also be useful for parallel ops. So even currently it is getting developed for certain usecases but the overall idea can be enhanced to have them for parallel ops as well. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers