Re: [HACKERS] [rfc] overhauling pgstat.stat
Sent from my iPad On 04-Sep-2013, at 10:54, Satoshi Nagayasu wrote: > Hi, > > (2013/09/04 12:52), Atri Sharma wrote: >> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu wrote: >>> Hi, >>> >>> I'm considering overhauling pgstat.stat, and would like to know how many >>> people are interested in this topic. >>> >>> As you may know, this file could be handreds of MB in size, because >>> pgstat.stat holds all access statistics in each database, and it needs >>> to read/write an entire pgstat.stat frequently. >>> >>> As a result, pgstat.stat often generates massive I/O operation, >>> particularly when having a large number of tables in the database. >>> >>> To support multi-tenancy or just a large number of tables (up to 10k >>> tables in single database), I think pgstat.stat needs to be overhauled. >>> >>> I think using heap and btree in pgstat.stat would be preferred to reduce >>> read/write and to allow updating access statistics for specific tables >>> in pgstat.stat file. >>> >>> Is this good for us? >> >> Hi, >> >> Nice thought. I/O reduction in pgstat can be really helpful. >> >> I am trying to think of our aim here. Would we be looking to split >> pgstat per table, so that the I/O write happens for only a portion of >> pgstat? Or reduce the I/O in general? > > I prefer the latter. > > Under the current implementation, DBA need to split single database > into many smaller databases with considering access locality of the > tables. It's difficult and could be change in future. > > And splitting the statistics data into many files (per table, > for example) would cause another performance issue when > collecting/showing statistics at once. Just my guess though. > > So, I'm looking for a new way to reduce I/O for the statistics data > in general. > > Regards, > >> >> If the later, how would using BTree help us? I would rather go for a >> range tree or something. But again, I may be completely wrong. >> >> Please elaborate a bit more on the solution we are trying to >> achieve.It seems really interesting. >> >> Regards, >> >> Atri >> >> > > Right,thanks. How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats? Regards, Atri -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
Hi, (2013/09/04 12:52), Atri Sharma wrote: On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu wrote: Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? I prefer the latter. Under the current implementation, DBA need to split single database into many smaller databases with considering access locality of the tables. It's difficult and could be change in future. And splitting the statistics data into many files (per table, for example) would cause another performance issue when collecting/showing statistics at once. Just my guess though. So, I'm looking for a new way to reduce I/O for the statistics data in general. Regards, If the later, how would using BTree help us? I would rather go for a range tree or something. But again, I may be completely wrong. Please elaborate a bit more on the solution we are trying to achieve.It seems really interesting. Regards, Atri -- Satoshi Nagayasu Uptime Technologies, LLC. http://www.uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
Hi, (2013/09/04 13:07), Alvaro Herrera wrote: Satoshi Nagayasu wrote: As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. We already changed it: > > commit 187492b6c2e8cafc5b39063ca3b67846e8155d24 > Author: Alvaro Herrera > Date: Mon Feb 18 17:56:08 2013 -0300 > > Split pgstat file in smaller pieces Thanks for the comments. I forgot to mention that. Yes, we have already split single pgstat.stat file into several pieces. However, we still need to read/write large amount of statistics data when we have a large number of tables in single database or multiple databases being accessed. Right? I think the issue here is that it is necessary to write/read statistics data even it's not actually changed. So, I'm wondering how we can minimize read/write operations on these statistics data files with using heap and btree. Regards, commit 187492b6c2e8cafc5b39063ca3b67846e8155d24 Author: Alvaro Herrera Date: Mon Feb 18 17:56:08 2013 -0300 Split pgstat file in smaller pieces We now write one file per database and one global file, instead of having the whole thing in a single huge file. This reduces the I/O that must be done when partial data is required -- which is all the time, because each process only needs information on its own database anyway. Also, the autovacuum launcher does not need data about tables and functions in each database; having the global stats for all DBs is enough. Catalog version bumped because we have a new subdir under PGDATA. Author: Tomas Vondra. Some rework by Álvaro Testing by Jeff Janes Other discussion by Heikki Linnakangas, Tom Lane. I don't oppose further tweaking, of course, but I wonder if you are considering these changes. -- Satoshi Nagayasu Uptime Technologies, LLC. http://www.uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?
(2013/09/04 2:34), Josh Berkus wrote: > I've checked the PostgreSQL License, and had a question. http://opensource.org/licenses/PostgreSQL >> >> I don't know who opensource.org are, but their idea of "the PostgreSQL >> license" doesn't appear to match the actual license text. > > The OSI, which is the organization which defines whether licenses are > open source or not. We requested that they list The PostgreSQL License > on their site after Red Hat determined that our license had sufficent > textual differences from the BSD License to be a different license. > > I've requested that the spurious "file name" reference be removed. > Thank you for checking it soon. I understands that "the PostgreSQL license" does not require any specified file name and missing "LICENSE" is not problem. I'm Sorry for the stupid question. regards, Tomonari Katsumata -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [tiny doc fix] statistics are not retained across immediate shutdown
Hi, In the following page, statistics are kept across server restarts: http://www.postgresql.org/docs/current/static/monitoring-stats.html "When the server shuts down, a permanent copy of the statistics data is stored in the global subdirectory, so that statistics can be retained across server restarts." However, statistics are not retained after immediate shutdown (pg_ctl stop -mi). You may say "pg_ctl stop -mi is not a shutdown but an abort, so the sentence is not wrong", but it's an "immediate shutdown" and one mode of shutdown. I propose a tiny fix to clarify this. Please find the attached patch. I'd like this to be backported at least 9.2. Thanks. Regards, Takayuki Tsunakawa stats_reset_in_recovery.patch Description: stats_reset_in_recovery.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?
Hi, Hackers! I find that it takes a long time when I increase the scale of a numeric datatype. By checking the code, I found that's because it needs to rewrite that table's file. After checking that table's data file, I found only parameter n_header changed. And, I found the data in that numeric field never changed. So I thank It's not necessary to rewrite the table's file in this case. Anyone has more idea about this, please come to talk about this! Best Regards! Yours, Wang Shuo HighGo Software Co.,Ltd. September 4, 2013 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
Satoshi Nagayasu wrote: > As you may know, this file could be handreds of MB in size, because > pgstat.stat holds all access statistics in each database, and it needs > to read/write an entire pgstat.stat frequently. > > As a result, pgstat.stat often generates massive I/O operation, > particularly when having a large number of tables in the database. We already changed it: commit 187492b6c2e8cafc5b39063ca3b67846e8155d24 Author: Alvaro Herrera Date: Mon Feb 18 17:56:08 2013 -0300 Split pgstat file in smaller pieces We now write one file per database and one global file, instead of having the whole thing in a single huge file. This reduces the I/O that must be done when partial data is required -- which is all the time, because each process only needs information on its own database anyway. Also, the autovacuum launcher does not need data about tables and functions in each database; having the global stats for all DBs is enough. Catalog version bumped because we have a new subdir under PGDATA. Author: Tomas Vondra. Some rework by Álvaro Testing by Jeff Janes Other discussion by Heikki Linnakangas, Tom Lane. I don't oppose further tweaking, of course, but I wonder if you are considering these changes. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [rfc] overhauling pgstat.stat
On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu wrote: > Hi, > > I'm considering overhauling pgstat.stat, and would like to know how many > people are interested in this topic. > > As you may know, this file could be handreds of MB in size, because > pgstat.stat holds all access statistics in each database, and it needs > to read/write an entire pgstat.stat frequently. > > As a result, pgstat.stat often generates massive I/O operation, > particularly when having a large number of tables in the database. > > To support multi-tenancy or just a large number of tables (up to 10k > tables in single database), I think pgstat.stat needs to be overhauled. > > I think using heap and btree in pgstat.stat would be preferred to reduce > read/write and to allow updating access statistics for specific tables > in pgstat.stat file. > > Is this good for us? Hi, Nice thought. I/O reduction in pgstat can be really helpful. I am trying to think of our aim here. Would we be looking to split pgstat per table, so that the I/O write happens for only a portion of pgstat? Or reduce the I/O in general? If the later, how would using BTree help us? I would rather go for a range tree or something. But again, I may be completely wrong. Please elaborate a bit more on the solution we are trying to achieve.It seems really interesting. Regards, Atri -- Regards, Atri l'apprenant -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
On Tue, Sep 3, 2013 at 09:44:33PM -0500, Merlin Moncure wrote: > It gets worse and worse. The IS NULL operator is already pretty much > special cased -- in just about all other case concerning rowtypes (for > example coalesce) 'null containing rowtypes are *not* considered to be > null as the container itself has a null bit independent of it's > elements which is expressly contrary to the SQL standard. This is > tragic; postgres's way of doing it (except with IS NULL) makes an > awful lot of sense to me but here we are. I think before making any > behavior changes at all, we need to ask ourselves: > > 1. Are we willing to break compatibility in order to move to spec > compliant behavior? > 2. and if so, what mountains do we have to cross to get there? > > Your proposed change (implementation details aside) seems ok in the > sense that it doesn't seem to have a lot of obvious side effects but > the elephant in the room is #1; if the answer is 'no', then I'd say > the best course of action is to let things be. Yes, these are the right questions. If we leave things unchanged, can we document the behavior we currently have? What I _think_ we have now is that IS NULL in queries checks two levels deep for nulls, but checks only one level deep for PL/pgSQL and constraints. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
On Tue, Sep 3, 2013 at 10:27:38PM -0400, Tom Lane wrote: > I wrote: > > And I will say once more that a patch that affects only the behavior of > > eval_const_expressions can be rejected on its face. That code has to be > > kept in sync with the behavior of execQual.c, not just whacked around by > > itself. And then there are the NOT NULL constraint cases to worry about. > > Hmm ... actually, it's already not in sync, because: > > regression=# create table tt (x int); > CREATE TABLE > regression=# insert into tt values(null); > INSERT 0 1 > regression=# select row(x) from tt; > row > - > () > (1 row) > > regression=# select row(row(x)) from tt; > row > > ("()") > (1 row) > > regression=# select row(row(row(x))) from tt; > row > -- > ("(""()"")") > (1 row) Uh, I see the same output you show for a NULL constant: SELECT ROW(NULL); row - () SELECT ROW(ROW(NULL)); row ("()") SELECT ROW(ROW(ROW(NULL))); row -- ("(""()"")") -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
On Tue, Sep 3, 2013 at 09:43:14PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > This has made me adjust my goal and change it so SELECT ROW(NULL) IS > > NULL returns true, and any further nesting returns false. > > AFAICS, the only good argument for breaking backwards compatibility here > is if you can convince people that the new behavior is more conformant to > the SQL spec. Where's the chapter and verse that argues for this > interpretation? The SQL03 standard in section 8.7, table 14, says "degree 1: null" and "degree > 1: all null". Does that mean they are considering nested rows as degree > 1, or is that the number of values in the row? A footnote says: For all R, "R IS NOT NULL" has the same result as "NOT R IS NULL" if and only if R is of degree 1. which seems to support the idea that degree is the number of values, meaning they don't discuss nesting. > And I will say once more that a patch that affects only the behavior of > eval_const_expressions can be rejected on its face. That code has to be > kept in sync with the behavior of execQual.c, not just whacked around by > itself. And then there are the NOT NULL constraint cases to worry about. I thought execQual.c was already not recursing so I didn't see a need to change that. I could not figure out how to test a NOT NULL constraint for nesting. What is driving my research here is that our current behavior is really not documentable. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
On Tue, Sep 3, 2013 at 8:32 PM, Bruce Momjian wrote: > On Fri, Jul 5, 2013 at 10:21:19AM -0400, Bruce Momjian wrote: >> On Thu, Jul 4, 2013 at 04:29:20PM -0400, Tom Lane wrote: >> > Bruce Momjian writes: >> > > I developed the attached patch which properly recurses into ROW() >> > > records checking for NULLs; you can see it returns the right answer in >> > > all cases (and constant folds too): >> > >> > My recollection of the previous discussion is that we didn't have >> > consensus on what the "right" behavior is, so I'm not sure you can just >> > assert that this patch is right. In any case this is only touching the >> > tip of the iceberg. If we intend that rows of nulls should be null, >> > then we have got issues with, for example, NOT NULL column constraint >> > checks, which don't have any such recursion built into them. I think >> > the same is true for plpgsql variable NOT NULL restrictions, and there >> > are probably some other places. >> >> Well we have three cases: >> >> 1 SELECT ROW(NULL) IS NULL; >> 2 SELECT ROW(ROW(NULL)) IS NULL; >> 3 SELECT ROW(ROW(ROW(NULL))) IS NULL; >> >> I think we could have them all return false, or all true, or the first >> one true, and the rest false. What I don't think we can justify is 1 >> and 2 as true, and 3 false. > > I have done some more research in this, and was able to verify Tom's > concern that PL/pgSQL's IS NULL doesn't recurse into ROW expressions: > > DO LANGUAGE plpgsql $$ > DECLARE > r RECORD; > BEGIN > > SELECT NULL INTO r; > IF (r IS NULL) > THEN RAISE NOTICE 'true'; > ELSE RAISE NOTICE 'false'; > END IF; > END; > $$; > NOTICE: true > DO > > In this test, SELECT NULL (which internally would produce SELECT > ROW(NULL)), returns TRUE, while SELECT ROW(NULL) and further nesting > returns false. > > This has made me adjust my goal and change it so SELECT ROW(NULL) IS > NULL returns true, and any further nesting returns false. It gets worse and worse. The IS NULL operator is already pretty much special cased -- in just about all other case concerning rowtypes (for example coalesce) 'null containing rowtypes are *not* considered to be null as the container itself has a null bit independent of it's elements which is expressly contrary to the SQL standard. This is tragic; postgres's way of doing it (except with IS NULL) makes an awful lot of sense to me but here we are. I think before making any behavior changes at all, we need to ask ourselves: 1. Are we willing to break compatibility in order to move to spec compliant behavior? 2. and if so, what mountains do we have to cross to get there? Your proposed change (implementation details aside) seems ok in the sense that it doesn't seem to have a lot of obvious side effects but the elephant in the room is #1; if the answer is 'no', then I'd say the best course of action is to let things be. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] getting rid of maintainer-check
The maintainer-check target never really caught on, I think. Most people don't run it, and that in turn annoys those who do. Also, it doesn't provide much functionality. I propose that we get rid of it and roll the functionality into the regular build. Specifically: - Running duplicate_oids during the regular build was already discussed elsewhere recently. There are some details to be resolved there, but it's doable. - Checking for tabs in SGML files can be run during the regular documentation build without problems. - The NLS checks can also be run during the regular NLS-enabled build. That's it. Any concerns? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
I wrote: > And I will say once more that a patch that affects only the behavior of > eval_const_expressions can be rejected on its face. That code has to be > kept in sync with the behavior of execQual.c, not just whacked around by > itself. And then there are the NOT NULL constraint cases to worry about. Hmm ... actually, it's already not in sync, because: regression=# create table tt (x int); CREATE TABLE regression=# insert into tt values(null); INSERT 0 1 regression=# select row(x) from tt; row - () (1 row) regression=# select row(row(x)) from tt; row ("()") (1 row) regression=# select row(row(row(x))) from tt; row -- ("(""()"")") (1 row) There's certainly no excuse for this behaving differently from the cases with a simple constant NULL. So I'm a bit inclined to say that we should rip out the special case in eval_const_expressions, not make it even less self-consistent. It's possible to argue that existing applications won't be too sensitive to the behavior of the constant cases, but they surely must be depending on the behavior in the non-constant cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
Bruce Momjian writes: > This has made me adjust my goal and change it so SELECT ROW(NULL) IS > NULL returns true, and any further nesting returns false. AFAICS, the only good argument for breaking backwards compatibility here is if you can convince people that the new behavior is more conformant to the SQL spec. Where's the chapter and verse that argues for this interpretation? And I will say once more that a patch that affects only the behavior of eval_const_expressions can be rejected on its face. That code has to be kept in sync with the behavior of execQual.c, not just whacked around by itself. And then there are the NOT NULL constraint cases to worry about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange IS NULL behaviour
On Fri, Jul 5, 2013 at 10:21:19AM -0400, Bruce Momjian wrote: > On Thu, Jul 4, 2013 at 04:29:20PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > I developed the attached patch which properly recurses into ROW() > > > records checking for NULLs; you can see it returns the right answer in > > > all cases (and constant folds too): > > > > My recollection of the previous discussion is that we didn't have > > consensus on what the "right" behavior is, so I'm not sure you can just > > assert that this patch is right. In any case this is only touching the > > tip of the iceberg. If we intend that rows of nulls should be null, > > then we have got issues with, for example, NOT NULL column constraint > > checks, which don't have any such recursion built into them. I think > > the same is true for plpgsql variable NOT NULL restrictions, and there > > are probably some other places. > > Well we have three cases: > > 1 SELECT ROW(NULL) IS NULL; > 2 SELECT ROW(ROW(NULL)) IS NULL; > 3 SELECT ROW(ROW(ROW(NULL))) IS NULL; > > I think we could have them all return false, or all true, or the first > one true, and the rest false. What I don't think we can justify is 1 > and 2 as true, and 3 false. I have done some more research in this, and was able to verify Tom's concern that PL/pgSQL's IS NULL doesn't recurse into ROW expressions: DO LANGUAGE plpgsql $$ DECLARE r RECORD; BEGIN SELECT NULL INTO r; IF (r IS NULL) THEN RAISE NOTICE 'true'; ELSE RAISE NOTICE 'false'; END IF; END; $$; NOTICE: true DO In this test, SELECT NULL (which internally would produce SELECT ROW(NULL)), returns TRUE, while SELECT ROW(NULL) and further nesting returns false. This has made me adjust my goal and change it so SELECT ROW(NULL) IS NULL returns true, and any further nesting returns false. Attached is a patch which accomplishes this, and a documentation update. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index 00f8ffb..51bfe31 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 351,357 for both tests. This definition conforms to the SQL standard, and is a change from the inconsistent behavior exhibited by PostgreSQL ! versions prior to 8.2. --- 351,359 for both tests. This definition conforms to the SQL standard, and is a change from the inconsistent behavior exhibited by PostgreSQL ! versions prior to 8.2. Row expressions inside row ! expressions are processed as non-null, even if the sub-row contains only ! nulls. diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c new file mode 100644 index 76c032c..ce7047f *** a/src/backend/optimizer/util/clauses.c --- b/src/backend/optimizer/util/clauses.c *** eval_const_expressions_mutator(Node *nod *** 3100,3109 return makeBoolConst(false, false); continue; } newntest = makeNode(NullTest); newntest->arg = (Expr *) relem; newntest->nulltesttype = ntest->nulltesttype; ! newntest->argisrow = type_is_rowtype(exprType(relem)); newargs = lappend(newargs, newntest); } /* If all the inputs were constants, result is TRUE */ --- 3100,3124 return makeBoolConst(false, false); continue; } + /* Do we have a ROW() expression in the row? */ + if (type_is_rowtype(exprType(relem))) + { + /* + * ROW values in ROW values are not null, even if + * those sub-ROW values contain only nulls. This + * makes the code match PL/pgSQL and constraint + * IS NULL checks which don't probe into sub-ROWs + * for NULL values. + */ + if (ntest->nulltesttype == IS_NULL) + return makeBoolConst(false, false); + continue; + } + newntest = makeNode(NullTest); newntest->arg = (Expr *) relem; newntest->nulltesttype = ntest->nulltesttype; ! newntest->argisrow = false; newargs = lappend(newargs, newntest); } /* If all the inputs were constants, result is TRUE */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [rfc] overhauling pgstat.stat
Hi, I'm considering overhauling pgstat.stat, and would like to know how many people are interested in this topic. As you may know, this file could be handreds of MB in size, because pgstat.stat holds all access statistics in each database, and it needs to read/write an entire pgstat.stat frequently. As a result, pgstat.stat often generates massive I/O operation, particularly when having a large number of tables in the database. To support multi-tenancy or just a large number of tables (up to 10k tables in single database), I think pgstat.stat needs to be overhauled. I think using heap and btree in pgstat.stat would be preferred to reduce read/write and to allow updating access statistics for specific tables in pgstat.stat file. Is this good for us? Any comments or suggestions? Regards, -- Satoshi Nagayasu Uptime Technologies, LLC. http://www.uptime.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
>> 1)Addition of new string data types NATIONAL CHARACTER and NATIONAL >> CHARACTER VARIABLE. >> These types differ from the char/varchar data types in one important >> respect: NATIONAL string types are always have UTF8 encoding even >> (independent from used database encoding). >I don't like the approach of adding a new data type for this. The encoding used for a text field should be an implementation detail, not something that's exposed to users at the schema-level. A separate data type makes an >nvarchar field behave slightly differently from text, for example when it's passed to and from functions. It will also require drivers and client applications to know about it. Hi, my task is implementing ANSI NATIONAL character string types as part of PostgreSQL core. And requirement " require drivers and client applications to know about it" is reason why it could not be done as add-on (these new types should have a fixed OID for most drivers from my experience). Implementing them as UTF8 data-type is first step which allows have NATIONAL characters with encoding differ from database encoding (and might me even support multiple encoding for common string types in future). >> 1)Full set of string functions and operators for NATIONAL types (we >> could not use generic text functions because they assume that the >> stings will have database encoding). >> Now only basic set implemented. >> 2)Need implement some way to define default collation for a NATIONAL >> types. >> 3)Need implement some way to input UTF8 characters into NATIONAL types >> via SQL (there are serious open problem... it will be defined later >> in the text). >Yeah, all of these issues stem from the fact that the NATIONAL types are separate from text. >I think we should take a completely different approach to this. Two alternatives spring to mind: >1. Implement a new encoding. The new encoding would be some variant of >UTF-8 that encodes languages like Russian more efficiently. Then just use that in the whole database. Something like SCSU >(http://www.unicode.org/reports/tr6/) should do the trick, although I'm not sure if SCSU can be used as a server-encoding. A lot of code relies on the fact that a server encoding must have the high bit set in all bytes that >are part of a multi-byte character. That's why SJIS for example can only be used as a client-encoding. But surely you could come up with some subset or variant of SCSU which satisfies that requirement. >2. Compress the column. Simply do "ALTER TABLE foo ALTER COLUMN bar SET STORAGE MAIN". That will make Postgres compress that field. That might not be very efficient for compressing short Cyrillic text encoded in >UTF-8 today, but that could be improved. There has been discussion on supporting more compression algorithms in the past, and one such algorithm could be again something like SCSU. Both of these approach requires dump/restore the whole database which is not always an opinion. Implementing an UTF8 NATIONAL character as new datatype will provide opinion use pg_upgrade to latest version and have required functionality without prolonged downtime. PS: is it possible to reserve some narrow type OID range in PostgreSQL core for the future use? Kind Regards, Maksym -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] operator precedence issues
On Tue, Sep 3, 2013 at 9:13 AM, Tom Lane wrote: > Andres Freund writes: >> On 2013-09-03 08:59:53 -0500, Merlin Moncure wrote: >>> While playing around with Andres's trick, I noticed that it works but >>> will not match against operators taking "any" although those will >>> match with explicit schema declaration (FWICT it goes through the >>> search_path trying to explicitly match int/int operator then goes >>> again matches "any"). That's pretty weird: > >> Not surprising. We look for the best match for an operator and >> explicitly matching types will be that. If there were no operator(int, >> int) your anyelement variant should get called. > > Yeah, this has exactly nothing to do with operator precedence. > Precedence is about which operator binds tighter in cases like "A+B*C". That all makes perfect sense -- thanks guys. For posterity, Andres's trick worked and did end up saving me some coding after all -- in my case I have to eval() some externally generated fairly complex expressions in SQL (via pl/pgsql EXECUTE) in the context of a much larger query. My de-parsing code ended up having bugs and it was much easier to tip-toe around the search_path (via SET LOCAL) and force the modified operator. merlin
Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)
On 2013-09-03 23:25:01 +0200, Hannu Krosing wrote: > On 09/03/2013 05:27 PM, Robert Haas wrote: > > On Thu, Aug 29, 2013 at 8:22 PM, Ants Aasma wrote: > >> I might give it a shot later this cycle as I have familiarized my self > >> with the problem domain anyway. I understand the appeal of staying > >> with what we have, but this would cap the speedup at 4x and has large > >> caveats with the extra lookup tables. A 28x speedup might be worth the > >> extra effort. > > I agree. However, on the flip side, a bird in the hand is worth two > > in the bush. A patch that just does the same thing faster is likely > > to be less controversial than changing the algorithm, and does not > > preclude changing the algorithm later. > > > Has anybody looked at the recent i5/i7 added hrdware support for CRC32 ? > > http://www.strchr.com/crc32_popcnt > > From number there it is probably still slower than xxhash, but it might > be worth doing as > an interim optimisation. Yes. There's two problems: For one they use a different polynom (castagnoli afair), so the results aren't compatible with ones we compute so far. For another, you need runtime detection of whether the instruction is available and you need some compiler specific things to use it. Neither is too hard to solve, but I think that amount of work is better spent using a different hash. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
On 2013-09-03 15:56:15 -0400, Robert Haas wrote: > On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund wrote: > >> And why is every 15 seconds good enough? > > > > Waiting 15s to become consistent instead of checkpoint_timeout seems to > > be ok to me and to be a good tradeoff between overhead and waiting. We > > can certainly discuss other values or making it configurable. The latter > > seemed to be unnecessary to me, but I have don't have a problem > > implementing it. I just don't want to document it :P > > I don't think it particularly needs to be configurable, but I wonder > if we can't be a bit smarter about when we do it. For example, > suppose we logged it every 15 s but only until we log a non-overflowed > snapshot. There's actually more benefits than just overflowed snapshots (pruning of the known xids machinery, exclusive lock cleanup). > I realize that the overhead of a WAL record every 15 > seconds is fairly small, but the load on some systems is all but > nonexistent. It would be nice not to wake up the HD unnecessarily. The patch as-is only writes if there has been WAL written since the last time it logged a running_xacts. I think it's not worth building more smarts than that? > >> The WAL writer is supposed to call XLogBackgroundFlush() every time > >> WalWriterDelay expires. Yeah, it can hibernate, but if it's > >> hibernating, then we should respect that decision for this WAL record > >> type also. > > > > Why should we respect it? > > Because I don't see any reason to believe that this WAL record is any > more important or urgent than any other WAL record that might get > logged. I can't follow the logic behind that statement. Just about all WAL records are either pretty immediately flushed afterwards or are done in the context of a transaction where we flush (or do a XLogSetAsyncXactLSN) at transaction commit. XLogBackgroundFlush() won't necessarily flush the running_xacts record. Unless you've set the async xact lsn it will only flush complete blocks. So what can happen (I've seen that more than once in testing, took me a while to debug) that a checkpoint is started in a busy period but nothing happens after it finished. Since the checkpoint triggered running_xact is triggered *before* we do the smgr flush it still is overflowed. Then, after activity has died down, the bgwriter issues the running xact record, but it's filling a block and thus it never get's flushed. To me the alternatives are to do a XLogSetAsyncXactLSN() or an XLogFlush(). The latter is more aggressive and can block for a measurable amount of time, which is why I don't want to do it in the bgwriter. > >> >> I understand why logical replication needs to connect to a database, > >> >> but I don't understand why any other walsender would need to connect > >> >> to a database. > >> > > >> > Well, logical replication actually streams out data using the walsender, > >> > so that's the reason why I want to add it there. But there have been > >> > cases in the past where we wanted to do stuff in the walsender that need > >> > database access, but we couldn't do so because you cannot connect to > >> > one. > > > >> Could you be more specific? > > > > I only remember 3959.1349384...@sss.pgh.pa.us but I think it has come up > > before. > > It seems we need some more design there. Perhaps entering replication > mode could be triggered by writing either dbname=replication or > replication=yes. But then, do the replication commands simply become > SQL commands? I've certainly seen hackers use them that way. And I > can imagine that being a sensible approach, but this patch seems like > it's only covering a fairly small fraction of what really ought to be > a single commit. Yes. I think you're right that we need more input/design here. I've previously started threads about it, but nobody replied :(. The problem with using dbname=replication as a trigger for anything is that we actually allow databases to be created with that name. Perhaps that was a design mistake. I wondered about turning replication from a boolean into something like off|0, on|1, database. dbname= gets only used in the latter variant. That would be compatible with previous versions and would even support using old tools (since all of them seem to do replication=1). > But then, do the replication commands simply become > SQL commands? I've certainly seen hackers use them that way. I don't think that it's a good way at this point to make them to plain SQL. There is more infrastructure (signal handlers, permissions, different timeouts) & memory required for walsenders, so using plain SQL there seems beyond the scope of this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup throttling
On 2013-09-03 12:56:53 -0400, Alvaro Herrera wrote: > Antonin Houska wrote: > > > + > > +Suffixes k (kilobytes) and m > > +(megabytes) are accepted. For example: 10m > > + > > "m" is for meters, or milli. Please use "M" here. Shouldn't it be MB? Consistent with GUCs? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Tue, Sep 3, 2013 at 07:47:22PM +0200, Magnus Hagander wrote: > > On Sep 3, 2013 6:23 PM, "Josh Berkus" wrote: > > > > Peter, > > > > > > > I wonder, is anyone really running in production with full_page_writes > > > off? > > > > We're doing so with clients on Joyent (ZFS). And also for "ephemeral" > > postgres instances -- ones where we've also turned off fsync. > > > > It's also quite reasonable to assume that future filesystems will also > > be able to make better torn page guarantees. > > > > So it should remain a setting. I see no reason why it should be a > > SIGHUP setting, though, if there's any code maintenance required to > > support it -- frankly, I only change this setting at first system > > deployment. > > > > This matches my experience as well - people certainly use it, but don't change > it during runtime. > > Not having looked at the code, but doesn't pg_start_backup use at least parts > of the same code path? That one will definitely still be able to modify it at > runtime Yes, this was already mentioned in the thread, but didn't get noticed by a few folks. In summary, no changes needed here. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)
On 09/03/2013 05:27 PM, Robert Haas wrote: > On Thu, Aug 29, 2013 at 8:22 PM, Ants Aasma wrote: >> I might give it a shot later this cycle as I have familiarized my self >> with the problem domain anyway. I understand the appeal of staying >> with what we have, but this would cap the speedup at 4x and has large >> caveats with the extra lookup tables. A 28x speedup might be worth the >> extra effort. > I agree. However, on the flip side, a bird in the hand is worth two > in the bush. A patch that just does the same thing faster is likely > to be less controversial than changing the algorithm, and does not > preclude changing the algorithm later. > Has anybody looked at the recent i5/i7 added hrdware support for CRC32 ? http://www.strchr.com/crc32_popcnt >From number there it is probably still slower than xxhash, but it might be worth doing as an interim optimisation. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup throttling
On 09/03/2013 06:56 PM, Alvaro Herrera wrote: >> +/* >> + * Only the following suffixes are allowed. It's not >> too useful to >> + * restrict the rate to gigabytes: such a rate will >> probably bring >> + * significant impact on the master anyway, so the >> throttling >> + * won't help much. >> + */ >> +case 'g': >> +factor <<= 10; > > I don't understand why you allow a 'g' here, given the comment above ... > but in any case it should be G. > This reflects my hesitation whether GB should be accepted as a unit or not. I'll probably remove this suffix. (Will fix the other findings too,) Thanks, Tony -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund wrote: >> To my way of thinking, it seems as though we ought to always begin >> replay at a checkpoint, so the standby ought always to see one of >> these records immediately. Obviously that's not good enough, but why >> not? > > We always see one after the checkpoint (well, actually before the > checkpoint record, but ...), correct. The problem is just that reading a > single xact_running record doesn't automatically make you consistent. If > there's a single suboverflowed transaction running on the primary when > the xl_runing_xacts is logged we won't be able to switch to > consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun > and some optimizations. > Since the only place where we currently have the information to > potentially become consistent is ProcArrayApplyRecoveryInfo() we will > have to wait checkpoint_timeout time till we get consistent. Which > sucks as there are good arguments to set that to 1h. > That especially sucks as you loose consistency everytime you restart the > standby... Right, OK. >> And why is every 15 seconds good enough? > > Waiting 15s to become consistent instead of checkpoint_timeout seems to > be ok to me and to be a good tradeoff between overhead and waiting. We > can certainly discuss other values or making it configurable. The latter > seemed to be unnecessary to me, but I have don't have a problem > implementing it. I just don't want to document it :P I don't think it particularly needs to be configurable, but I wonder if we can't be a bit smarter about when we do it. For example, suppose we logged it every 15 s but only until we log a non-overflowed snapshot. I realize that the overhead of a WAL record every 15 seconds is fairly small, but the load on some systems is all but nonexistent. It would be nice not to wake up the HD unnecessarily. >> The WAL writer is supposed to call XLogBackgroundFlush() every time >> WalWriterDelay expires. Yeah, it can hibernate, but if it's >> hibernating, then we should respect that decision for this WAL record >> type also. > > Why should we respect it? Because I don't see any reason to believe that this WAL record is any more important or urgent than any other WAL record that might get logged. >> >> I understand why logical replication needs to connect to a database, >> >> but I don't understand why any other walsender would need to connect >> >> to a database. >> > >> > Well, logical replication actually streams out data using the walsender, >> > so that's the reason why I want to add it there. But there have been >> > cases in the past where we wanted to do stuff in the walsender that need >> > database access, but we couldn't do so because you cannot connect to >> > one. > >> Could you be more specific? > > I only remember 3959.1349384...@sss.pgh.pa.us but I think it has come up > before. It seems we need some more design there. Perhaps entering replication mode could be triggered by writing either dbname=replication or replication=yes. But then, do the replication commands simply become SQL commands? I've certainly seen hackers use them that way. And I can imagine that being a sensible approach, but this patch seems like it's only covering a fairly small fraction of what really ought to be a single commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup throttling
Antonin Houska wrote: > + > +Suffixes k (kilobytes) and m > +(megabytes) are accepted. For example: 10m > + "m" is for meters, or milli. Please use "M" here. > +static uint32 > +parse_max_rate(char *src) > +{ > + int factor; > + char *after_num; > + int64 result; > + int errno_copy; > + > + result = strtol(src, &after_num, 0); > + errno_copy = errno; > + if (src == after_num) > + { > + fprintf(stderr, _("%s: transfer rate %s is not a valid integer > value\n"), progname, src); > + exit(1); > + } Please add quotes to the invalid value. > + > + /* > + * Evaluate (optional) suffix. > + * > + * after_num should now be right behind the numeric value. > + */ > + factor = 1; > + switch (tolower(*after_num)) > + { > + /* > + * Only the following suffixes are allowed. It's not > too useful to > + * restrict the rate to gigabytes: such a rate will > probably bring > + * significant impact on the master anyway, so the > throttling > + * won't help much. > + */ > + case 'g': > + factor <<= 10; I don't understand why you allow a 'g' here, given the comment above ... but in any case it should be G. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Mon, Sep 2, 2013 at 10:10 PM, Jeff Davis wrote: > There is a significant amount of code supporting the changing of > full_page_writes while the server is running, including an > XLOG_FPW_CHANGE wal record. But I don't see the use-case; surely almost > nobody changes this on a running server, because either your filesystem > guarantees atomic page writes for you or not. Although this is true, the administrator's estimate of whether that guarantee is or is not provided might not be as consistent as the hardware behavior itself. I am generally of the feeling that having to restart the server to change setting sucks, and while it surely sucks less for this setting than some, mostly because few people change this setting in the first place, I'm still not convinced that this is moving in the right direction. > I'd like to submit a patch to just make it into a PGC_POSTMASTER and > remove the code to support changing it. > > Then, I intend to write another patch to make the full-page writes for > checksums honor the full_page_writes setting. That will be easier to > write once it's a PGC_POSTMASTER. I don't think I know exactly what this means. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 doc fix] clarification of Solaris versions
Robert Haas escribió: > The patch file turned out to be sorta garbled. I'm not sure if a > broken version of diff was used to generate this or whether MauMau > hand-edited it after the fact, but the number of lines that were > indicated in the control lines didn't match the actual hunks, and > patch threw up. So it took me 20 minutes to do what should have taken > 5, but now it's done. If this ever happens to you in the future, I suggest giving recountdiff a look (in the patchutils package). It is supposed to fix offsets and line counts of hand-edited diffs. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further XLogInsert scaling tweaking
Bruce Momjian wrote: > On Mon, Sep 2, 2013 at 10:14:03AM +0300, Heikki Linnakangas wrote: > > diff --git a/src/backend/access/transam/xlog.c > > b/src/backend/access/transam/xlog.c > > index 39c58d0..28e62ea 100644 > > - XLogInsertSlotPadded *insertSlots; > > + /* > > +* Make sure the above heavily-contended spinlock and byte positions are > > +* on their own cache line. In particular, the RedoRecPtr and full page > > +* write variables below should be on a different cache line. They are > > +* read on every WAL insertion, but updated rarely, and we don't want > > +* those reads to steal the cache line containing Curr/PrevBytePos. > > +*/ > > + charpad[128]; > > Do we adjust for cache line lengths anywhere else? PGPROC? Should it > be a global define? We have LWLockPadded in lwlock.c; it only adjusts the size of the struct to be a power of 2. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE FUNCTION .. SET vs. pg_dump
Robert Haas writes: > On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner > wrote: >> attached is a rough patch that does exactly that, if we are worried >> about an api change we could simple do a ProcessGUCArrayNotice() in the >> backbranches if that approach is actually sane. > This patch has some definite coding-style issues, but those should be > easy to fix. The bigger thing I worry about is whether distributing > the decision as to what elevel ought to be used here all over the code > base is indeed sane. Perhaps that ship has already sailed, though. I don't particularly care for this approach, for that reason and others. One such issue is that it would result in duplicate messages, because functioncmds.c will already have issued warnings thanks to the GUCArrayAdd calls it makes. (Those calls are made with context PGC_S_TEST, which is why they only throw warnings not errors, cf validate_option_array_item.) But here's the big-picture consideration: the reason that we call ProcessGUCArray in this place in ProcedureCreate is that we are trying to set up the correct GUC environment for the function validator. As an example, if the SET clause for a SQL-language function includes a setting for search_path, we had better adopt that search path while checking the function body, or we will come to incorrect conclusions. If we use this patch, we'd be saying that we're okay with failing to apply the SET clause before calling the validator, any time the requested GUC value happens to be invalid. That's not terribly comfortable: in general we'd be assuming that validators' results don't depend on the GUC environment, which is clearly false. Now we can probably get away with that in the context of check_function_bodies = false, because the validator shouldn't be doing much of anything then (although a few of them check some things anyway...). But if we're going to assume that the validator doesn't depend on GUC environment when check_function_bodies = false, why apply the SET clause at all? So I think a more reasonable fix is just to skip the ProcessGUCArray call altogether when check_function_bodies = false, and to document that validators mustn't assume anything about the GUC environment when check_function_bodies = false. If no objections, I'll go fix it that way. BTW, I notice the various comments about PGC_S_TEST context are out of date, because they don't mention that it is used for CREATE FUNCTION SET cases ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE FUNCTION .. SET vs. pg_dump
* Tom Lane (t...@sss.pgh.pa.us) wrote: > So I think a more reasonable fix is just to skip the ProcessGUCArray > call altogether when check_function_bodies = false, and to document > that validators mustn't assume anything about the GUC environment > when check_function_bodies = false. > > If no objections, I'll go fix it that way. Works for me. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Tue, Sep 3, 2013 at 9:23 AM, Josh Berkus wrote: > So it should remain a setting. Sure. I wasn't suggesting deprecating it or anything. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?
>>> I've checked the PostgreSQL License, >>> and had a question. >>> http://opensource.org/licenses/PostgreSQL > > I don't know who opensource.org are, but their idea of "the PostgreSQL > license" doesn't appear to match the actual license text. The OSI, which is the organization which defines whether licenses are open source or not. We requested that they list The PostgreSQL License on their site after Red Hat determined that our license had sufficent textual differences from the BSD License to be a different license. I've requested that the spurious "file name" reference be removed. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Sep 3, 2013 6:23 PM, "Josh Berkus" wrote: > > Peter, > > > > I wonder, is anyone really running in production with full_page_writes > > off? > > We're doing so with clients on Joyent (ZFS). And also for "ephemeral" > postgres instances -- ones where we've also turned off fsync. > > It's also quite reasonable to assume that future filesystems will also > be able to make better torn page guarantees. > > So it should remain a setting. I see no reason why it should be a > SIGHUP setting, though, if there's any code maintenance required to > support it -- frankly, I only change this setting at first system > deployment. > This matches my experience as well - people certainly use it, but don't change it during runtime. Not having looked at the code, but doesn't pg_start_backup use at least parts of the same code path? That one will definitely still be able to modify it at runtime /Magnus
Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?
Robert Haas writes: > On Tue, Sep 3, 2013 at 12:16 PM, Tomonari Katsumata > wrote: >> PostgreSQL is released under "the PostgreSQL License". >> http://www.postgresql.org/about/licence/ The actual license terms can be found in the COPYRIGHT file in our distribution. There is nothing in there mentioning a LICENSE file. >> I've checked the PostgreSQL License, >> and had a question. >> http://opensource.org/licenses/PostgreSQL I don't know who opensource.org are, but their idea of "the PostgreSQL license" doesn't appear to match the actual license text. The real question here is why our website is linking to theirs. Surely we should not be setting things up in such a way that people might think that opensource.org defines the authoritative version of the text. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
On 2013-09-03 12:22:22 -0400, Robert Haas wrote: > On Tue, Sep 3, 2013 at 12:05 PM, Andres Freund wrote: > >> 1. I think more comments are needed here to explain why we need this. > >> I don't know if the comments should go into the functions modified by > >> this patch or in some other location, but I don't find what's here now > >> adequate for understanding. > > > > Hm. What information are you actually missing? I guess the > > XLogSetAsyncXactLSN() needs a bit more context based on your question, > > what else? > > Not sure if it makes sense to explain in detail why it helps us to get > > into a consistent state faster? > Well, we must have had some idea in mind when the original Hot Standby > patch went in that doing this once per checkpoint was good enough. > Now we think we need it every 15 seconds, but not more or less often. > So, why the change of heart? I think the primary reason for that was that it's was a pretty complicated patchset and we needed to start somewhere. By now we do have reports of standbys taking their time to get consistent. > To my way of thinking, it seems as though we ought to always begin > replay at a checkpoint, so the standby ought always to see one of > these records immediately. Obviously that's not good enough, but why > not? We always see one after the checkpoint (well, actually before the checkpoint record, but ...), correct. The problem is just that reading a single xact_running record doesn't automatically make you consistent. If there's a single suboverflowed transaction running on the primary when the xl_runing_xacts is logged we won't be able to switch to consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun and some optimizations. Since the only place where we currently have the information to potentially become consistent is ProcArrayApplyRecoveryInfo() we will have to wait checkpoint_timeout time till we get consistent. Which sucks as there are good arguments to set that to 1h. That especially sucks as you loose consistency everytime you restart the standby... > And why is every 15 seconds good enough? Waiting 15s to become consistent instead of checkpoint_timeout seems to be ok to me and to be a good tradeoff between overhead and waiting. We can certainly discuss other values or making it configurable. The latter seemed to be unnecessary to me, but I have don't have a problem implementing it. I just don't want to document it :P > >> 3. Why does LogCurrentRunningXacts() need to call > >> XLogSetAsyncXactLSN()? Hopefully any WAL record is going to get > >> sync'd in a reasonably timely fashion; I can't see off-hand why this > >> one should need special handling. > > > > No, we don't force writing out wal records in a timely fashion if > > there's no pressure in wal_buffers, basically only on commits and > > various XLogFlush()es. It doesn't make much of a difference if the > > entire system is busy, but if it's not the wal writer will sleep. The > > alternative would be to XLogFlush() the record, but that would actually > > block, which isn't really what we want/need. > The WAL writer is supposed to call XLogBackgroundFlush() every time > WalWriterDelay expires. Yeah, it can hibernate, but if it's > hibernating, then we should respect that decision for this WAL record > type also. Why should we respect it? There is work to be done and the wal writer has no way of knowing that without us telling it? Normally we rely on commit records and XLogFlush()es to trigger the wal writer. Alternatively we can start a transaction and set synchronous_commit = off, but that seems like a complication to me. > >> I understand why logical replication needs to connect to a database, > >> but I don't understand why any other walsender would need to connect > >> to a database. > > > > Well, logical replication actually streams out data using the walsender, > > so that's the reason why I want to add it there. But there have been > > cases in the past where we wanted to do stuff in the walsender that need > > database access, but we couldn't do so because you cannot connect to > > one. > Could you be more specific? I only remember 3959.1349384...@sss.pgh.pa.us but I think it has come up before. > >> Absent a clear use case for such a thing, I don't > >> think we should allow it. Ignorant suggestion: perhaps the database > >> name could be stored in the logical replication slot. > > > > The problem is that you need to InitPostgres() with a database. You > > cannot do that again, after connecting with an empty database which we > > do in a plain walsender. > > Are you saying that the logical replication slot can't be read before > calling InitPostgres()? The slot can be read just fine, but we won't know that we should do that. Walsender accepts commands via PostgresMain()'s mainloop which has done a InitPostgres(dbname) before. Which we need to do because we need the environment it sets up. The database is stored in the slots btw (as oid, not a
Re: [HACKERS] CREATE FUNCTION .. SET vs. pg_dump
On 09/03/2013 06:15 PM, Robert Haas wrote: > On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner > wrote: >>> It would seem that a simple solution would be to add an elevel argument >>> to ProcessGUCArray and then call it with NOTICE in the case that >>> check_function_bodies is true. None of the contrib modules call >>> ProcessGUCArray, but should we worry that some external module does? >> >> attached is a rough patch that does exactly that, if we are worried >> about an api change we could simple do a ProcessGUCArrayNotice() in the >> backbranches if that approach is actually sane. > > This patch has some definite coding-style issues, but those should be > easy to fix. The bigger thing I worry about is whether distributing > the decision as to what elevel ought to be used here all over the code > base is indeed sane. Perhaps that ship has already sailed, though. I can certainly fix the coding style thing up - but it was declared as "rough" mostly because I'm not entirely sure the way this is going is actually the right way to attack this... This whole stuff seems to be a bit messy and bolted on in some ways. There is ProcessGUCArray(), but also set_config_option() and its external "wrapper" SetConfigOption() - the division of labour between the caller deciding on what it wants vs what the function does with some combination of elevel and source internally is not very consistent at best. I also note that a lot of places are actually calling set_config_option() directly, so maybe there is an opportunity to unify here as well. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?
On 09/03/2013 11:16 AM, Tomonari Katsumata wrote: > It seems the contents of the "COPYRIGTH" file are all required by the > license. > So I have confused why PostgreSQL has a "COPYRIGHT" file instead of > "LICENSE". > Anybody knows the reason? > And is this non problem thing? The first paragraph on the OSI page isn't part of the licences, but rather is general advice by OSI. I'm not clear on why they specified a particular filename, since they don't do that for other licenses; I'll ask. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE FUNCTION .. SET vs. pg_dump
On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner wrote: >> It would seem that a simple solution would be to add an elevel argument >> to ProcessGUCArray and then call it with NOTICE in the case that >> check_function_bodies is true. None of the contrib modules call >> ProcessGUCArray, but should we worry that some external module does? > > attached is a rough patch that does exactly that, if we are worried > about an api change we could simple do a ProcessGUCArrayNotice() in the > backbranches if that approach is actually sane. This patch has some definite coding-style issues, but those should be easy to fix. The bigger thing I worry about is whether distributing the decision as to what elevel ought to be used here all over the code base is indeed sane. Perhaps that ship has already sailed, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?
On Tue, Sep 3, 2013 at 12:16 PM, Tomonari Katsumata wrote: > Hi, > > PostgreSQL is released under "the PostgreSQL License". > http://www.postgresql.org/about/licence/ > > I've checked the PostgreSQL License, > and had a question. > http://opensource.org/licenses/PostgreSQL > > It seems that the license requres a "LICENSE" file > in the target product > [quote] >> Then put the license into a file called "LICENSE" in >> your software distribution. > > But I couldn't find "LICENSE" file in current PostgreSQL source tree, > instead found a "COPYRIGHT" file. > > It seems the contents of the "COPYRIGTH" file are all required by the > license. > So I have confused why PostgreSQL has a "COPYRIGHT" file instead of > "LICENSE". > Anybody knows the reason? > And is this non problem thing? I don't think it matters very much what the file is called, as long as it's in a file that can be found easily by people who want to know what the license is. I imagine that either LICENSE or COPYRIGHT would suffice for that purpose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On Mon, Sep 2, 2013 at 3:16 PM, Jeff Davis wrote: > On Fri, 2013-08-30 at 20:34 +0200, Andres Freund wrote: >> I have a quick question: The reason I'd asked about the status of the >> patch was that I was thinking about the state of the "forensic freezing" >> patch. After a quick look at your proposal, we still need to freeze in >> some situations (old & new data on the same page basically), so I'd say >> it still makes sense to apply the forensic freezing patch, right? >> >> Differing Opinions? > > The Freeze Forensically patch is nice because (if I understand it > correctly) it allows us to freeze at the same time as we mark > PD_ALL_VISIBLE, which avoids the potential extra page write. The patch itself doesn't actually make that change, but it removes one major objection to such a change. > But that's > not such a big advantage if we don't ordinarily have to write again for > freezing, anyway. That was my thought as well. > However, there are still some cases where we'd be able to preserve the > forensic information. If nothing else, that might help debug this patch, > if necessary. There might also be cases where we can freeze more eagerly > to avoid the case where very old (but unfrozen) and very new tuples mix > on the same page. Perhaps Robert has some thoughts here, as well. I basically agree. I think if we adopt Heikki's patch forensic freezing becomes much less important, but we might find there's still a reason to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The PostgreSQL License requires "LICENSE" file?
Hi, PostgreSQL is released under "the PostgreSQL License". http://www.postgresql.org/about/licence/ I've checked the PostgreSQL License, and had a question. http://opensource.org/licenses/PostgreSQL It seems that the license requres a "LICENSE" file in the target product [quote] > Then put the license into a file called "LICENSE" in > your software distribution. But I couldn't find "LICENSE" file in current PostgreSQL source tree, instead found a "COPYRIGHT" file. It seems the contents of the "COPYRIGTH" file are all required by the license. So I have confused why PostgreSQL has a "COPYRIGHT" file instead of "LICENSE". Anybody knows the reason? And is this non problem thing? regards, - Tomonari Katsumata
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
Peter, > I wonder, is anyone really running in production with full_page_writes > off? We're doing so with clients on Joyent (ZFS). And also for "ephemeral" postgres instances -- ones where we've also turned off fsync. It's also quite reasonable to assume that future filesystems will also be able to make better torn page guarantees. So it should remain a setting. I see no reason why it should be a SIGHUP setting, though, if there's any code maintenance required to support it -- frankly, I only change this setting at first system deployment. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
On Tue, Sep 3, 2013 at 12:05 PM, Andres Freund wrote: >> 1. I think more comments are needed here to explain why we need this. >> I don't know if the comments should go into the functions modified by >> this patch or in some other location, but I don't find what's here now >> adequate for understanding. > > Hm. What information are you actually missing? I guess the > XLogSetAsyncXactLSN() needs a bit more context based on your question, > what else? > Not sure if it makes sense to explain in detail why it helps us to get > into a consistent state faster? Well, we must have had some idea in mind when the original Hot Standby patch went in that doing this once per checkpoint was good enough. Now we think we need it every 15 seconds, but not more or less often. So, why the change of heart? To my way of thinking, it seems as though we ought to always begin replay at a checkpoint, so the standby ought always to see one of these records immediately. Obviously that's not good enough, but why not? And why is every 15 seconds good enough? >> 3. Why does LogCurrentRunningXacts() need to call >> XLogSetAsyncXactLSN()? Hopefully any WAL record is going to get >> sync'd in a reasonably timely fashion; I can't see off-hand why this >> one should need special handling. > > No, we don't force writing out wal records in a timely fashion if > there's no pressure in wal_buffers, basically only on commits and > various XLogFlush()es. It doesn't make much of a difference if the > entire system is busy, but if it's not the wal writer will sleep. The > alternative would be to XLogFlush() the record, but that would actually > block, which isn't really what we want/need. The WAL writer is supposed to call XLogBackgroundFlush() every time WalWriterDelay expires. Yeah, it can hibernate, but if it's hibernating, then we should respect that decision for this WAL record type also. >> > 0003 wal_decoding: Allow walsender's to connect to a specific database >> > * biggest problem is how to specify the connection we connect >> > to. Currently with the patch walsender connects to a database if it's >> > not named "replication" (via dbname). Perhaps it's better to invent a >> > replication_dbname parameter? > >> I understand why logical replication needs to connect to a database, >> but I don't understand why any other walsender would need to connect >> to a database. > > Well, logical replication actually streams out data using the walsender, > so that's the reason why I want to add it there. But there have been > cases in the past where we wanted to do stuff in the walsender that need > database access, but we couldn't do so because you cannot connect to > one. Could you be more specific? >> Absent a clear use case for such a thing, I don't >> think we should allow it. Ignorant suggestion: perhaps the database >> name could be stored in the logical replication slot. > > The problem is that you need to InitPostgres() with a database. You > cannot do that again, after connecting with an empty database which we > do in a plain walsender. Are you saying that the logical replication slot can't be read before calling InitPostgres()? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
On 2013-09-03 11:40:57 -0400, Robert Haas wrote: > On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund > wrote: > > 0005 wal_decoding: Log xl_running_xact's at a higher frequency than > > checkpoints are done > > * benefits hot standby startup > > Review: > > 1. I think more comments are needed here to explain why we need this. > I don't know if the comments should go into the functions modified by > this patch or in some other location, but I don't find what's here now > adequate for understanding. Hm. What information are you actually missing? I guess the XLogSetAsyncXactLSN() needs a bit more context based on your question, what else? Not sure if it makes sense to explain in detail why it helps us to get into a consistent state faster? > 2. I think the variable naming could be better. If nothing else, I'd > spell out "snapshot" rather than abbreviating it to "snap". I'd also > add comments explaining what each of those variables does. Ok. > And why > isn't log_snap_interval_ms a #define rather than a variable? (Don't > even talk to me about using gdb on a running instance. If you're even > thinking about that, this needs to be a GUC.) Ugh. It certainly doesn't have anything to do with wanting to change it on a running system using gdb. Brrr. I think I wanted it to be a constant variable but forgot the const. I personally prefer 'static const' to #define's if its legal C, but I guess the project's style differs, so I'll change that. > 3. Why does LogCurrentRunningXacts() need to call > XLogSetAsyncXactLSN()? Hopefully any WAL record is going to get > sync'd in a reasonably timely fashion; I can't see off-hand why this > one should need special handling. No, we don't force writing out wal records in a timely fashion if there's no pressure in wal_buffers, basically only on commits and various XLogFlush()es. It doesn't make much of a difference if the entire system is busy, but if it's not the wal writer will sleep. The alternative would be to XLogFlush() the record, but that would actually block, which isn't really what we want/need. > > 0003 wal_decoding: Allow walsender's to connect to a specific database > > * biggest problem is how to specify the connection we connect > > to. Currently with the patch walsender connects to a database if it's > > not named "replication" (via dbname). Perhaps it's better to invent a > > replication_dbname parameter? > I understand why logical replication needs to connect to a database, > but I don't understand why any other walsender would need to connect > to a database. Well, logical replication actually streams out data using the walsender, so that's the reason why I want to add it there. But there have been cases in the past where we wanted to do stuff in the walsender that need database access, but we couldn't do so because you cannot connect to one. > Absent a clear use case for such a thing, I don't > think we should allow it. Ignorant suggestion: perhaps the database > name could be stored in the logical replication slot. The problem is that you need to InitPostgres() with a database. You cannot do that again, after connecting with an empty database which we do in a plain walsender. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add database to PGXACT / per database vacuuming
On Fri, Aug 30, 2013 at 2:29 PM, Andres Freund wrote: >> I don't know how big an impact adding the database oid would have, on the >> case that the PGPROC/PGXACT split was done in the first place. In the worst >> case it will make taking a snapshot 1/3 slower under contention. That needs >> to be tested. > > Yes, definitely. I am basically wondering whether somebody has/sees > fundamental probles with it making it pointless to investigate. I expect there will be a measurable performance degradation, though I'm willing to be proven wrong. I think the question is whether we get enough bang for the buck out of it to eat that. It seems quite likely that users with many databases will come out ahead, as such systems seem likely to be shared hosting environments where the machine is lightly loaded most of the time anyway, but where cross-database interactions cause headaches. But many users have One Big Database, and AFAICS this is just overhead for them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 doc fix] clarification of Solaris versions
On Fri, Aug 30, 2013 at 5:01 AM, Bjorn Munch wrote: > On 29/08 21.17, MauMau wrote: >> >> Thanks. I belive PostgreSQL runs successfully on Solaris 10 and later, >> because the binaries are published on the community site: >> >> http://ftp.postgresql.org/pub/binary/v9.3beta2/solaris/ > > Sorry, I didn't notice this thread earlier. Yes, I am building those > binaries and I have had no problems building PostgreSQL and running > the regression tests on Solaris 11. It would be quite unusual for > something to work on Solaris 10 but not 11, but of course it could > happen. There may have been a case or two which I now can't remember. OK, patch committed and back-patched to 9.3. The patch file turned out to be sorta garbled. I'm not sure if a broken version of diff was used to generate this or whether MauMau hand-edited it after the fact, but the number of lines that were indicated in the control lines didn't match the actual hunks, and patch threw up. So it took me 20 minutes to do what should have taken 5, but now it's done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)
On Thu, Aug 29, 2013 at 8:22 PM, Ants Aasma wrote: > I might give it a shot later this cycle as I have familiarized my self > with the problem domain anyway. I understand the appeal of staying > with what we have, but this would cap the speedup at 4x and has large > caveats with the extra lookup tables. A 28x speedup might be worth the > extra effort. I agree. However, on the flip side, a bird in the hand is worth two in the bush. A patch that just does the same thing faster is likely to be less controversial than changing the algorithm, and does not preclude changing the algorithm later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v5
On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund wrote: > 0005 wal_decoding: Log xl_running_xact's at a higher frequency than > checkpoints are done > * benefits hot standby startup Review: 1. I think more comments are needed here to explain why we need this. I don't know if the comments should go into the functions modified by this patch or in some other location, but I don't find what's here now adequate for understanding. 2. I think the variable naming could be better. If nothing else, I'd spell out "snapshot" rather than abbreviating it to "snap". I'd also add comments explaining what each of those variables does. And why isn't log_snap_interval_ms a #define rather than a variable? (Don't even talk to me about using gdb on a running instance. If you're even thinking about that, this needs to be a GUC.) 3. Why does LogCurrentRunningXacts() need to call XLogSetAsyncXactLSN()? Hopefully any WAL record is going to get sync'd in a reasonably timely fashion; I can't see off-hand why this one should need special handling. > 0003 wal_decoding: Allow walsender's to connect to a specific database > * biggest problem is how to specify the connection we connect > to. Currently with the patch walsender connects to a database if it's > not named "replication" (via dbname). Perhaps it's better to invent a > replication_dbname parameter? I understand why logical replication needs to connect to a database, but I don't understand why any other walsender would need to connect to a database. Absent a clear use case for such a thing, I don't think we should allow it. Ignorant suggestion: perhaps the database name could be stored in the logical replication slot. > 0006 wal_decoding: copydir: move fsync_fname to fd.[c.h] and make it public > * Pretty trivial and boring. Seems fine. > 0007 wal_decoding: Add information about a tables primary key to struct > RelationData > * Could be used in the matview refresh code I think you and Kevin should discuss whether this is actually the right way to do this. ISTM that if logical replication and materialized views end up selecting different approaches to this problem, everybody loses. > 0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new > maximum for CommandCounterIncrement I'm still unconvinced we want this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugfix and new feature for PGXS
On 09/03/2013 04:02 AM, Cédric Villemain wrote: Simple one, attached. I didn't document USE_VPATH, not sure how to explain that clearly. Just a remember that the doc is written and is waiting to be commited. There is also an issue spoted by Christoph with the installdirs prerequisite, the attached patch fix that. Also, the bugfixes were supposed to be backported. The behavior is not changed, nothing new, just fixing problem for some kind of extension builds. (However the USE_VPATH is new and is not needed in <9.3) Darn, I knew I had forgotten something. Mea maxima culpa. Well, the 9.3 tarballs have been cut, unfortunately, but I'll get to this as soon as I can. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
* Peter Geoghegan (p...@heroku.com) wrote: > I wonder, is anyone really running in production with full_page_writes > off? The simple answer to this is "yes, definitely." I don't know if they change it on-the-fly, but I doubt it. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Hstore: Query speedups with Gin index
Thanks for the feedback everyone. I've attached the patch that we are now running in production to service our hstore include queries. We rebuilt the index to account for the on-disk incompatibility. I've submitted the patch to commitfest here: https://commitfest.postgresql.org/action/patch_view?id=1203 Michael: I don't have a formal benchmark, but several of our worst queries went from 10-20 seconds per query down to 50-400 ms. These are numbers we've seen when testing real production queries against our production dataset with real world access patterns. Oleg: Thanks for your thoughts on this change. As for the spgist / gin work you're doing, is there anything you need help with or are you still in the research phase? I'd love to help get something more robust merged into mainline if you think there's collaborative work to be done (even if it's only user testing). Thanks, Blake On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund wrote: > On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote: > > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote: > > > Michael Paquier writes: > > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith > wrote: > > > >> The combined entry is used to support "contains (@>)" queries, and > the key > > > >> only item is used to support "key contains (?)" queries. This > change seems > > > >> to help especially with hstore keys that have high cardinalities. > Downsides > > > >> of this change is that it requires an index rebuild, and the index > will be > > > >> larger in size. > > > > > > > Index rebuild would be a problem only for minor releases, > > > > > > That's completely false; people have expected major releases to be > > > on-disk-compatible for several years now. While there probably will be > > > future releases in which we are willing to break storage compatibility, > > > a contrib module doesn't get to dictate that. > > > > > > What might be a practical solution, especially if this isn't always a > > > win (which seems likely given the index-bloat risk), is to make hstore > > > offer two different GIN index opclasses, one that works the traditional > > > way and one that works this way. > > > > > > Another thing that needs to be taken into account here is Oleg and > > > Teodor's in-progress work on extending hstore: > > > https://www.pgcon.org/2013/schedule/events/518.en.html > > > I'm not sure if this patch would conflict with that at all, but it > > > needs to be considered. > > > > We can disallow in-place upgrades for clusters that use certain contrib > > modules --- we have done that in the past. > > But that really cannot be acceptable for hstore. The probably most > widely used extension there is. > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > -- Blake Smith http://blakesmith.me @blakesmith hstore_gin_speedup.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] operator precedence issues
Andres Freund writes: > On 2013-09-03 08:59:53 -0500, Merlin Moncure wrote: >> While playing around with Andres's trick, I noticed that it works but >> will not match against operators taking "any" although those will >> match with explicit schema declaration (FWICT it goes through the >> search_path trying to explicitly match int/int operator then goes >> again matches "any"). That's pretty weird: > Not surprising. We look for the best match for an operator and > explicitly matching types will be that. If there were no operator(int, > int) your anyelement variant should get called. Yeah, this has exactly nothing to do with operator precedence. Precedence is about which operator binds tighter in cases like "A+B*C". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] operator precedence issues
On 2013-09-03 08:59:53 -0500, Merlin Moncure wrote: > While playing around with Andres's trick, I noticed that it works but > will not match against operators taking "any" although those will > match with explicit schema declaration (FWICT it goes through the > search_path trying to explicitly match int/int operator then goes > again matches "any"). That's pretty weird: Not surprising. We look for the best match for an operator and explicitly matching types will be that. If there were no operator(int, int) your anyelement variant should get called. > Ideally though you could specify operator precedence in the operator > name itself though in such a way that bison pick it up. I don't know > if that's possible since so many operator names have been given out > without any thought to reserving characters for precedence, or if it > would be worth the extra parsing time even if you could do it. > Overriding stock operator behaviors is a really dodgy practice with > the limited but important exception of handling certain classes of > mathematical errors. I have to say, even those it seems like it's primary advantage is making it harder to read the code, but YMMV. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] operator precedence issues
On Fri, Aug 30, 2013 at 5:48 PM, Andres Freund wrote: > Hi, > > On 2013-08-30 17:35:04 -0500, Merlin Moncure wrote: >> "When a schema-qualified operator name is used in the OPERATOR syntax, >> as for example in: >> SELECT 3 OPERATOR(pg_catalog.+) 4; >> the OPERATOR construct is taken to have the default precedence shown >> in Table 4-2 for "any other" operator. This is true no matter which >> specific operator appears inside OPERATOR()." >> >> That rule seems intentionally designed to make it impossible to to >> override mathematical behaviors. Mainly curious -- was that >> intentional? > > You can change your search_path to include your schema before an > explicitly listed pg_catalog afair. Not nice, but should work... hurk -- wish I had known that last week, but that's a nifty trick! It satisfies my particular problem (safe division) since in this case the problem is handled 'in function' and I can temporarily hack the search_path. Interestingly, if you do this the database doesn't match Ideally though you could specify operator precedence in the operator name itself though in such a way that bison pick it up. I don't know if that's possible since so many operator names have been given out without any thought to reserving characters for precedence, or if it would be worth the extra parsing time even if you could do it. Overriding stock operator behaviors is a really dodgy practice with the limited but important exception of handling certain classes of mathematical errors. While playing around with Andres's trick, I noticed that it works but will not match against operators taking "any" although those will match with explicit schema declaration (FWICT it goes through the search_path trying to explicitly match int/int operator then goes again matches "any"). That's pretty weird: postgres=# CREATE OR REPLACE FUNCTION SafeDiv( postgres(# anyelement, postgres(# anyelement) RETURNS anyelement AS postgres-# $$ postgres$# SELECT $1 OPERATOR(pg_catalog./) NULLIF($2, 0); postgres$# $$ LANGUAGE SQL; CREATE FUNCTION postgres=# set search_path to safediv, pg_catalog, public; SET postgres=# CREATE OPERATOR safediv./ postgres-# ( postgres(# PROCEDURE = SafeDiv, postgres(# LEFTARG = anyelement, postgres(# RIGHTARG = anyelement, postgres(# COMMUTATOR = / postgres(# ); CREATE OPERATOR postgres=# select 1/0; ERROR: division by zero postgres=# select 1 operator(safediv./) 0; ?column? -- (1 row) postgres=# CREATE OR REPLACE FUNCTION SafeDiv( postgres(# int4, postgres(# int4) RETURNS int4 AS postgres-# $$ postgres$# SELECT $1 OPERATOR(pg_catalog./) NULLIF($2, 0); postgres$# $$ LANGUAGE SQL; CREATE FUNCTION postgres=# postgres=# CREATE OPERATOR safediv./ postgres-# ( postgres(# PROCEDURE = SafeDiv, postgres(# LEFTARG = int4, postgres(# RIGHTARG = int4, postgres(# COMMUTATOR = / postgres(# ); CREATE OPERATOR postgres=# select 1/0; ?column? -- (1 row) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Heikki Linnakangas writes: > On 03.09.2013 05:28, Boguk, Maksym wrote: >> Target usage: ability to store UTF8 national characters in some >> selected fields inside a single-byte encoded database. > I think we should take a completely different approach to this. Two > alternatives spring to mind: > 1. Implement a new encoding. The new encoding would be some variant of > UTF-8 that encodes languages like Russian more efficiently. +1. I'm not sure that SCSU satisfies the requirement (which I read as that Russian text should be pretty much 1 byte/character). But surely we could devise a variant that does. For instance, it could look like koi8r (or any other single-byte encoding of your choice) with one byte value, say 255, reserved as a prefix. 255 means that a UTF8 character follows. The main complication here is that you don't want to allow more than one way to represent a character --- else you break text hashing, for instance. So you'd have to take care that you never emit the 255+UTF8 representation for a character that can be represented in the single-byte encoding. In particular, you'd never encode ASCII that way, and thus this would satisfy the all-multibyte-chars-must-have-all-high-bits-set rule. Ideally we could make a variant like this for each supported single-byte encoding, and thus you could optimize a database for "mostly but not entirely LATIN1 text", etc. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On 2013-09-02 21:59:42 -0700, Peter Geoghegan wrote: > > I can't really agree with this part. First off, a heap_insert() > > certainly isn't lightweight operation. There's several things that can > > take extended time: > > * toasting of rows (writing a gigabyte worth of data...) > > * extension of the heap/toast table > > * writeout of a dirty victim buffer if the page isn't in s_b > > * reading in the target page if the page isn't in s_b > > * waiting for an exclusive lock on the target pages > > These are all valid concerns. I'll need to better assess just how bad > this can get (hopefully with help from others). But some of this stuff > can surely be done separately for my case. For example, surely we > could make TOASTing occur after index insertion proper when locks are > released, without too much work. I don't think those are that easy to solve, but I'd like to be surprised. E.g. the toasting cannot really happen after the main table insert since the inserted rows needs to refer to the toast id. And you don't even know the final size of a row without doing the toasting, since how much it will get compressed depends on the amount of free space on the current target buffer. > > I'd expect that with the current implementation strategy if you > > e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever > > actually hitting duplicates, you'd see a very, very noticeable hit in > > scalability. I wouldn't expect a single session to get much slower bug > > trying a dozen or two should show a very noticeable dip. > > I do not accept the premise of your questions regarding the patch's > performance, which appears to be that it's fair to expect the same > level of performance of INSERT .,. ON DUPLICATE IGNORE as it is to > expect of regular INSERT. No scheme is going to get that, including > your scheme, which necessitates WAL logging everything related to > index tuple insertion twice for each unique index. We certainly didn't > require SSI to perform as well as the current REPEATABLE READ, and > certainly not as well as READ COMMITTED, and for good reasons. I think it's ok that INSERT .. IGNORE itself is noticeably slower than a normal INSERT. What I am concerned about is it negatively impacting overall scalability. > I'm not going to bother benchmarking the patch myself until I give it > another whack. There is some really obvious low-hanging performance > fruit. Obviously by this I mean doing as little as possible when > holding the exclusive lock. Sounds fair. > > I am pretty sure it would be better. We know that nbtree works pretty > > damn well under concurrency. And the "promise" logic wouldn't change > > anything of that. > It would have multiple extra steps. Including WAL-logging of the > initial insertion, and clean-up for the IGNORE case. And WAL-logging > of the cleanup. Instead of doing exactly nothing extra in the IGNORE > case, which, under my proposal, has an additional duration of the > exclusive lock being held of zero (instead of the ereport() call we > just immediately release the lock and report the duplicate). The > IGNORE case matters too, and there is going to be considerably more > exclusive locking for that case with your proposal, before we even > talk about other overhead. What I mean is that none of the individual proposed steps require changing the locking semantics/granularity. I.e. we will never hold locks on more pages at once than we do for a normal insert. Yes, one page will get locked twice, but for shorter than a normal insert (all you need to do is to rewrite the target of the item) and without having any other pages locked. > > Furthermore, I am pretty sure that locking an index page *and* related > > heap pages exclusively at the same time will be a deadlock > > nightmare. Even in the case of only having a single unique index. > It's seems like what you're saying here is "I don't know, this > extended index locking thing seems pretty sketchy to me". Which is a > perfectly reasonable thing to say in the absence of better analysis > from either of us - it is sketchy. Yes, that's basically what I am saying. I'd really like some input from others on this. Perhaps you could write up an email/thread just about the locking semantics after the next version of the patch (since that presumably will change the semantics)? > I have taken a certain amount of reassurance from the fact that I was > not able to observe any deadlocks when pushing the patch hard - > perhaps that's naive. I mean hammering a table with inserts using > pgbench for 3 hours, with 64 INSERT...DUPLICATE KEY IGNORE sessions. > PK values were over a range of values from 115 million. This was > with shared_buffers set to 512MB and assertions enabled. It stood up > without any deadlocks or other manifestations of bugs. Granted, I did > that before fixing the bug you've highlighted (but after you pointed > it out), and so I haven't tested the multi-index case in the same way. I can readil
Re: [HACKERS] Further XLogInsert scaling tweaking
On Mon, Sep 2, 2013 at 10:32 PM, Bruce Momjian wrote: > On Mon, Sep 2, 2013 at 10:14:03AM +0300, Heikki Linnakangas wrote: >> diff --git a/src/backend/access/transam/xlog.c >> b/src/backend/access/transam/xlog.c >> index 39c58d0..28e62ea 100644 >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -428,8 +428,14 @@ typedef struct XLogCtlInsert >> uint64 CurrBytePos; >> uint64 PrevBytePos; >> >> - /* insertion slots, see above for details */ >> - XLogInsertSlotPadded *insertSlots; >> + /* >> + * Make sure the above heavily-contended spinlock and byte positions >> are >> + * on their own cache line. In particular, the RedoRecPtr and full page >> + * write variables below should be on a different cache line. They are >> + * read on every WAL insertion, but updated rarely, and we don't want >> + * those reads to steal the cache line containing Curr/PrevBytePos. >> + */ >> + charpad[128]; > > Do we adjust for cache line lengths anywhere else? PGPROC? Should it > be a global define? +1 -- that is, I think it should be. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup throttling
Hi, On 2013-09-03 14:35:18 +0200, Antonin Houska wrote: > + /* > + * THROTTLING_SAMPLE_MIN / MAX_RATE_LOWER (in seconds) should > be the > + * longest possible time to sleep. > + */ > + pg_usleep((long) sleep); > + else > + > + /* > + * The actual transfer rate is below the limit. Negative value > would > + * distort the adjustment of throttled_last. > + */ > + sleep = 0; > + > + /* > + * Only the whole multiples of throttling_sample processed. The rest > will > + * be done during the next call of this function. > + */ > + throttling_counter %= throttling_sample; > + /* Once the (possible) sleep ends, new period starts. */ > + throttled_last += elapsed + sleep; > +} It's probably better to use latches for the waiting, those have properly defined interruption semantics. Whether pg_usleep will be interrupted is platform dependant... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup throttling
On 07/24/2013 09:20 AM, Antonin Houska wrote: > Hello, > the purpose of this patch is to limit impact of pg_backup on running > server. Attached is a new version. Server-side implementation this time. Antonin Houska (Tony) diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index eb0c1d6..f58eb60 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -189,6 +189,22 @@ PostgreSQL documentation + -r + --max-rate + + +The maximum amount of data transferred from server per second. +The purpose is to limit impact of pg_basebackup +on a running master server while transfering data directory. + + +Suffixes k (kilobytes) and m +(megabytes) are accepted. For example: 10m + + + + + -R --write-recovery-conf diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index ba8d173..08e4f26 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -33,6 +33,7 @@ #include "utils/builtins.h" #include "utils/elog.h" #include "utils/ps_status.h" +#include "utils/timestamp.h" #include "pgtar.h" typedef struct @@ -42,6 +43,7 @@ typedef struct bool fastcheckpoint; bool nowait; bool includewal; + uint32 maxrate; } basebackup_options; @@ -59,6 +61,7 @@ static void perform_base_backup(basebackup_options *opt, DIR *tblspcdir); static void parse_basebackup_options(List *options, basebackup_options *opt); static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli); static int compareWalFileNames(const void *a, const void *b); +static void throttle(size_t increment); /* Was the backup currently in-progress initiated in recovery mode? */ static bool backup_started_in_recovery = false; @@ -68,6 +71,41 @@ static bool backup_started_in_recovery = false; */ #define TAR_SEND_SIZE 32768 + +/* + * The maximum amount of data per second - bounds of the user input. + * + * If the maximum should be increased to more than 4 GB, uint64 must + * be introduced for the related variables. However such high values have + * little to do with throttling. + */ +#define MAX_RATE_LOWER 32768 +#define MAX_RATE_UPPER (1024 << 20) + +/* + * Transfer rate is only measured when this number of bytes has been sent. + * (Too frequent checks would impose too high CPU overhead.) + * + * The default value is used unless it'd result in too frequent checks. + */ +#define THROTTLING_SAMPLE_MIN 32768 + +/* The maximum number of checks per second. */ +#define THROTTLING_MAX_FREQUENCY 256 + +/* The actual value, transfer of which may cause sleep. */ +static uint32 throttling_sample; + +/* Amount of data already transfered but not yet throttled. */ +static int32 throttling_counter; + +/* The minimum time required to transfer throttling_sample bytes. */ +static int64 elapsed_min_unit; + +/* The last check of the transfer rate. */ +static int64 throttled_last; + + typedef struct { char *oid; @@ -171,6 +209,31 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir) /* Send tablespace header */ SendBackupHeader(tablespaces); + if (opt->maxrate > 0) + { + throttling_sample = opt->maxrate / THROTTLING_MAX_FREQUENCY; + + /* Don't measure too small pieces of data. */ + if (throttling_sample < THROTTLING_SAMPLE_MIN) +throttling_sample = THROTTLING_SAMPLE_MIN; + + /* + * opt->maxrate is bytes per seconds. Thus the expression in + * brackets is microseconds per byte. + */ + elapsed_min_unit = throttling_sample * +((double) USECS_PER_SEC / opt->maxrate); + + /* Enable throttling. */ + throttling_counter = 0; + + /* The 'real data' starts now (header was ignored). */ + throttled_last = GetCurrentIntegerTimestamp(); + } + else + /* Disable throttling. */ + throttling_counter = -1; + /* Send off our tablespaces one by one */ foreach(lc, tablespaces) { @@ -468,6 +531,7 @@ parse_basebackup_options(List *options, basebackup_options *opt) bool o_fast = false; bool o_nowait = false; bool o_wal = false; + bool o_maxrate = false; MemSet(opt, 0, sizeof(*opt)); foreach(lopt, options) @@ -519,6 +583,29 @@ parse_basebackup_options(List *options, basebackup_options *opt) opt->includewal = true; o_wal = true; } + else if (strcmp(defel->defname, "maxrate") == 0) + { + long maxrate; + + if (o_maxrate) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("duplicate option \"%s\"", defel->defname))); + maxrate = intVal(defel->arg); + + opt->maxrate = (uint32) maxrate; + if (opt->maxrate > 0 && + (opt->maxrate < MAX_RATE_LOWER || opt->maxrate > MAX_RATE_UPPER)) + { +ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("transfer rate %u bytes per second is out of range", + opt->maxrate), + errhint("The accepte
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On 2013-09-02 19:10:57 -0700, Jeff Davis wrote: > There is a significant amount of code supporting the changing of > full_page_writes while the server is running, including an > XLOG_FPW_CHANGE wal record. But I don't see the use-case; surely almost > nobody changes this on a running server, because either your filesystem > guarantees atomic page writes for you or not. > > I'd like to submit a patch to just make it into a PGC_POSTMASTER and > remove the code to support changing it. I've wondered about that before. But always decided that we wouldn't gain all that much by making it PGC_POSTMASTER since the effective value of full_page_writes still needs to be changeable at runtime since we need to force-enable full page writes during a pg_start/stop_backup(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Tue, Sep 3, 2013 at 2:11 AM, Peter Geoghegan wrote: > and it would be totally unacceptable if that meant that lots of people > blocked on the page lock that an upserter held while it tried to > acquire locks on tuples By which I mean: it seems shaky that I'd then be assuming that to lock all of a row's unique index values was to lock the row itself, and so that there was no need to worry about blocking on acquiring an actual row lock while holding all those exclusive/write buffer locks. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Tue, Sep 3, 2013 at 12:52 AM, Craig Ringer wrote: > That'd certainly be the ideal, but as you say is far from simple, and > IIRC prior discussions here have suggested the SSI / predicate locking > stuff won't help. I think that by far the strongest argument for what Andres suggested (that has, incidentally, been suggested by a number of others), which is to insert what he termed promise index tuples before a heap tuple, is that there is no sensible way to lock the tuples while holding an exclusive buffer lock on an index. It could take basically any amount of time to acquire those locks (see GetTupleForTrigger() to get a rough idea of how that ought to work), and it would be totally unacceptable if that meant that lots of people blocked on the page lock that an upserter held while it tried to acquire locks on tuples (we could only unlock the buffer lock/locked value when the rows were locked). So while I think what I've done here is probably workable for the ON DUPLICATE KEY IGNORE case, perhaps I'm being short-sighted in focusing on that. I'm surprised Andres didn't call me out on this himself, actually. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugfix and new feature for PGXS
> Simple one, attached. > I didn't document USE_VPATH, not sure how to explain that clearly. Just a remember that the doc is written and is waiting to be commited. There is also an issue spoted by Christoph with the installdirs prerequisite, the attached patch fix that. Also, the bugfixes were supposed to be backported. The behavior is not changed, nothing new, just fixing problem for some kind of extension builds. (However the USE_VPATH is new and is not needed in <9.3) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation>From fb9d5ed99397b40e7fc33224fa31cd5c54c7b5d2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Villemain?= Date: Tue, 3 Sep 2013 09:55:05 +0200 Subject: [PATCH] Add installdirs to PGXS order-only-prerequisites prerequisites are not ordered but installdirs is required by others. This patch is the simplest one, another solution being to remove completely installdirs prerequisite. Spotted by Christoph Berg with plr build. --- src/makefiles/pgxs.mk | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index 8618aa1..ac12f7d 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -124,7 +124,7 @@ all: all-lib endif # MODULE_big -install: all installdirs installcontrol installdata installdatatsearch installdocs installscripts +install: all installcontrol installdata installdatatsearch installdocs installscripts | installdirs ifdef MODULES $(INSTALL_SHLIB) $(addsuffix $(DLSUFFIX), $(MODULES)) '$(DESTDIR)$(pkglibdir)/' endif # MODULES -- 1.8.4.rc3 signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On 09/03/2013 06:18 AM, Peter Geoghegan wrote: > On Mon, Sep 2, 2013 at 6:25 AM, Craig Ringer wrote: >> It'll be yet another way for people to get upsert wrong, of course. >> They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects >> w/o locking the table against writes first. Documenting this pitfall >> should be enough, though. > > My preferred solution is to actually provide a variant to lock the > rows implicated in the would-be unique constraint violation. Obviously > that's a harder problem to solve. That'd certainly be the ideal, but as you say is far from simple, and IIRC prior discussions here have suggested the SSI / predicate locking stuff won't help. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
On 03.09.2013 05:28, Boguk, Maksym wrote: Target usage: ability to store UTF8 national characters in some selected fields inside a single-byte encoded database. For sample if I have a ru-RU.koi8r encoded database with mostly Russian text inside, it would be nice to be able store an Japanese text in one field without converting the whole database to UTF8 (convert such database to UTF8 easily could almost double the database size even if only one field in whole database will use any symbols outside of ru-RU.koi8r encoding). Ok. What has been done: 1)Addition of new string data types NATIONAL CHARACTER and NATIONAL CHARACTER VARIABLE. These types differ from the char/varchar data types in one important respect: NATIONAL string types are always have UTF8 encoding even (independent from used database encoding). I don't like the approach of adding a new data type for this. The encoding used for a text field should be an implementation detail, not something that's exposed to users at the schema-level. A separate data type makes an nvarchar field behave slightly differently from text, for example when it's passed to and from functions. It will also require drivers and client applications to know about it. What need to be done: 1)Full set of string functions and operators for NATIONAL types (we could not use generic text functions because they assume that the stings will have database encoding). Now only basic set implemented. 2)Need implement some way to define default collation for a NATIONAL types. 3)Need implement some way to input UTF8 characters into NATIONAL types via SQL (there are serious open problem... it will be defined later in the text). Yeah, all of these issues stem from the fact that the NATIONAL types are separate from text. I think we should take a completely different approach to this. Two alternatives spring to mind: 1. Implement a new encoding. The new encoding would be some variant of UTF-8 that encodes languages like Russian more efficiently. Then just use that in the whole database. Something like SCSU (http://www.unicode.org/reports/tr6/) should do the trick, although I'm not sure if SCSU can be used as a server-encoding. A lot of code relies on the fact that a server encoding must have the high bit set in all bytes that are part of a multi-byte character. That's why SJIS for example can only be used as a client-encoding. But surely you could come up with some subset or variant of SCSU which satisfies that requirement. 2. Compress the column. Simply do "ALTER TABLE foo ALTER COLUMN bar SET STORAGE MAIN". That will make Postgres compress that field. That might not be very efficient for compressing short cyrillic text encoded in UTF-8 today, but that could be improved. There has been discussion on supporting more compression algorithms in the past, and one such algorithm could be again something like SCSU. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME
于 2013-09-03 08:15, David Johnston 回复: Jeff Davis-8 wrote Is there any semantic difference between marking a constraint as DISABLED and simply dropping it? Or does it just make it easier to re-add it later? David Johnston wrote: I cannot answer the question but if there is none then the main concern I'd have is capturing "meta-information" about WHY such a constraint has been disabled instead of dropped. Drop/build and disable/enable constraint has no fundamental difference, and could achieve the same purpose.What I do also more convenient for the user. Recording the disabled constraints is easier than recoding all the constrains. What's more, a lot of people ever asked about turing off constraint and The sql2008 support this.So I think it's necessary in some ways. I guess this whole feature extends from the trigger disable feature that already exists. Given we have the one adding this seems symmetrical... I cannot really see using either feature on a production system (if following best practices) but I can imagine where they could both be helpful during development. Note with this usage pattern the meta-information about "why" becomes considerably less important. David J. Wang Shuo HighGo Software Co.,Ltd. September 3, 2013 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers