Re: [HACKERS] printing table in asciidoc with psql
2014-10-29 12:23 GMT+01:00 Szymon Guz mabew...@gmail.com: On 17 October 2014 09:01, Pavel Stehule pavel.steh...@gmail.com wrote: Hi Szymon I found a small bug - it doesn't escape | well postgres=# select * from mytab ; a | numeric_b | c --+---+ Ahoj |10 | 2014-10-17 Hello|20 | 2014-10-18 Hi |30 | 2014-10-19 aaa| | | 2014-10-17 (4 rows) result [options=header,cols=literal,literal,literal,frame=all,grid=all] | ^| +++a+++ ^| +++numeric_b+++ ^| +++c+++ | Ahoj | 10 | 2014-10-17 | Hello | 20 | 2014-10-18 | Hi | 30 | 2014-10-19 | aaa| | | 2014-10-17 | Next, I tested it with asciidoc and asciidoctor and I have a problem with asciidoctor - it doesn't respect aligning .. so numbers are aligned to left instead to right. When you use a option header then a formatting +++ is useless. Hi Pavel, thanks for the remarks. I've attached another version of the pach. It works a little better now, including escaping | and asciidoctor alignment support. it is fixed. Thank you. I fixed formatting - please, recheck it. I don't see any issue - it should be ready for commiter Regards Pavel thanks, Szymon commit f8ffd152aaa78ac1c96f808761680b8e593528a2 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Thu Oct 30 09:02:59 2014 +0100 fix formatting diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index e7fcc73..cd64b88 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2092,8 +2092,8 @@ lo_import 152801 literalaligned/literal, literalwrapped/literal, literalhtml/literal, literallatex/literal (uses literaltabular/literal), - literallatex-longtable/literal, or - literaltroff-ms/literal. + literallatex-longtable/literal, + literaltroff-ms/literal, or literalasciidoc/literal. Unique abbreviations are allowed. (That would mean one letter is enough.) /para @@ -2120,7 +2120,8 @@ lo_import 152801 para The literalhtml/, literallatex/, - literallatex-longtable/literal, and literaltroff-ms/ + literallatex-longtable/literal, literaltroff-ms/, + and literalasciidoc/ formats put out tables that are intended to be included in documents using the respective mark-up language. They are not complete documents! This might not be diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 26089352..e00e47b 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -2248,6 +2248,9 @@ _align2string(enum printFormat in) case PRINT_TROFF_MS: return troff-ms; break; + case PRINT_ASCIIDOC: + return asciidoc; + break; } return unknown; } @@ -2321,9 +2324,11 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.format = PRINT_LATEX_LONGTABLE; else if (pg_strncasecmp(troff-ms, value, vallen) == 0) popt-topt.format = PRINT_TROFF_MS; + else if (pg_strncasecmp(asciidoc, value, vallen) == 0) + popt-topt.format = PRINT_ASCIIDOC; else { - psql_error(\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms\n); + psql_error(\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms, asciidoc\n); return false; } diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index ae5fe88..b14b313 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -351,7 +351,7 @@ helpVariables(unsigned short int pager) fprintf(output, _( expanded (or x)toggle expanded output\n)); fprintf(output, _( fieldsep field separator for unaligned output (default '|')\n)); fprintf(output, _( fieldsep_zero set field separator in unaligned mode to zero\n)); - fprintf(output, _( format set output format [unaligned, aligned, wrapped, html, latex, ..]\n)); + fprintf(output, _( format set output format [unaligned, aligned, wrapped, html, latex, asciidoc ..]\n)); fprintf(output, _( footer enable or disable display of the table footer [on, off]\n)); fprintf(output, _( linestyle set the border line drawing style [ascii, old-ascii, unicode]\n)); fprintf(output, _( null set the string to be printed in place of a null value\n)); diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 3b3c3b7..ae98ff9 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -2475,6 +2475,216 @@ print_troff_ms_vertical(const printTableContent *cont, FILE *fout) } } +/*/ +/* ASCIIDOC */ +/*/ + +static void +asciidoc_escaped_print(const char *in, FILE *fout) +{ + const char *p; + for (p = in; *p; p++) + { + switch(*p) + { + case '|': +fputs(\\|, fout); +break; + default: +
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Thanks for your input, Jim! On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby jim.na...@bluetreble.com wrote: Patch applies against current HEAD and builds, but I'm getting 37 failed tests (mostly parallel, but also misc and WITH; results attached). Is that expected? This is caused by the recent commit 7b1c2a0 (that I actually participated in reviewing :p) because of a missing inclusion of ruleutils.h in index.c. The mark the concurrent index bit is rather confusing; it sounds like it's referring to the new index instead of the old. Now that I've read the code I understand what's going on here between the concurrent index *entry* and the filenode swap, but I don't think the docs make this sufficiently clear to users. How about something like this: The following steps occur in a concurrent index build, each in a separate transaction. Note that if there are multiple indexes to be rebuilt then each step loops through all the indexes we're rebuilding, using a separate transaction for each one. 1. [blah] Definitely a good idea! I took your text and made it more precise, listing the actions done for each step, the pg_index flags switched, using orderedlist to make the list of steps described in a separate paragraph more exhaustive for the user. At the same time I reworked the docs removing a part that was somewhat duplicated about dealing with the constraints having invalid index entries and how to drop them. + * index_concurrent_create + * + * Create an index based on the given one that will be used for concurrent + * operations. The index is inserted into catalogs and needs to be built later + * on. This is called during concurrent index processing. The heap relation + * on which is based the index needs to be closed by the caller. Last bit presumably should be on which the index is based. What about Create a concurrent index based on the definition of the one provided by caller? + /* Build the list of column names, necessary for index_create */ Instead of all this work wouldn't it be easier to create a version of index_create/ConstructTupleDescriptor that will use the IndexInfo for the old index? ISTM index_concurrent_create() is doing a heck of a lot of work to marshal data into one form just to have it get marshaled yet again. Worst case, if we do have to play this game, there should be a stand-alone function to get the columns/expressions for an existing index; you're duplicating a lot of code from pg_get_indexdef_worker(). Yes, this definitely sucks and the approach creating a function to get all the column names is not productive as well. Then let's define an additional argument in index_create to pass a potential TupleDesc instead of this whole wart. I noticed as well that we need to actually reset attcacheoff to be able to use a fresh version of the tuple descriptor of the old index. I added a small API for this purpose in tupdesc.h called ResetTupleDescCache. Would it make sense instead to extend CreateTupleDescCopyConstr or CreateTupleDescCopy with a boolean flag? index_concurrent_swap(): Perhaps it'd be better to create index_concurrent_swap_setup() and index_concurrent_swap_cleanup() and refactor the duplicated code out... the actual function would then become: This sentence is not finished :) IMO, index_concurrent_swap looks good as is, taking as arguments the index and its concurrent entry, and swapping their relfilenode after taking AccessExclusiveLock that will be hold until the end of this transaction. ReindexRelationConcurrently() + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be + * either an index or a table. If a table is specified, each step of REINDEX + * CONCURRENTLY is done in parallel with all the table's indexes as well as + * its dependent toast indexes. This comment is a bit misleading; we're not actually doing anything in parallel, right? AFAICT index_concurrent_build is going to block while each index is built the first time. Yes, parallel may be misleading. What is meant here is that each step of the process is done one by one on all the valid indexes a table may have. +* relkind is an index, this index itself will be rebuilt. The locks taken +* parent relations and involved indexes are kept until this transaction +* is committed to protect against schema changes that might occur until +* the session lock is taken on each relation. This comment is a bit unclear to me... at minimum I think it should be * on parent relations instead of * parent relations, but I think it needs to elaborate on why/when we're also taking session level locks. Hum, done as follows: @@ -896,9 +896,11 @@ ReindexRelationConcurrently(Oid relationOid) * If the relkind of given relation Oid is a table, all its valid indexes * will be rebuilt, including its associated toast table indexes. If * relkind is an index, this index itself will be rebuilt. The locks
Re: [HACKERS] WITH CHECK and Column-Level Privileges
On 29 October 2014 13:04, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Oct 29, 2014 at 8:16 AM, Stephen Frost sfr...@snowman.net wrote: suggestions. If the user does not have table-level SELECT rights, they'll see for the Failing row contains case, they'll get: Failing row contains (col1, col2, col3) = (1, 2, 3). Or, if they have no access to any columns: Failing row contains () = () I haven't looked at the code, but that sounds nice, except that if they have no access to any columns, I'd leave the message out altogether instead of emitting it with no useful content. Alright, I can change things around to make that happen without too much trouble. Yes, skim-reading the patch, it looks like a good approach to me, and also +1 for not emitting anything if no values are visible. I fear, however, that for updatable views, in the most common case, the user won't have any permissions on the underlying table, and so the error detail will always be omitted. I wonder if one way we can improve upon that is to include the RTE's modifiedCols set in the set of columns the user can see, since for those columns what we would be reporting are the new user-supplied values, and so there is no information leakage. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] printing table in asciidoc with psql
On 30 October 2014 09:04, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-10-29 12:23 GMT+01:00 Szymon Guz mabew...@gmail.com: On 17 October 2014 09:01, Pavel Stehule pavel.steh...@gmail.com wrote: Hi Szymon I found a small bug - it doesn't escape | well postgres=# select * from mytab ; a | numeric_b | c --+---+ Ahoj |10 | 2014-10-17 Hello|20 | 2014-10-18 Hi |30 | 2014-10-19 aaa| | | 2014-10-17 (4 rows) result [options=header,cols=literal,literal,literal,frame=all,grid=all] | ^| +++a+++ ^| +++numeric_b+++ ^| +++c+++ | Ahoj | 10 | 2014-10-17 | Hello | 20 | 2014-10-18 | Hi | 30 | 2014-10-19 | aaa| | | 2014-10-17 | Next, I tested it with asciidoc and asciidoctor and I have a problem with asciidoctor - it doesn't respect aligning .. so numbers are aligned to left instead to right. When you use a option header then a formatting +++ is useless. Hi Pavel, thanks for the remarks. I've attached another version of the pach. It works a little better now, including escaping | and asciidoctor alignment support. it is fixed. Thank you. I fixed formatting - please, recheck it. I don't see any issue - it should be ready for commiter Regards Pavel Hi Pavel, thanks for the review and reformatting. It looks much better after the reformatting. thanks, Szymon
Re: [HACKERS] Index scan optimization
On Mon, Oct 27, 2014 at 10:38 PM, Rajeev rastogi rajeev.rast...@huawei.com wrote: On 26 October 2014 10:42, Haribabu Kommi wrote: I have a question regarding setting of key flags matched. Only the first key was set as matched even if we have multiple index conditions. Is there any reason behind that? Actually the keysCount can be more than one only if the key strategy is BTEqualStrategyNumber (i.e. = condition) But our optimization is only for the '' or '=' condition only. So adding code to set flag for all keys is of no use. Let me know if I am missing anything here? Thanks. I understood the point. If any volatile function is present in the index condition, the index scan itself is not choosen, Is there any need of handling the same similar to NULLS? Do you mean to say that whether we should add any special handling for volatile function? If yes then as you said since index scan itself is not selected for such functions, then I feel We don’t need to add anything for that. I also have the same opinion. I marked the patch as ready for committer. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] group locking: incomplete patch, just for discussion
On 30 October 2014 04:24, Amit Kapila amit.kapil...@gmail.com wrote: Locking the toast table of any main tables we access seems easily done. Though perhaps we should make weak locking of the toast table presumed. Do we have cases where the toast table can be accessed when the main table is not also strong locked first? I think it is possible to have a strong lock on toast table before main table. Cluster pg_toast.toast_table_name using toast_index; Vacuum Full pg_toast.toast_table_name; Reindex Index pg_toast.toast_index_name; .. Now if take the lock on toast table in main task, it will block some of the operations before actually they need to be blocked. Very strange. Those commands are not common operations? I doubt many people even know that exists. All of those commands would block SELECTs on the main table, so there is no significant benefit in that behaviour. In fact it would be more sensible to lock the toast table earlier. As for locking the enums table or collation table, that's simple stuff also. They're just AccessShareLocks. Yeah, taking AccessShareLocks is not a problem, however the main problem is that they block any other operation on those tables which require AccessExclusiveLock lock or any other strong lock required, before it is actually required to block any such operation. What operations are those? How frequently do they occur? I can't see any need for major maintenance operations on small catalog tables. I can't imagine we'll be able to presume that all functions of every kind are parallel-safe. Yeah, thats right, so we need to either block such functions to get executed in parallel worker or make arrangements so that they can be safely executed, I think for first version blocking might be better. I think that also. That way we won't need complex locking. Now, I think we can find ways to avoid all such things by hacking code such that conflicting operations can be banned/blocked, however eventually we need to have some kind of group locking to avoid deadlocks which can arise due to operations in parallel worker. So it seems to me it is not a bad idea to have a strong infrastructure first for the things (like group locking) which we think are required for any non-trivial useful parallel operation without putting too much restrictions on users for using the feature. We are all agreed we need to block some functions from executing in parallel mode. Why don't we start by making a list of the functions that might cause problems in parallel workers, and why. That way we might find out easier ways of doing things. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] initdb -S and tablespaces
At 2014-09-29 11:54:10 +0200, and...@2ndquadrant.com wrote: On 2014-09-29 14:09:01 +0530, Abhijit Menon-Sen wrote: I just noticed that initdb -S (Safely write all database files to disk and exit) does (only) the following in perform_fsync: pre_sync_fname(pdir, true); walkdir(pg_data, pre_sync_fname); fsync_fname(pdir, true); walkdir(pg_data, fsync_fname); walkdir() reads the directory and calls itself recursively for S_ISDIR entries, or calls the function for S_ISREG entries… which means it doesn't follow links. Which means it doesn't fsync the contents of tablespaces. Which means at least pg_upgrade is unsafe right now... c.f. 630cd14426dc1daf85163ad417f3a224eb4ac7b0. Here's a proposed patch to initdb to make initdb -S fsync everything under pg_tblspc. It introduces a new function that calls walkdir on every entry under pg_tblspc. This is only one approach: I could have also changed walkdir to follow links, but that would have required a bunch of #ifdefs for Windows (because it doesn't have symlinks), and I guessed a separate function with two calls might be easier to patch into back branches. I've tested this patch under various conditions on Linux, but it could use some testing on Windows. -- Abhijit diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index dc1f1df..4fdc9eb 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -218,6 +218,7 @@ static char **filter_lines_with_token(char **lines, const char *token); static char **readfile(const char *path); static void writefile(char *path, char **lines); static void walkdir(char *path, void (*action) (char *fname, bool isdir)); +static void walktblspc_links(char *path, void (*action) (char *fname, bool isdir)); static void pre_sync_fname(char *fname, bool isdir); static void fsync_fname(char *fname, bool isdir); static FILE *popen_check(const char *command, const char *mode); @@ -588,6 +589,53 @@ walkdir(char *path, void (*action) (char *fname, bool isdir)) } /* + * walktblspc_links: call walkdir on each entry under the given + * pg_tblspc directory, or do nothing if pg_tblspc doesn't exist. + */ +static void +walktblspc_links(char *path, void (*action) (char *fname, bool isdir)) +{ + DIR *dir; + struct dirent *direntry; + char subpath[MAXPGPATH]; + + dir = opendir(path); + if (dir == NULL) + { + if (errno == ENOENT) + return; + fprintf(stderr, _(%s: could not open directory \%s\: %s\n), +progname, path, strerror(errno)); + exit_nicely(); + } + + while (errno = 0, (direntry = readdir(dir)) != NULL) + { + if (strcmp(direntry-d_name, .) == 0 || + strcmp(direntry-d_name, ..) == 0) + continue; + + snprintf(subpath, MAXPGPATH, %s/%s, path, direntry-d_name); + + walkdir(subpath, action); + } + + if (errno) + { + fprintf(stderr, _(%s: could not read directory \%s\: %s\n), +progname, path, strerror(errno)); + exit_nicely(); + } + + if (closedir(dir)) + { + fprintf(stderr, _(%s: could not close directory \%s\: %s\n), +progname, path, strerror(errno)); + exit_nicely(); + } +} + +/* * Hint to the OS that it should get ready to fsync() this file. */ static void @@ -2377,6 +2425,7 @@ static void perform_fsync(void) { char pdir[MAXPGPATH]; + char pg_tblspc[MAXPGPATH]; fputs(_(syncing data to disk ... ), stdout); fflush(stdout); @@ -2398,6 +2447,11 @@ perform_fsync(void) /* then recursively through the directory */ walkdir(pg_data, pre_sync_fname); + /* now do the same thing for everything under pg_tblspc */ + + snprintf(pg_tblspc, MAXPGPATH, %s/pg_tblspc, pg_data); + walktblspc_links(pg_tblspc, pre_sync_fname); + /* * Now, do the fsync()s in the same order. */ @@ -2408,6 +2462,8 @@ perform_fsync(void) /* then recursively through the directory */ walkdir(pg_data, fsync_fname); + walktblspc_links(pg_tblspc, fsync_fname); + check_ok(); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: multivariate statistics / proof of concept
On Thu, Oct 30, 2014 at 12:48 AM, Tomas Vondra t...@fuzzy.cz wrote: Dne 29 Říjen 2014, 12:31, Petr Jelinek napsal(a): I've not really gotten around to looking at the patch yet, but I'm also wondering if it would be simple include allowing functional statistics too. The pg_mv_statistic name seems to indicate multi columns, but how about stats on date(datetime_column), or perhaps any non-volatile function. This would help to solve the problem highlighted here http://www.postgresql.org/message-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9uf34...@mail.gmail.com . Without giving it too much thought, perhaps any expression that can be indexed should be allowed to have stats? Would that be really difficult to implement in comparison to what you've already done with the patch so far? I would not over-complicate requirements for the first version of this, I think it's already complicated enough. My thoughts, exactly. I'm not willing to put more features into the initial version of the patch. Actually, I'm thinking about ripping out some experimental features (particularly hashed MCV and associative rules). That's fair, but I didn't really mean to imply that you should go work on that too and that it should be part of this patch.. I was thinking more along the lines of that I don't really agree with the table name for the new stats and that at some later date someone will want to add expression stats and we'd probably better come up design that would be friendly towards that. At this time I can only think that the name of the table might not suit well to expression stats, I'd hate to see someone have to invent a 3rd table to support these when we could likely come up with something that could be extended later and still make sense both today and in the future. I was just looking at how expression indexes are stored in pg_index and I see that if it's an expression index that the expression is stored in the indexprs column which is of type pg_node_tree, so quite possibly at some point in the future the new stats table could just have an extra column added, and for today, we'd just need to come up with a future proof name... Perhaps pg_statistic_ext or pg_statisticx, and name functions and source files something along those lines instead? Regards David Rowley
Re: [HACKERS] WIP: multivariate statistics / proof of concept
On Thu, Oct 30, 2014 at 12:21 AM, Tomas Vondra t...@fuzzy.cz wrote: Dne 29 Říjen 2014, 10:41, David Rowley napsal(a): I'm quite interested in reviewing your work on this, but it appears that some of your changes are not C89: src\backend\commands\analyze.c(3774): error C2057: expected constant expression [D:\Postgres\a\postgres.vcxproj] src\backend\commands\analyze.c(3774): error C2466: cannot allocate an array of constant size 0 [D:\Postgres\a\postgres.vcxproj] src\backend\commands\analyze.c(3774): error C2133: 'indexes' : unknown size [D:\Postgres\a\postgres.vcxproj] src\backend\commands\analyze.c(4302): error C2057: expected constant expression [D:\Postgres\a\postgres.vcxproj] src\backend\commands\analyze.c(4302): error C2466: cannot allocate an array of constant size 0 [D:\Postgres\a\postgres.vcxproj] src\backend\commands\analyze.c(4302): error C2133: 'ndistincts' : unknown size [D:\Postgres\a\postgres.vcxproj] src\backend\commands\analyze.c(4775): error C2057: expected constant expression [D:\Postgres\a\postgres.vcxproj] src\backend\commands\analyze.c(4775): error C2466: cannot allocate an array of constant size 0 [D:\Postgres\a\postgres.vcxproj] src\backend\commands\analyze.c(4775): error C2133: 'keys' : unknown size [D:\Postgres\a\postgres.vcxproj] I'll look into that. The thing is I don't have access to MSVC, so it's a bit difficult to spot / fix those issues :-( It should be a pretty simple fix, just use the files and line numbers from the above. It's just a problem that in those 3 places you're declaring an array of a variable size, which is not allowed in C89. The thing to do instead would just be to palloc() the size you need and the pfree() it when you're done. Regards David Rowley
Re: [HACKERS] printing table in asciidoc with psql
2014-10-30 9:30 GMT+01:00 Szymon Guz mabew...@gmail.com: On 30 October 2014 09:04, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-10-29 12:23 GMT+01:00 Szymon Guz mabew...@gmail.com: On 17 October 2014 09:01, Pavel Stehule pavel.steh...@gmail.com wrote: Hi Szymon I found a small bug - it doesn't escape | well postgres=# select * from mytab ; a | numeric_b | c --+---+ Ahoj |10 | 2014-10-17 Hello|20 | 2014-10-18 Hi |30 | 2014-10-19 aaa| | | 2014-10-17 (4 rows) result [options=header,cols=literal,literal,literal,frame=all,grid=all] | ^| +++a+++ ^| +++numeric_b+++ ^| +++c+++ | Ahoj | 10 | 2014-10-17 | Hello | 20 | 2014-10-18 | Hi | 30 | 2014-10-19 | aaa| | | 2014-10-17 | Next, I tested it with asciidoc and asciidoctor and I have a problem with asciidoctor - it doesn't respect aligning .. so numbers are aligned to left instead to right. When you use a option header then a formatting +++ is useless. Hi Pavel, thanks for the remarks. I've attached another version of the pach. It works a little better now, including escaping | and asciidoctor alignment support. it is fixed. Thank you. I fixed formatting - please, recheck it. I don't see any issue - it should be ready for commiter Regards Pavel Hi Pavel, thanks for the review and reformatting. It looks much better after the reformatting. ok so 1. There are no any objections against proposed and implemented feature. This patch contains just implementation of asciidoc format and nothing else. It has zero impact on current code. 2. There are no problems with patching and compilation. All current regress tests passed. 3. Patch contains doc and small set of regress tests. 4. I tested output against asciidoc and asciidoctor, I didn't find any problems. This patch is ready for commiter Thank you for patch Regards Pavel thanks, Szymon
Re: [HACKERS] alter user/role CURRENT_USER
Hello, At Tue, 28 Oct 2014 09:05:20 -0400, Stephen Frost sfr...@snowman.net wrote in 20141028130520.gl28...@tamriel.snowman.net As well, the originally proposed RoleId_or_curruser suffers from the same issue. I'm going to go out on a limb here, but is it not possible to consider current_user, etc. reserved in the same sense that we do with PUBLIC and NONE and disallow users/roles from being created as them? Maybe we could in some future release, but we can't for back-branches so I'm afraid we're going to have to figure out how to fix this to work regardless. Zero-length identifiers are rejected in scanner so RoleId cannot be a valid pointer to a zero-length string. (NULL is used as PUBLIC in auth_ident) | postgres=# create role ; | ERROR: zero-length delimited identifier at or near | postgres=# create role U\00; | ERROR: invalid Unicode escape value at or near \00 As a dirty and quick hack we can use some integer values prfixed by single zero byte to represent special roles such as CURRENT_USER. | RoleId_or_curruser: RoleId{ $$ = $1; } | | CURRENT_USER { $$ = \x00\x01;}; | Oid ResolveRoleId(const char *name, bool missing_ok) | { | Oid roleid; | | if (name[0] == 0 name[1] == 1) | roleid = GetUserId(); This is ugly but needs no additional struct member or special logics. (Macros could make them look better.) Taking a step back, I'm still not sure I understand the need for this feature or the use case. It seems to have started as a potential fix to an inconsistency between ALTER USER and ALTER ROLE syntax (which I think I could see some value in). However, I think it has been taken beyond just resolving the inconsistency and started to cross over into feature creep. Is the intent simply to resolve inconsistencies between what is now an alias of another command? Or is it to add new functionality? I think the original proposal needs to be revisited and more time needs to be spent defining the scope and purpose of this patch. I agree that we should probably seperate the concerns here. Personally, I like the idea of being able to say CURRENT_USER in utility commands to refer to the current user where a role would normally be expected, as I could see it simplifying things for some applications, but that's a new feature and independent of making role-vs-user cases more consistent. I agree that role-vs-user compatibility is another problem. In a sense, the CURRENT_USER problem is also a separate problem but simplly spreading current current_user mechanism (which cannot allow using the words literally even if double-quoted) out to other commands is a kind of pandemic so it should be fixed before making current_user usable in other commands. From a view of comptibility (specification stability), fixing this problem could break someone's application if he/she uses current_user in the meaning of CURRENT_USER intentionally but I think it is least likely. As another problem, in the defenition of grantee, there is the following comment. | /* This hack lets us avoid reserving PUBLIC as a keyword*/ | if (strcmp($1, public) == 0) Please let me know the reason to avoid registering new keyword making the word unusable as an literal identifier, if any? PUBLIC and NONE ought to can be identifiers but in reality unusable because they are handled as keywords in literal form. Thses are fixed by making them real keywords. So if there's no particular reason, I will register new keywords PUBLIC and NONE as another patch. # However, I don't think that considerable number of people want # to do that.. On the other hand, pg_* as shcmea names and operators are cases simply of special names, which cannot be identifiers from the first so it should be as it is, I think. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: multivariate statistics / proof of concept
Dne 30 Říjen 2014, 10:17, David Rowley napsal(a): On Thu, Oct 30, 2014 at 12:48 AM, Tomas Vondra t...@fuzzy.cz wrote: Dne 29 Říjen 2014, 12:31, Petr Jelinek napsal(a): I've not really gotten around to looking at the patch yet, but I'm also wondering if it would be simple include allowing functional statistics too. The pg_mv_statistic name seems to indicate multi columns, but how about stats on date(datetime_column), or perhaps any non-volatile function. This would help to solve the problem highlighted here http://www.postgresql.org/message-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9uf34...@mail.gmail.com . Without giving it too much thought, perhaps any expression that can be indexed should be allowed to have stats? Would that be really difficult to implement in comparison to what you've already done with the patch so far? I would not over-complicate requirements for the first version of this, I think it's already complicated enough. My thoughts, exactly. I'm not willing to put more features into the initial version of the patch. Actually, I'm thinking about ripping out some experimental features (particularly hashed MCV and associative rules). That's fair, but I didn't really mean to imply that you should go work on that too and that it should be part of this patch.. I was thinking more along the lines of that I don't really agree with the table name for the new stats and that at some later date someone will want to add expression stats and we'd probably better come up design that would be friendly towards that. At this time I can only think that the name of the table might not suit well to expression stats, I'd hate to see someone have to invent a 3rd table to support these when we could likely come up with something that could be extended later and still make sense both today and in the future. I was just looking at how expression indexes are stored in pg_index and I see that if it's an expression index that the expression is stored in the indexprs column which is of type pg_node_tree, so quite possibly at some point in the future the new stats table could just have an extra column added, and for today, we'd just need to come up with a future proof name... Perhaps pg_statistic_ext or pg_statisticx, and name functions and source files something along those lines instead? Ah, OK. I don't think the catalog name pg_mv_statistic is somehow inappropriate for this purpose, though. IMHO the multivariate does not mean only columns or no expressions, it simply describes that the approximated density function has multiple input variables, be it attributes or expressions. But maybe there's a better name. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
(2014/10/09 11:49), Etsuro Fujita wrote: (2014/10/08 22:51), Fujii Masao wrote: On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Please see the attached patch. In this patch, I introduced the GUC parameter, pending_list_cleanup_size. I chose 4MB as the default value of the parameter. But do you have any better idea about that default value? It seems reasonable to me that the GUC has the same default value as work_mem. So, +1 for the default value of 4MB. BTW, I moved the CommitFest entry of this patch to next CF 2014-10. OK, I'll review the patch in the CF. Thank you for updating the patch! Here are my review comments. * The patch applies cleanly and make and make check run successfully. I think that the patch is mostly good. * In src/backend/utils/misc/guc.c: + { + {pending_list_cleanup_size, PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop(Sets the maximum size of the pending list for GIN index.), +NULL, + GUC_UNIT_KB + }, + pending_list_cleanup_size, + 4096, 0, MAX_KILOBYTES, + NULL, NULL, NULL + }, ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? Also why not set min to 64, not to 0, in accoradance with that of work_mem? * In src/backend/utils/misc/postgresql.conf.sample: Likewise, why not put this variable in the section of RESOURCE USAGE, not in that of CLIENT CONNECTION DEFAULTS. * In src/backend/access/common/reloptions.c: + { + { + pending_list_cleanup_size, + Maximum size of the pending list for this GIN index, in kilobytes., + RELOPT_KIND_GIN + }, + -1, 0, MAX_KILOBYTES + }, As in guc.c, why not set min to 64, not to 0? * In src/include/access/gin.h: /* GUC parameter */ extern PGDLLIMPORT int GinFuzzySearchLimit; + extern int pending_list_cleanup_size; The comment should be GUC parameters. * In src/backend/access/gin/ginfast.c: + /* GUC parameter */ + int pending_list_cleanup_size = 0; Do we need to substitute 0 for pending_list_cleanup_size? * In doc/src/sgml/config.sgml: + varlistentry id=guc-pending-list-cleanup-size xreflabel=pending_list_cleanup_size + termvarnamepending_list_cleanup_size/varname (typeinteger/type) As in postgresql.conf.sample, ISTM it'd be better to explain this variable in the section of Resource Consumption (maybe in Memory), not in that of Client Connection Defaults. * In doc/src/sgml/gin.sgml: ! literalpending_list_cleanup_size/. To avoid fluctuations in observed ISTM it'd be better to use varname for pending_list_cleanup_size, not literal, here. * In doc/src/sgml/ref/create_index.sgml: + termliteralPENDING_LIST_CLEANUP_SIZE//term IMHO, it seems to me better for this variable to be in lowercase in accordance with the GUC version. Also, I think that the words GIN indexes accept a different parameter: in the section of Index Storage Parameters in this reference page would be GIN indexes accept different parameters:. Sorry for the delay in reviewing the patch. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lockless StrategyGetBuffer() clock sweep
On 2014-10-30 10:23:56 +0530, Amit Kapila wrote: I have a feeling that this might also have some regression at higher loads (like scale_factor = 5000, shared_buffers = 8GB, client_count = 128, 256) for the similar reasons as bgreclaimer patch, means although both reduces contention around spin lock, however it moves contention somewhere else. I have yet to take data before concluding anything (I am just waiting for your other patch (wait free LW_SHARED) to be committed). I have a hard time to see how this could be. In the uncontended case the number of cachelines touched and the number of atomic operations is exactly the same. In the contended case the new implementation does far fewer atomic ops - and doesn't do spinning. What's your theory? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign data wrapper option manipulation during Create foreign table time?
Le mercredi 29 octobre 2014 14:16:23 Demai Ni a écrit : Robert and Ronan, many thanks for your response. I realized there is no clean way/api for it. maybe a hacking of ptree can do the trick.. :-) I will also take a look at IMPORT FOREIGN SCHEMA. However, for this requirement, I still need the user to input filename or filefolder, and I'd like to process the file(s) during create foreign table time, and save the processed result somewhere like ftoptions column in pg_foreign_table. may be some other way I can save the process result and make it assessable during query time? You can pass options to IMPORT FOREIGN SCHEMA. So, you could maybe implement it so that the user can do that: IMPORT FOREIGN SCHEMA public FROM SERVER file_server INTO local_schema OPTIONS (filename '/path/to/the/file'); Your FDW would then issue several CREATE FOREIGN TABLE statements, with all the necessary options. Or even, to allow importing multiple files at once: IMPORT FOREIGN SCHEMA public FROM SERVER file_server INTO local_schema OPTIONS (directory '/path/to/the/file_dir/'); Demai On Wed, Oct 29, 2014 at 10:01 AM, Ronan Dunklau ronan.dunk...@dalibo.com wrote: Le mercredi 29 octobre 2014 12:49:12 Robert Haas a écrit : On Tue, Oct 28, 2014 at 5:26 PM, Demai Ni nid...@gmail.com wrote: I am looking for a couple pointers here about fdw, and how to change the option values during CREATE table time. I am using postgres-xc-1.2.1 right now. For example, it contains file_fdw, whose create-table-stmt looks like: CREATE FOREIGN TABLE t1() SERVER file_server OPTIONS(format 'text',filename 'testing.txt'); I would like to replace the 'testing.txt' with absolute path like '/user/testor1/testing.txt', and make sure the new value is saved in pg_foreign_table; the file_fdw_validator is used to validate the options, but is there a way to replace the optionValue here? And get the new value stored in pg_foreign_table? Thanks BTW, in my real use case, I am trying to interpret a hdfs file and would need to save some hostname/port information in the option value, which not necessary specified by user. I don't think there's going to be a clean way to do this. The intention of the system is that the user-provided options are simply stored, not that the FDW author is going to use the options list to store their own bits and pieces. I would do that during the IMPORT FOREIGN SCHEMA statement. That way, the user doesn't have to specify those options: they would be generated at IMPORT time, and the user could change them later if really needed. -- Ronan Dunklau http://dalibo.com - http://dalibo.org -- Ronan Dunklau http://dalibo.com - http://dalibo.org signature.asc Description: This is a digitally signed message part.
[HACKERS] Converting an expression of one data type to another
Dear I'm doing a job about converting an expression of one data type to another.In SQLServer, there'are two functions to do this job. 1. CAST ( expression AS data_type [ ( length ) ] )2. CONVERT ( data_type [ ( length ) ] , expression ) However, In PostgreSQL, there's only the CAST ( expression AS data_type [ ( length ) ] ) function. I have tried the following two ways to implenting the CONVERT ( data_type [ ( length ) ] , expression ) function, but both are failed. 1. CREATE FUNCTION . The function's arguments can only be expressions but not data_type . 2. Modifying the gram.y .The CONVERT ( data_type [ ( length ) ] , expression ) is in grammer conflict with the PostgreSQL self's convert(data,src_encoding_name,dest_encoding_name) function. And the PostgreSQL self's convert(data,src_encoding_name,dest_encoding_name) function cannot be used. I wonder whether there's a better way to solve this problem. Any help will be appreciated. Best RegardsRohtodeveloper
Re: [HACKERS] Lockless StrategyGetBuffer() clock sweep
On Wed, Oct 29, 2014 at 3:09 PM, Andres Freund and...@2ndquadrant.com wrote: But if it is, then how about adding a flag that is 4 bytes wide or less alongside bgwriterLatch, and just checking the flag instead of checking bgwriterLatch itself? Yea, that'd be nicer. I didn't do it because it made the patch slightly more invasive... The complexity really is only needed because we're not guaranteed that 64bit reads are atomic. Although we actually can be sure, because there's no platform with nonatomic intptr_t reads - so 64bit platforms actually *do* have atomic 64bit reads/writes. So if we just have a integer 'setBgwriterLatch' somewhere we're good. We don't even need to take a lock to set it. Afaics the worst that can happen is that several processes set the latch... OK, that design is fine with me. Actually, the same approach would work for INT_ACCESS_ONCE. So I propose this approach instead: Add a new atomic flag to StrategyControl, useSlowPath. If StrategyGetBuffer() finds useSlowPath set, call StrategyGetBufferSlow(), which will acquire the spinlock, check whether bgwriterLatch is set and/or the freelist is non-empty and return a buffer ID if able to allocate from the freelist; otherwise -1. If StrategyGetBufferSlow() returns -1, or we decide not to call it in the first place because useSlowPath isn't set, then fall through to your clock-sweep logic. We can set useSlowPath at stratup and whenever we put buffers on the free list. I don't like the idea to to conflate the slowpath for the bgwriter latch with the freelist slowpath. And I don't really see what that'd buy us over what's in the patch right now? Well, I was thinking that with a single flag check we could skip two paths that, as of today, are both uncommonly taken. Moving all that logic off into a separate function might help the compiler optimize for the fast case, too. I wonder if we could make a trylock for spinlocks work - then we could look at the freelist if the lock is free and just go to the clock cycle otherwise. My guess is that that'd be quite performant. IIRC it's just the spinlock semaphore fallback that doesn't know how to do trylock... I could go for that. I don't think the semaphore thing is really a problem. If the spinlock semaphore implementation indeed can't support that, then just have it fall back to waiting for the lock and always returning true. Semaphores are so slow that it's not going to make any difference to performance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-21 12:40:56 +0530, Amit Kapila wrote: While doing performance tests, I noticed a hang at higher client counts with patch. I have tried to check call stack for few of processes and it is as below: #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () #9 0x100a8e84 in .btgettuple () .. #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () ... The test configuration is as below: Test env - Power - 7 (hydra) scale_factor - 3000 shared_buffers - 8GB test mode - pgbench read only test execution - ./pgbench -c 128 -j 128 -T 1800 -S -M prepared postgres I have ran it for half an hour, but it doesn't came out even after ~2 hours. It doesn't get reproduced every time, currently I am able to reproduce it and the m/c is in same state, if you want any info, let me know (unfortunately binaries are in release mode, so might not get enough information). Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/10/09 11:49), Etsuro Fujita wrote: (2014/10/08 22:51), Fujii Masao wrote: On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Fujii Masao wrote: On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting. Wouldn't it be easy-to-use to have only one parameter, PENDING_LIST_CLEANUP_SIZE? How about setting PENDING_LIST_CLEANUP_SIZE to work_mem as the default value when running the CREATE INDEX command? So what about introduing pending_list_cleanup_size also as GUC? That is, users can set the threshold by using either the reloption or GUC. Yes, I think having both a GUC and a reloption makes sense -- the GUC applies to all indexes, and can be tweaked for individual indexes using the reloption. OK, I'd vote for your idea of having both the GUC and the reloption. So, I think the patch needs to be updated. Fujii-san, what plan do you have about the patch? Please see the attached patch. In this patch, I introduced the GUC parameter, pending_list_cleanup_size. I chose 4MB as the default value of the parameter. But do you have any better idea about that default value? It seems reasonable to me that the GUC has the same default value as work_mem. So, +1 for the default value of 4MB. BTW, I moved the CommitFest entry of this patch to next CF 2014-10. OK, I'll review the patch in the CF. Thank you for updating the patch! Here are my review comments. * The patch applies cleanly and make and make check run successfully. I think that the patch is mostly good. Thanks! Attached is the updated version of the patch. * In src/backend/utils/misc/guc.c: + { + {pending_list_cleanup_size, PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop(Sets the maximum size of the pending list for GIN index.), +NULL, + GUC_UNIT_KB + }, + pending_list_cleanup_size, + 4096, 0, MAX_KILOBYTES, + NULL, NULL, NULL + }, ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? Yes if the pending list always exists in the memory. But not, IIUC. Thought? Also why not set min to 64, not to 0, in accoradance with that of work_mem? I'm OK to use 64. But I just chose 0 because I could not think of any reasonable reason why 64k is suitable as the minimum size of the pending list. IOW, I have no idea about whether it's reasonable to use the min value of work_mem as the min size of the pending list. * In src/backend/utils/misc/postgresql.conf.sample: Likewise, why not put this variable in the section of RESOURCE USAGE, not in that of CLIENT CONNECTION DEFAULTS. Same as above. * In src/backend/access/common/reloptions.c: + { + { + pending_list_cleanup_size, + Maximum size of the pending list for this GIN index, in kilobytes., + RELOPT_KIND_GIN + }, + -1, 0, MAX_KILOBYTES + }, As in guc.c, why not set min to 64, not to 0? Same as above. * In src/include/access/gin.h: /* GUC parameter */ extern PGDLLIMPORT int GinFuzzySearchLimit; + extern int pending_list_cleanup_size; The comment should be GUC parameters. Yes, fixed. * In src/backend/access/gin/ginfast.c: + /* GUC parameter */ + int pending_list_cleanup_size = 0; Do we need to substitute 0 for pending_list_cleanup_size? Same as above. * In doc/src/sgml/config.sgml: + varlistentry id=guc-pending-list-cleanup-size xreflabel=pending_list_cleanup_size + termvarnamepending_list_cleanup_size/varname (typeinteger/type) As in postgresql.conf.sample, ISTM it'd be better to explain this variable in the section of Resource Consumption (maybe in Memory), not in that of Client Connection Defaults. Same as above. * In doc/src/sgml/gin.sgml: ! literalpending_list_cleanup_size/. To avoid fluctuations in observed ISTM it'd be better to use varname for pending_list_cleanup_size, not literal, here. Yes, fixed. * In doc/src/sgml/ref/create_index.sgml: + termliteralPENDING_LIST_CLEANUP_SIZE//term IMHO, it seems to me better for this variable to be in lowercase in accordance with the GUC version. Using lowercase only for pending_list_cleanup_size and uppercase for other options looks strange to me. What about using lowercase for all the storage options? I changed the document in that way. Also, I think that the words GIN indexes accept a different parameter: in the section of Index Storage Parameters in this reference page would be GIN indexes accept different parameters:. Yes,
Re: [HACKERS] Review of Refactoring code for sync node detection
Thanks for your review! (No worries for creating a new thread, I don't mind as this is not a huge patch. I think you could have downloaded the mbox from postgresql.org btw). On Thu, Oct 30, 2014 at 8:24 AM, Jim Nasby j...@nasby.net wrote: SyncRepGetSynchronousNode(): IMHO, the top comment should mention that if there are multiple standbys of the same priority it returns the first one. OK. Corrected. This switches from using a single if() with multiple conditions 'd together to a bunch of if() continue's. I don't know if those will perform the same, and AFAIK this is pretty performance critical. Well, we could still use the old notation with a single if(). That's not much complicated to change. grammarZealotIn the comments, Process should be Proceed./grammarZealot Noted. Thanks for correcting my Frenglish. pg_stat_get_wal_senders(): The original version only loops through walsnds[] one time; the new code loops twice, both times while holding SyncRepLock(shared). This will block transaction commit if syncrep is enabled. That's not great, but presumably this function isn't going to be called terribly often, so maybe it's OK. (On a side note, perhaps we should add a warning to the documentation for pg_stat_replication warning not to hammer it...) Hm. For code readability I did not really want to do so, but let's do this: let's add a new argument in SyncRepGetSynchronousNode to retrieve the priority of each WAL sender and do a single pass through the array such as there is no performance difference. Updated patch attached. Regards, -- Michael From 1a02c982de0aeb2bd7dcf0bbf34605017619a417 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Thu, 30 Oct 2014 22:01:10 +0900 Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array This patch is made to remove code duplication in walsender.c and syncrep.c in order to detect what is the node with the lowest strictly-positive priority, facilitating maintenance of this code. --- src/backend/replication/syncrep.c | 99 +++-- src/backend/replication/walsender.c | 36 +++--- src/include/replication/syncrep.h | 1 + 3 files changed, 80 insertions(+), 56 deletions(-) diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..70c799b 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -5,7 +5,7 @@ * Synchronous replication is new as of PostgreSQL 9.1. * * If requested, transaction commits wait until their commit LSN is - * acknowledged by the sync standby. + * acknowledged by the synchronous standby. * * This module contains the code for waiting and release of backends. * All code in this module executes on the primary. The core streaming @@ -357,6 +357,70 @@ SyncRepInitConfig(void) } } + +/* + * Obtain position of synchronous standby in the array referencing all + * the WAL senders, or -1 if no synchronous node can be found. The caller + * of this function should take a lock on SyncRepLock. If there are + * multiple nodes with the same lowest priority value, the first node + * found is selected. + * sync_priority is a preallocated array of size max_wal_senders that + * can be used to retrieve the priority of each WAL sender. Its + * inclusion in this function has the advantage to limit the scan + * of the WAL sender array to one pass, limiting the amount of cycles + * SyncRepLock is taken. + */ +int +SyncRepGetSynchronousNode(int *node_priority) +{ + int sync_node = -1; + int priority = 0; + int i; + + /* Scan WAL senders and find synchronous node if any */ + for (i = 0; i max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = WalSndCtl-walsnds[i]; + + /* First get the priority of this WAL sender */ + if (node_priority) + node_priority[i] = XLogRecPtrIsInvalid(walsnd-flush) ? +0 : walsnd-sync_standby_priority; + + /* + * Proceed immediately to next WAL sender if one of those + * conditions is satisfied: + * - Not active with PID equal to 0. + * - Not streaming. + * - Priority conditions not satisfied. If priority is not 0 it + * means that one potential synchronous node has already been + * found. The node selected needs as well to have the lowest + * priority in the list of WAL senders. + * - Stream flush position is invalid. + */ + if (walsnd-pid == 0 || + walsnd-state != WALSNDSTATE_STREAMING || + walsnd-sync_standby_priority == 0 || + (priority != 0 + priority = walsnd-sync_standby_priority) || + XLogRecPtrIsInvalid(walsnd-flush)) + continue; + + /* + * We have a potential synchronous candidate, choose it as the + * synchronous node for the time being before going through the + * other nodes listed in the WAL sender array. + */ + sync_node = i; + + /* Update priority to current value of WAL sender */ + priority = walsnd-sync_standby_priority; + }
Re: [HACKERS] Converting an expression of one data type to another
I think you're looking for the pgsql-general mailing list. This list is for PostgreSQL extensions and core database engine software development. On 10/30/2014 07:44 PM, rohtodeveloper wrote: Dear I'm doing a job about converting an expression of one data type to another. In SQLServer, there'are two functions to do this job. 1. CAST ( expression AS data_type [ ( length ) ] ) 2. CONVERT ( data_type [ ( length ) ] , expression ) However, In PostgreSQL, there's only the CAST ( expression AS data_type [ ( length ) ] ) function. I have tried the following two ways to implenting the CONVERT ( data_type [ ( length ) ] , expression ) function, but both are failed. 1. CREATE FUNCTION . The function's arguments can only be expressions but not data_type . 2. Modifying the gram.y . The CONVERT ( data_type [ ( length ) ] , expression ) is in grammer conflict with the PostgreSQL self's convert(data,src_encoding_name,dest_encoding_name) function. And the PostgreSQL self's convert(data,src_encoding_name,dest_encoding_name) function cannot be used. I wonder whether there's a better way to solve this problem. Any help will be appreciated. Best Regards Rohtodeveloper -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-21 12:40:56 +0530, Amit Kapila wrote: While doing performance tests, I noticed a hang at higher client counts with patch. I have tried to check call stack for few of processes and it is as below: #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () #9 0x100a8e84 in .btgettuple () .. #0 0x008010933e54 in .semop () from /lib64/libc.so.6 #1 0x10286e48 in .PGSemaphoreLock () #2 0x102f68bc in .LWLockAcquire () #3 0x102d1ca0 in .ReadBuffer_common () #4 0x102d2ae0 in .ReadBufferExtended () #5 0x100a57d8 in ._bt_getbuf () #6 0x100a6210 in ._bt_getroot () #7 0x100aa910 in ._bt_search () #8 0x100ab494 in ._bt_first () ... The test configuration is as below: Test env - Power - 7 (hydra) scale_factor - 3000 shared_buffers - 8GB test mode - pgbench read only test execution - ./pgbench -c 128 -j 128 -T 1800 -S -M prepared postgres I have ran it for half an hour, but it doesn't came out even after ~2 hours. It doesn't get reproduced every time, currently I am able to reproduce it and the m/c is in same state, if you want any info, let me know (unfortunately binaries are in release mode, so might not get enough information). Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... commit 494affbd900d1c90de17414a575af1a085c3e37a Author: Noah Misch n...@leadboat.com Date: Sun Oct 12 23:33:37 2014 -0400 And, I think you are saying that heikki's commit e0d97d has fixed this issue, in that case I will check once by including that fix? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Escaping from blocked send() reprised.
On 10/03/2014 06:29 PM, Heikki Linnakangas wrote: On 10/03/2014 05:26 PM, Andres Freund wrote: On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote: On 09/28/2014 01:54 AM, Andres Freund wrote: 0003 Sinval/notify processing got simplified further. There really isn't any need for DisableNotifyInterrupt/DisableCatchupInterrupt anymore. Also begin_client_read/client_read_ended don't make much sense anymore. Instead introduce ProcessClientReadInterrupt (which wants a better name). There's also a very WIP 0004 Allows secure_read/write be interrupted when ProcDiePending is set. All of that happens via the latch mechanism, nothing happens inside signal handlers. So I do think it's quite an improvement over what's been discussed in this thread. But it (and the other approaches) do noticeably increase the likelihood of clients not getting the error message if the client isn't actually dead. The likelihood of write() being blocked *temporarily* due to normal bandwidth constraints is quite high when you consider COPY FROM and similar. Right now we'll wait till we can put the error message into the socket afaics. 1-3 need some serious comment work, but I think the approach is basically sound. I'm much, much less sure about allowing send() to be interrupted. Yeah, 1-3 seem sane. I think 3 also needs a careful look. Have you looked through it? While imo much less complex than before, there's some complex interactions in the touched code. And we have terrible coverage of both catchup interrupts and notify stuff... I only looked at the .patch, I didn't apply it, so I didn't look at the context much. But I don't see any fundamental problem with it. I would like to have a closer look before it's committed, though. About patch 3: Looking closer, this design still looks OK to me. As you said yourself, the comments need some work (e.g. the step 5. in the top comment in async.c needs updating). And then there are a couple of FIXME and XXX comments that need to be addressed. The comment on PGPROC.procLatch in storage/proc.h says just this: Latch procLatch; /* generic latch for process */ This needs a lot more explaining. It's now used by signal handlers to interrupt a read or write to the socket; that should be mentioned. What else is it used for? (for waking up a backend in synchronous replication, at least) What are the rules on when to arm it and when to reset it? Would it be more clear to use a separate, backend-private, latch, for the signals? I guess that won't work, though, because sometimes we need need to wait for a wakeup from a different process or from a signal at the same time (SyncRepWaitForLSN() in particular). Not without adding a variant of WaitLatch that can wait on two latches simultaneously, anyway. The assumption in secure_raw_read that MyProc exists is pretty surprising. I understand why it's that way, and there's a comment in PostgresMain explaining why the socket cannot be put into non-blocking mode earlier, but it's still a bit whacky. Not sure what to do about that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-30 18:54:57 +0530, Amit Kapila wrote: On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-21 12:40:56 +0530, Amit Kapila wrote: I have ran it for half an hour, but it doesn't came out even after ~2 hours. It doesn't get reproduced every time, currently I am able to reproduce it and the m/c is in same state, if you want any info, let me know (unfortunately binaries are in release mode, so might not get enough information). Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... commit 494affbd900d1c90de17414a575af1a085c3e37a Author: Noah Misch n...@leadboat.com Date: Sun Oct 12 23:33:37 2014 -0400 And, I think you are saying that heikki's commit e0d97d has fixed this issue, in that case I will check once by including that fix? Well, the hang I was able to reproduce was originally hanging because of that. I saw lot of content locks waiting as well, but the origin seems to have a backend waiting for a xloginsert. The way I could trigger it quite fast was by first running a read/write pgbench and then switch to a readonly one. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Thu, Oct 30, 2014 at 6:58 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-30 18:54:57 +0530, Amit Kapila wrote: On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... commit 494affbd900d1c90de17414a575af1a085c3e37a Author: Noah Misch n...@leadboat.com Date: Sun Oct 12 23:33:37 2014 -0400 And, I think you are saying that heikki's commit e0d97d has fixed this issue, in that case I will check once by including that fix? Well, the hang I was able to reproduce was originally hanging because of that. I saw lot of content locks waiting as well, but the origin seems to have a backend waiting for a xloginsert. The way I could trigger it quite fast was by first running a read/write pgbench and then switch to a readonly one. So what exactly you mean by 'switch to'? Is it that both read-write and readonly pgbench were running together or after read-write got finished and then by running read-only pgbench, you are able to reproduce it? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Escaping from blocked send() reprised.
On 2014-10-30 15:27:13 +0200, Heikki Linnakangas wrote: The comment on PGPROC.procLatch in storage/proc.h says just this: Latch procLatch; /* generic latch for process */ This needs a lot more explaining. It's now used by signal handlers to interrupt a read or write to the socket; that should be mentioned. What else is it used for? (for waking up a backend in synchronous replication, at least) What are the rules on when to arm it and when to reset it? Hm. I agree it use expaned commentary, but I'm unsure if that much detail is really warranted. Any such documentation seems to be almost guaranteed to be out of date. As evidenced that there's none to date... Would it be more clear to use a separate, backend-private, latch, for the signals? I guess that won't work, though, because sometimes we need need to wait for a wakeup from a different process or from a signal at the same time (SyncRepWaitForLSN() in particular). Not without adding a variant of WaitLatch that can wait on two latches simultaneously, anyway. I wondered the same, but I don't really see what it'd buy us during normal running. It seems like it'd just make code more complex without leading to relevantly fewer wakeups. The assumption in secure_raw_read that MyProc exists is pretty surprising. I understand why it's that way, and there's a comment in PostgresMain explaining why the socket cannot be put into non-blocking mode earlier, but it's still a bit whacky. Not sure what to do about that. It makes me quite unhappy too. I looked what it'd take to make proclatch available earlier, but it wasn't pretty. I wondered whether we could use a 'early proc latch' in MyProcLatch that used until we're fully attached to shared memory. Then, when attaching to shared memory we'd set the old latch once, and reset MyProcLatch to the shared memory one. But that's pretty ugly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-30 19:05:06 +0530, Amit Kapila wrote: On Thu, Oct 30, 2014 at 6:58 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-30 18:54:57 +0530, Amit Kapila wrote: On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: Hm. What commit did you apply the series ontop? I managed to reproduce a hang, but it was just something that heikki had already fixed... commit 494affbd900d1c90de17414a575af1a085c3e37a Author: Noah Misch n...@leadboat.com Date: Sun Oct 12 23:33:37 2014 -0400 And, I think you are saying that heikki's commit e0d97d has fixed this issue, in that case I will check once by including that fix? Well, the hang I was able to reproduce was originally hanging because of that. I saw lot of content locks waiting as well, but the origin seems to have a backend waiting for a xloginsert. The way I could trigger it quite fast was by first running a read/write pgbench and then switch to a readonly one. So what exactly you mean by 'switch to'? Is it that both read-write and readonly pgbench were running together or after read-write got finished and then by running read-only pgbench, you are able to reproduce it? I don't think that matters all that much. In this case I first had a read-write one (accidentally, by leaving of -S), and then aborted and ran a readonly pgbench. That turned out to trigger it relatively fast. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE IF NOT EXISTS INDEX
On Tue, Oct 7, 2014 at 2:42 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Mon, Oct 6, 2014 at 11:13 AM, Marti Raudsepp ma...@juffo.org wrote: On Mon, Oct 6, 2014 at 4:27 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: create_index_if_not_exists_v7.patch Looks good to me. Marking ready for committer. Thanks. The patch looks good to me except the following minor comments. + termliteralIF NOT EXISTS/literal/term It's better to place this after the paragraph of CONCURRENTLY for the consistency with the syntax. +Do not throw an error if the index already exists. I think that this should be Do not throw an error if a relation with the same name already exists. +Index name is required when IF NOT EXISTS is specified. IF NOT EXISTS should be enclosed with literal tag. @@ -60,7 +60,8 @@ extern Oid index_create(Relation heapRelation, bool allow_system_table_mods, bool skip_build, bool concurrent, - bool is_internal); + bool is_internal, + bool if_not_exists); You forgot to add the comment of if_not_exists argument into the top of index_create function. +boolif_not_exists;/* just do nothing if index already exists */ You forgot to add the trailing ? at the above comment. There are similar comments in parsenodes.h, and they have such ?. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Change in HEAP_NEWPAGE logging makes diagnosis harder
Hi, I've just once more looked at the WAL stream ans was briefly confused about all the XLOG_FPI records. Since 54685338e3 log_newpage/log_newpage_buffer and XLogSaveBufferForHint() use the same WAL record. I personally find that a bad idea because they're used in quite different situations. Can we use different IDs again? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup vs. Windows and tablespaces
On Fri, Sep 12, 2014 at 6:12 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Sep 12, 2014 at 5:07 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 12 September 2014 14:34, Amit Kapila Wrote Please find updated patch to include those documentation changes. Looks fine, Moved to Ready for committer. Thanks a lot for the review. This patch is in Ready for committer stage for more than 1.5 months. I believe this is an important functionality such that without this tar format of pg_basebackup is not usable on Windows. I feel this will add a value to pg_basebackup utility and moreover the need and design has been agreed upon the list before development. Can any Committer please have a look at this patch? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TAP test breakage on MacOS X
Noah Misch n...@leadboat.com writes: Ick; I concur with your judgment on those aspects of the IPC::Cmd design. Thanks for investigating. So, surviving options include: 1. Require IPC::Run. 2. Write our own run() that reports the raw exit code. 3. Distill the raw exit code from the IPC::Cmd::run() error string. 4. Pass IPC::Run::run_forked() a subroutine that execs an argument list. Any others worth noting? From a user-inconvenience standpoint, I think requiring IPC::Run is not horrid, so long as we add an --enable-something configure option to drive whether the TAP tests are run or skipped. However, it has to work with whatever version of IPC::Run the platform's Perl installation comes up with, and in that regard I'm a bit worried. I'm sitting here watching a trial run on prairiedog (OSX 10.4.11, perl 5.8.6), and there seems to be something extremely wrong from a performance standpoint. For example: t/001_pg_controldata.pl .. 1..13 ok 1 - pg_controldata --help exit code 0 ok 2 - pg_controldata --help goes to stdout ok 3 - pg_controldata --help nothing to stderr ok 4 - pg_controldata --version exit code 0 ok 5 - pg_controldata --version goes to stdout ok 6 - pg_controldata --version nothing to stderr ok 7 - pg_controldata with invalid option nonzero exit code ok 8 - pg_controldata with invalid option prints error message ok 9 - pg_controldata without arguments fails ok 10 - pg_controldata with nonexistent directory fails ok 11 - pg_controldata /Users/tgl/pgsql/src/bin/pg_controldata/tmp_testOtBj/data exit code 0 ok 12 - pg_controldata /Users/tgl/pgsql/src/bin/pg_controldata/tmp_testOtBj/data no stderr ok 13 - pg_controldata produces output: matches ok All tests successful. Files=1, Tests=13, 32 wallclock secs ( 0.23 usr 0.05 sys + 21.97 cusr 7.82 csys = 30.07 CPU) Result: PASS Yup, you read that right, it took 32 seconds to run those dozen utterly trivial tests. As far as I could tell by eyeball, pretty much all of the time went into test 11, which is odd since it seems not significantly different from the others. I think there's something wacky about IPC::Run on this platform. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On 2014-10-30 01:57:15 -0400, Noah Misch wrote: On Wed, Oct 29, 2014 at 08:14:07PM -0400, Peter Eisentraut wrote: On 10/28/14 9:09 PM, Peter Eisentraut wrote: I have looked into IPC::Cmd, but the documentation keeps telling me that to do anything interesting I have to have IPC::Run anyway. I'll give it a try, though. I tried this, but I'm not optimistic about it. While parts of IPC::Cmd are actually a bit nicer than IPC::Run, other parts are weird. For example, with most packages and functions in Perl that run a command, you can pass the command as a string or as a list (or array reference). The latter is preferred because it avoids issues with quoting, word splitting, spaces, etc. In IPC::Run, I can use the run function in the latter way, but I cannot use the run_forked function like that, and I need that one to get the exit code of a command. It's possible to work around that, but I'm getting the feeling that this is not very well designed. Ick; I concur with your judgment on those aspects of the IPC::Cmd design. Thanks for investigating. So, surviving options include: 1. Require IPC::Run. 2. Write our own run() that reports the raw exit code. 3. Distill the raw exit code from the IPC::Cmd::run() error string. 4. Pass IPC::Run::run_forked() a subroutine that execs an argument list. Any others worth noting? 5. Include a copy of IPC::Run in the repository till it's common enough. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote: On 10/06/2014 02:29 PM, Andres Freund wrote: On 2014-10-06 14:19:39 +0300, Heikki Linnakangas wrote: Barring objections, I'll commit this, and then continue benchmarking the second patch with the WAL format and API changes. I'd like to have a look at it beforehand. Ping? Here's an rebased patch. I'd like to proceed with this. Doing so. I've not yet really looked, but on a quick readthrough XLogInsertRecData() staying in xlog.c doesn't make me happy... Ok.. Can you elaborate? To me the split between xloginsert.c doing some of the record assembly, and xlog.c doing the lower level part of the assembly is just wrong. And it leads to things like the already complicated logic to retry after detecting missing fpws is now split across two files seems to confirm that. What happens right now is that XLogInsert() (with some helper routines) assembles the record. Then hands that off to XLogInsertRecData(). Which sometimes returns InvalidXLogRecPtr, and returns back to XLogInsert(), which reverses *some* of its work and then retries. Brr. Similary the 'page_writes_omitted' logic doesn't make me particularly happy. Previously we retried when there actually was a page affected by the different RedoRecPtr. Now we do it as soon as our copy of RedoRecPtr is out of date? Won't that quite often spuriously trigger retries? Am I missing something? Arguably this doesn't happen often enough to matter, but it's still something that we should explicitly discuss. The implementation of the split seems to change the meaning of TRACE_POSTGRESQL_XLOG_INSERT - it's now counting retries. I don't particularly care about those tracepoints, but I see no simplification due to changing its meaning. Aside: In C11 function local statics become a bit more expensive - they essentially require atomics and/or spinlocks on the first few calls. Next thing: The patch doesn't actually compile. Misses an #include storage/proc.h for MyPgXact. I'm not a big fan of the naming for the new split. We have * XLogInsert() which is the external interface * XLogRecordAssemble() which builds the rdata chain that will be inserted * XLogInsertRecData() which actually inserts the data into the xl buffers. To me that's pretty inconsistent. I'm also somewhat confused about the new split of the headers. I'm not adverse to splitting them, but it's getting a bit hard to understand: * xlog.h - generic mishmash * xlogdefs.h - Three typedefs and some #defines * xlog_fn.h - SQL functions * xlog_internal.h - Pretty random collection of stuff * xlogutils.h - Utilities for replaying WAL records * xlogreader.h - generic XLog reading facility * xloginsert.h - Functions for generating WAL records * xlogrecord.h - Definitions for the WAL record format. Isn't that a bit too much? Pretty much the only files that have a recognizable separation to me are xlog_fn.h and xlogreader.h. You've removed the - * - * NB: this routine feels free to scribble on the XLogRecData structs, - * though not on the data they reference. This is OK since the XLogRecData - * structs are always just temporaries in the calling code. comment, but we still do, no? While not this patches blame, I see currently we finish the critical section in XLogInsert/XLogInsertRecData pretty early: END_CRIT_SECTION(); /* * Update shared LogwrtRqst.Write, if we crossed page boundary. */ if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ) { SpinLockAcquire(XLogCtl-info_lck); /* advance global request to include new block(s) */ if (XLogCtl-LogwrtRqst.Write EndPos) XLogCtl-LogwrtRqst.Write = EndPos; /* update local result copy while I have the chance */ LogwrtResult = XLogCtl-LogwrtResult; SpinLockRelease(XLogCtl-info_lck); } I don't think this is particularly critical, but it still looks wrong to me. Hm. I think that's what I see for now. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
I wrote: Yup, you read that right, it took 32 seconds to run those dozen utterly trivial tests. As far as I could tell by eyeball, pretty much all of the time went into test 11, which is odd since it seems not significantly different from the others. I think there's something wacky about IPC::Run on this platform. Hm ... on my RHEL6 machine, it takes about 2.5 seconds to run the pg_controldata TAP tests, and again it looks like practically all of that is going into test 11. Given the speed differential between the two machines, the time taken by prairiedog is roughly in line with that. So the real question seems to be why is IPC::Run's performance so inconsistent? Is there something I'm not understanding that would cause test 11 to require much more time than the others? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] infinite loop in _bt_getstackbuf
A colleague at EnterpriseDB today ran into a situation on PostgreSQL 9.3.5 where the server went into an infinite loop while attempting a VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't be killed with ^C. I think we should add a check for interrupts into that loop somewhere; and possibly make some attempt to notice if we've been iterating for longer than, say, the lifetime of the universe until now. The fundamental structure of that function is an infinite loop. We break out of that loop when BTEntrySame(item, stack-bts_btentry) or P_RIGHTMOST(opaque) and I'm sure that it's correct to think that, in theory, one of those things will eventually happen. But the index could be corrupted, most obviously by having a page where opaque-btpo_next points pack to the current block number. If that happens, you need an immediate shutdown (or some clever gdb hackery) to terminate the VACUUM. That's unfortunate and unnecessary. It also looks likes something we can fix, at a minimum by adding a CHECK_FOR_INTERRUPTS() at the top of that loop, or in some function that it calls, like _bt_getbuf(), so that if it goes into an infinite loop, it can at least be killed. We could also onsider adding a check at the bottom of the loop, just before setting blkno = opaque-btpo_next, that those values are unequal. If they are, elog(). Clearly it's possible to have a cycle of length 1, and such a check wouldn't catch that, but it might still be worth checking for the trivial case. Or, we could try to put an upper bound on the number of iterations that are reasonable and error out if we exceed that value. That might be tricky, though; it's not obvious to me that there's any comfortably small upper bound. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On 10/30/2014 06:02 PM, Andres Freund wrote: On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote: On 10/06/2014 02:29 PM, Andres Freund wrote: I've not yet really looked, but on a quick readthrough XLogInsertRecData() staying in xlog.c doesn't make me happy... Ok.. Can you elaborate? To me the split between xloginsert.c doing some of the record assembly, and xlog.c doing the lower level part of the assembly is just wrong. Really? To me, that feels like a natural split. xloginsert.c is responsible for constructing the final WAL record, with the backup blocks and all. It doesn't access any shared memory (related to WAL; it does look at Buffers, to determine what to back up). The function in xlog.c inserts the assembled record to the WAL, as is. It handles the locking and WAL buffer management related to that. What would you suggest? I don't think putting everything in one XLogInsert function, like we have today, is better. Note that the second patch makes xloginsert.c a lot more complicated. And it leads to things like the already complicated logic to retry after detecting missing fpws is now split across two files seems to confirm that. What happens right now is that XLogInsert() (with some helper routines) assembles the record. Then hands that off to XLogInsertRecData(). Which sometimes returns InvalidXLogRecPtr, and returns back to XLogInsert(), which reverses *some* of its work and then retries. Brr. Modifying the chain that was already constructed is not pretty, but it's not this patch's fault. I don't think splitting the functions makes that any worse. (BTW, the follow-up patch to change the WAL format fixes that; the per-buffer data is kept separately from the main data chain). Similary the 'page_writes_omitted' logic doesn't make me particularly happy. Previously we retried when there actually was a page affected by the different RedoRecPtr. Now we do it as soon as our copy of RedoRecPtr is out of date? Won't that quite often spuriously trigger retries? Am I missing something? Arguably this doesn't happen often enough to matter, but it's still something that we should explicitly discuss. The situation where it leads to a spurious retry is when the constructed WAL record skipped the FPW of a buffer, and RedoRecPtr was updated, but the buffer stilll doesn't need to be FPW according to the updated RedoRecPtr. That only happens if the same buffer was updated (by another backend) after the new checkpoint began. I believe that's extremely rare. We could make that more fine-grained, though. Instead of passing a boolean 'fpw_sensitive' flag to XLogInsertRecData, pass the lowest LSN among the pages whose FPW was skipped. If that's less than the new RedoRecPtr, then retry is needed. That would eliminate the spurious retries. The implementation of the split seems to change the meaning of TRACE_POSTGRESQL_XLOG_INSERT - it's now counting retries. I don't particularly care about those tracepoints, but I see no simplification due to changing its meaning. I wasn't sure what to do about that. I don't use tracepoints, and I don't know what others use them for. I'm not a big fan of the naming for the new split. We have * XLogInsert() which is the external interface * XLogRecordAssemble() which builds the rdata chain that will be inserted * XLogInsertRecData() which actually inserts the data into the xl buffers. To me that's pretty inconsistent. Got a better suggestion? I wanted to keep the name XLogInsert() unchanged, to avoid code churn, and because it's still a good name for that. XLogRecordAssemble is pretty descriptive for what it does, although XLogRecord implies that it construct an XLogRecord struct. It does fill in that too, but the main purpose is to build the XLogRecData chain. Perhaps XLogAssembleRecord() would be better. I'm not very happy with XLogInsertRecData myself. XLogInsertRecord? I'm also somewhat confused about the new split of the headers. I'm not adverse to splitting them, but it's getting a bit hard to understand: * xlog.h - generic mishmash * xlogdefs.h - Three typedefs and some #defines * xlog_fn.h - SQL functions * xlog_internal.h - Pretty random collection of stuff * xlogutils.h - Utilities for replaying WAL records * xlogreader.h - generic XLog reading facility * xloginsert.h - Functions for generating WAL records * xlogrecord.h - Definitions for the WAL record format. Isn't that a bit too much? Pretty much the only files that have a recognizable separation to me are xlog_fn.h and xlogreader.h. This is a matter of taste, I guess. I find xlog.h, xlogdefs.h and xlog_internal.h fairly random, while the rest are have a clear function. xlogutils.h is needed by redo functions, and xloginsert.h is needed for generating WAL. Note that the second patch adds more stuff to xloginsert.h and xlogrecord.h. You've removed the - * - * NB: this routine feels free to scribble on the XLogRecData structs, - * though not on the data
Re: [HACKERS] BRIN indexes - TRAP: BadArgument
+ acronymBRIN/acronym indexes can satisfy queries via the bitmap + scanning facility, and will return all tuples in all pages within The bitmap scanning facility? Does this mean a bitmap index scan? Or something novel to BRIN? I think this could be clearer. + This enables them to work as very fast sequential scan helpers to avoid + scanning blocks that are known not to contain matching tuples. Hmm, but they don't actually do anything about sequential scans per se, right? I'd say something like: Because a BRIN index is very small, scanning the index adds little overhead compared to a sequential scan, but may avoid scanning large parts of the table that are known not to contain matching tuples. + depend on the operator class selected for the data type. The operator class is selected for the index, not the data type. + The size of the block range is determined at index creation time with + the literalpages_per_range/ storage parameter. + The smaller the number, the larger the index becomes (because of the need to + store more index entries), but at the same time the summary data stored can + be more precise and more data blocks can be skipped during an index scan. I would insert a sentence something like this: The number of index entries will be equal to the size of the relation in pages divided by the selected value for pages_per_range. Therefore, the smaller the number At least, I would insert that if it's actually true. My point is that I think the effect of pages_per_range could be made more clear. + The core productnamePostgreSQL/productname distribution includes + includes the acronymBRIN/acronym operator classes shown in + xref linkend=gin-builtin-opclasses-table. Shouldn't that say brin, not gin? + requiring the access method implementer only to implement the semantics The naming of the reverse range map seems a little weird. It seems like most operations go through it, so it feels more like the forward direction. Maybe I'm misunderstanding. (I doubt it's worth renaming it at this point either way, but I thought I'd mention it.) + errmsg(unlogged BRIN indexes are not supported))); Why not? Shouldn't be particularly hard, I wouldn't think. I'm pretty sure you need to create a pageinspect--1.3.sql, not just update the 1.2 file. Because that's in 9.4, and this won't be. I'm pretty excited about this feature. I think it's going to be very good for PostgreSQL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user/role CURRENT_USER
Kyotaro, Zero-length identifiers are rejected in scanner so RoleId cannot be a valid pointer to a zero-length string. (NULL is used as PUBLIC in auth_ident) | postgres=# create role ; | ERROR: zero-length delimited identifier at or near | postgres=# create role U\00; | ERROR: invalid Unicode escape value at or near \00 Err... what? I'm not sure what you are getting at here? Why would we ever have/want a zero-length identifier for a RoleId? Seems to me that anything requiring a RoleId should be explicit. As a dirty and quick hack we can use some integer values prfixed by single zero byte to represent special roles such as CURRENT_USER. | RoleId_or_curruser: RoleId{ $$ = $1; } | | CURRENT_USER { $$ = \x00\x01;}; | Oid ResolveRoleId(const char *name, bool missing_ok) | { | Oid roleid; | | if (name[0] == 0 name[1] == 1) | roleid = GetUserId(); This is ugly but needs no additional struct member or special logics. (Macros could make them look better.) Yeah, that's pretty ugly. I think Alvaro's recommendation of having the production return a node with a name or flag is the better approach. | /* This hack lets us avoid reserving PUBLIC as a keyword*/ | if (strcmp($1, public) == 0) Please let me know the reason to avoid registering new keyword making the word unusable as an literal identifier, if any? FWIW, I found the following in the archives: http://www.postgresql.org/message-id/15516.1038718...@sss.pgh.pa.us Now this is from 2002 and it appears it wasn't necessary to change at the time, but I haven't yet found anything else related (it's a big archive). Though, as I understand it, PUBLIC is now non-reserved as of SQL:2011 which might make a compelling argument to leave it as is? -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [HACKERS] infinite loop in _bt_getstackbuf
Robert Haas wrote: A colleague at EnterpriseDB today ran into a situation on PostgreSQL 9.3.5 where the server went into an infinite loop while attempting a VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't be killed with ^C. I think we should add a check for interrupts into that loop somewhere; Our design principle in this area is that all loops should have CHECK_FOR_INTERRUPTS() calls somewhere, so that even if data is horribly corrupted you can get out of it. (Trivial loops where the exit condition cannot possibly fail don't need to apply --- surely we don't need to cope with hardware that makes i+1 go back to i-1 or whatever.) Therefore I don't think you need to argue very hard in order to add more interrupt checks if you see a loop that's missing them. For example, Andres was saying to me just this morning that GetMultiXactIdMembers is lacking one such check, and I was considering pushing a patch to add it without any discussion. and possibly make some attempt to notice if we've been iterating for longer than, say, the lifetime of the universe until now. This I'm not so sure about. Adding extra logic in all nontrivial loops to detect whether they have been running for too long is likely to cause too much overhead. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Faster relevance ranking didn't make it into PostgreSQL 9.4
This is slight off topic but please bear with me. I came across this post: http://pauleveritt.wordpress.com/2014/10/29/faster-relevance-ranking-didnt-make-it-into-postgresql-9-4/ I was curious about it so I checked several commit fest pages and searched the mailing lists but I wasn't able to locate the relevant discussion. Can someone point me to to it? -- Arthur Silva
Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT
On Wed, Oct 29, 2014 at 2:36 PM, Tomas Vondra t...@fuzzy.cz wrote: I would tend not to worry too much about this case. I'm skeptical that there are a lot of people using large template databases. But if there are, or if some particular one of those people hits this problem, then they can raise checkpoint_segments to avoid it. The reverse problem, which you are encountering, cannot be fixed by adjusting settings. That however solves only the checkpoint, not the double amount of I/O due to writing both the files and WAL, no? But maybe that's OK. I mean, it's not unimaginable that it's going to hurt somebody, but the current situation is pretty bad too. You don't have to be the world's foremost PostgreSQL performance expert to know that extra checkpoints are really bad for performance. Write volume is of course also a problem, but I bet there are a lot more people using small template databases (where the write volume isn't really an issue, because as Heikki points out the checkpoint wastes half a segment anyway, but the checkpoint may very well be a issue) than large ones (where either could be an issue). Also, all this is concern only with 'wal_level != minimal', but ISTM 'on wal_level=minimal it's fast' is a rather poor argument. I think the argument here is more that there's no such thing as a free lunch. If someone wants to come up with a way to make this work without WAL-logging every block -AND- without triggering a checkpoint, great. If that works well, perhaps we can apply it to other cases like ALTER TABLE .. SET TABLESPACE which are currently quite painful and which seem to me to have more or less the same problem. But I don't really know why we should expect that such a solution exists at all or is easy to engineer correctly. All we're doing here is applying the same solution that's been found to be robust in other situations to some old, crufty code that isn't. (This reminds me, yet again, that it would be really nice to something smarter than checkpoint_segments. If there is little WAL activity between one checkpoint and the next, we should reduce the number of segments we're keeping around to free up disk space and ensure that we're recycling a file new enough that it's likely to still be in cache. Recycling files long-since evicted from cache is poor. But then we should also let the number of WAL files ratchet back up if the system again becomes busy. Isn't this more or less what Heikki's soft-WAL-limit patch did? Why did we reject that, again?) What about simply reusing the files in a different way? Instead of looping through the files in a round robin manner, couldn't we just use the last recently used file, instead of going all the way back to the first one? This won't free the disk space, but IMHO that's not a problem because noone is going to use that space anyway (as it would be a risk once all the segments will be used again). Hmm, interesting idea. I have no idea whether that would work out to a win or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT
On 2014-10-30 14:51:54 -0400, Robert Haas wrote: On Wed, Oct 29, 2014 at 2:36 PM, Tomas Vondra t...@fuzzy.cz wrote: I would tend not to worry too much about this case. I'm skeptical that there are a lot of people using large template databases. But if there are, or if some particular one of those people hits this problem, then they can raise checkpoint_segments to avoid it. The reverse problem, which you are encountering, cannot be fixed by adjusting settings. That however solves only the checkpoint, not the double amount of I/O due to writing both the files and WAL, no? But maybe that's OK. I mean, it's not unimaginable that it's going to hurt somebody, but the current situation is pretty bad too. You don't have to be the world's foremost PostgreSQL performance expert to know that extra checkpoints are really bad for performance. Write volume is of course also a problem, but I bet there are a lot more people using small template databases (where the write volume isn't really an issue, because as Heikki points out the checkpoint wastes half a segment anyway, but the checkpoint may very well be a issue) than large ones (where either could be an issue). Agreed. The current behaviour is a pretty ugly that just failed to fail recently. I actually think we should *always* use the new code and not add a separate wal_level=minimal branch. Maintaining this twice just isn't worth the effort. minimal is used *far* less these days. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster relevance ranking didn't make it into PostgreSQL 9.4
Arthur Silva wrote: This is slight off topic but please bear with me. I came across this post: http://pauleveritt.wordpress.com/2014/10/29/faster-relevance-ranking-didnt-make-it-into-postgresql-9-4/ I was curious about it so I checked several commit fest pages and searched the mailing lists but I wasn't able to locate the relevant discussion. Can someone point me to to it? Maybe this one? http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=gon-...@mail.gmail.com -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user/role CURRENT_USER
* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: | RoleId_or_curruser: RoleId{ $$ = $1; } | | CURRENT_USER { $$ = \x00\x01;}; [...] This is ugly but needs no additional struct member or special logics. (Macros could make them look better.) Yeah, that's pretty ugly. I think Alvaro's recommendation of having the production return a node with a name or flag is the better approach. That's more than just 'ugly', in my view. I don't think there's any reason to avoid making this into a node with a field that can be set to indicate it's something special if we're going to support this. The other idea which comes to mind is- could we try to actually resolve what the 'right' answer is here, instead of setting a special value and then having to detect and fix it later? Perhaps have a Oid+Rolename structure and then fill in the Oid w/ GetUserId(), if we're called with CURRENT_USER, and otherwise just populate the Rolename field and have code later which fills in the Oid if it's InvalidOid. Please let me know the reason to avoid registering new keyword making the word unusable as an literal identifier, if any? We really don't want to introduce new keywords without very good reason, and adding to the list of can't be used even if quoted is all but completely forbidden. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT
On 10/30/2014 08:51 PM, Robert Haas wrote: On Wed, Oct 29, 2014 at 2:36 PM, Tomas Vondra t...@fuzzy.cz wrote: I would tend not to worry too much about this case. I'm skeptical that there are a lot of people using large template databases. But if there are, or if some particular one of those people hits this problem, then they can raise checkpoint_segments to avoid it. The reverse problem, which you are encountering, cannot be fixed by adjusting settings. That however solves only the checkpoint, not the double amount of I/O due to writing both the files and WAL, no? But maybe that's OK. I mean, it's not unimaginable that it's going to hurt somebody, but the current situation is pretty bad too. You don't have to be the world's foremost PostgreSQL performance expert to know that extra checkpoints are really bad for performance. Write volume is of course also a problem, but I bet there are a lot more people using small template databases (where the write volume isn't really an issue, because as Heikki points out the checkpoint wastes half a segment anyway, but the checkpoint may very well be a issue) than large ones (where either could be an issue). Nitpick: I didn't say that a a checkpoint wastes half a segment. An xlog switch does, but a checkpoint doesn't automatically cause an xlog switch. But I agree with the sentiment in general. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT
On 10/30/2014 08:56 PM, Andres Freund wrote: I actually think we should *always* use the new code and not add a separate wal_level=minimal branch. Maintaining this twice just isn't worth the effort. minimal is used *far* less these days. I wouldn't go that far. Doing the wal_level=minimal optimization should be pretty straightforward. Note that it would be implemented more like CREATE INDEX et al with wal_level=minimal, not the way CREATE DATABASE currently works. It would not involve any extra checkpoints. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
Hi, On 2014-09-15 15:41:22 +0300, Heikki Linnakangas wrote: The second patch contains the interesting changes. Heikki's pushed the newest version of this to the git tree. Some things I noticed while reading the patch: * potential mismerge: +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -408,7 +408,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, D:d:h:p:U:s:S:nF:wWv, + while ((c = getopt_long(argc, argv, D:d:h:p:U:s:nF:wWv, long_options, option_index)) != -1) { switch (c) * Can't you just have REGBUF_WILL_INIT include REGBUF_NO_IMAGE? Then you don't need to test both. * Is it really a good idea to separate XLogRegisterBufData() from XLogRegisterBuffer() the way you've done it ? If we ever actually get a record with a large numbers of blocks touched this issentially is O(touched_buffers*data_entries). * At least in Assert mode XLogRegisterBuffer() should ensure we're not registering the same buffer twice. It's a pretty easy to make mistake to do it twice for some kind of actions (think UPDATE), and the consequences will be far from obvious afaics. * There's lots of functions like XLogRecHasBlockRef() that dig through the wal record. A common pattern is something like: if (XLogRecHasBlockRef(record, 1)) XLogRecGetBlockTag(record, 1, NULL, NULL, oldblk); else oldblk = newblk; I think doing that repeatedly is quite a bad idea. We should parse the record once and then use it in a sensible format. Not do it in pieces, over and over again. It's not like we ignore backup blocks - so doing this lazily doesn't make sense to me. Especially as ValidXLogRecord() *already* has parsed the whole damn thing once. * Maybe XLogRegisterData() and XLogRegisterBufData() should accept void* instead of char*? Right now there's lots of pointless casts in callers needed that don't add any safety. The one additional line that's then needed in these functions seems well worth it. * There's a couple record types (e.g. XLOG_SMGR_TRUNCATE) that only refer to the relation, but not to the block number. These still log their rnode manually. Shouldn't we somehow deal with those in a similar way explicit block references are dealt with? * You've added an assert in RestoreBackupBlockContents() that ensures the page isn't new. That wasn't there before, instead there was a special case for it. I can't immediately think how that previously could happen, but I do wonder why it was there. * The following comment in +RestoreBackupBlock probably is two lines to early + /* Found it, apply the update */ + if (!(bkpb-fork_flags BKPBLOCK_HAS_IMAGE)) + return InvalidBuffer; This new InvalidBuffer case probably could use a comment in either XLogReadBufferForRedoExtended or here. * Most of the commentary above RestoreBackupBlock() probably doesn't belong there anymore. * Maybe XLogRecordBlockData.fork_flags should be renamed? Maybe just into flags_fork? Right now it makes at least me think that it's fork specific flags, which really isn't the actual meaning. * I wonder if it wouldn't be worthwile, for the benefit of the FPI compression patch, to keep the bkp block data after *all* the headers. That'd make it easier to just compress the data. * +InitXLogRecordConstruct() seems like a remainder of the xlogconstruct.c naming. I'd just go for InitXLogInsert() * Hm. At least WriteMZeroPageXlogRec (and probably the same for all the other slru stuff) doesn't include a reference to the page. Isn't that bad? Shouldn't we make XLogRegisterBlock() usable for that case? Otherwise I fail to see how pg_rewind like tools can sanely deal with this? * Why did you duplicate the identical cases in btree_desc()? I guess due to rebasing ontop the xlogdump stats series? * I haven't actually looked at the xlogdump output so I might be completely wrong, but won't the output of e.g. updates be harder to read now because it's not clear which of the references old/new buffers are? Not that it matters that much. * s/recdataend/recdata_end/? The former is somewhat hard to parse for me. * I think heap_xlog_update is buggy for wal_level=logical because it computes the length of the tuple using tuplen = recdataend - recdata; But the old primary key/old tuple value might be stored there as well. Afaics that code has to continue to use xl_heap_header_len. * It looks to me like insert wal logging could just use REGBUF_KEEP_DATA to get rid of: + /* +* The new tuple is normally stored as buffer 0's data. But if +* XLOG_HEAP_CONTAINS_NEW_TUPLE flag is set, it's part of the main +* data, after the xl_heap_insert struct. +*/ + if (xlrec-flags XLOG_HEAP_CONTAINS_NEW_TUPLE) + { + data = XLogRecGetData(record) + SizeOfHeapInsert; + datalen = record-xl_len - SizeOfHeapInsert; + } + else + data
Re: [HACKERS] infinite loop in _bt_getstackbuf
Alvaro Herrera alvhe...@2ndquadrant.com writes: Robert Haas wrote: A colleague at EnterpriseDB today ran into a situation on PostgreSQL 9.3.5 where the server went into an infinite loop while attempting a VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't be killed with ^C. I think we should add a check for interrupts into that loop somewhere; Our design principle in this area is that all loops should have CHECK_FOR_INTERRUPTS() calls somewhere, so that even if data is horribly corrupted you can get out of it. FWIW, I concur with Alvaro that adding a CHECK_FOR_INTERRUPTS() needn't require much discussion. Given the lack of prior complaints about this loop, I'm not sure I see the need to work harder than that; corruption of this sort must be quite rare. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user/role CURRENT_USER
Stephen Frost sfr...@snowman.net writes: The other idea which comes to mind is- could we try to actually resolve what the 'right' answer is here, instead of setting a special value and then having to detect and fix it later? No, absolutely not. Read the NOTES at the head of gram.y. Or if you need it spelled out: when we run the bison parser, we may not be inside a transaction at all, and even if we are, we aren't necessarily going to be seeing the same current user that will apply when the parsetree is ultimately executed. (Consider a query inside a plpgsql function, which might be called under multiple userids over the course of a session.) I think Alvaro's suggestion is a perfectly appropriate solution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user/role CURRENT_USER
Adam Brightwell adam.brightw...@crunchydatasolutions.com writes: FWIW, I found the following in the archives: http://www.postgresql.org/message-id/15516.1038718...@sss.pgh.pa.us Now this is from 2002 and it appears it wasn't necessary to change at the time, but I haven't yet found anything else related (it's a big archive). Though, as I understand it, PUBLIC is now non-reserved as of SQL:2011 which might make a compelling argument to leave it as is? The current spec does list PUBLIC as a non-reserved keyword, but it also says (5.4 Names and identifiers syntax rules) 20) No authorization identifier shall specify PUBLIC. which, oddly enough, seems to license us to handle PUBLIC the way we are doing. OTOH, it lists CURRENT_USER as a reserved word, suggesting that they don't think the same type of hack should be used for that. I'd be inclined to leave the grammar as such alone (ie CURRENT_USER is a keyword, PUBLIC isn't). Changing that has more downside than upside, and we do have justification in the spec for treating the two cases differently. However, I agree that we should fix the subsequent processing so that current_user is not confused with CURRENT_USER. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
I wrote: Yup, you read that right, it took 32 seconds to run those dozen utterly trivial tests. As far as I could tell by eyeball, pretty much all of the time went into test 11, which is odd since it seems not significantly different from the others. I think there's something wacky about IPC::Run on this platform. Oh, never mind: after digging into the test source I eventually realized that there's an initdb happening between tests 10 and 11, and that's what's eating the time. It might be a thought to print something to indicate that a time-consuming step is happening; but the lack of any feedback here isn't exactly IPC::Run's fault. Anyway, I can confirm Peter's statement that the current tests work even on quite old platforms, as long as you install IPC::Run. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SET TABLESPACE ... NOWAIT
Hi, while preparing an overview of features new in 9.4, I noticed that while we provide NOWAIT for the ALTER ... ALL IN TABLESPACE commands, we don't support that for the 'single object' case. Is that on purpose? I assume it makes, as with a single object you can't get stuck half-way through, but I don't see it mentioned in any of the discussions (both for the original patch in January and the summer reworks). Also, the current phrasing If the NOWAIT option is specified then the command will fail if it is unable to acquire all of the locks required immediately. seems a bit ambiguous to me. Maybe it's just me, but I wasn't sure if that means locks for all objects immediately, before actually starting moving them to the new tablespace, and I had to check to code that I understand it correctly. But maybe that's just me and it's unambiguous to everyone else. Well, it would be really strange if it behaved differently ... Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 10/30/14, 3:19 AM, Michael Paquier wrote: On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com wrote: Patch applies against current HEAD and builds, but I'm getting 37 failed tests (mostly parallel, but also misc and WITH; results attached). Is that expected? This is caused by the recent commit 7b1c2a0 (that I actually participated in reviewing :p) because of a missing inclusion of ruleutils.h in index.c. Sorry, forgot to mention patch now passes make check cleanly. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 10/30/14, 3:19 AM, Michael Paquier wrote: Thanks for your input, Jim! On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby jim.na...@bluetreble.com mailto:jim.na...@bluetreble.com wrote: Patch applies against current HEAD and builds, but I'm getting 37 failed tests (mostly parallel, but also misc and WITH; results attached). Is that expected? This is caused by the recent commit 7b1c2a0 (that I actually participated in reviewing :p) because of a missing inclusion of ruleutils.h in index.c. The mark the concurrent index bit is rather confusing; it sounds like it's referring to the new index instead of the old. Now that I've read the code I understand what's going on here between the concurrent index *entry* and the filenode swap, but I don't think the docs make this sufficiently clear to users. How about something like this: The following steps occur in a concurrent index build, each in a separate transaction. Note that if there are multiple indexes to be rebuilt then each step loops through all the indexes we're rebuilding, using a separate transaction for each one. 1. [blah] Definitely a good idea! I took your text and made it more precise, listing the actions done for each step, the pg_index flags switched, using orderedlist to make the list of steps described in a separate paragraph more exhaustive for the user. At the same time I reworked the docs removing a part that was somewhat duplicated about dealing with the constraints having invalid index entries and how to drop them. Awesome! Just a few items here: + Then a second pass is performed to add tuples that were added while + the first pass build was running. One the validation of the index s/One the/Once the/ + * index_concurrent_create + * + * Create an index based on the given one that will be used for concurrent + * operations. The index is inserted into catalogs and needs to be built later + * on. This is called during concurrent index processing. The heap relation + * on which is based the index needs to be closed by the caller. Last bit presumably should be on which the index is based. What about Create a concurrent index based on the definition of the one provided by caller? That's good too, but my comment was on the last sentence, not the first. + /* Build the list of column names, necessary for index_create */ Instead of all this work wouldn't it be easier to create a version of index_create/ConstructTupleDescriptor that will use the IndexInfo for the old index? ISTM index_concurrent_create() is doing a heck of a lot of work to marshal data into one form just to have it get marshaled yet again. Worst case, if we do have to play this game, there should be a stand-alone function to get the columns/expressions for an existing index; you're duplicating a lot of code from pg_get_indexdef_worker(). Yes, this definitely sucks and the approach creating a function to get all the column names is not productive as well. Then let's define an additional argument in index_create to pass a potential TupleDesc instead of this whole wart. I noticed as well that we need to actually reset attcacheoff to be able to use a fresh version of the tuple descriptor of the old index. I added a small API for this purpose in tupdesc.h called ResetTupleDescCache. Would it make sense instead to extend CreateTupleDescCopyConstr or CreateTupleDescCopy with a boolean flag? Perhaps there'd be other places that would want to reset the stats, so I lean slightly that direction. The comment at the beginning of index_create needs to be modified to mention tupdesc and especially that setting tupdesc over-rides indexColNames. index_concurrent_swap(): Perhaps it'd be better to create index_concurrent_swap_setup() and index_concurrent_swap_cleanup() and refactor the duplicated code out... the actual function would then become: This sentence is not finished :) IMO, index_concurrent_swap looks good as is, taking as arguments the index and its concurrent entry, and swapping their relfilenode after taking AccessExclusiveLock that will be hold until the end of this transaction. Fair enough. ReindexRelationConcurrently() + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be + * either an index or a table. If a table is specified, each step of REINDEX + * CONCURRENTLY is done in parallel with all the table's indexes as well as + * its dependent toast indexes. This comment is a bit misleading; we're not actually doing anything in parallel, right? AFAICT index_concurrent_build is going to block while each index is built the first time. Yes, parallel may be misleading. What is meant here is that each step of the process is done one by one on all the valid indexes a table may have. New comment looks good. +* relkind is an index, this index itself will be rebuilt. The locks taken +* parent relations
Re: [HACKERS] TAP test breakage on MacOS X
On 10/30/2014 05:23 PM, Tom Lane wrote: I wrote: Yup, you read that right, it took 32 seconds to run those dozen utterly trivial tests. As far as I could tell by eyeball, pretty much all of the time went into test 11, which is odd since it seems not significantly different from the others. I think there's something wacky about IPC::Run on this platform. Oh, never mind: after digging into the test source I eventually realized that there's an initdb happening between tests 10 and 11, and that's what's eating the time. It might be a thought to print something to indicate that a time-consuming step is happening; but the lack of any feedback here isn't exactly IPC::Run's fault. Anyway, I can confirm Peter's statement that the current tests work even on quite old platforms, as long as you install IPC::Run. So, I'm a bit confused. Is the --enable-tap-tests config setting still on the table? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
Andrew Dunstan and...@dunslane.net writes: On 10/30/2014 05:23 PM, Tom Lane wrote: Anyway, I can confirm Peter's statement that the current tests work even on quite old platforms, as long as you install IPC::Run. So, I'm a bit confused. Is the --enable-tap-tests config setting still on the table? I think it should be. You should not have to have either prove or IPC::Run (or, IIRC, even Perl) in order to do make check-world. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On 10/30/14, 8:05 AM, Michael Paquier wrote: This switches from using a single if() with multiple conditions 'd together to a bunch of if() continue's. I don't know if those will perform the same, and AFAIK this is pretty performance critical. Well, we could still use the old notation with a single if(). That's not much complicated to change. I actually prefer the multiple if's; it reads a LOT cleaner. I don't know what the compiler will do with it though. If we stick with this version I'd argue it makes more sense to just stick the sync_node = and priority = statements into the if block and ditch the continue. /nitpick -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
On Wed, Oct 29, 2014 at 4:58 PM, Andres Freund and...@2ndquadrant.com wrote: That's true. I don't know what to do about it. I'm somewhat inclined to think that, if this remains in contrib, it's OK to ignore those issues until such time as people complain about them, because anybody who dislikes the things that can be done with this extension doesn't have to install it. Also, the people complaining might have useful ideas about what a good fix would look like, which I currently don't. There's some push to move this into core, which I think is overkill, but if we do it then we'd better have a good solution to this problem. At the very least it need to be clearly documented. Another solution would be to simply not give out PUBLIC rights, and restrict it to the owner/superuesers lest somebody makes explicit grants. I favor combining those two. I don't think it's appropriate to put superuser() checks in the code, if that's what you are proposing. Forcing this to be super-user only is hitting a mouse with a battleship. If you're saying we should put REVOKE commands into the install script as we do for some other modules, like dblink, that makes sense to me. We could try to make connection limits apply to pg_background, and we could also check CONNECT permission when starting a background worker. Both of those things feel slightly odd because there's no actual server connection. There *might* be a connection to the user backend that started it, but it's sort of a virtual connection through shared memory, and the background process continues running unimpeded if it goes away, so there might be no actual connection at all. I think that'd not be bad. Looks like those checks happen in InitializeSessionUserId(), which is called from InitPostgres(), which is called from BackgroundWorkerInitializeConnection(). That makes me think we're already applying these checks. rhaas=# select * from pg_background_result(pg_background_launch('vacuum pg_enum')) as (x text); x VACUUM (1 row) rhaas=# alter role rhaas nologin; ALTER ROLE rhaas=# select * from pg_background_result(pg_background_launch('vacuum pg_enum')) as (x text); ERROR: role rhaas is not permitted to log in CONTEXT: background worker, pid 64311 rhaas=# alter role rhaas login; ALTER ROLE rhaas=# select * from pg_background_result(pg_background_launch('vacuum pg_enum')) as (x text); x VACUUM (1 row) Hm. I'm unconvinced. It looks almost trivial to fail back to the text based protocol. It's hard to argue with I'm unconvinced. What specifically about that argument do you think isn't valid? While I am sure the problem can be worked around, it isn't trivial. Right now, execute_sql_string() just requests binary format unconditionally. To do what you're talking about, we'd need to iterate through all of the types and figure out which ones have typsend/typreceive functions. If there's a convenience function that will do that for us, I don't see it, probably because I believe there are exact zero situations where we do that kind of inference today. Then, the user backend has to save the format codes from the RowDescription message and decide whether to use text or binary. That just seems like a silly waste of code and cycles. I think this actually matters, too, because the question is what we're going to do with full-blown parallelism. Best would be to actually shuttle entire raw tuples between backends; second best, binary format; third best, text format or mixture of text and binary. I'm not sure what it's reasonable to try to get away with there, but I certainly think minimizing IPC costs is going to be an important goal. I didn't try to get around with shipping raw tuples here because we don't lock types while they're in use, and I'm worried that Bad Things Could Happen. But I'm sure somebody's going to care about the overhead of converting back and forth at some point. I don't see how that follows. The error context logic is there to make it clearer where an error originated from. It'll be really confusing if there's ERRORs jumping out of a block of code without emitting context that has set a error context set. I don't think I was proposing that, but I think I may have gotten a little off-track here. See what you think of the attached, which seems to work. It does mean that if a security definer function starts a worker, and returns without detaching it or cleaning it up, the unprivileged user could then read the data back from that worker. That's more insidious than it may at first appear, because the security definer function could have been intending to read back the data before returning, and then a transaction abort happens. We could add a guard requiring that the data be read by the same effective user ID that started the worker, although that might also foreclose useful things people would otherwise be able to do with this. I think such a restriction would be a good idea
Re: [HACKERS] TAP test breakage on MacOS X
On 10/30/14, 4:55 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 10/30/2014 05:23 PM, Tom Lane wrote: Anyway, I can confirm Peter's statement that the current tests work even on quite old platforms, as long as you install IPC::Run. So, I'm a bit confused. Is the --enable-tap-tests config setting still on the table? I think it should be. You should not have to have either prove or IPC::Run (or, IIRC, even Perl) in order to do make check-world. Could configure detect if we have IPC::Run? ISTM it'd be nice to make this automatic instead of requiring it to be specifically enabled. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SET TABLESPACE ... NOWAIT
Tomas Vondra wrote Also, the current phrasing If the NOWAIT option is specified then the command will fail if it is unable to acquire all of the locks required immediately. seems a bit ambiguous to me. Maybe it's just me, but I wasn't sure if that means locks for all objects immediately, before actually starting moving them to the new tablespace, and I had to check to code that I understand it correctly. But maybe that's just me and it's unambiguous to everyone else. Well, it would be really strange if it behaved differently ... If you weren't sure that it meant locks for all objects [first]... what did you think it could have meant instead? NOWAIT semantics seem fairly straight-forward in that at no point do I want to wait to get a lock so unless I can get all my locks right now give up and let me try again when I am ready. Once you have all the locks feel free to do your thing cause the locks you hold will prevent you from having to wait for someone else during your processing (aside from resource contention anyways). Note from the ALTER TABLE documentation section: All tables in the current database in a tablespace can be moved by using the ALL IN TABLESPACE form, which will lock all tables to be moved first and then move each one. This immediately precedes the comment about NOWAIT so there is not need to reason things out... Is there inconsistency between the different objects types with regard to the wording in the documentation? I may nitpick the wording itself but at least for TABLE there is no ambiguity that I can see. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/SET-TABLESPACE-NOWAIT-tp5825098p5825107.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
Jim Nasby jim.na...@bluetreble.com writes: On 10/30/14, 4:55 PM, Tom Lane wrote: I think it should be. You should not have to have either prove or IPC::Run (or, IIRC, even Perl) in order to do make check-world. Could configure detect if we have IPC::Run? ISTM it'd be nice to make this automatic instead of requiring it to be specifically enabled. The general philosophy we have about features enabled by configure is exactly opposite to that: we do not for example look for Perl and then build or not build plperl dependent on that. If you want plperl, you tell us so, and then we either build it or fail because we can't. You could argue that test coverage is different from features of the completed package, but I don't really buy that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT
On 10/30/14, 2:13 PM, Heikki Linnakangas wrote: On 10/30/2014 08:56 PM, Andres Freund wrote: I actually think we should *always* use the new code and not add a separate wal_level=minimal branch. Maintaining this twice just isn't worth the effort. minimal is used *far* less these days. I wouldn't go that far. Doing the wal_level=minimal optimization should be pretty straightforward. Note that it would be implemented more like CREATE INDEX et al with wal_level=minimal, not the way CREATE DATABASE currently works. It would not involve any extra checkpoints. +1 At my previous job, we used createdb -T copy_from_production new_dev_database, because that was far faster than re-loading the raw SQL dump all the time. It'd be a shame to have that need to write the copied data 2x. IIRC that database was around 20MB. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On 10/30/14, 5:32 PM, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: On 10/30/14, 4:55 PM, Tom Lane wrote: I think it should be. You should not have to have either prove or IPC::Run (or, IIRC, even Perl) in order to do make check-world. Could configure detect if we have IPC::Run? ISTM it'd be nice to make this automatic instead of requiring it to be specifically enabled. The general philosophy we have about features enabled by configure is exactly opposite to that: we do not for example look for Perl and then build or not build plperl dependent on that. If you want plperl, you tell us so, and then we either build it or fail because we can't. You could argue that test coverage is different from features of the completed package, but I don't really buy that. If our policy is that tests are there primarily for developers then I agree with you. If not, then would we be OK with make check being a no-op unless you'd configured with --enable-make-check? Making this something you have to enable will seriously limit the number of people running the TAP tests, simply because few will know to enable them. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background (and more parallelism infrastructure patches)
Hi, On 2014-10-30 18:03:27 -0400, Robert Haas wrote: On Wed, Oct 29, 2014 at 4:58 PM, Andres Freund and...@2ndquadrant.com wrote: That's true. I don't know what to do about it. I'm somewhat inclined to think that, if this remains in contrib, it's OK to ignore those issues until such time as people complain about them, because anybody who dislikes the things that can be done with this extension doesn't have to install it. Also, the people complaining might have useful ideas about what a good fix would look like, which I currently don't. There's some push to move this into core, which I think is overkill, but if we do it then we'd better have a good solution to this problem. At the very least it need to be clearly documented. Another solution would be to simply not give out PUBLIC rights, and restrict it to the owner/superuesers lest somebody makes explicit grants. I favor combining those two. I don't think it's appropriate to put superuser() checks in the code, if that's what you are proposing. That's not at all what I'm thinking of... The superuser reference was only that explicit EXECUTE rights aren't required for superusers. Forcing this to be super-user only is hitting a mouse with a battleship. If you're saying we should put REVOKE commands into the install script as we do for some other modules, like dblink, that makes sense to me. But instead this. We could try to make connection limits apply to pg_background, and we could also check CONNECT permission when starting a background worker. Both of those things feel slightly odd because there's no actual server connection. There *might* be a connection to the user backend that started it, but it's sort of a virtual connection through shared memory, and the background process continues running unimpeded if it goes away, so there might be no actual connection at all. I think that'd not be bad. Looks like those checks happen in InitializeSessionUserId(), which is called from InitPostgres(), which is called from BackgroundWorkerInitializeConnection(). That makes me think we're already applying these checks. Oh, neat. Hm. I'm unconvinced. It looks almost trivial to fail back to the text based protocol. It's hard to argue with I'm unconvinced. What specifically about that argument do you think isn't valid? We don't normally just say this is unsupported for things that are perfectly valid. And types without binary send/recv functions imo are perfectly valid. You've made that kind of argument yourself more than once, no? It's really not a academic thing. Popular extensions like postgis don't provide send/recv for all its types. Same with a couple of our own contrib types (intarray, chkpass, seg, cube). I guess we need input from others on this point. While I am sure the problem can be worked around, it isn't trivial. Right now, execute_sql_string() just requests binary format unconditionally. To do what you're talking about, we'd need to iterate through all of the types and figure out which ones have typsend/typreceive functions. If there's a convenience function that will do that for us, I don't see it, probably because I believe there are exact zero situations where we do that kind of inference today. Then, the user backend has to save the format codes from the RowDescription message and decide whether to use text or binary. That just seems like a silly waste of code and cycles. The current client/server protocol definitely can do it. You can specify per column whether you want binary/textual data. I'm pretty sure that at least the jdbc driver does that. It's also what I've decided to do for BDR's output plugin. Some builtin types will be transferred 'as is' if the architecture/version matches. Otherwise, if available and the version matches, send/recv will be used (exluding some corner case, but ..). Only if neither is the case if falls back to the textual protocol. I think this actually matters, too, because the question is what we're going to do with full-blown parallelism. I think for parallelism it'll most definitely not be acceptable to not support types without send/recv. So building some convenience infrastructure here imo is going to pay off. It's not like it's that hard to detect - just check OidIsValid(pg_type-typsend). The pg_type tuple has to be looked up anyway to find the send/recv functions... Best would be to actually shuttle entire raw tuples between backends; second best, binary format; third best, text format or mixture of text and binary. I'm not sure what it's reasonable to try to get away with there, but I certainly think minimizing IPC costs is going to be an important goal. I didn't try to get around with shipping raw tuples here because we don't lock types while they're in use, and I'm worried that Bad Things Could Happen. Yea, because of type changes and such I went with using the 'as-is' format only for builtin types
Re: [HACKERS] TAP test breakage on MacOS X
Jim Nasby jim.na...@bluetreble.com writes: If our policy is that tests are there primarily for developers then I agree with you. If not, then would we be OK with make check being a no-op unless you'd configured with --enable-make-check? Making this something you have to enable will seriously limit the number of people running the TAP tests, simply because few will know to enable them. Well, TBH I have no problem with that at the moment, because as Robert has pointed out the current TAP tests are of close to zero value. The odds that they'll find anything in the hands of Joe Random User are far lower than the odds that they'll break Joe's build. At some point down the road that value judgement might (hopefully will) reverse, and then we could deal with it by making --enable-tap-tests the default. But even then there would be a place for --disable-tap-tests. The current situation, where the only way to disable the TAP tests is to not do make check-world, is utterly unacceptable given their low present usefulness and lack of proven portability. I opined before that we should rip those tests out of the 9.4 branch altogether. I'm willing to leave them there if we have an --enable switch controlling them, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT
On 2014-10-30 18:06:02 -0500, Jim Nasby wrote: On 10/30/14, 2:13 PM, Heikki Linnakangas wrote: On 10/30/2014 08:56 PM, Andres Freund wrote: I actually think we should *always* use the new code and not add a separate wal_level=minimal branch. Maintaining this twice just isn't worth the effort. minimal is used *far* less these days. I wouldn't go that far. Doing the wal_level=minimal optimization should be pretty straightforward. Note that it would be implemented more like CREATE INDEX et al with wal_level=minimal, not the way CREATE DATABASE currently works. It would not involve any extra checkpoints. It's probably not that hard. I agree. Imo it's up to the person doing this conversion. We imo shouldn't require that person to develop both versions, but if they're interested in doing it: fine with me. At my previous job, we used createdb -T copy_from_production new_dev_database, because that was far faster than re-loading the raw SQL dump all the time. It'd be a shame to have that need to write the copied data 2x. IIRC that database was around 20MB. At that size not doing two immediate checkpoints will still be an order of magnitude or so bigger win. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On 2014-10-30 19:30:04 -0400, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: If our policy is that tests are there primarily for developers then I agree with you. If not, then would we be OK with make check being a no-op unless you'd configured with --enable-make-check? Making this something you have to enable will seriously limit the number of people running the TAP tests, simply because few will know to enable them. Well, TBH I have no problem with that at the moment, because as Robert has pointed out the current TAP tests are of close to zero value. The odds that they'll find anything in the hands of Joe Random User are far lower than the odds that they'll break Joe's build. I already have a couple more ready once this has stabilized... I personally don't agree that they have no value at this point. At least the pg_basebackup tests test paths that are not executed *at all* otherwise and which are not trivial. To my knowledge it's the only thing in our tests that exercises walsender and wal_level != minimal. At some point down the road that value judgement might (hopefully will) reverse, and then we could deal with it by making --enable-tap-tests the default. But even then there would be a place for --disable-tap-tests. Which would be what exactly? The current situation, where the only way to disable the TAP tests is to not do make check-world, is utterly unacceptable given their low present usefulness and lack of proven portability. Agreed. I opined before that we should rip those tests out of the 9.4 branch altogether. I'm willing to leave them there if we have an --enable switch controlling them, though. Hm. I'm not convinced that that's the best way, but I'm not going fight hard. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
Andres Freund and...@2ndquadrant.com writes: On 2014-10-30 19:30:04 -0400, Tom Lane wrote: At some point down the road that value judgement might (hopefully will) reverse, and then we could deal with it by making --enable-tap-tests the default. But even then there would be a place for --disable-tap-tests. Which would be what exactly? Well, for example, you don't have and don't want to install IPC::Run. Or you've tripped over some failure in the TAP tests (which might or might not be an actual bug) but would like to complete your build anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SET TABLESPACE ... NOWAIT
On 30.10.2014 23:19, David G Johnston wrote: Tomas Vondra wrote Also, the current phrasing If the NOWAIT option is specified then the command will fail if it is unable to acquire all of the locks required immediately. seems a bit ambiguous to me. Maybe it's just me, but I wasn't sure if that means locks for all objects immediately, before actually starting moving them to the new tablespace, and I had to check to code that I understand it correctly. But maybe that's just me and it's unambiguous to everyone else. Well, it would be really strange if it behaved differently ... If you weren't sure that it meant locks for all objects [first]... what did you think it could have meant instead? NOWAIT semantics seem fairly straight-forward in that at no point do I want to wait to get a lock so unless I can get all my locks right now give up and let me try again when I am ready. Once you have all the locks feel free to do your thing cause the locks you hold will prevent you from having to wait for someone else during your processing (aside from resource contention anyways). Note from the ALTER TABLE documentation section: All tables in the current database in a tablespace can be moved by using the ALL IN TABLESPACE form, which will lock all tables to be moved first and then move each one. This immediately precedes the comment about NOWAIT so there is not need to reason things out... EINSUFFICIENTCAFFEINE I guess ... I was already in slighly confused mode because of something else, and apparently I skipped the sentence about locking all the tables first. So nevermind, let's forget this ever happened, OK? Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On 2014-10-30 19:53:54 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-10-30 19:30:04 -0400, Tom Lane wrote: At some point down the road that value judgement might (hopefully will) reverse, and then we could deal with it by making --enable-tap-tests the default. But even then there would be a place for --disable-tap-tests. Which would be what exactly? Well, for example, you don't have and don't want to install IPC::Run. Well, that's what the hypothetical configure test is for. I see little reason in this specific case to do anything more complicated than check for prove and IPC::Run in configure and use them if necessary. Or you've tripped over some failure in the TAP tests (which might or might not be an actual bug) but would like to complete your build anyway. Well. That's just the same with plain regression tests. But I guess adding --disable is essentially free, so whatever. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to handle missing prove
On 10/28/14 10:01 PM, Peter Eisentraut wrote: On 10/28/14 9:16 PM, Tom Lane wrote: ISTM that the project policy for external components like this has been don't rely on them unless user says to use them, in which case fail if they aren't present. So perhaps what we ought to have is a configure switch along the lines of --enable-tap-tests. If you don't specify it, prove_check expands to nothing. If you do specify it, we fail if we lack any of the expected support, both prove and whatever the agreed-on set of Perl modules is. That's also a good idea. Here is a patch. From 27b303938e4472c1a28f216978c847aed5750ced Mon Sep 17 00:00:00 2001 From: Peter Eisentraut pete...@gmx.net Date: Thu, 30 Oct 2014 20:08:45 -0400 Subject: [PATCH] Add configure --enable-tap-tests option --- configure | 41 - configure.in | 17 - doc/src/sgml/installation.sgml | 10 ++ src/Makefile.global.in | 8 4 files changed, 74 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 1248b06..1b29be6 100755 --- a/configure +++ b/configure @@ -729,6 +729,7 @@ CPPFLAGS LDFLAGS CFLAGS CC +enable_tap_tests enable_dtrace DTRACEFLAGS DTRACE @@ -808,6 +809,7 @@ enable_debug enable_profiling enable_coverage enable_dtrace +enable_tap_tests with_blocksize with_segsize with_wal_blocksize @@ -1477,6 +1479,7 @@ Optional Features: --enable-profiling build with profiling enabled --enable-coverage build with coverage testing instrumentation --enable-dtrace build with DTrace support + --enable-tap-tests enable TAP tests (requires Perl) --enable-depend turn on automatic dependency tracking --enable-cassertenable assertion checks (for debugging) --disable-thread-safety disable thread-safety in client libraries @@ -3466,6 +3469,34 @@ fi # +# TAP tests +# + + +# Check whether --enable-tap-tests was given. +if test ${enable_tap_tests+set} = set; then : + enableval=$enable_tap_tests; + case $enableval in +yes) + : + ;; +no) + : + ;; +*) + as_fn_error $? no argument expected for --enable-tap-tests option $LINENO 5 + ;; + esac + +else + enable_tap_tests=no + +fi + + + + +# # Block size # { $as_echo $as_me:${as_lineno-$LINENO}: checking for block size 5 @@ -14785,7 +14816,8 @@ done # # Check for test tools # -for ac_prog in prove +if test $enable_tap_tests = yes; then + for ac_prog in prove do # Extract the first word of $ac_prog, so it can be a program name with args. set dummy $ac_prog; ac_word=$2 @@ -14827,6 +14859,13 @@ fi test -n $PROVE break done + if test -z $PROVE; then +as_fn_error $? prove not found $LINENO 5 + fi + if test -z $PERL; then +as_fn_error $? Perl not found $LINENO 5 + fi +fi # Thread testing diff --git a/configure.in b/configure.in index 0a3725f..fe6f735 100644 --- a/configure.in +++ b/configure.in @@ -229,6 +229,13 @@ AC_SUBST(DTRACEFLAGS)]) AC_SUBST(enable_dtrace) # +# TAP tests +# +PGAC_ARG_BOOL(enable, tap-tests, no, + [enable TAP tests (requires Perl)]) +AC_SUBST(enable_tap_tests) + +# # Block size # AC_MSG_CHECKING([for block size]) @@ -1876,7 +1883,15 @@ AC_CHECK_PROGS(OSX, [osx sgml2xml sx]) # # Check for test tools # -AC_CHECK_PROGS(PROVE, prove) +if test $enable_tap_tests = yes; then + AC_CHECK_PROGS(PROVE, prove) + if test -z $PROVE; then +AC_MSG_ERROR([prove not found]) + fi + if test -z $PERL; then +AC_MSG_ERROR([Perl not found]) + fi +fi # Thread testing diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 68931d2..adde5b3 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -1271,6 +1271,16 @@ titleConfiguration/title /listitem /varlistentry + varlistentry + termoption--enable-tap-tests/option/term + listitem +para + Enable tests using the Perl TAP tools. This requires a Perl + installation and the Perl module literalIPC::Run/literal. + See xref linkend=regress-tap for more information. +/para + /listitem + /varlistentry /variablelist /para diff --git a/src/Makefile.global.in b/src/Makefile.global.in index b04d005..63ff50b 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -174,6 +174,7 @@ enable_nls = @enable_nls@ enable_debug = @enable_debug@ enable_dtrace = @enable_dtrace@ enable_coverage = @enable_coverage@ +enable_tap_tests = @enable_tap_tests@ enable_thread_safety = @enable_thread_safety@ python_enable_shared = @python_enable_shared@ @@ -310,6 +311,8 @@ define ld_library_path_var $(if $(filter $(PORTNAME),darwin),DYLD_LIBRARY_PATH,$(if $(filter $(PORTNAME),aix),LIBPATH,LD_LIBRARY_PATH)) endef +ifeq ($(enable_tap_tests),yes) + define prove_installcheck cd $(srcdir) TESTDIR='$(CURDIR)'
Re: [HACKERS] TAP test breakage on MacOS X
Andres Freund and...@2ndquadrant.com writes: On 2014-10-30 19:53:54 -0400, Tom Lane wrote: Well, for example, you don't have and don't want to install IPC::Run. Well, that's what the hypothetical configure test is for. I see little reason in this specific case to do anything more complicated than check for prove and IPC::Run in configure and use them if necessary. As I said upthread, that approach seems to me to be contrary to the project policy about how configure should behave. If you have selected (or, someday, defaulted to) --enable-tap-tests, configure should *fail* if you don't have the tools to run the tests. Not silently disable tests that we have decided are valuable. How exactly would that be different from silently omitting readline support if we don't find that library? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On Thu, Oct 30, 2014 at 04:54:25PM +0100, Andres Freund wrote: On 2014-10-30 01:57:15 -0400, Noah Misch wrote: On Wed, Oct 29, 2014 at 08:14:07PM -0400, Peter Eisentraut wrote: On 10/28/14 9:09 PM, Peter Eisentraut wrote: I have looked into IPC::Cmd, but the documentation keeps telling me that to do anything interesting I have to have IPC::Run anyway. I'll give it a try, though. I tried this, but I'm not optimistic about it. While parts of IPC::Cmd are actually a bit nicer than IPC::Run, other parts are weird. For example, with most packages and functions in Perl that run a command, you can pass the command as a string or as a list (or array reference). The latter is preferred because it avoids issues with quoting, word splitting, spaces, etc. In IPC::Run, I can use the run function in the latter way, but I cannot use the run_forked function like that, and I need that one to get the exit code of a command. It's possible to work around that, but I'm getting the feeling that this is not very well designed. Ick; I concur with your judgment on those aspects of the IPC::Cmd design. Thanks for investigating. So, surviving options include: 1. Require IPC::Run. 2. Write our own run() that reports the raw exit code. 3. Distill the raw exit code from the IPC::Cmd::run() error string. 4. Pass IPC::Run::run_forked() a subroutine that execs an argument list. Any others worth noting? 5. Include a copy of IPC::Run in the repository till it's common enough. True. I eliminated that one because the license of IPC::Run is, like Perl itself, GPL+Artistic. Right now, we presume that the entire PostgreSQL tarball is distributable under the PostgreSQL License. The benefit of bundling IPC::Run is not strong enough to justify muddying that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On 2014-10-30 20:13:53 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-10-30 19:53:54 -0400, Tom Lane wrote: Well, for example, you don't have and don't want to install IPC::Run. Well, that's what the hypothetical configure test is for. I see little reason in this specific case to do anything more complicated than check for prove and IPC::Run in configure and use them if necessary. As I said upthread, that approach seems to me to be contrary to the project policy about how configure should behave. I don't think that holds much water. There's a fair amount of things that configure detects automatically. I don't think the comparison to plperl or such is meaningful - that's a runtime/install time difference. These tests are not. We e.g. detect compiler support for certain features that result in possible speedups and/or better warnings. we detect wether bison is available... If you have selected (or, someday, defaulted to) --enable-tap-tests, configure should *fail* if you don't have the tools to run the tests. Not silently disable tests that we have decided are valuable. How exactly would that be different from silently omitting readline support if we don't find that library? Because it doesn't result in a user visible regression? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby jim.na...@bluetreble.com wrote: If we stick with this version I'd argue it makes more sense to just stick the sync_node = and priority = statements into the if block and ditch the continue. /nitpick Let's go with the cleaner version then, I'd prefer code that can be read easily for this code path. Switching back is not much complicated either. -- Michael From 00d85f046d187056a85c6d6dc82e9a79674b132d Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Thu, 30 Oct 2014 22:01:10 +0900 Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array This patch is made to remove code duplication in walsender.c and syncrep.c in order to detect what is the node with the lowest strictly-positive priority, facilitating maintenance of this code. --- src/backend/replication/syncrep.c | 101 ++-- src/backend/replication/walsender.c | 36 +++-- src/include/replication/syncrep.h | 1 + 3 files changed, 82 insertions(+), 56 deletions(-) diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..848c783 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -5,7 +5,7 @@ * Synchronous replication is new as of PostgreSQL 9.1. * * If requested, transaction commits wait until their commit LSN is - * acknowledged by the sync standby. + * acknowledged by the synchronous standby. * * This module contains the code for waiting and release of backends. * All code in this module executes on the primary. The core streaming @@ -357,6 +357,72 @@ SyncRepInitConfig(void) } } + +/* + * Obtain position of synchronous standby in the array referencing all + * the WAL senders, or -1 if no synchronous node can be found. The caller + * of this function should take a lock on SyncRepLock. If there are + * multiple nodes with the same lowest priority value, the first node + * found is selected. + * sync_priority is a preallocated array of size max_wal_senders that + * can be used to retrieve the priority of each WAL sender. Its + * inclusion in this function has the advantage to limit the scan + * of the WAL sender array to one pass, limiting the amount of cycles + * SyncRepLock is taken. + */ +int +SyncRepGetSynchronousNode(int *node_priority) +{ + int sync_node = -1; + int priority = 0; + int i; + + /* Scan WAL senders and find synchronous node if any */ + for (i = 0; i max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = WalSndCtl-walsnds[i]; + + /* First get the priority of this WAL sender */ + if (node_priority) + node_priority[i] = XLogRecPtrIsInvalid(walsnd-flush) ? +0 : walsnd-sync_standby_priority; + + /* Proceed to next if not active */ + if (walsnd-pid == 0) + continue; + + /* Proceed to next if not streaming */ + if (walsnd-state != WALSNDSTATE_STREAMING) + continue; + + /* Proceed to next one if asynchronous */ + if (walsnd-sync_standby_priority == 0) + continue; + + /* Proceed to next one if priority conditions not satisfied */ + if (priority != 0 + priority = walsnd-sync_standby_priority) + continue; + + /* Proceed to next one if flush position is invalid */ + if (XLogRecPtrIsInvalid(walsnd-flush)) + continue; + + /* + * We have a potential synchronous candidate, choose it as the + * synchronous node for the time being before going through the + * other nodes listed in the WAL sender array. + */ + sync_node = i; + + /* Update priority to current value of WAL sender */ + priority = walsnd-sync_standby_priority; + } + + return sync_node; +} + /* * Update the LSNs on each queue based upon our latest state. This * implements a simple policy of first-valid-standby-releases-waiter. @@ -369,10 +435,9 @@ SyncRepReleaseWaiters(void) { volatile WalSndCtlData *walsndctl = WalSndCtl; volatile WalSnd *syncWalSnd = NULL; + int sync_node; int numwrite = 0; int numflush = 0; - int priority = 0; - int i; /* * If this WALSender is serving a standby that is not on the list of @@ -388,32 +453,14 @@ SyncRepReleaseWaiters(void) /* * We're a potential sync standby. Release waiters if we are the highest * priority standby. If there are multiple standbys with same priorities - * then we use the first mentioned standby. If you change this, also - * change pg_stat_get_wal_senders(). + * then we use the first mentioned standbys. */ LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); + sync_node = SyncRepGetSynchronousNode(NULL); - for (i = 0; i max_wal_senders; i++) - { - /* use volatile pointer to prevent code rearrangement */ - volatile WalSnd *walsnd = walsndctl-walsnds[i]; - - if (walsnd-pid != 0 - walsnd-state == WALSNDSTATE_STREAMING - walsnd-sync_standby_priority 0 - (priority == 0 || - priority walsnd-sync_standby_priority) - !XLogRecPtrIsInvalid(walsnd-flush)) - {
Re: [HACKERS] TAP test breakage on MacOS X
Andres Freund and...@2ndquadrant.com writes: On 2014-10-30 20:13:53 -0400, Tom Lane wrote: As I said upthread, that approach seems to me to be contrary to the project policy about how configure should behave. I don't think that holds much water. There's a fair amount of things that configure detects automatically. I don't think the comparison to plperl or such is meaningful - that's a runtime/install time difference. These tests are not. Meh. Right now, it's easy to dismiss these tests as unimportant, figuring that they play little part in whether the completed build is reliable. But that may not always be true. If they do become a significant part of our test arsenal, silently omitting them will not be cool for configure to do. We think that it's okay to autoconfigure things like which semaphore technology to use in part because we believe we can test the results. I think if someone asks for make check-world, it should be pretty deterministic what that means. Historical note: I was not originally very much on board with the strict enable-what-you-want policy for configure behavior, but I got religion after working at Red Hat for awhile. Nondeterministic package build behaviors *suck*. Here's one example: https://bugzilla.redhat.com/show_bug.cgi?id=427063 If you have selected (or, someday, defaulted to) --enable-tap-tests, configure should *fail* if you don't have the tools to run the tests. Not silently disable tests that we have decided are valuable. How exactly would that be different from silently omitting readline support if we don't find that library? Because it doesn't result in a user visible regression? Uncaught bugs become user-visible regressions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to handle missing prove
Peter Eisentraut pete...@gmx.net writes: On 10/28/14 10:01 PM, Peter Eisentraut wrote: On 10/28/14 9:16 PM, Tom Lane wrote: ISTM that the project policy for external components like this has been don't rely on them unless user says to use them, in which case fail if they aren't present. So perhaps what we ought to have is a configure switch along the lines of --enable-tap-tests. If you don't specify it, prove_check expands to nothing. If you do specify it, we fail if we lack any of the expected support, both prove and whatever the agreed-on set of Perl modules is. That's also a good idea. Here is a patch. Looks generally reasonable, but I thought you were planning to choose a different option name? One minor nitpick: perhaps the --help description of the option should read + --enable-tap-tests enable TAP tests (requires Perl and IPC::Run) because in practice it'll be much more likely that people will be missing IPC::Run than that they'll be missing Perl altogether. Also, shouldn't we have it fail rather than just skipping tests if IPC::Run is missing? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On 2014-10-30 21:24:04 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-10-30 21:03:43 -0400, Tom Lane wrote: Meh. Right now, it's easy to dismiss these tests as unimportant, figuring that they play little part in whether the completed build is reliable. But that may not always be true. If they do become a significant part of our test arsenal, silently omitting them will not be cool for configure to do. Well, I'm all for erroring out if somebody passed --enable-foo-tests and the prerequisites aren't there. What I *am* against is requiring an explicit flag to enable them because then they'll just not be run in enough environments. And that's what's much more likely to cause unnoticed bugs. Once they're at the point where they're actually likely to catch stuff of interest, I'll be all for enabling them by default. Great. We already are at that point due to the pg_basebackup tests. If we slightly extend it to also start up the newly made base backups we will have the first minimal automated test of recovery... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE IF NOT EXISTS INDEX
On Thu, Oct 30, 2014 at 12:11 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Oct 7, 2014 at 2:42 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Mon, Oct 6, 2014 at 11:13 AM, Marti Raudsepp ma...@juffo.org wrote: On Mon, Oct 6, 2014 at 4:27 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: create_index_if_not_exists_v7.patch Looks good to me. Marking ready for committer. Thanks. The patch looks good to me except the following minor comments. + termliteralIF NOT EXISTS/literal/term It's better to place this after the paragraph of CONCURRENTLY for the consistency with the syntax. Fixed. +Do not throw an error if the index already exists. I think that this should be Do not throw an error if a relation with the same name already exists. Fixed. +Index name is required when IF NOT EXISTS is specified. IF NOT EXISTS should be enclosed with literal tag. Fixed. @@ -60,7 +60,8 @@ extern Oid index_create(Relation heapRelation, bool allow_system_table_mods, bool skip_build, bool concurrent, - bool is_internal); + bool is_internal, + bool if_not_exists); You forgot to add the comment of if_not_exists argument into the top of index_create function. Fixed. +boolif_not_exists;/* just do nothing if index already exists */ You forgot to add the trailing ? at the above comment. There are similar comments in parsenodes.h, and they have such ?. Fixed. Thanks for your review! Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 43df32f..0414c26 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation refsynopsisdiv synopsis -CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/replaceable ] ON replaceable class=parametertable_name/replaceable [ USING replaceable class=parametermethod/replaceable ] +CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] replaceable class=parametername/replaceable ] ON replaceable class=parametertable_name/replaceable [ USING replaceable class=parametermethod/replaceable ] ( { replaceable class=parametercolumn_name/replaceable | ( replaceable class=parameterexpression/replaceable ) } [ COLLATE replaceable class=parametercollation/replaceable ] [ replaceable class=parameteropclass/replaceable ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] ) ] [ TABLESPACE replaceable class=parametertablespace_name/replaceable ] @@ -97,7 +97,6 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ refsect1 titleParameters/title -variablelist varlistentry termliteralUNIQUE/literal/term listitem @@ -126,6 +125,19 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ /listitem /varlistentry +variablelist + varlistentry + termliteralIF NOT EXISTS/literal/term + listitem + para +Do not throw an error if a relation with the same name already exists. +A notice is issued in this case. Note that there is no guarantee that +the existing index is anything like the one that would have been created. +Index name is required when literalIF NOT EXISTS/literal is specified. + /para + /listitem + /varlistentry + varlistentry termreplaceable class=parametername/replaceable/term listitem diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 01ed880..c886f74 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -674,6 +674,8 @@ UpdateIndexRelation(Oid indexoid, * will be marked invalid and the caller must take additional steps * to fix it up. * is_internal: if true, post creation hook for new index + * if_not_exists: if true, issue a notice instead an error if the index with + * the same name already exists * * Returns the OID of the created index. */ @@ -697,7 +699,8 @@ index_create(Relation heapRelation, bool allow_system_table_mods, bool skip_build, bool concurrent, - bool is_internal) + bool is_internal, + bool if_not_exists) { Oid heapRelationId = RelationGetRelid(heapRelation); Relation pg_class; @@ -773,10 +776,22 @@ index_create(Relation heapRelation, elog(ERROR, shared relations must be placed in pg_global tablespace); if (get_relname_relid(indexRelationName,
Re: [HACKERS] Add CREATE support to event triggers
On Thu, Oct 30, 2014 at 2:40 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Oct 28, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com wrote: Uhm. Obviously we didn't have jsonb when I started this and we do have them now, so I could perhaps see about updating the patch to do things this way; but I'm not totally sold on that idea, as my ObjTree stuff is a lot easier to manage and the jsonb API is pretty ugly. I looked at this as well, and I think trying to do so would not result in readable code. That doesn't speak very well of jsonb. :-( Just did the same and I played a bit with the APIs. And I am getting the impression that the jsonb API is currently focused on the fact of deparsing and parsing Jsonb strings to/from containers but there is no real interface that allows to easily manipulate the containers where the values are located. So, what I think is missing is really a friendly interface to manipulate JsonbContainers directly, and I think that we are not far from it with something like this set, roughly: - Initialization of an empty container - Set of APIs to directly push a value to a container (boolean, array, null, string, numeric or other jsonb object) - Initialization of JsonbValue objects With this basic set of APIs patch 4 could for example use JsonbToCString to then convert the JSONB bucket back to a string it sends to client. Note as well that there is already findJsonbValueFromContainer present to get back a value in a container. In short, my point is: instead of re-creating the wheel like what this series of patch is trying to do with ObjTree, I think that it would be more fruitful to have a more solid in-place JSONB infrastructure that allows to directly manipulate JSONB objects. This feature as well as future extensions could benefit from that. Feel free to comment. Regards, -- Michael
Re: [HACKERS] TAP test breakage on MacOS X
On 10/30/2014 09:37 PM, Andres Freund wrote: On 2014-10-30 21:24:04 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-10-30 21:03:43 -0400, Tom Lane wrote: Meh. Right now, it's easy to dismiss these tests as unimportant, figuring that they play little part in whether the completed build is reliable. But that may not always be true. If they do become a significant part of our test arsenal, silently omitting them will not be cool for configure to do. Well, I'm all for erroring out if somebody passed --enable-foo-tests and the prerequisites aren't there. What I *am* against is requiring an explicit flag to enable them because then they'll just not be run in enough environments. And that's what's much more likely to cause unnoticed bugs. Once they're at the point where they're actually likely to catch stuff of interest, I'll be all for enabling them by default. Great. We already are at that point due to the pg_basebackup tests. If we slightly extend it to also start up the newly made base backups we will have the first minimal automated test of recovery... There are other issues. I am not going to enable this in the buildfarm until the check test can work from a single install. It's insane for the bin tests to take an order of magnitude longer than the main regression suite. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot too large error when initializing logical replication (9.4)
On 10/28/2014 01:27 PM, Andres Freund wrote: Hi, On 2014-10-25 18:09:36 -0400, Steve Singer wrote: I sometimes get the error snapshot too large from my logical replication walsender process when in response to a CREATE_REPLICATION_SLOT. Yes. That's possible if 'too much' was going on until a consistent point was reached. I think we can just use a much larger size for the array if necessary. I've attached patch for this. Could you try whether that helps? I don't have a testcase handy that reproduces the problem. This patch seems to fix things. I've done numerous runs of the test with I was doing before with your patch applied and don't seem to be having this issue anymore. This is in SnapBuildExportSnapshot in snapbuild.c newxcnt is 212 at that point I have max_connections = 200 procArray-maxProcs=212 Should we be testing newxcnt GetMaxSnapshotXidCount() instead of newxcnt = GetMaxSnapshotXidCount() It actually looks correct to me new - newxcnt is used as an offset into an array of size GetMaxSnapshotXidCount(). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
Andrew Dunstan and...@dunslane.net writes: There are other issues. I am not going to enable this in the buildfarm until the check test can work from a single install. It's insane for the bin tests to take an order of magnitude longer than the main regression suite. I think the installs as such aren't the only reason for the sucky performance. We need to also reduce the number of initdb cycles incurred by the TAP tests. It's useless for example that the pg_controldata test creates its very own $PGDATA rather than sharing one with other tests. This line of thought implies that the tests will become less independent of each other, which will probably result in them being a bit harder to maintain. Still, we are paying an awful lot of cycles for not much, as things stand at the moment. A couple other random ideas for shaving cycles: * Use initdb -N (no fsync) where we do need to initdb. * We probably don't need a full install tree for these types of tests; it's tempting for instance to omit installing the include/ tree. That wouldn't save a large number of megabytes but it is a sizable number of files, so it might cut the install/rm -rf time noticeably. * In the same line, suppressing install of the timezone database file tree would possibly save a useful number of cycles. We do need to have that data for functionality, but buildfarm owners could be encouraged to use --with-system-tzdata to shave install cycles. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CINE in CREATE TABLE AS ... and CREATE MATERIALIZED VIEW ...
On Mon, Oct 27, 2014 at 4:15 AM, Rushabh Lathia rushabh.lat...@gmail.com wrote: Hi All, - Patch got applied cleanly. - Regression make check run fine. - Patch covered the documentation changes Here are few comments: 1) What the need of following change: diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index bcec173..9fe6855 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -1005,12 +1005,6 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) lock-tail = proc; lock-head = proc; -/* - * Set releaseOK, to make sure we get woken up as soon as the lock is - * released. - */ -lock-releaseOK = true; - /* Can release the mutex now */ SpinLockRelease(lock-mutex); It doesn't look like related to this patch. Sorry... my mistake when diff to master (more updated than my branch). Fixed. 2) diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index ae5fe88..4d11952 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -247,7 +247,7 @@ slashUsage(unsigned short int pager) fprintf(output, _( \\f [STRING]show or set field separator for unaligned query output\n)); fprintf(output, _( \\H toggle HTML output mode (currently %s)\n), ON(pset.popt.topt.format == PRINT_HTML)); -fprintf(output, _( \\pset [NAME [VALUE]] set table output option\n +fprintf(output, _( \\pset [NAME [VALUE]] set table output option\n (NAME := {format|border|expanded|fieldsep|fieldsep_zero|footer|null|\n numericlocale|recordsep|recordsep_zero|tuples_only|title|tableattr|pager|\n unicode_border_linestyle|unicode_column_linestyle|unicode_header_linestyle})\n)); Why above changes reqired ? Same previous mistake. Fixed. 3) This patch adding IF NOT EXIST_S for CREATE TABLE AS and CREATE MATERIALIZED TABLE, but testcase coverage for CREATE MATERIALIZED TABLE is missing. Apart from this changes looks good to me. Fixed. Thanks for your review. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml index 2c73852..8a0fb4d 100644 --- a/doc/src/sgml/ref/create_materialized_view.sgml +++ b/doc/src/sgml/ref/create_materialized_view.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation refsynopsisdiv synopsis -CREATE MATERIALIZED VIEW replaceabletable_name/replaceable +CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] replaceabletable_name/replaceable [ (replaceablecolumn_name/replaceable [, ...] ) ] [ WITH ( replaceable class=PARAMETERstorage_parameter/replaceable [= replaceable class=PARAMETERvalue/replaceable] [, ... ] ) ] [ TABLESPACE replaceable class=PARAMETERtablespace_name/replaceable ] @@ -53,6 +53,18 @@ CREATE MATERIALIZED VIEW replaceabletable_name/replaceable refsect1 titleParameters/title + varlistentry +termliteralIF NOT EXISTS//term +listitem + para + Do not throw an error if a materialized view with the same name already + exists. A notice is issued in this case. Note that there is no guarantee + that the existing materialized view is anything like the one that would + have been created. + /para +/listitem + /varlistentry + variablelist varlistentry termreplaceabletable_name/replaceable/term diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml index 60300ff..8e4ada7 100644 --- a/doc/src/sgml/ref/create_table_as.sgml +++ b/doc/src/sgml/ref/create_table_as.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation refsynopsisdiv synopsis -CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE replaceabletable_name/replaceable +CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] replaceabletable_name/replaceable [ (replaceablecolumn_name/replaceable [, ...] ) ] [ WITH ( replaceable class=PARAMETERstorage_parameter/replaceable [= replaceable class=PARAMETERvalue/replaceable] [, ... ] ) | WITH OIDS | WITHOUT OIDS ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ] @@ -91,6 +91,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE replaceable /varlistentry varlistentry +termliteralIF NOT EXISTS//term +listitem + para + Do not throw an error if a relation with the same name already exists. + A notice is issued in this case. Refer to xref linkend=sql-createtable + for details. + /para +
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On 10/29/14 10:48 AM, Robert Haas wrote: On Tue, Oct 28, 2014 at 8:29 PM, Peter Eisentraut pete...@gmx.net wrote: On 10/20/14 2:59 PM, Tom Lane wrote: My Salesforce colleague Thomas Fanghaenel observed that the TAP tests for pg_basebackup fail when run in a sufficiently deeply-nested directory tree. As for the test, we can do something like the attached to mark the test as TODO. What does this actually do? It doesn't appear that it's just disabling the test. It still runs the tests, but doesn't count the results in whether the suite passes. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
Peter Eisentraut pete...@gmx.net writes: On 10/7/14 1:57 PM, Tom Lane wrote: Peter had a patch to eliminate the overhead of multiple subinstalls; not sure where that stands, but presumably it would address your issue. It will need some cleverness to sort out the parallel make issues that were brought up in the review thread. I took a quick look. I concur with Fabien that the dependency on MAKELEVEL seems pretty horrid: in particular, will that not break the ability to initiate make check from somewhere below the top directory? I wonder whether it could be solved by having code in the toplevel Makefile that actually makes the test install tree, and not as a .PHONY target but along the lines of tmp-install-stamp: rm -rf tmp_install # in case we failed during a previous attempt $(MKDIR_P) tmp_install/log $(MAKE) ... etc etc ... touch tmp-install-stamp and then also at top level, put tmp-install-stamp as a prerequisite for check-world, and then in every subdirectory that has a check target, add something like $(abs_top_builddir)/tmp-install-stamp: $(MAKE) -C $(abs_top_builddir) tmp-install-stamp check: $(abs_top_builddir)/tmp-install-stamp The way this solves the parallel make problem is that no matter where you invoke make check, the first thing it would have to do is create the tmp_install directory if it's not done already, before it can launch any parallel operations. Or at least I hope it'd work like that; I've not actually tried it. Possibly some of these rules could be kept in Makefile.global so as to avoid having to touch so many individual Makefiles. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] infinite loop in _bt_getstackbuf
On Thu, Oct 30, 2014 at 03:52:01PM -0400, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Robert Haas wrote: A colleague at EnterpriseDB today ran into a situation on PostgreSQL 9.3.5 where the server went into an infinite loop while attempting a VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't be killed with ^C. I think we should add a check for interrupts into that loop somewhere; Our design principle in this area is that all loops should have CHECK_FOR_INTERRUPTS() calls somewhere, so that even if data is horribly corrupted you can get out of it. FWIW, I concur with Alvaro that adding a CHECK_FOR_INTERRUPTS() needn't require much discussion. +1 Given the lack of prior complaints about this loop, I'm not sure I see the need to work harder than that; corruption of this sort must be quite rare. Looks like _bt_getstackbuf() is always called with some buffer lock held, so CHECK_FOR_INTERRUPTS() alone would not help: http://www.postgresql.org/message-id/flat/16519.1401395...@sss.pgh.pa.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] infinite loop in _bt_getstackbuf
Noah Misch n...@leadboat.com writes: Looks like _bt_getstackbuf() is always called with some buffer lock held, so CHECK_FOR_INTERRUPTS() alone would not help: http://www.postgresql.org/message-id/flat/16519.1401395...@sss.pgh.pa.us Oooh, good point. I never followed up on that idea, but we would have to in order to fix the case Robert is on about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
I wrote: I took a quick look. I concur with Fabien that the dependency on MAKELEVEL seems pretty horrid: in particular, will that not break the ability to initiate make check from somewhere below the top directory? Another use-case that seems to be broken both by Peter's patch and my proposed variant is configure ... make ... cd src/bin/something hack hack hack make make check ooops ... hack some more make make check Neither proposed approach would result in reinstalling the updated binary you just built into the common temp install tree, so the second make check step would just retest the old binary. The simplest solution that comes to mind is to define a temp-install target that is the same as make install except it installs the stuff built in the current directory into the temp install tree instead of the configured installation target tree. You'd have to remember to do make temp-install before a make check; but it would work, and it's not so different from make install then make installcheck, which is what you usually do for this use-case if you've got the tree configured to install into a throwaway installation location. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] group locking: incomplete patch, just for discussion
On Thu, Oct 30, 2014 at 2:28 PM, Simon Riggs si...@2ndquadrant.com wrote: On 30 October 2014 04:24, Amit Kapila amit.kapil...@gmail.com wrote: Locking the toast table of any main tables we access seems easily done. Though perhaps we should make weak locking of the toast table presumed. Do we have cases where the toast table can be accessed when the main table is not also strong locked first? I think it is possible to have a strong lock on toast table before main table. Cluster pg_toast.toast_table_name using toast_index; Vacuum Full pg_toast.toast_table_name; Reindex Index pg_toast.toast_index_name; .. Now if take the lock on toast table in main task, it will block some of the operations before actually they need to be blocked. Very strange. Those commands are not common operations? I doubt many people even know that exists. All of those commands would block SELECTs on the main table, so there is no significant benefit in that behaviour. In fact it would be more sensible to lock the toast table earlier. It might make some sense to lock the toast table earlier for this particular case, but I don't think in general it will be feasible to lock all the tables (including catalog tables) which might get used in the overall operation before starting parallel operation. I believe it will make doing parallel stuff difficult if we try to go via this route, as we need to always do this for any operation we want to make parallel. It will always have the risk that we might miss something and another thing is it might not be feasible in all kind of cases. If I understand correctly, you are suggesting that to get the first version of parallel operation out earlier, we should try to avoid all the stuff (like group locking, ...), have some restrictions on which kind of cases Create Index can work, do some hackery in Create Index path to avoid any kind of locking problems and do the parallel operation for External Sort (which might avoid need for shared memory allocator) and then call it a day and then do the things which we need for other parallel operations as and when they are required. I think this might be a good approach in itself if somebody wants to target something to release early, however in medium to short term when we want to provide non-trivial parallel operations we have to eventually solve those problems and delaying will only make the things worse. In short, I agree that there is a merit in your idea w.r.t getting the first version of parallel operation out early, however if we see from medium to long term, I feel solving group locking problem (or some other general infrastructure what Robert mentioned upthread) can yield better results, unless you feel that is not at all required for parallel operations. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TAP test breakage on MacOS X
On Thu, Oct 30, 2014 at 10:49:33PM -0400, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: There are other issues. I am not going to enable this in the buildfarm until the check test can work from a single install. It's insane for the bin tests to take an order of magnitude longer than the main regression suite. I think the installs as such aren't the only reason for the sucky performance. We need to also reduce the number of initdb cycles incurred by the TAP tests. It's useless for example that the pg_controldata test creates its very own $PGDATA rather than sharing one with other tests. This line of thought implies that the tests will become less independent of each other, which will probably result in them being a bit harder to maintain. Still, we are paying an awful lot of cycles for not much, as things stand at the moment. One could memoize initdb within the suite. Call it once per distinct command line, caching the resulting data directory. Copy the cached data directory for each test desiring one. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On Fri, Oct 31, 2014 at 2:52 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 10/30/2014 06:02 PM, Andres Freund wrote: On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote: On 10/06/2014 02:29 PM, Andres Freund wrote: I've not yet really looked, but on a quick readthrough XLogInsertRecData() staying in xlog.c doesn't make me happy... Ok.. Can you elaborate? To me the split between xloginsert.c doing some of the record assembly, and xlog.c doing the lower level part of the assembly is just wrong. Really? To me, that feels like a natural split. xloginsert.c is responsible for constructing the final WAL record, with the backup blocks and all. It doesn't access any shared memory (related to WAL; it does look at Buffers, to determine what to back up). The function in xlog.c inserts the assembled record to the WAL, as is. It handles the locking and WAL buffer management related to that. FWIW, I tend to the same opinion here. What would you suggest? I don't think putting everything in one XLogInsert function, like we have today, is better. Note that the second patch makes xloginsert.c a lot more complicated. I recall some time ago seeing complaints that xlog.c is too complicated and should be refactored. Any effort in this area is a good thing IMO, and this split made sense when I went through the code. I'm not a big fan of the naming for the new split. We have * XLogInsert() which is the external interface * XLogRecordAssemble() which builds the rdata chain that will be inserted * XLogInsertRecData() which actually inserts the data into the xl buffers. To me that's pretty inconsistent. Got a better suggestion? I wanted to keep the name XLogInsert() unchanged, to avoid code churn, and because it's still a good name for that. XLogRecordAssemble is pretty descriptive for what it does, although XLogRecord implies that it construct an XLogRecord struct. It does fill in that too, but the main purpose is to build the XLogRecData chain. Perhaps XLogAssembleRecord() would be better. I'm not very happy with XLogInsertRecData myself. XLogInsertRecord? +1 for XLogInsertRecord. -- Michael
Re: [HACKERS] tracking commit timestamps
On Tue, Oct 28, 2014 at 9:25 PM, Simon Riggs si...@2ndquadrant.com wrote: On 13 October 2014 10:05, Petr Jelinek p...@2ndquadrant.com wrote: I worked bit on this patch to make it closer to committable state. Here is updated version that works with current HEAD for the October committfest. I've reviewed this and it looks good to me. Clean, follows existing code neatly, documented and tested. Please could you document a few things * ExtendCommitTS() works only because commit_ts_enabled can only be set at server start. We need that documented so somebody doesn't make it more easily enabled and break something. (Could we make it enabled at next checkpoint or similar?) * The SLRU tracks timestamps of both xids and subxids. We need to document that it does this because Subtrans SLRU is not persistent. If we made Subtrans persistent we might need to store only the top level xid's commitTS, but that's very useful for typical use cases and wouldn't save much time at commit. Hm. What is the performance impact of this feature using the latest version of this patch? I imagine that the penalty of the additional operations this feature introduces is not zero, particularly for loads with lots of short transactions. -- Michael