Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sun, Sep 29, 2013 at 10:25 AM, Sameer Thakur samthaku...@gmail.com wrote: Yes i was. Just saw a warning when pg_stat_statements is loaded that valid values for pg_stat_statements.max is between 100 and 2147483647. Not sure why though. I remember hacking that out for testing sake. I can only justify it as a foot-gun to prevent someone from being stuck restarting the database to get a reasonable number in there. Let's CC Peter; maybe he can remember some thoughts about that. Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugfix and new feature for PGXS
Le lundi 30 septembre 2013 00:10:09 Andrew Dunstan a écrit : On 09/29/2013 07:09 PM, Andrew Dunstan wrote: On 09/03/2013 04:04 AM, Cédric Villemain wrote: Simple one, attached. I didn't document USE_VPATH, not sure how to explain that clearly. Just a remember that the doc is written and is waiting to be commited. There is also an issue spoted by Christoph with the installdirs prerequisite, the attached patch fix that. I applied this one line version of the patch, which seemed more elegant than the later one supplied. I backpatched that and the rest of the VPATH changes for extensiuons to 9.1 where we first provided for extensions, so people can have a reasonably uniform build system for their extensions. I have reverted this in the 9.2 and 9.1 branches until I can work out why it's breaking the buildfarm. 9.3 seems to be fine. Yes, please take the longer but better patch following the small one. There are eveidences that it fixes compilation to always build installdirs first. Previously installdirs may be build after copying files, so it fails. Chritstoph authored this good patch. I'm sorry not to have been clear on that. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] appendStringInfo vs appendStringInfoString
On Sat, Sep 28, 2013 at 9:44 PM, David Rowley dgrowle...@gmail.com wrote: I did some benchmarking earlier in the week for the new patch which was just commited to allow formatting in the log_line_prefix string. In version 0.4 of the patch there was some performance regression as I was doing appendStringInfo(buf, %*s, padding, variable); instead of appendStringInfoString(buf, variable); This regression was fixed in a later version of the patch by only using appendStringInfo when the padding was 0. More details here: http://www.postgresql.org/message-id/CAApHDvreSGYvtXJvqHcXZL8_tXiKKiFXhQyXgqtnQ5Yo=me...@mail.gmail.com Today I started looking through the entire source tree to look for places where appendStringInfo() could be replaced by appendStringInfoString(), I now have a patch which is around 100 KB in size which changes a large number of these instances. Example: diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 3107f9c..d0dea4f 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -6174,7 +6174,7 @@ flatten_set_variable_args(const char *name, List *args) A_Const*con; if (l != list_head(args)) - appendStringInfo(buf, , ); + appendStringInfoString(buf, , ); I know lots of these places are not really in performance critical parts of the code, so it's likely not too big a performance gain in most places, though to be honest I didn't pay a great deal of attention to how hot the code path might be when I was editing. I thought I would post here just to test the water on this type of change. I personally don't think it makes the code any less readable, but if performance is not going to be greatly improved then perhaps people would have objections to code churn. I didn't run any benchmarks on the core code, but I did pull out all the stringinfo functions and write my own test application. I also did a trial on a new macro which could further improve performance when the string being appended in a string constant, although I just wrote this to gather opinions about the idea. It is not currently a part of the patch. In my benchmarks I was just appending a 8 char string constant 1000 times onto a stringinfo, I did this 3000 times in my tests. The results were as follows: Results: 1. appendStringInfoString in 0.404000 sec 2. appendStringInfo with %s in 1.118000 sec 3 .appendStringInfo in 1.30 sec 4. appendStringInfoStringConst with in 0.221000 sec You can see that appendStringInfoString() is far faster than appendStringInfo() this will be down to appendStringInfo using va_args whereas appendStringInfoString() just does a strlen() then memcpy(). Test 4 bypasses the strlen() by using sizeof() which of course can only be used with string constants. Actual code: 1. appendStringInfoString(str, test1234); 2. appendStringInfo(str, %s, test1234); 3. appendStringInfo(str, test1234); 4. appendStringInfoStringConst(str, test1234); The macro for test 4 was as follows: #define appendStringInfoStringConst(buf, s) appendBinaryStringInfo(buf, (s), sizeof(s)-1) I should say at this point that I ran these benchmarks on windows 7 64bit, though my tests for the log_line_prefix patch were all on Linux and saw a similar slowdown on appendStringInfo() vs appendStringInfoString(). So let this be the thread to gather opinions on if my 100kb patch which replaces appendStringInfo with appendStringInfoString where possible would be welcomed by the community. Also would using appendStringInfoStringConst() be going 1 step too far? Regards I've attached a the cleanup patch for this. This one just converts instances of appendStringInfo into appendStringInfoString where appendStringInfo does no formatting or just has the format %s. David Rowley appendstringinfo_cleanup_v0.1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql and pset without any arguments
Le 30/09/2013 05:43, Alvaro Herrera a écrit : Gilles Darold escribió: +else if (strcmp(param, numericlocale) == 0) +{ +if (popt-topt.numericLocale) +puts(_(Locale-adjusted numeric output (numericlocale) is on.)); +else +puts(_(Locale-adjusted numeric output (numericlocale) is off.)); +} Please don't make the variable name part of the translatable message. I suggest using the following pattern: +else if (strcmp(param, numericlocale) == 0) +{ +if (popt-topt.numericLocale) +printf(_(Locale-adjusted numeric output (%s) is on.), numericlocale); +else +printf(_(Locale-adjusted numeric output (%s) is off.), numericlocale); +} Otherwise it will be too easy for the translator to make the mistake that the variable name needs translation too. That's right, here is the patch modified with just a little change with your suggestion: if (popt-topt.numericLocale) printf(_(Locale-adjusted numeric output (%s) is on.\n), param); else printf(_(Locale-adjusted numeric output (%s) is off.\n), param); Thanks -- Gilles Darold Administrateur de bases de données http://dalibo.com - http://dalibo.org diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 574db5c..ddf7bba 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2272,13 +2272,10 @@ lo_import 152801 /para /tip -note para -It is an error to call command\pset/command without any -arguments. In the future this case might show the current status -of all printing options. + command\pset/command without any arguments displays current status + of all printing options. /para -/note /listitem /varlistentry diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 10e9f64..abaee9b 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -68,6 +68,7 @@ static int strip_lineno_from_funcdesc(char *func); static void minimal_error_message(PGresult *res); static void printSSLInfo(void); +static bool printPsetInfo(const char *param, struct printQueryOpt *popt); #ifdef WIN32 static void checkWin32Codepage(void); @@ -1045,8 +1046,20 @@ exec_command(const char *cmd, if (!opt0) { - psql_error(\\%s: missing required argument\n, cmd); - success = false; + size_t i; + /* list all variables */ + static const char *const my_list[] = { +border, columns, expanded, fieldsep, +footer, format, linestyle, null, +numericlocale, pager, recordsep, +tableattr, title, tuples_only, +NULL }; + for (i = 0; my_list[i] != NULL; i++) { + printPsetInfo(my_list[i], pset.popt); + } + + success = true; + } else success = do_pset(opt0, opt1, pset.popt, pset.quiet); @@ -2275,8 +2288,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) return false; } - if (!quiet) - printf(_(Output format is %s.\n), _align2string(popt-topt.format)); } /* set table line style */ @@ -2296,9 +2307,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) return false; } - if (!quiet) - printf(_(Line style is %s.\n), - get_line_style(popt-topt)-name); } /* set border style/width */ @@ -2307,8 +2315,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) if (value) popt-topt.border = atoi(value); - if (!quiet) - printf(_(Border style is %d.\n), popt-topt.border); } /* set expanded/vertical mode */ @@ -2320,15 +2326,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.expanded = ParseVariableBool(value); else popt-topt.expanded = !popt-topt.expanded; - if (!quiet) - { - if (popt-topt.expanded == 1) -printf(_(Expanded display is on.\n)); - else if (popt-topt.expanded == 2) -printf(_(Expanded display is used automatically.\n)); - else -printf(_(Expanded display is off.\n)); - } } /* locale-aware numeric output */ @@ -2338,13 +2335,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.numericLocale = ParseVariableBool(value); else popt-topt.numericLocale = !popt-topt.numericLocale; - if (!quiet) - { - if (popt-topt.numericLocale) -puts(_(Showing locale-adjusted numeric output.)); - else -puts(_(Locale-adjusted numeric output is off.)); - } } /* null display */ @@ -2355,8 +2345,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) free(popt-nullPrint); popt-nullPrint = pg_strdup(value); } - if (!quiet) - printf(_(Null display is \%s\.\n), popt-nullPrint ?
Re: [HACKERS] Compression of full-page-writes
(2013/09/30 13:55), Amit Kapila wrote: On Mon, Sep 30, 2013 at 10:04 AM, Fujii Masao masao.fu...@gmail.com wrote: Yep, please! It's really helpful! OK! I test with single instance and synchronous replication constitution. By the way, you posted patch which is sync_file_range() WAL writing method in 3 years ago. I think it is also good for performance. As the reason, I read sync_file_range() and fdatasync() in latest linux kernel code(3.9.11), fdatasync() writes in dirty buffers of the whole file, on the other hand, sync_file_range() writes in partial dirty buffers. In more detail, these functions use the same function in kernel source code, fdatasync() is vfs_fsync_range(file, 0, LLONG_MAX, 1), and sync_file_range() is vfs_fsync_range(file, offset, amount, 1). It is obvious that which is more efficiently in WAL writing. You had better confirm it in linux kernel's git. I think your conviction will be more deeply. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/fs/sync.c?id=refs/tags/v3.11.2 I think it will be useful if you can get the data for 1 and 2 threads (may be with pgbench itself) as well, because the WAL reduction is almost sure, but the only thing is that it should not dip tps in some of the scenarios. That's right. I also want to know about this patch in MD environment, because MD has strong point in sequential write which like WAL writing. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: lob conversion functionality
On 12.08.2013 21:08, Pavel Stehule wrote: 2013/8/10 Tom Lanet...@sss.pgh.pa.us: Pavel Stehulepavel.steh...@gmail.com writes: I found so there are no simple API for working with LO from PL without access to file system. What? See lo_open(), loread(), lowrite(), etc. yes, so there are three problems with these functions: a) probably (I didn't find) undocumented It's there, although it's a bit difficult to find by searching. See: http://www.postgresql.org/docs/devel/static/lo-funcs.html. I don't actually agree with this phrase on that page: The ones that are actually useful to call via SQL commands are lo_creat, lo_create, lo_unlink, lo_import, and lo_export Calling lo_open, loread and lowrite seems equally useful to me. b) design with lo handler is little bit PL/pgSQL unfriendly. It's a bit awkward, I agree. c) probably there is a bug - it doesn't expect handling errors postgres=# select fbuilder.attachment_to_xml(0); WARNING: Snapshot reference leak: Snapshot 0x978f6f0 still referenced attachment_to_xml ─── [null] (1 row) Yeah, that's a server-side bug. inv_open() registers the snapshot before checking if the large object exists. If it doesn't, the already-registered snapshot is not unregistered, hence the warning. I've committed the attached fix for that bug. - Heikki From 357f7521384df34c697b3544115622520a6a0e9f Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Mon, 30 Sep 2013 11:29:09 +0300 Subject: [PATCH 1/1] Fix snapshot leak if lo_open called on non-existent object. lo_open registers the currently active snapshot, and checks if the large object exists after that. Normally, snapshots registered by lo_open are unregistered at end of transaction when the lo descriptor is closed, but if we error out before the lo descriptor is added to the list of open descriptors, it is leaked. Fix by moving the snapshot registration to after checking if the large object exists. Reported by Pavel Stehule. Backpatch to 8.4. The snapshot registration system was introduced in 8.4, so prior versions are not affected (and not supported, anyway). --- src/backend/storage/large_object/inv_api.c | 44 ++ 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c index fb91571..d248743 100644 --- a/src/backend/storage/large_object/inv_api.c +++ b/src/backend/storage/large_object/inv_api.c @@ -240,29 +240,18 @@ LargeObjectDesc * inv_open(Oid lobjId, int flags, MemoryContext mcxt) { LargeObjectDesc *retval; - - retval = (LargeObjectDesc *) MemoryContextAlloc(mcxt, - sizeof(LargeObjectDesc)); - - retval-id = lobjId; - retval-subid = GetCurrentSubTransactionId(); - retval-offset = 0; + Snapshot snapshot = NULL; + int descflags = 0; if (flags INV_WRITE) { - retval-snapshot = NULL; /* instantaneous MVCC snapshot */ - retval-flags = IFS_WRLOCK | IFS_RDLOCK; + snapshot = NULL; /* instantaneous MVCC snapshot */ + descflags = IFS_WRLOCK | IFS_RDLOCK; } else if (flags INV_READ) { - /* - * We must register the snapshot in TopTransaction's resowner, because - * it must stay alive until the LO is closed rather than until the - * current portal shuts down. - */ - retval-snapshot = RegisterSnapshotOnOwner(GetActiveSnapshot(), -TopTransactionResourceOwner); - retval-flags = IFS_RDLOCK; + snapshot = GetActiveSnapshot(); + descflags = IFS_RDLOCK; } else ereport(ERROR, @@ -271,11 +260,30 @@ inv_open(Oid lobjId, int flags, MemoryContext mcxt) flags))); /* Can't use LargeObjectExists here because we need to specify snapshot */ - if (!myLargeObjectExists(lobjId, retval-snapshot)) + if (!myLargeObjectExists(lobjId, snapshot)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(large object %u does not exist, lobjId))); + /* + * We must register the snapshot in TopTransaction's resowner, because + * it must stay alive until the LO is closed rather than until the + * current portal shuts down. Do this after checking that the LO exists, + * to avoid leaking the snapshot if an error is thrown. + */ + if (snapshot) + snapshot = RegisterSnapshotOnOwner(snapshot, + TopTransactionResourceOwner); + + /* All set, create a descriptor */ + retval = (LargeObjectDesc *) MemoryContextAlloc(mcxt, + sizeof(LargeObjectDesc)); + retval-id = lobjId; + retval-subid = GetCurrentSubTransactionId(); + retval-offset = 0; + retval-snapshot = snapshot; + retval-flags = descflags; + return retval; } -- 1.8.4.rc3 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: lob conversion functionality
2013/9/30 Heikki Linnakangas hlinnakan...@vmware.com On 12.08.2013 21:08, Pavel Stehule wrote: 2013/8/10 Tom Lanet...@sss.pgh.pa.us: Pavel Stehulepavel.stehule@gmail.**com pavel.steh...@gmail.com writes: I found so there are no simple API for working with LO from PL without access to file system. What? See lo_open(), loread(), lowrite(), etc. yes, so there are three problems with these functions: a) probably (I didn't find) undocumented It's there, although it's a bit difficult to find by searching. See: http://www.postgresql.org/**docs/devel/static/lo-funcs.**htmlhttp://www.postgresql.org/docs/devel/static/lo-funcs.html . I don't actually agree with this phrase on that page: The ones that are actually useful to call via SQL commands are lo_creat, lo_create, lo_unlink, lo_import, and lo_export Calling lo_open, loread and lowrite seems equally useful to me. b) design with lo handler is little bit PL/pgSQL unfriendly. It's a bit awkward, I agree. c) probably there is a bug - it doesn't expect handling errors postgres=# select fbuilder.attachment_to_xml(0); WARNING: Snapshot reference leak: Snapshot 0x978f6f0 still referenced attachment_to_xml ─── [null] (1 row) Yeah, that's a server-side bug. inv_open() registers the snapshot before checking if the large object exists. If it doesn't, the already-registered snapshot is not unregistered, hence the warning. I've committed the attached fix for that bug. nice, I afraid so it is mine bug thank you Pavel - Heikki
Re: [HACKERS] review: psql and pset without any arguments
Hi 2013/9/30 Gilles Darold gilles.dar...@dalibo.com: (...) That's right, here is the patch modified with just a little change with your suggestion: if (popt-topt.numericLocale) printf(_(Locale-adjusted numeric output (%s) is on.\n), param); else printf(_(Locale-adjusted numeric output (%s) is off.\n), param); I'm a bit late to this thread, but I'd just like to say I find this a useful feature which I've missed on the odd occasion. BTW there is a slight typo in the patch, s should be is: Output format (format) s aligned. + printf(_(Output format (%s) s %s.\n), param, + _align2string(popt-topt.format)); Regards Ian Barwick -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 27.09.2013 21:43, Greg Stark wrote: On Fri, Sep 27, 2013 at 7:22 PM, Jim Nasbyj...@nasby.net wrote: Yeah, we obviously kept things simpler when adding forks in order to get the feature out the door. There's improvements that need to be made. But IMHO that's not reason to automatically avoid forks; we need to consider the cost of improving them vs what we gain by using them. We think this gives short change to the decision to introduce forks. If you go back to the discussion at the time it was a topic of debate and the argument which won the day is that interleaving different streams of data in one storage system is exactly what the file system is designed to do and we would just be reinventing the wheel if we tried to do it ourselves. I think that makes a lot of sense for things like the fsm or vm which grow indefinitely and are maintained by a different piece of code from the main heap. The tradeoff might be somewhat different for the pieces of a data structure like a bitmap index or gin index where the code responsible for maintaining it is all the same. There are quite a dfew cases where we have several streams of data, all related to a single relation. We've solved them all in slightly different ways: 1. TOAST. A separate heap relation with accompanying b-tree index is created. 2. GIN. GIN contains a b-tree, and data pages (and somer other kinds of pages too IIRC). It would be natural to use the regular B-tree code for the B-tree, but instead it contains a completely separate implementation. All the different kinds of streams are stored in the main fork. 3. Free space map. Stored as a separate fork. 4. Visibility map. Stored as a separate fork. And upcoming: 5. Minmax indexes, with the linearly-addressed range reverse map and variable lenghth index tuples. 6. Bitmap indexes. Like in GIN, there's a B-tree and the data pages containing the bitmaps. A nice property of the VM and FSM forks currently is that they are just auxiliary information to speed things up. You can safely remove them (when the server is shut down), and the system will recreate them on next vacuum. It's not carved in stone that it has to be that way for all extra forks, but it is today and I like it. I feel we need a new kind of a relation fork, something more heavy-weight than the current forks, but not as heavy-weight as the way TOAST does it. It would be nice if GIN and bitmap indexes could use the regular nbtree code. Or any other index type - imagine a bitmap index using a SP-GiST index instead of a B-tree! You could create a bitmap index for 2d points, and use it to speed up operations like overlap for example. The nbtree code expects the data to be in the main fork and uses the FSM fork too. Maybe it could be abstracted, so that the regular b-tree could be used as part of another index type. Same with other indexams. Perhaps relation forks need to be made more flexible, allowing access methods to define what forks exists. IOW, let's not avoid using relation forks, let's make them better instead. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
What would it take to abstract the minmax indexes to allow maintaing a bounding box for points, instead of a plain min/max? Or for ranges. In other words, why is this restricted to b-tree operators? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. I am not sure why introduced keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it reflected the (most recent) time query statements statistics is added to hashtable. Is this a bug? Will continue to test and try and understand the code. regards Sameer -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5772841.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] [PATCH] pg_sleep(interval)
On 09/22/2013 02:17 PM, Fabien COELHO wrote: Hello, There is no pg_sleep(text) function and the cast is unknown-double precision. My mistake. As I understand it, pg_sleep('12') currently works and would not anymore once your patch is applied. That is the concern raised by Robert Haas. That is correct. ISTM that providing pg_sleep(TEXT) cleanly resolves the upward-compatibility issue raised. I don't like this idea at all. If we don't want to break compatibility for callers that quote the value, I would rather the patch be rejected outright. That was just a suggestion, and I was trying to help. ISTM that if Robert's concern is not addressed one way or another, you will just get rejected on this basis. Yes, I understand you are trying to help, and I appreciate it! My opinion, and that of others as well from the original thread, is that this patch should either go in as is and break that one case, or not go in at all. I'm fine with either (although clearly I would prefer it went in otherwise I wouldn't have written the patch). -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
Steve Singer st...@ssinger.info wrote: How about To support matching of rows which include elements without a default B-tree operator class, the following operators are defined for composite type comparison: literal*=/, literal*lt;gt;/, literal*lt;/, literal*lt;=/, literal*gt;/, and literal*gt;=/. These operators compare the internal binary representation of the two rows. Two rows might have a different binary representation even though comparisons of the two rows with the equality operator is true. The ordering of rows under these comparision operators is deterministic but not otherwise meaningful. These operators are used internally for materialized views and might be useful for other specialized purposes such as replication but are not intended to be generally useful for writing queries. I agree that's an improvement. Thanks! -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Cpu usage 100% on slave. s_lock problem.
On 9/27/13 3:00 PM, Merlin Moncure wrote: Attached is simplified patch that replaces the spinlock with a read barrier based on a suggestion made by Andres offlist. This patch doesn't apply. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls
On 29.09.2013 23:32, Nicholas White wrote: bms_add_member() is an accident waiting to happen I've attached a patch that makes it use repalloc as suggested You'll have to zero out the extended portion. I tried to demonstrate that by setting RANDOMIZE_ALLOCATED_MEMORY, but surprisingly regression tests still passed. I guess the regression suite doesn't use wide enough bitmapsets to exercise that. But this causes an assertion failure, with RANDOMIZE_ALLOCATED_MEMORY: create table t (i int4); select * from t as t1, t as t2, t as t3, t as t4, t as t5, t as t6, t as t7, t as t8, t as t9, t as t10, t as t11, t as t12, t as t13, t as t14, t as t15, t as t16, t as t17, t as t18, t as t19, t as t20, t as t21, t as t22, t as t23, t as t24, t as t25, t as t26, t as t27, t as t28, t as t29, t as t30, t as t31, t as t32, t as t33, t as t34, t as t35, t as t36, t as t37, t as t38, t as t39, t as t40; - is it OK to commit separately? I'll address the lead-lag patch comments in the next couple of days. Thanks Yep, thanks. I committed the attached. After thinking about this some more, I realized that bms_add_member() is still sensitive to CurrentMemoryContext, if the 'a' argument is NULL. That's probably OK for the laglead patch - I didn't check - but if we're going to start relying on the fact that bms_add_member() no longer allocates a new bms in CurrentMemoryContext, that needs to be documented. bitmapset.c currently doesn't say a word about memory contexts. So what needs to be done next is to document how the functions in bitmapset.c work wrt. memory contexts. Then double-check that the behavior of all the other recycling bms functions is consistent. (At least bms_add_members() needs a similar change). - Heikki From ee01d848f39400c8524c66944ada6fde47894978 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Mon, 30 Sep 2013 16:37:00 +0300 Subject: [PATCH 1/1] In bms_add_member(), use repalloc() if the bms needs to be enlarged. Previously bms_add_member() would palloc a whole-new copy of the existing set, copy the words, and pfree the old one. repalloc() is potentially much faster, and more importantly, this is less surprising if CurrentMemoryContext is not the same as the context the old set is in. bms_add_member() still allocates a new bitmapset in CurrentMemoryContext if NULL is passed as argument, but that is a lot less likely to induce bugs. Nicholas White. --- src/backend/nodes/bitmapset.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c index b18b7a5..540db16 100644 --- a/src/backend/nodes/bitmapset.c +++ b/src/backend/nodes/bitmapset.c @@ -632,21 +632,20 @@ bms_add_member(Bitmapset *a, int x) return bms_make_singleton(x); wordnum = WORDNUM(x); bitnum = BITNUM(x); + + /* enlarge the set if necessary */ if (wordnum = a-nwords) { - /* Slow path: make a larger set and union the input set into it */ - Bitmapset *result; - int nwords; + int oldnwords = a-nwords; int i; - result = bms_make_singleton(x); - nwords = a-nwords; - for (i = 0; i nwords; i++) - result-words[i] |= a-words[i]; - pfree(a); - return result; + a = (Bitmapset *) repalloc(a, BITMAPSET_SIZE(wordnum + 1)); + a-nwords = wordnum + 1; + /* zero out the enlarged portion */ + for (i = oldnwords; i a-nwords; i++) + a-words[i] = 0; } - /* Fast path: x fits in existing set */ + a-words[wordnum] |= ((bitmapword) 1 bitnum); return a; } -- 1.8.4.rc3 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cmpact commits and changeset extraction
Hi, Changeset extraction only works in the context of a single database but has to scan through xlog records from multiple databases. Most records are easy to skip because they contain the database in the relfilenode or are just not interesting for logical replication. The only exception are compact commits. So we have some alternatives: 1) don't do anything, in that case empty transactions will get replayed since the changes themselves will get skipped. 2) Don't use compact commits if wal_level=logical 3) unify compact and non-compact commits, trying to get the normal one smaller. For 3) I am thinking of using 'xinfo' to store whether we have the other information or not. E.g. if there are subxacts in a compact commit we signal that by the flag 'XACT_COMMIT_CONTAINS_SUBXACTS' and store the number of subxacts after the xlog record. Similarly with relations, invalidation messages and the database id. That should leave compact commits without any subxacts at the former size, and those with at the former size + 4. Normal commits would get smaller in many cases since we don't store the empty fields. I personally think 3) is the best solution, any other opinions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Fri, Sep 27, 2013 at 8:36 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Sep 27, 2013 at 5:36 AM, Robert Haas robertmh...@gmail.com wrote: I don't have another idea either. In fact, I'd go so far as to say that doing any third thing that's better than those two to any reasonable person is obviously impossible. But I'd add that we simple cannot rollback at read committed, so we're just going to have to hold our collective noses and do strange things with visibility. I don't accept that as a general principal. We're writing the code; we can make it behave any way we think best. I presume you're referring to the principle that we cannot throw serialization failures at read committed. I'd suggest that letting that happen would upset a lot of people, because it's so totally unprecedented. A large segment of our user base would just consider that to be Postgres randomly throwing errors, and would be totally dismissive of the need to do so, and not without some justification - no one else does the same. The reality is that the majority of our users don't even know what an isolation level is. I'm not just talking about people that use Postgres more casually, such as Heroku customers. I've personally talked to people who didn't even know what a transaction isolation level was, that were in a position where they really, really should have known. Yes, it might not be a good idea. But I'm just saying, we get to decide. I doubt that any change to HeapTupleSatisfiesMVCC() will be acceptable. This feature needs to restrain itself to behavior changes that only affect users of this feature, I think. I agree with the principle of what you're saying, but I'm not aware that those changes to HeapTupleSatisfiesMVCC() imply any behavioral changes for those not using the feature. Certainly, the standard regression tests and isolation tests still pass, for what it's worth. Having said that, I have not thought about it enough to be willing to actually defend that bit of code. Though I must admit that I am a little encouraged by the fact that it passes casual inspection. Well, at a minimum, it's a performance worry. Those functions are *hot*. Single branches do matter there. I am starting to wonder if it's really necessary to have a blessed update that can see the locked, not-otherwise-visible tuple. Doing that certainly has its disadvantages, both in terms of code complexity and in terms of being arbitrarily restrictive. We're going to have to allow the user to see the locked row after it's updated (the new row version that we create will naturally be visible to its creating xact) - is it really any worse that the user can see it before an update (or a delete)? The user could decide to effectively make the update change nothing, and see the same thing anyway. If we're not going to just error out over the invisible tuple the user needs some way to interact with it. The details are negotiable. I get why you're averse to doing odd things to visibility - I was too. I just don't see that we have a choice if we want this feature to work acceptably with read committed. In addition, as it happens I just don't see that the general situation is made any worse by the fact that the user might be able to see the row before an update/delete. Isn't is also weird to update or delete something you cannot see? Couldn't EvalPlanQual() be said to be an MVCC violation on similar grounds? It also reaches into the future. Locking a row isn't really that distinct from updating it in terms of the code footprint, but also from a logical perspective. Yes, EvalPlanQual() is definitely an MVCC violation. Realize you can generally only kill the heap tuple *before* you have the row lock, because otherwise a totally innocent non-HOT update (may not update any unique indexed columns at all) will deadlock with your session, which I don't think is defensible, and will probably happen often if allowed to (after all, this is upsert - users are going to want to update their locked rows!). I must be obtuse; I don't see why that would deadlock. If you don't see it, then you aren't being obtuse in asking for clarification. It's really easy to be wrong about this kind of thing. If the non-HOT update updates some random row, changing the key columns, it will lock that random row version. It will then proceed with value locking (i.e. inserting index tuples in the usual way, in this case with entirely new values). It might then block on one of the index tuples we, the upserter, have already inserted (these are our value locks under your scheme). Meanwhile, we (the upserter) might have *already* concluded that the *old* heap row that the regular updater is in the process of rendering invisible is to blame in respect of some other value in some later unique index, and that *it* must be locked. Deadlock. This seems very possible if the key values are somewhat correlated, which is
Re: [HACKERS] review: psql and pset without any arguments
Please remove the tabs from the SGML files. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On 9/28/13 3:05 AM, Amit Kapila wrote: Now as we have an agreement, I had updated patch for below left issues: Regression tests are failing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls
Nicholas White escribió: But even if we did decide to switch memory contexts on every call, it would still be much cleaner than this. I've removed all the bms_initalize code from the patch and am using this solution. As the partition memory is zero-initialised I just store a Bitmapset pointer in the WinGetPartitionLocalMemory. The bms_add_member and bms_is_member functions behave sensibly for null-pointer inputs (they return a bms_make_singleton in the current memory context and false respectively). I've surrounded the calls to bms_make_singleton with a memory context switch (to the partition's context) so the Bitmapset stays in the partition's context. Now that I look again, would GetMemoryChunkContext() be useful here? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Mon, Sep 30, 2013 at 02:17:39PM +0300, Heikki Linnakangas wrote: What would it take to abstract the minmax indexes to allow maintaing a bounding box for points, instead of a plain min/max? Or for ranges. In other words, why is this restricted to b-tree operators? If I had to guess, I'd guess, first cut. I take it this also occurred to you and that you believe that this approach makes the more general case or at least further out than it would need to be. Am I close? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
David Fetter wrote: On Mon, Sep 30, 2013 at 02:17:39PM +0300, Heikki Linnakangas wrote: What would it take to abstract the minmax indexes to allow maintaing a bounding box for points, instead of a plain min/max? Or for ranges. In other words, why is this restricted to b-tree operators? If I had to guess, I'd guess, first cut. Yeah, there were a few other simplifications in the design too, though I admit allowing for multidimensional dataypes hadn't occured to me (though I will guess Simon did think about it and just didn't tell me to avoid me going overboard with stuff that would make the first version take forever). I think we'd better add version numbers and stuff to the metapage to allow for extensions and proper upgradability. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
--On 27. September 2013 09:57:07 +0200 Andres Freund and...@2ndquadrant.com wrote: Ok, was free: padding + 16 partitions: tps = 147884.648416 padding + 32 partitions: tps = 141777.841125 padding + 64 partitions: tps = 141561.539790 padding + 16 partitions + new lwlocks tps = 601895.580903 (yeha, still reproduces after some sleep!) Hmm, out of interest and since i have access to a (atm) free DL580 G7 (4x E7-4800 10core) i've tried your bench against this machine and got this (best of three): HEAD (default): tps = 181738.607247 (including connections establishing) tps = 182665.993063 (excluding connections establishing) HEAD (padding + 16 partitions + your lwlocks patch applied): tps = 269328.259833 (including connections establishing) tps = 270685.666091 (excluding connections establishing) So, still an improvement but far away from what you got. Do you have some other tweaks in your setup? -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition
Hi, On 2013-09-30 18:54:11 +0200, Bernd Helmle wrote: HEAD (default): tps = 181738.607247 (including connections establishing) tps = 182665.993063 (excluding connections establishing) HEAD (padding + 16 partitions + your lwlocks patch applied): tps = 269328.259833 (including connections establishing) tps = 270685.666091 (excluding connections establishing) So, still an improvement but far away from what you got. Do you have some other tweaks in your setup? The only relevant setting changed was -c shared_buffers=1GB, no other patches applied. At which scale did you pgbench -i? Your processors are a different microarchitecture, I guess that could also explain some of the difference. Any chance you could run a perf record -ag (after compiling with -fno-omit-frame-pointer)? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 30.09.2013 19:49, Alvaro Herrera wrote: David Fetter wrote: On Mon, Sep 30, 2013 at 02:17:39PM +0300, Heikki Linnakangas wrote: What would it take to abstract the minmax indexes to allow maintaing a bounding box for points, instead of a plain min/max? Or for ranges. In other words, why is this restricted to b-tree operators? If I had to guess, I'd guess, first cut. Yeah, there were a few other simplifications in the design too, though I admit allowing for multidimensional dataypes hadn't occured to me You can almost create a bounding box opclass in the current implementation, by mapping operator to contains and to not contains. But there's no support for creating a new, larger, bounding box on insert. It will just replace the max with the new value if it's greater than, when it should create a whole new value to store in the index that covers both the old and the new values. (or less than? I'm not sure which way those operators would work..) When you think of the general case, it's weird that the current implementation requires storing both the min and the max. For a bounding box, you store the bounding box that covers all heap tuples in the range. If that corresponds to max, what does min mean? In fact, even with regular b-tree operators, over integers for example, you don't necessarily want to store both min and max. If you only ever perform queries like WHERE col ?, there's no need to track the min value. So to make this really general, you should be able to create an index on only the minimum or maximum. Or if you want both, you can store them as separate index columns. Something like: CREATE INDEX minindex ON table (col ASC); -- For min CREATE INDEX minindex ON table (col DESC); -- For max CREATE INDEX minindex ON table (col ASC, col DESC); -- For both That said, in practice most people probably want to store both min and max. Maybe it's a bit too finicky if we require people to write col ASC, col DESC to get that. Some kind of a shorthand probably makes sense. (though I will guess Simon did think about it and just didn't tell me to avoid me going overboard with stuff that would make the first version take forever). Heh, and I ruined that great plan :-). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cmpact commits and changeset extraction
On Mon, Sep 30, 2013 at 10:50 AM, Andres Freund and...@2ndquadrant.com wrote: Changeset extraction only works in the context of a single database but has to scan through xlog records from multiple databases. Most records are easy to skip because they contain the database in the relfilenode or are just not interesting for logical replication. The only exception are compact commits. So we have some alternatives: 1) don't do anything, in that case empty transactions will get replayed since the changes themselves will get skipped. 2) Don't use compact commits if wal_level=logical 3) unify compact and non-compact commits, trying to get the normal one smaller. For 3) I am thinking of using 'xinfo' to store whether we have the other information or not. E.g. if there are subxacts in a compact commit we signal that by the flag 'XACT_COMMIT_CONTAINS_SUBXACTS' and store the number of subxacts after the xlog record. Similarly with relations, invalidation messages and the database id. That should leave compact commits without any subxacts at the former size, and those with at the former size + 4. Normal commits would get smaller in many cases since we don't store the empty fields. I personally think 3) is the best solution, any other opinions? What's wrong with #1? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
On Sep 30, 2013 4:39 AM, Sameer Thakur samthaku...@gmail.com wrote: Also, for onlookers, I have changed this patch around to do the date-oriented stuff but want to look it over before stapling it up and sending it. If one cannot wait, one can look at https://github.com/fdr/postgres/tree/queryid. The squashed-version of that history contains a reasonable patch I think, but a re-read often finds something for me and I've only just completed it yesterday. I did the following 1. Forked from fdr/postgres 2. cloned branch queryid 3. squashed 22899c802571a57cfaf0df38e6c5c366b5430c74 d813096e29049667151a49fc5e5cf3d6bbe55702 picked be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5 4. usual make/make install/create extension pg_stat_statements. (pg_stat_statements.max=100). 5. select * from pg_stat_statements_reset(), select * from pgbench_tellers. result below: userid | dbid | session_start |introduced | query | query_id | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | t emp_blks_read | temp_blks_written | blk_read_time | blk_write_time +---+--+---+---+-+---++ --+-+--+-+-++-+++-- --+---+---+ 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pg_stat_statements_reset(); | 2531907647060518039 | 1 | 0 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01 05:30:00+05:30 | select * from pgbench_tellers ; | 7580333025384382649 | 1 | 0 | 10 | 1 |0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 (2 rows) I understand session_start and verified that it changes with each database restart to reflect current time. It should only restart when the statistics file cannot be loaded. I am not sure why introduced keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it reflected the (most recent) time query statements statistics is added to hashtable. Is this a bug? Will continue to test and try and understand the code. Yes, a bug. There are a few calls to pgss store and I must be submitting a zero value for the introduction time in one of those cases. Heh, I thought that was fixed, but maybe I broke something. Like I said; preliminary. At the earliest I can look at this Wednesday, but feel free to amend and resubmit including my changes if you feel inclined and get to it first.
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On Mon, Sep 30, 2013 at 9:07 PM, Peter Eisentraut pete...@gmx.net wrote: On 9/28/13 3:05 AM, Amit Kapila wrote: Now as we have an agreement, I had updated patch for below left issues: Regression tests are failing. Thanks for informing. I am sorry for not running regression before sending patch. Reason for failure was that source for GUC in new function validate_conf_option() was hardcoded to PGC_S_FILE which was okay for Alter System, but not for SET path. I had added new parameter source in this function and get the value of source when this is called from SET path. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com alter_system_v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: psql and pset without any arguments
Le 30/09/2013 17:35, Peter Eisentraut a écrit : Please remove the tabs from the SGML files. Done. I've also fixed the typo reported by Ian. Here is the attached v4 patch. Thanks a lot for your review. Regards, -- Gilles Darold Administrateur de bases de données http://dalibo.com - http://dalibo.org diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 574db5c..98a4221 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2272,13 +2272,10 @@ lo_import 152801 /para /tip -note para -It is an error to call command\pset/command without any -arguments. In the future this case might show the current status +command\pset/command without any arguments displays current status of all printing options. /para -/note /listitem /varlistentry diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 10e9f64..3fc0728 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -68,6 +68,7 @@ static int strip_lineno_from_funcdesc(char *func); static void minimal_error_message(PGresult *res); static void printSSLInfo(void); +static bool printPsetInfo(const char *param, struct printQueryOpt *popt); #ifdef WIN32 static void checkWin32Codepage(void); @@ -1045,8 +1046,20 @@ exec_command(const char *cmd, if (!opt0) { - psql_error(\\%s: missing required argument\n, cmd); - success = false; + size_t i; + /* list all variables */ + static const char *const my_list[] = { +border, columns, expanded, fieldsep, +footer, format, linestyle, null, +numericlocale, pager, recordsep, +tableattr, title, tuples_only, +NULL }; + for (i = 0; my_list[i] != NULL; i++) { + printPsetInfo(my_list[i], pset.popt); + } + + success = true; + } else success = do_pset(opt0, opt1, pset.popt, pset.quiet); @@ -2275,8 +2288,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) return false; } - if (!quiet) - printf(_(Output format is %s.\n), _align2string(popt-topt.format)); } /* set table line style */ @@ -2296,9 +2307,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) return false; } - if (!quiet) - printf(_(Line style is %s.\n), - get_line_style(popt-topt)-name); } /* set border style/width */ @@ -2307,8 +2315,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) if (value) popt-topt.border = atoi(value); - if (!quiet) - printf(_(Border style is %d.\n), popt-topt.border); } /* set expanded/vertical mode */ @@ -2320,15 +2326,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.expanded = ParseVariableBool(value); else popt-topt.expanded = !popt-topt.expanded; - if (!quiet) - { - if (popt-topt.expanded == 1) -printf(_(Expanded display is on.\n)); - else if (popt-topt.expanded == 2) -printf(_(Expanded display is used automatically.\n)); - else -printf(_(Expanded display is off.\n)); - } } /* locale-aware numeric output */ @@ -2338,13 +2335,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.numericLocale = ParseVariableBool(value); else popt-topt.numericLocale = !popt-topt.numericLocale; - if (!quiet) - { - if (popt-topt.numericLocale) -puts(_(Showing locale-adjusted numeric output.)); - else -puts(_(Locale-adjusted numeric output is off.)); - } } /* null display */ @@ -2355,8 +2345,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) free(popt-nullPrint); popt-nullPrint = pg_strdup(value); } - if (!quiet) - printf(_(Null display is \%s\.\n), popt-nullPrint ? popt-nullPrint : ); } /* field separator for unaligned text */ @@ -2368,13 +2356,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.fieldSep.separator = pg_strdup(value); popt-topt.fieldSep.separator_zero = false; } - if (!quiet) - { - if (popt-topt.fieldSep.separator_zero) -printf(_(Field separator is zero byte.\n)); - else -printf(_(Field separator is \%s\.\n), popt-topt.fieldSep.separator); - } } else if (strcmp(param, fieldsep_zero) == 0) @@ -2382,8 +2363,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) free(popt-topt.fieldSep.separator); popt-topt.fieldSep.separator = NULL; popt-topt.fieldSep.separator_zero = true; - if (!quiet) - printf(_(Field separator is zero byte.\n)); } /* record separator for unaligned text */ @@ -2395,15 +2374,6 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) popt-topt.recordSep.separator = pg_strdup(value); popt-topt.recordSep.separator_zero = false; } - if
Re: [HACKERS] pluggable compression support
I've been following this issue these last few months. Having the latest and best compressors built-in is a fashionable features these days. And for good reasons. I'm quite amazed that this issue is still considered a legal risk. To put this in perspective, the *whole world* is using LZ4 by now. It's integrated directly into Linux kernel ARM, which means every smartphone on the planet will have this piece of software integrated right at the factory. And that's not just Linux. HBase has it. TokuDB has it. Delphix has it. And PostgreSQL is stuck with what, pglz ? How come any compressor which could put some competition to pglz is systematically pushed out of the field on the ground of unverifiable legal risks ? And why would pglz be much safer to the very same risks ? From what I can see, pglz is more complex, and therefore exposed to many more patent risks, than simpler lz alternatives. Seems the debate is overly biaised in favor of pglz. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pluggable-compression-support-tp5759259p5772891.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cmpact commits and changeset extraction
On 2013-09-30 14:22:22 -0400, Robert Haas wrote: On Mon, Sep 30, 2013 at 10:50 AM, Andres Freund and...@2ndquadrant.com wrote: Changeset extraction only works in the context of a single database but has to scan through xlog records from multiple databases. Most records are easy to skip because they contain the database in the relfilenode or are just not interesting for logical replication. The only exception are compact commits. So we have some alternatives: 1) don't do anything, in that case empty transactions will get replayed since the changes themselves will get skipped. 2) Don't use compact commits if wal_level=logical 3) unify compact and non-compact commits, trying to get the normal one smaller. For 3) I am thinking of using 'xinfo' to store whether we have the other information or not. E.g. if there are subxacts in a compact commit we signal that by the flag 'XACT_COMMIT_CONTAINS_SUBXACTS' and store the number of subxacts after the xlog record. Similarly with relations, invalidation messages and the database id. That should leave compact commits without any subxacts at the former size, and those with at the former size + 4. Normal commits would get smaller in many cases since we don't store the empty fields. I personally think 3) is the best solution, any other opinions? What's wrong with #1? It seems confusing that a changeset stream in database #1 will contain commits (without corresponding changes) from database #2. Seems like aaa pola violation to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.1
Andres Freund and...@2ndquadrant.com wrote: Attached you can find an updated version of the series taking in some of the review comments I don't know whether this is related to the previously-reported build problems, but when I apply each patch in turn, with make -j4 world make check-world for each step, I die during compile of 0004. make[4]: Entering directory `/home/kgrittn/pg/master/src/backend/access/transam' gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po xlog.c:44:33: fatal error: replication/logical.h: No such file or directory compilation terminated. make[4]: *** [xlog.o] Error 1 I tried maintainer-clean and a new ./configure to see if that would get me past it; no joy. I haven't dug further, but if this is not a known issue I can poke around. If it is known -- how do I get past it? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fast gin cache performance improvement
Hi Etsuro, Sorry for the delay but I have been very busy with work. I have been away from postgres for a while, so I will need a little time to review the code and make sure I give you an informed response. I'll get back to you as soon as I am able. Thanks for understanding. Ian Link Etsuro Fujita Friday, September 27, 2013 2:24 AM I wrote: I had a look over this patch. I think this patch is interesting and very useful. Here are my review points: 8. I think there are no issues in this patch. However, I have one question: how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this case, in my understanding, this patch inserts new entries into the pending list temporarily and immediately moves them to the main GIN data structure using ginInsertCleanup(). Am I right? If so, that is obviously inefficient. Sorry, There are incorrect expressions. I mean gin_fast_limit 0 and fast_cache_size = 0. Although I asked this question, I've reconsidered about these parameters, and it seems that these parameters not only make code rather complex but are a little confusing to users. So I'd like to propose to introduce only one parameter: fast_cache_size. While users that give weight to update performance for the fast update technique set this parameter to a large value, users that give weight not only to update performance but to search performance set this parameter to a small value. What do you think about this? Thanks, Best regards, Etsuro Fujita Etsuro Fujita Thursday, September 26, 2013 6:02 AM Hi Ian, This patch contains a performance improvement for the fast gin cache. As you may know, the performance of the fast gin cache decreases with its size. Currently, the size of the fast gin cache is tied to work_mem. The size of work_mem can often be quite high. The large size of work_mem is inappropriate for the fast gin cache size. Therefore, we created a separate cache size called gin_fast_limit. This global variable controls the size of the fast gin cache, independently of work_mem. Currently, the default gin_fast_limit is set to 128kB. However, that value could need tweaking. 64kB may work better, but it's hard to say with only my single machine to test on. On my machine, this patch results in a nice speed up. Our test queries improve from about 0.9 ms to 0.030 ms. Please feel free to use the test case yourself: it should be attached. I can look into additional test cases (tsvectors) if anyone is interested. In addition to the global limit, we have provided a per-index limit: fast_cache_size. This per-index limit begins at -1, which means that it is disabled. If the user does not specify a per-index limit, the index will simply use the global limit. I had a look over this patch. I think this patch is interesting and very useful. Here are my review points: 1. Patch applies cleanly. 2. make, make install and make check is good. 3. I did performance evaluation using your test queries with 64kB and 128kB of gin_fast_limit (or fast_cache_size), and saw that both values achieved the performance gains over gin_fast_limit = '256MB'. 64kB worked better than 128kB. 64kB improved from 1.057 ms to 0.075 ms. Great! 4. In my understanding, the small value of gin_fast_limit/fast_cache_size leads to the increase in GIN search performance, which, however, leads to the decrease in GIN update performance. Am I right? If so, I think the tradeoff should be noted in the documentation. 5. The following documents in Chapter 57. GIN Indexes need to be updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and Tricks 6. I would like to see the results for the additional test cases (tsvectors). 7. The commented-out elog() code should be removed. 8. I think there are no issues in this patch. However, I have one question: how this patch works in the case where gin_fast_limit/fast_cache_size = 0? In this case, in my understanding, this patch inserts new entries into the pending list temporarily and immediately moves them to the main GIN data structure using ginInsertCleanup(). Am I right? If so, that is obviously inefficient. Sorry for the delay. Best regards, Etsuro Fujita Ian Link Monday, June 17, 2013 9:42 PM This patch contains a performance improvement for the fast gin cache. As you may know, the performance of the fast gin cache decreases with its size. Currently, the size of the fast gin cache is tied to work_mem. The size of work_mem can often be quite high. The large size of work_mem is inappropriate for the fast gin cache size. Therefore, we created a separate cache size called gin_fast_limit. This global variable controls the size of the fast gin cache, independently of work_mem. Currently, the default gin_fast_limit is set to 128kB. However, that value could need tweaking. 64kB may work better, but it's hard to say with only my single machine to test on.On my machine, this
Re: [HACKERS] Wait free LW_SHARED acquisition
--On 30. September 2013 19:00:06 +0200 Andres Freund and...@2ndquadrant.com wrote: HEAD (default): tps = 181738.607247 (including connections establishing) tps = 182665.993063 (excluding connections establishing) HEAD (padding + 16 partitions + your lwlocks patch applied): tps = 269328.259833 (including connections establishing) tps = 270685.666091 (excluding connections establishing) So, still an improvement but far away from what you got. Do you have some other tweaks in your setup? The only relevant setting changed was -c shared_buffers=1GB, no other patches applied. At which scale did you pgbench -i? I've used a scale factor of 10 (i recall you've mentioned using the same upthread...). Okay, i've used 2GB shared buffers, repeating with your setting i get a far more noticable speedup: tps = 346292.008580 (including connections establishing) tps = 347997.073595 (excluding connections establishing) Here's the perf output: + 4.34% 207112 postgres postgres [.] AllocSetAlloc + 4.07% 194476 postgres libc-2.13.so [.] 0x127b33 + 2.59% 123471 postgres postgres [.] SearchCatCache + 2.49% 118974 pgbench libc-2.13.so [.] 0x11aaef + 2.48% 118263 postgres postgres [.] GetSnapshotData + 2.46% 117646 postgres postgres [.] base_yyparse + 2.02% 96546 postgres postgres [.] MemoryContextAllocZeroAligned + 1.58% 75326 postgres postgres [.] AllocSetFreeIndex + 1.23% 58587 postgres postgres [.] hash_search_with_hash_value + 1.01% 48391 postgres postgres [.] palloc + 0.93% 44258 postgres postgres [.] LWLockAttemptLock + 0.91% 43575 pgbench libc-2.13.so [.] free + 0.89% 42484 postgres postgres [.] nocachegetattr + 0.89% 42378 postgres postgres [.] core_yylex + 0.88% 42001 postgres postgres [.] _bt_compare + 0.84% 39997 postgres postgres [.] expression_tree_walker + 0.76% 36533 postgres postgres [.] ScanKeywordLookup + 0.74% 35515 pgbench libc-2.13.so [.] malloc + 0.64% 30715 postgres postgres [.] LWLockRelease + 0.56% 26779 postgres postgres [.] fmgr_isbuiltin + 0.54% 25681 pgbench [kernel.kallsyms][k] _spin_lock + 0.48% 22836 postgres postgres [.] new_list + 0.48% 22700 postgres postgres [.] hash_any + 0.47% 22378 postgres postgres [.] FunctionCall2Coll + 0.46% 22095 postgres postgres [.] pfree + 0.44% 20929 postgres postgres [.] palloc0 + 0.43% 20592 postgres postgres [.] AllocSetFree + 0.40% 19495 postgres [unknown][.] 0x81cf2f + 0.40% 19247 postgres postgres [.] hash_uint32 + 0.38% 18227 postgres postgres [.] PinBuffer + 0.38% 18022 pgbench [kernel.kallsyms][k] do_select -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - exclude pthread_create() from connection start timing
On Thu, Sep 26, 2013 at 01:41:01PM +0200, Fabien COELHO wrote: I don't get it; why is taking the time just after pthread_create() more sane than taking it just before pthread_create()? Thread create time seems to be expensive as well, maybe up 0.1 seconds under some conditions (?). Under --rate, this create delay means that throttling is laging behind schedule by about that time, so all the first transactions are trying to catch up. threadRun() already initializes throttle_trigger with a fresh timestamp. Please detail how the problem remains despite that. typically far more expensive than pthread_create(). The patch for threaded pgbench made the decision to account for pthread_create() as though it were part of establishing the connection. You're proposing to not account for it all. Both of those designs are reasonable to me, but I do not comprehend the benefit you anticipate from switching from one to the other. -j 800 vs -j 100 : ITM that if I you create more threads, the time delay incurred is cumulative, so the strangeness of the result should worsen. Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before calling pthread_create(). However, thread 0 is a special case; we set its start time first and actually start it last. Your observation of cumulative delay fits those facts. Yep, that must be thread 0 which has a very large delay. I think it is simpler that each threads record its start time when it has started, without exception. Initializing the thread-0 start time later, just before calling its threadRun(), should clear this anomaly without changing other aspects of the measurement. Always taking the thread start time when the thread is started does solve the issue as well, and it is homogeneous for all cases, so the solution I suggest seems reasonable and simple. To exercise the timing semantics before and after your proposed change, I added a sleep(1); before the pthread_create() call. Here are the results with and without -j, with and without pgbench-measurements-v5.patch: $ echo 'select 1' test.sql # just the sleep(1) addition $ env time pgbench -c4 -t1000 -S -n -f test.sql | grep tps tps = 6784.410104 (including connections establishing) tps = 7094.701854 (excluding connections establishing) 0.03user 0.07system 0:00.60elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k $ env time pgbench -j4 -c4 -t1000 -S -n -f test.sql | grep tps tps = 1224.327010 (including connections establishing) tps = 2274.160899 (excluding connections establishing) 0.02user 0.03system 0:03.27elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k # w/ pgbench-measurements-v5.patch $ env time pgbench -c4 -t1000 -S -n -f test.sql | grep tps tps = 6792.393877 (including connections establishing) tps = 7207.142278 (excluding connections establishing) 0.08user 0.06system 0:00.60elapsed 23%CPU (0avgtext+0avgdata 0maxresident)k $ env time pgbench -j4 -c4 -t1000 -S -n -f test.sql | grep tps tps = 1212.040409 (including connections establishing) tps = 1214.728830 (excluding connections establishing) 0.09user 0.06system 0:03.31elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k Rather than, as I supposed before, excluding the cost of thread start entirely, pgbench-measurements-v5.patch has us count pthread_create() as part of the main runtime. I now see the cumulative delay you mentioned, but pgbench-measurements-v5.patch does not fix it. The problem is centered on the fact that pgbench.c:main() calculates a single total_time and models each thread as having run for that entire period. If pthread_create() is slow, reality diverges considerably from that model; some threads start notably late, and other threads finish notably early. The threadRun() runtime intervals in the last test run above are actually something like this: thread 1: 1.0s - 1.3s thread 2: 2.0s - 2.3s thread 3: 3.0s - 3.3s thread 0: 3.0s - 3.3s Current pgbench instead models every thread as having run 0.0s - 3.3s, hence the numbers reported. To make the numbers less surprising, we could axe the global total_time=end_time-start_time and instead accumulate total_time on a per-thread basis, just as we now accumulate conn_time on a per-thread basis. That ceases charging threads for time spent not-yet-running or already-finished, but it can add its own inaccuracy. Performance during a period in which some clients have yet to start is not interchangeable with performance when all clients are running. pthread_create() slowness would actually make the workload seem to perform better. An alternate strategy would be to synchronize the actual start of command issuance across threads. All threads would start and make their database connections, then signal readiness. Once the first thread notices that every other thread is ready, it would direct them to actually start issuing queries. This might minimize the result skew problem of the first strategy. A third strategy is to just add a comment and write
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Sep 30, 2013 at 8:32 AM, Robert Haas robertmh...@gmail.com wrote: I doubt that any change to HeapTupleSatisfiesMVCC() will be acceptable. This feature needs to restrain itself to behavior changes that only affect users of this feature, I think. I agree with the principle of what you're saying, but I'm not aware that those changes to HeapTupleSatisfiesMVCC() imply any behavioral changes for those not using the feature. Well, at a minimum, it's a performance worry. Those functions are *hot*. Single branches do matter there. Well, that certainly is a reasonable concern. Offhand, I suspect that branch prediction helps immensely. But even if it doesn't, couldn't it be the case that returning earlier there actually helps? Where we have a real xid (so TransactionIdIsCurrentTransactionId() must do more than a single test of a scalar variable), and the row is locked *only* (which is already very cheap to check - it's another scalar variable that we already test in a few other places in that function), isn't there on average a high chance that the tuple ought to be visible to our snapshot anyway? I am starting to wonder if it's really necessary to have a blessed update that can see the locked, not-otherwise-visible tuple. If we're not going to just error out over the invisible tuple the user needs some way to interact with it. The details are negotiable. I think that we will error out over an invisible tuple with higher isolation levels. Certainly, what we do there today instead of EvalPlanQual() looping is consistent with that behavior. Couldn't EvalPlanQual() be said to be an MVCC violation on similar grounds? It also reaches into the future. Locking a row isn't really that distinct from updating it in terms of the code footprint, but also from a logical perspective. Yes, EvalPlanQual() is definitely an MVCC violation. So I think that you can at least see why I'd consider that the two (my tweaks to HeapTupleSatisfiesMVCC() and EvalPlanQual()) are isomorphic. It just becomes the job of this new locking infrastructure to worry about the would-be invisibility of the locked tuple, and raise a serialization error accordingly at higher isolation levels. The important observation here is that an updater, in effect, locks both the old and new sets of values (for non-HOT updates). And as I've already noted, any practical value locking implementation isn't going to be able to prevent the update from immediately locking the old, because that doesn't touch an index. Hence, there is an irresolvable disconnect between value and row locking. This part I don't follow. locking the old? What irresolvable disconnect? I mean, they're different things; I get *that*. Well, if you update a row, the old row version's values are locked, in the sense that any upserter interested in inserting the same values as the old version is going to have to block pending the outcome of the updating xact. The disconnect is that any attempt at a clever dance, to interplay value and row locking such that this definitely just works first time seems totally futile - I'm emphasizing this because it's the obvious way to approach this basic problem. It turns out that it could only be done at great expense, in a way that would immediately be dismissed as totally outlandish. OK, I take your point, I think. The existing system already acquires value locks when a tuple lock is held, during an UPDATE, and we can't change that. Right. I think that re-ordering is an important property of any design where we cannot bail out with serialization failures. I worry about the behavior being confusing and hard to understand in the presence of multiple unique indexes and reordering. Perhaps I simply don't understand the problem domain well-enough yet. It's only confusing if you are worried about what concurrent sessions do with respect to each other at this low level. In which case, just use a higher isolation level and pay the price. I'm not introducing any additional anomalies described and prohibited by the standard by doing this, and indeed the order of retrying in the event of a conflict today is totally undefined, so this line of thinking is not inconsistent with how things work today. Today, strictly speaking some unique constraint violations might be more appropriate as serialization failures. So with this new functionality, when used, they're going to be actual serialization failures where that's appropriate, where we'd otherwise go do something else other than error. Why burden read committed like that? (Actually, fwiw I suspect that currently the SSI guarantees *can* be violated with unique retry re-ordering, but that's a whole other story, and is pretty subtle). Let me come right out and say it: Yes, part of the reason that I'm taking this line is because it's convenient to my implementation from a number of different perspectives. But one of those perspectives is that it will help performance
Re: [HACKERS] dynamic shared memory
On Tue, Sep 24, 2013 at 08:58:36AM +0530, Amit Kapila wrote: On Tue, Sep 24, 2013 at 12:32 AM, Noah Misch n...@leadboat.com wrote: I don't know whether writing it as binary will help or hurt that situation. If nothing else, binary gives you one less variation to think about when studying the code. In that case, shouldn't all other places be consistent. One reason I had in mind for using appropriate mode is that somebody reading code can tomorrow come up with a question or a patch to use correct mode, then we will again be in same situation. There are cases that must use binary I/O (table data files), cases that benefit notably from text I/O (log files, postgresql.conf), and cases where it doesn't matter too much (dsm state file, postmaster.pid). I don't see a need to make widespread changes to other call sites. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Cpu usage 100% on slave. s_lock problem.
On Sat, Sep 28, 2013 at 12:53 AM, Andres Freund and...@2ndquadrant.com wrote: What confuses me is that pg_read_barrier() is just a compiler barrier on x86[-64] in barrier.h. According to my knowledge it needs to be an lfence or the full barrier? The linked papers from Paul McKenney - which are a great read - seem to agree? x86 provides a pretty strong memory model, the only (relevant) allowed reordering is that stores may be delayed beyond subsequent loads. A simple and functionally accurate model of this is to ignore all caches and think of the system as a bunch of CPU's accessing the main memory through a shared bus, but each CPU has a small buffer that stores writes done by that CPU. Intel and AMD have performed feats of black magic in the memory pipelines that this isn't a good performance model, but for correctness it is valid. The exceptions are few unordered memory instructions that you have to specifically ask for and non-writeback memory that concerns the kernel. (See section 8.2 in [1] for details) The note by McKenney stating that lfence is required for a read barrier is for those unordered instructions. I don't think it's necessary in PostgreSQL. As for the specific patch being discussed here. The read barrier is in the wrong place and with the wrong comment, and the write side is assuming that SpinLockAcquire() is a write barrier, which it isn't documented to do at the moment. The correct way to think of this is that StartupXLOG() does a bunch of state modifications and then advertises the fact that it's done by setting xlogctl-SharedRecoveryInProgress = false; The state modifications should better be visible to anyone seeing that last write, so you need one write barrier between the state modifications and setting the flag. On the other side, after reading the flag as false in RecoveryInProgress() the state modified by StartupXLOG() may be investigated, those loads better not happen before we read the flag. So we need a read barrier somewhere *after* reading the flag in RecoveryInProgress() and reading the shared memory structures, and in theory a full barrier if we are going to be writing data. In practice x86 is covered thanks to it's memory model, Power is covered thanks to the control dependency and ARM would need a read barrier, but I don't think any real ARM CPU does speculative stores as that would be insane. [1] http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.pdf Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Sep 30, 2013 at 3:45 PM, Peter Geoghegan p...@heroku.com wrote: If you think it's a bit odd that we lock every value while the user essentially has one constraint in mind when writing their DML, consider: I should add to that list: 4) Locking all the values at once is necessary for the behavior of the locking to be well-defined -- I feel we need to know that some exact tuple is to blame (according to our well defined ordering for checking unique indexes for conflicts) for at least one instant in time. Given that we need to be the first to change the row without anything being altered to it, this ought to be sufficient. If you think it's bad that some other session can come in and insert a tuple that would have caused us to decide differently (before *our* transaction commits but *after* we've inserted), now you're into blaming the *wrong* tuple in the future, and I can't get excited about that - we always prefer a tuple normally visible to our snapshot, but if forced to (if there is none) we just throw a serialization failure (where appropriate). So for read committed you can have no *principled* beef with this, but for serializable you're going to naturally prefer the currently-visible tuple generally (that's the only correct behavior there that won't error - there *better* be something visible). Besides, the way the user tacitly has to use the feature with one particular constraint in mind kind of implies that this cannot happen... -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
Just found this in my drafts folder. Sorry for the late response. On Fri, Sep 20, 2013 at 3:32 PM, Robert Haas robertmh...@gmail.com wrote: I am entirely unconvinced that we need this. Some people use acquire + release fences, some people use read + write fences, and there are all combinations (read-acquire, read-release, read-acquire-release, write-acquire, write-release, write-acquire-release, both-acquire, both-release, both-acquire-release). I think we're going to have enough trouble deciding between the primitives we already have, let alone with a whole mess of new ones. Mistakes will only manifest themselves on certain platforms and in many cases the races are so tight that actual failures are very unlikely to be reserved in regression testing. I have to retract my proposal to try to emulate C11 atomics in C89. I guess this goes to show why one shouldn't try to conjure up atomic API's at 2AM. I forgot the fact that while acquire-release semantics are enough to ensure sequentially consistent behavior, the actual barriers need to be paired with specific atomic accesses to achieve that. It's not possible to use freestanding acquire/release barriers to do implement this, nor would it be possible to include barriers in the atomic accesses themselves without causing significant pessimization. Sequentially consistency (along with causal transitivity and total store ordering that come with it) should be regarded as a goal. I'm not able to reason about concurrent programs without those guarantees, and I suspect no one else is either. Sequential consistency is guaranteed if atomic variables (including locks) are accessed with appropriate acquire and release semantics. We just need to use a hodge-podge of read/write/full barriers and volatile memory accesses to actually implement the semantics until some distant future date where we can start relying on compilers getting it right. I still think we should have a macro for the volatile memory accesses. As a rule, each one of those needs a memory barrier, and if we consolidate them, or optimize them out, the considerations why this is safe should be explained in a comment. Race prone memory accesses should stick out like a sore thumb. Personally, I think the biggest change that would help here is to mandate that spinlock operations serve as compiler fences. That would eliminate the need for scads of volatile references all over the place. +1. The commits that you showed fixing issues in this area show quite well why this is a good idea. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Sun, Sep 29, 2013 at 11:40:51AM +0530, Amit Kapila wrote: Shouldn't we do it for Set Constraints as well? Oh, very good point. I missed that one. Updated patch attached. I am glad you are seeing things I am not. :-) 1. The function set_config also needs similar functionality, else there will be inconsistency, the SQL statement will give error but equivalent function set_config() will succeed. SQL Command postgres=# set local search_path='public'; ERROR: SET LOCAL can only be used in transaction blocks Function postgres=# select set_config('search_path', 'public', true); set_config public (1 row) I looked at this but could not see how to easily pass the value of 'isTopLevel' down to the SELECT. All the other checks have isTopLevel passed down from the utility case statement. 2. I think we should update the documentation as well. For example: The documentation of LOCK TABLE (http://www.postgresql.org/docs/devel/static/sql-lock.html) clearly indicates that it will give error if used outside transaction block. LOCK TABLE is useless outside a transaction block: the lock would remain held only to the completion of the statement. Therefore PostgreSQL reports an error if LOCK is used outside a transaction block. Use BEGINand COMMIT (or ROLLBACK) to define a transaction block. The documentation of SET TRANSACTION (http://www.postgresql.org/docs/devel/static/sql-set-transaction.html) has below line which doesn't indicate that it will give error if used outside transaction block. If SET TRANSACTION is executed without a prior START TRANSACTION or BEGIN, it will appear to have no effect, since the transaction will immediately end. Yes, you are right. All cases I changed had mentions of the command having no effect; I have updated it to mention an error is generated. 3. void ExecSetVariableStmt(VariableSetStmt *stmt, bool isTopLevel) { .. .. else if (strcmp(stmt-name, TRANSACTION SNAPSHOT) == 0) { A_Const*con = (A_Const *) linitial(stmt-args); RequireTransactionChain(isTopLevel, SET TRANSACTION); if (stmt-is_local) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg(SET LOCAL TRANSACTION SNAPSHOT is not implemented))); .. } .. .. } Wouldn't it be better if call to RequireTransactionChain() is done after if (stmt-is_local)? Yes, good point. Done. Thanks much for your review. Updated patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/ref/set.sgml b/doc/src/sgml/ref/set.sgml new file mode 100644 index 21745db..d108dd4 *** a/doc/src/sgml/ref/set.sgml --- b/doc/src/sgml/ref/set.sgml *** SET [ SESSION | LOCAL ] TIME ZONE { rep *** 110,119 para Specifies that the command takes effect for only the current transaction. After commandCOMMIT/ or commandROLLBACK/, ! the session-level setting takes effect again. Note that ! commandSET LOCAL/ will appear to have no effect if it is ! executed outside a commandBEGIN/ block, since the ! transaction will end immediately. /para /listitem /varlistentry --- 110,118 para Specifies that the command takes effect for only the current transaction. After commandCOMMIT/ or commandROLLBACK/, ! the session-level setting takes effect again. ! productnamePostgreSQL/productname reports an error if ! commandSET LOCAL/ is used outside a transaction block. /para /listitem /varlistentry diff --git a/doc/src/sgml/ref/set_constraints.sgml b/doc/src/sgml/ref/set_constraints.sgml new file mode 100644 index 8098b7b..895a5fd *** a/doc/src/sgml/ref/set_constraints.sgml --- b/doc/src/sgml/ref/set_constraints.sgml *** SET CONSTRAINTS { ALL | replaceable cla *** 102,108 current transaction. Thus, if you execute this command outside of a transaction block (commandBEGIN/command/commandCOMMIT/command pair), it will !not appear to have any effect. /para /refsect1 --- 102,108 current transaction. Thus, if you execute this command outside of a transaction block (commandBEGIN/command/commandCOMMIT/command pair), it will !generate an error. /para /refsect1 diff --git a/doc/src/sgml/ref/set_transaction.sgml b/doc/src/sgml/ref/set_transaction.sgml new file mode 100644 index f060729..391464a *** a/doc/src/sgml/ref/set_transaction.sgml --- b/doc/src/sgml/ref/set_transaction.sgml *** SET SESSION CHARACTERISTICS AS TRANSACTI *** 184,192 para If commandSET TRANSACTION/command is executed without a prior !commandSTART TRANSACTION/command or commandBEGIN/command, !it will appear to have no effect, since the transaction will immediately !end. /para para ---
Re: [HACKERS] record identical operator - Review
On Fri, Sep 27, 2013 at 03:34:20PM -0700, Kevin Grittner wrote: The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. And to functions the user can run against the values. The example with the null bitmap for arrays being included when not necessary is something that isn't directly apparent to the eye, but queries which use pg_column_size would not get equal results. pg_column_size() is by definition internal details, so I am not worried about that not matching. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this We first need to document the existing record comparison operators. If they read the docs for comparing row_constructors and expect that to be the behavior they get when they compare records, they will be surprised. Well, if they appear in \do, I am thinking they should be documented. This is a good summary. I think there are a few things that make this issue difficult to decide: 1. We have to use an operator to give the RMVC (REFRESH MATERIALIZED VIEW CONCURRENTLY) SPI query the ability to optimize this query. If we could do this with an SQL or C function, there would be less concern about the confusion caused by adding this capability. (a) We can't. (b) Why would that be less confusing? Because function names, especially long ones, are more clearly self-documenting than operators. Question: If we are comparing based on some primary key, why do we need this to optimize? Because comparing primary keys doesn't tell us whether the old and new values in the row all match. OK, but my question was about why we need a full set of operators rather than just equal, and maybe not equal. I thought you said we needed others, e.g. , so we could do merge joins, but I thought we would just be doing comparisons after primary keys are joined, and if that is true, we could just use a function. Actually, I am now realizing you have to use the non-binary-level equals comparison on keys, then the binary-level equals on rows for this to work --- that's pretty confusing. Is that true? 3. Our type casting and operators are already complex, and adding another set of operators only compounds that. It cannot have any effect on any of the existing operators, so I'm not sure whether you are referring to the extra operators and functions, or something else. It does not, for example, introduce any risk of ambiguous operators. It makes our system more complex for the user to understand. One interesting approach would be to only allow the operator to be called from SPI queries. Why would that be a good idea? Because then it would be more of an internal operator. It would also be good to know about similar non-default entries in pg_opclass so we can understand the expected impact. A quick query (lacking schema information and schema qualification) shows what is there by default: OK, the unique list is: opcname - varchar_ops kd_point_ops cidr_ops text_pattern_ops varchar_pattern_ops bpchar_pattern_ops (6 rows) Do these all have operators defined too? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Completing PL support for Event Triggers
Review of the PL/Tcl part: The functionality looks OK. There are some cosmetic issues. If those are addressed, I think this can be committed. In the documentation, Event Triggers - Event triggers. For the example in the documentation, please show the output, that is, what the trigger outputs. Remove the extra space before tclsnitch. Document the return value (it is ignored). Will we need the return value in a future expansion? Should we leave room for that? Change command trigger to event trigger in several places. compile_pltcl_function() does not catch trigger function being called as event trigger or vice versa. Not sure if that should be covered. The first PG_TRY block in pltcl_event_trigger_handler() is unnecessary, because there is nothing in there that can throw an exception. I'd remove some comments from pltcl_event_trigger_handler(). They were obviously copied from pltcl_trigger_handler(), but that function is much longer, so more comments might have been appropriate there. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Documentation for SET var_name FROM CURRENT
While reading documentation for SET command, I observed that FROM CURRENT syntax and its description is missing from SET command's syntax page (http://www.postgresql.org/docs/devel/static/sql-set.html). Do you think that documentation should be updated for the same or is there any reason why it is not documented? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Documentation for SET var_name FROM CURRENT
Amit Kapila-2 wrote While reading documentation for SET command, I observed that FROM CURRENT syntax and its description is missing from SET command's syntax page (http://www.postgresql.org/docs/devel/static/sql-set.html). Do you think that documentation should be updated for the same or is there any reason why it is not documented? It is documented as part of CREATE FUNCTION since its use is only valid in that context. The paragraph with the link to CREATE FUNCTION seems sufficient to notify and direct people to the needed description for this. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Documentation-for-SET-var-name-FROM-CURRENT-tp5772920p5772922.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers