Re: [HACKERS] add more NLS to bin
On Thu, Oct 27, 2016 at 10:02 AM, Peter Eisentrautwrote: > Here is a series of patches to add NLS to the remaining bin programs, > which were moved from contrib a while ago. (If you're missing pgbench, > I'm skipping that for now because it's more complicated.) I'll add this > to the commit fest. I have been looking at this patch set, and that's definitely a good idea to do this change. 1) 0001 for pg_archivecleanup is missing nothing. 2) For 0002 and pg_test_fsync, I am seeing a missing entry: printf(NA_FORMAT, "n/a*\n"); 3) For pg_test_timing, the doc changes could be into a separate change, but that's fine to group them as well. I am seeing no missing strings for translations. 4) 0004 and pg_upgrade... In check.c, three places like that: if (!db_used) { fprintf(script, "Database: %s\n", active_db->db_name); db_used = true; } In exec.c: #endif fprintf(log, "command: %s\n", cmd); #ifdef WIN32 +GETTEXT_FLAGS= \ +pg_fatal:1:c-format \ +pg_log:2:c-format \ +prep_status:1:c-format \ +report_stats:2:c-forma s/report_stats/report_status/ In info.c, missing some entries in report_unmatched_relation() when reporting unmatching relations? In parseCommandLine() of option.c, the "pg_upgrade run on" string needs to be handled. In util.c, doesn't pg_log_v() need to handle strings used in fprintf? In version.c, this one: if (!db_used) { fprintf(script, "Database: %s\n", active_db->db_name); db_used = true; } 5) 0005 and pg_xlogdump, I am not seeing a missing entry. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance improvement for joins where outer side is unique
On 8 April 2016 at 06:49, Tom Lanewrote: > David Rowley writes: > Just had a thought about this, which should have crystallized a long > time ago perhaps. Where I'd originally imagined you were going with > this idea is to do what the thread title actually says, and check for > joins in which the *outer* side is unique. I can't see that that's > of any value for nestloop or hash joins, but for merge joins, knowing > that the outer side is unique could be extremely valuable because > we could skip doing mark/restore backups on the inner side, hugely > reducing the cost when the inner side has many duplicates. Now I'm > not asking for the initially-committed version of the patch to do that, > but I think we need to design it to be readily extensible to do so. I've rebased the changes I made to address this back in April to current master. The changes get rid of the changing JOIN_INNER to JOIN_SEMI conversion and revert back to the original "inner_unique" marking of the joins. In addition to this I've also added "outer_unique", which is only made use of in merge join to control if the outer side needs to enable mark and restore or not. However, having said that, I'm not sure why we'd need outer_unique available so we'd know that we could skip mark/restore. I think inner_unique is enough for this purpose. Take the comment from nodeMergejoin.c: * outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1 * inner: (1 ^3 5 5 5 5 6) current tuple: 3 ... * * Consider the above relations and suppose that the executor has * just joined the first outer "5" with the last inner "5". The * next step is of course to join the second outer "5" with all * the inner "5's". This requires repositioning the inner "cursor" * to point at the first inner "5". This is done by "marking" the * first inner 5 so we can restore the "cursor" to it before joining * with the second outer 5. The access method interface provides * routines to mark and restore to a tuple. If only one inner "5" can exist (unique_inner), then isn't the inner side already in the correct place, and no restore is required? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services unique_joins_2016-10-31.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fest manager for CF 2016-11?
On Mon, Oct 31, 2016 at 1:49 PM, Peter Eisentrautwrote: > Commit fest CF 2016-11 is supposed to start in about a day. I don't > think a commit fest manager was chosen yet. Volunteers? I'd like to pass my turn on this one as CFM, but I can bring in some time for the one of January. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] commit fest manager for CF 2016-11?
Commit fest CF 2016-11 is supposed to start in about a day. I don't think a commit fest manager was chosen yet. Volunteers? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Microvacuum support for Hash Index
Hi Amit, Thanks for showing your interest and reviewing my patch. I have started looking into your review comments. I will share the updated patch in a day or two. With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com On Fri, Oct 28, 2016 at 4:42 PM, Amit Kapilawrote: > On Mon, Oct 24, 2016 at 2:21 PM, Ashutosh Sharma > wrote: >> Hi All, >> >> I have added a microvacuum support for hash index access method and >> attached is the v1 patch for the same. >> > > This is an important functionality for hash index as we already do > have same functionality for other types of indexes like btree. > >> The patch basically takes care >> of the following things: >> >> 1. Firstly, it changes the marking of dead tuples from >> 'tuple-at-a-time' to 'page-at-a-time' during hash index scan. For this >> we accumulate the heap tids and offset of all the hash index tuples if >> it is pointed by kill_prior_tuple during scan and then mark all >> accumulated tids as LP_DEAD either while stepping from one page to >> another (assuming the scan in both forward and backward direction) or >> during end of the hash index scan or during rescan. >> >> 2. Secondly, when inserting tuple into hash index table, if not enough >> space is found on a current page then it ensures that we first clean >> the dead tuples if found in the current hash index page before moving >> to the next page in a bucket chain or going for a bucket split. This >> basically increases the page reusability and reduces the number of >> page splits, thereby reducing the overall size of hash index table. >> > > Few comments on patch: > > 1. > +static void > +hash_xlog_vacuum_one_page(XLogReaderState *record) > +{ > + XLogRecPtr lsn = record->EndRecPtr; > + xl_hash_vacuum *xldata = (xl_hash_vacuum *) XLogRecGetData(record); > + Buffer bucketbuf = InvalidBuffer; > + Buffer buffer; > + Buffer metabuf; > + Page page; > + XLogRedoAction action; > > While replaying the delete/vacuum record on standby, it can conflict > with some already running queries. Basically the replay can remove > some row which can be visible on standby. You need to resolve > conflicts similar to what we do in btree delete records (refer > btree_xlog_delete). > > 2. > + /* > + * Write-lock the meta page so that we can decrement > + * tuple count. > + */ > + _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE); > + > + _hash_vacuum_one_page(rel, metabuf, buf, bucket_buf, > + (buf == bucket_buf) ? true : false); > + > + _hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK); > > It seems here meta page lock is acquired for duration more than > required and also it is not required when there are no deletable items > on page. You can take the metapage lock before decrementing the count. > > 3. > Assert(offnum <= maxoff); > + > > Spurious space. There are some other similar spurious white space > changes in patch, remove them as well. > > > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] make coverage-html on OS X
On 10/29/16 3:31 PM, Jim Nasby wrote: tl;dr: It's critical that you actually do a make install, or at least it is if you've set --prefix with configure. If you don't, then even if you do make check you'le going to get the *installed* libpq, and not the *built* libpq. Actually, sometimes that isn't even enough. I had one checkout that was refusing to successfully build coverage (producing an old .gcda for *everything*) until I rm'd the entire install directory (make uninstall was not enough). I also killed the running postmaster as part of that, so it's possible that's what was causing the problems. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/30/16 1:32 PM, Tomas Vondra wrote: Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's some sort of CPU / OS scheduling artifact. For example, the system has 36 physical cores, 72 virtual ones (thanks to HT). I find it strange that the "good" client counts are always multiples of 72, while the "bad" ones fall in between. 72 = 72 * 1 (good) 108 = 72 * 1.5 (bad) 144 = 72 * 2 (good) 180 = 72 * 2.5 (bad) 216 = 72 * 3 (good) 252 = 72 * 3.5 (bad) 288 = 72 * 4 (good) So maybe this has something to do with how OS schedules the tasks, or maybe some internal heuristics in the CPU, or something like that. It might be enlightening to run a series of tests that are 72*.1 or *.2 apart (say, 72, 79, 86, ..., 137, 144). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequential scan result order vs performance
On 10/30/16 9:12 AM, Tom Lane wrote: I think there will be a lot of howls. People expect that creating a table by inserting a bunch of rows, and then reading back those rows, will not change the order. We already futzed with that guarantee a bit with syncscans, but that only affects quite large tables --- and even there, we were forced to provide a way to turn it off. Leaving a 30% performance improvement on the floor because some people don't grok how sets work seems insane to me. We could have a GUC to disable this. I suspect ORDER BY ctid would be another option. BTW, I've sometimes wished for a mode where queries would silently have result ordering intentionally futzed, to eliminate any possibility of dependence on tuple ordering (as well as having sequences start at some random value). I guess with the hooks that are in place today it wouldn't be hard to stick a ORDER BY random() in if there wasn't already a Sort node at the top level... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
Hi Gilles, On Sat, 29 Oct 2016 22:00:08 +0200 Gilles Daroldwrote: > The attached v10 of the current_logfiles patch Attached is a patch to apply on top of the v10 patch. It updates current_logfiles only once per log rotation. I see no reason to open and write the file twice if both csvlog and stderr logging is happening. I have 2 more questions. I don't understand why you're calling logfile_writename(last_file_name, last_csv_file_name); in the SIGHUP code. Presumably you've already written the old logfile names to current_logfiles. On SIGHUP you want to write the new names, not the old ones. But the point of the SIGHUP code is to look for changes in logfile writing and then call logfile_rotate() to make those changes. And logfile_rotate() calls logfile_writename() as appropriate. Shouldn't the logfile_writename call in the SIGHUP code be eliminated? Second, and I've not paid really close attention here, you're calling logfile_writename() with last_file_name and last_csv_file_name in a number of places. Are you sure that last_file_name and last_csv_file_name is reset to the empty string if stderr/csvlog logging is turned off and the config file re-read? You might be recording that logging is going somewhere that's been turned off by a config change. I've not noticed last_file_name and (especially) last_csv_file_name getting reset to the empty string. FYI, ultimately, in order to re-try writes to current_logfiles after ENFILE and EMFILE failure, I'm thinking that I'll probably wind up with logfile_writename() taking no arguments. It will always write last_file_name and last_csv_file_name. Then it's a matter of making sure that these variables contain accurate values. It would be helpful to let me know if you have any insight regards config file re-read and resetting of these variables before I dive into writing code which retries writes to current_logfiles. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein patch_pg_current_logfile-v10.diff.only_once Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Radix tree for character conversion
Hello, At Fri, 28 Oct 2016 09:42:25 -0400, Tom Lanewrote in <13049.1477662...@sss.pgh.pa.us> > Robert Haas writes: > > On Thu, Oct 27, 2016 at 3:23 AM, Kyotaro HORIGUCHI > > wrote: > >> Perhaps we can put the files into our repositoy by providing some > >> notifications. > > > Uggh, I don't much like advertising clauses. > > Even if the license were exactly compatible with ours, I'd be -1 on > bloating our tarballs with these files. They're large and only a > tiny fraction of developers, let alone end users, will ever care > to look at them. I understood that the intention of Heikki's suggestion, that is, these might be included in PostgreSQL's repository, is looking for a kind of stability, or consistency. The source files are not revision-mangaged. In case where the authorities get unwanted changes or no longer avaiable, .map files have to be edited irelevantly from the authority files maybe from the reason that . Actually some map files have lost their authority file or other map files have got several direct modifications. We will be free from such disturbance by containing "frozen" authority files. On the other hand, I also agree that the advertising or additional bloats of source repositiry are a nuisance. > I think it's fine as long as we have a README file that explains > where to get them. (I'm not even very thrilled with the proposed > auto-download script, as it makes undesirable assumptions about > which internet tools you use, not to mention that it won't work > at all on Windows.) Mmm. It would be a pain in the neck. Some of the files are already stored in "OBSOLETE" directory in the Unicode consortium ftp site, and one of them has been vanished and available from another place, a part of ICU source tree. On the other hand map files are assumed to be generated from the scripts and are to discuraged to be directly edited. Radix map files are uneditable and currently made from the authority files. If some authority files are gone, the additional edit have to be done directly onto map files, and they are in turn to be the authority for radix files. (it's quite easy to chage the authority to current map files, though). By the way, the following phrase of the terms of license. http://www.unicode.org/copyright.html | COPYRIGHT AND PERMISSION NOTICE | | Copyright (c) 1991-2016 Unicode, Inc. All rights reserved. | Distributed under the Terms of Use in http://www.unicode.org/copyright.html. | | Permission is hereby granted, free of charge, to any person obtaining | a copy of the Unicode data files and any associated documentation | (the "Data Files") or Unicode software and any associated documentation | (the "Software") to deal in the Data Files or Software | without restriction, including without limitation the rights to use, | copy, modify, merge, publish, distribute, and/or sell copies of | the Data Files or Software, and to permit persons to whom the Data Files | or Software are furnished to do so, provided that either | (a) this copyright and permission notice appear with all copies | of the Data Files or Software, or | (b) this copyright and permission notice appear in associated | Documentation. I'm afraid that the map (and _radix.map files) are the translates of the "Data Files", and 'translate' is a part of 'modify'. Either the notice is necessary or not, if we decide to wipe the 'true' authority out from our source files, I'd like to make the map files (preferably with comments) as the second authority, _radix.map files are to be getenerated from them, since they're not editable. > I'd actually vote for getting rid of the reference files we > have in the tree now (src/backend/utils/mb/Unicode/*txt), on > the same grounds. That's 600K of stuff that does not need to > be in our tarballs. Anyway, I'd like to register this as an item of this CF. regares, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 10/24/2016 09:22 AM, Petr Jelinek wrote: Hi, attached is updated version of the patch. There are quite a few improvements and restructuring, I fixed all the bugs and basically everything that came up from the reviews and was agreed on. There are still couple of things missing, ie column type definition in protocol and some things related to existing data copy. Here are a few things I've noticed so far. + +CREATE SUBSCRIPTION mysub WITH CONNECTION dbname=foo host=bar user=repuser PUBLICATION mypub; + + + The documentation above doesn't match the syntax, CONNECTION needs to be in single quotes not double quotes I think you want + +CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar user=repuser' PUBLICATION mypub; + + + I am not sure if this is a known issue covered by your comments about data copy but I am still having issues with error reporting on a failed subscription. I created a subscription, dropped the subscription and created a second one. The second subscription isn't active but shows no errors. P: create publication mypub for table public.a; S: create subscription mysub with connection 'dbname=test host=localhost port=5440' publication mypub; P: insert into a(b) values ('t'); S: select * FROM a; a | b ---+--- 1 | t (1 row) Everything is good Then I do S: drop subscription mysub; S: create subscription mysub2 with connection 'dbname=test host=localhost port=5440' publication mypub; P: insert into a(b) values ('f'); S: select * FROM a; a | b ---+--- 1 | t The data doesn't replicate select * FROM pg_stat_subscription; subid | subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time ---+-+-+---+--++---++- 16398 | mysub2 | | | | | || (1 row) The only thing in my log is 2016-10-30 15:27:27.038 EDT [6028] NOTICE: dropped replication slot "mysub" on publisher 2016-10-30 15:27:36.072 EDT [6028] NOTICE: created replication slot "mysub2" on publisher 2016-10-30 15:27:36.082 EDT [6028] NOTICE: synchronized table states I'd expect an error in the log or something. However, if I delete everything from the table on the subscriber then the subscription proceeds I think there are still problems with signal handling in the initial sync If I try to drop mysub2 (while the subscription is stuck instead of deleting the data) the drop hangs If I then try to kill the postmaster for the subscriber nothing happens, have to send it a -9 to go away. However once I do that and then restart the postmaster for the subscriber I start to see the duplicate key errors in the log 2016-10-30 16:00:54.635 EDT [7018] ERROR: duplicate key value violates unique constraint "a_pkey" 2016-10-30 16:00:54.635 EDT [7018] DETAIL: Key (a)=(1) already exists. 2016-10-30 16:00:54.635 EDT [7018] CONTEXT: COPY a, line 1 2016-10-30 16:00:54.637 EDT [7007] LOG: worker process: logical replication worker 16400 sync 16387 (PID 7018) exited with exit code 1 I'm not sure why I didn't get those until I restarted the postmaster but it seems to happen whenever I drop a subscription then create a new one. Creating the second subscription from the same psql session as I create/drop the first seems important in reproducing this I am also having issues dropping a second subscription from the same psql session (table a is empty on both nodes to avoid duplicate key errors) S: create subscription sub1 with connection 'host=localhost dbname=test port=5440' publication mypub; S: create subscription sub2 with connection 'host=localhost dbname=test port=5440' publication mypub; S: drop subscription sub1; S: drop subscription sub2; At this point the drop subscription hangs. The biggest changes are: I added one more prerequisite patch (the first one) which adds ephemeral slots (or well implements UI on top of the code that was mostly already there). The ephemeral slots are different in that they go away either on error or when session is closed. This means the initial data sync does not have to worry about cleaning up the slots after itself. I think this will be useful in other places as well (for example basebackup). I originally wanted to call them temporary slots in the UI but since the behavior is bit different from temp tables I decided to go with what the underlying code calls them in UI as well. I also split out the libpqwalreceiver rewrite to separate patch which does just the re-architecture and does not really add new functionality. And I did the re-architecture bit differently based on the review. There is now new executor module in execReplication.c, no new nodes but several utility commands. I moved there the tuple lookup functions from apply and also wrote new interfaces for doing inserts/updates/deletes to
Re: [HACKERS] sources.sgml typo
>> In doc/src/sgml/sources.sgml: >> >> When the definition an inline function references symbols >> (i.e. variables, functions) that are only available as part of the >> backend, the function may not be visible when included from frontend >> code. >> >> Shinichi Matsuda reported that there should be "of" between >> "definition" and "an inline function" and I think he is right. Patch >> attached. If there's no objection, I will commit it. > > Sounds good. Done. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
Hi, On 10/27/2016 01:44 PM, Amit Kapila wrote: On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondrawrote: FWIW I plan to run the same test with logged tables - if it shows similar regression, I'll be much more worried, because that's a fairly typical scenario (logged tables, data set > shared buffers), and we surely can't just go and break that. Sure, please do those tests. OK, so I do have results for those tests - that is, scale 3000 with shared_buffers=16GB (so continuously writing out dirty buffers). The following reports show the results slightly differently - all three "tps charts" next to each other, then the speedup charts and tables. Overall, the results are surprisingly positive - look at these results (all ending with "-retest"): [1] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest [2] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-noskip-retest [3] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest All three show significant improvement, even with fairly low client counts. For example with 72 clients, the tps improves 20%, without significantly affecting variability variability of the results( measured as stdddev, more on this later). It's however interesting that "no_content_lock" is almost exactly the same as master, while the other two cases improve significantly. The other interesting thing is that "pgbench -N" [3] shows no such improvement, unlike regular pgbench and Dilip's workload. Not sure why, though - I'd expect to see significant improvement in this case. I have also repeated those tests with clog buffers increased to 512 (so 4x the current maximum of 128). I only have results for Dilip's workload and "pgbench -N": [4] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest-512 [5] http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest-512 The results are somewhat surprising, I guess, because the effect is wildly different for each workload. For Dilip's workload increasing clog buffers to 512 pretty much eliminates all benefits of the patches. For example with 288 client, the group_update patch gives ~60k tps on 128 buffers [1] but only 42k tps on 512 buffers [4]. With "pgbench -N", the effect is exactly the opposite - while with 128 buffers there was pretty much no benefit from any of the patches [3], with 512 buffers we suddenly get almost 2x the throughput, but only for group_update and master (while the other two patches show no improvement at all). I don't have results for the regular pgbench ("noskip") with 512 buffers yet, but I'm curious what that will show. In general I however think that the patches don't show any regression in any of those workloads (at least not with 128 buffers). Based solely on the results, I like the group_update more, because it performs as good as master or significantly better. 2. We do see in some cases that granular_locking and no_content_lock patches has shown significant increase in contention on CLOGControlLock. I have already shared my analysis for same upthread [8]. I've read that analysis, but I'm not sure I see how it explains the "zig zag" behavior. I do understand that shifting the contention to some other (already busy) lock may negatively impact throughput, or that the group_update may result in updating multiple clog pages, but I don't understand two things: (1) Why this should result in the fluctuations we observe in some of the cases. For example, why should we see 150k tps on, 72 clients, then drop to 92k with 108 clients, then back to 130k on 144 clients, then 84k on 180 clients etc. That seems fairly strange. (2) Why this should affect all three patches, when only group_update has to modify multiple clog pages. For example consider this: http://tvondra.bitbucket.org/index2.html#dilip-300-logged-async For example looking at % of time spent on different locks with the group_update patch, I see this (ignoring locks with ~1%): event_type wait_event 36 72 108 144 180 216 252 288 - - -60 63 45 53 38 50 33 48 Client ClientRead 33 239 146 1048 LWLockNamedCLogControlLock 27 33 14 34 14 33 14 LWLockTranche buffer_content029 13 19 18 26 22 I don't see any sign of contention shifting to other locks, just CLogControlLock fluctuating between 14% and 33% for some reason. Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's some sort of CPU / OS scheduling artifact. For example, the system has 36 physical cores, 72 virtual ones (thanks to HT). I find it strange that the "good" client counts are always multiples of 72, while the "bad" ones fall in between. 72 = 72 * 1 (good)
Re: [HACKERS] [sqlsmith] Missing CHECK_FOR_INTERRUPTS in tsquery_rewrite
I wrote: > Also, I think this is outright *wrong* for phrase search --- dropping some > of the child nodes without any other adjustment isn't valid is it? After further digging, it seems there's no bug because the tree is originally binary and QTNTernary won't try to flatten OP_PHRASE nodes. So we can't actually get to this logic for such nodes. But seems like an Assert for that wouldn't be a bad thing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Missing CHECK_FOR_INTERRUPTS in tsquery_rewrite
Andreas Seltenreichwrites: > testing with sqlsmith yielded an uncancellable backend hogging CPU time. > Gdb showed it was busy in findeq() of tsquery_rewrite.c. This function > appears to have exponential complexity wrt. the size of the involved > tsqueries. The following query runs for 12s on my machine with no way > to cancel it and incrementing the length of the first argument by 1 > doubles this time. > select ts_rewrite( > (select string_agg(i::text, '&')::tsquery from generate_series(1,32) g(i)), > (select string_agg(i::text, '&')::tsquery from generate_series(1,19) g(i)), > 'foo'); > The attached patch adds a CHECK_FOR_INTERRUPTS to make it cancellable. A CHECK_FOR_INTERRUPTS is probably a good idea, but man is this code stupid. It seems to be checking for subset inclusion by forming every possible subset of the test node and then checking for exact equality to the target set. Seems like we should be able to do better. Also, I think this is outright *wrong* for phrase search --- dropping some of the child nodes without any other adjustment isn't valid is it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Sun, Oct 2, 2016 at 11:20 PM, Michael Paquierwrote: > On Sat, Sep 10, 2016 at 7:28 AM, Kevin Grittner wrote: >> v6 fixes recently-introduced bit-rot. > > Not as big as I thought, only 2k when both patches are combined... The > patch without noapi in its name needs to be applied first, and after > the patch with noapi can be applied. > 60 files changed, 2073 insertions(+), 63 deletions(-) > Moved to next CF. In an attempt to make this patch more digestible for reviewers, I have split it up as follows: transition-c-triggers-only-v7.diff contrib/pgstattuple/pgstattuple.c| 2 + doc/src/sgml/catalogs.sgml | 16 ++ doc/src/sgml/ref/create_trigger.sgml | 94 +-- src/backend/commands/tablecmds.c | 5 +- src/backend/commands/trigger.c | 327 - src/backend/nodes/copyfuncs.c| 16 ++ src/backend/nodes/equalfuncs.c | 14 ++ src/backend/nodes/outfuncs.c | 13 + src/backend/parser/gram.y| 70 +- src/backend/utils/adt/ruleutils.c| 23 ++ src/bin/pg_basebackup/walmethods.c | 5 - src/include/catalog/pg_trigger.h | 13 +- src/include/commands/trigger.h | 2 + src/include/nodes/nodes.h| 1 + src/include/nodes/parsenodes.h | 17 ++ src/include/parser/kwlist.h | 3 + src/include/utils/reltrigger.h | 7 + 17 files changed, 581 insertions(+), 47 deletions(-) This part is virtually unchanged (just curing bit-rot) since August, 2014, when I believe I had addressed all issues raised by reviewers. It does provide a barely usable feature, since the syntax for transition tables is added and tuplestores are created when needed (and only when needed), with references stored in the TriggerData structure. No new execution nodes are provided, so only C triggers can use these relations, and must explicitly and directly access the tuplestores from within the C code -- there is no support for referencing these tuplestores from within queries. This is basic infrastructure needed for the more complete feature. As far as I know there are no objections to what is implemented here. I have pulled it out to make the review of the more controversial portions easier. Since it had quite a bit of review two years ago, I will do some testing to make sure that nothing has broken and then push this part in a few days if there are no objections. transition-noapi.diff contrib/pgstattuple/pgstattuple.c | 2 - doc/src/sgml/spi.sgml | 279 src/backend/commands/explain.c| 10 + src/backend/executor/Makefile | 3 +- src/backend/executor/execAmi.c| 6 + src/backend/executor/execProcnode.c | 14 + src/backend/executor/nodeTuplestorescan.c | 200 ++ src/backend/nodes/copyfuncs.c | 25 ++ src/backend/nodes/nodeFuncs.c | 2 + src/backend/nodes/outfuncs.c | 20 ++ src/backend/nodes/print.c | 4 + src/backend/nodes/readfuncs.c | 7 + src/backend/optimizer/path/allpaths.c | 44 +++ src/backend/optimizer/path/costsize.c | 66 + src/backend/optimizer/plan/createplan.c | 71 + src/backend/optimizer/plan/setrefs.c | 11 + src/backend/optimizer/plan/subselect.c| 5 + src/backend/optimizer/prep/prepjointree.c | 2 + src/backend/optimizer/util/pathnode.c | 25 ++ src/backend/optimizer/util/plancat.c | 4 +- src/backend/optimizer/util/relnode.c | 1 + src/backend/parser/Makefile | 3 +- src/backend/parser/analyze.c | 11 + src/backend/parser/parse_clause.c | 23 +- src/backend/parser/parse_node.c | 2 + src/backend/parser/parse_relation.c | 115 +++- src/backend/parser/parse_target.c | 2 + src/backend/parser/parse_tuplestore.c | 34 +++ src/backend/tcop/utility.c| 3 +- src/backend/utils/adt/ruleutils.c | 1 + src/backend/utils/cache/Makefile | 2 +- src/backend/utils/cache/tsrcache.c| 111 src/bin/pg_basebackup/walmethods.c| 5 + src/include/executor/nodeTuplestorescan.h | 24 ++ src/include/nodes/execnodes.h | 18 ++ src/include/nodes/nodes.h | 2 + src/include/nodes/parsenodes.h| 11 +- src/include/nodes/plannodes.h | 10 + src/include/optimizer/cost.h | 3 + src/include/optimizer/pathnode.h | 2 + src/include/parser/parse_node.h | 2 + src/include/parser/parse_relation.h | 4 + src/include/parser/parse_tuplestore.h | 24 ++ src/include/utils/tsrcache.h | 27 ++ src/include/utils/tsrmd.h | 29 ++ src/include/utils/tsrmdcache.h| 31 +++ src/include/utils/tuplestore.h| 14 + src/pl/plpgsql/src/pl_comp.c |
Re: [HACKERS] sequential scan result order vs performance
Andres Freundwrites: > It's quite easy to change iteration so we start with the latest item, > and iterate till the first, rather than the other way round. In > benchmarks with somewhat wide columns and aggregation, this yields > speedups of over 30%, before hitting other bottlenecks. > I do wonder however if it's acceptable to change the result order of > sequential scans. I think there will be a lot of howls. People expect that creating a table by inserting a bunch of rows, and then reading back those rows, will not change the order. We already futzed with that guarantee a bit with syncscans, but that only affects quite large tables --- and even there, we were forced to provide a way to turn it off. If you were talking about 3X then maybe it would be worth it, but for 30% (on a subset of queries) I am not excited. I wonder whether we could instead adjust the rules for insertion so that tuples tend to be physically in order by itemid. I'm imagining leaving two "holes" in a page and sometimes (hopefully not often) having to shift data during insert to preserve the ordering. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sources.sgml typo
On 2016-10-30 21:30:45 +0900, Tatsuo Ishii wrote: > In doc/src/sgml/sources.sgml: > > When the definition an inline function references symbols > (i.e. variables, functions) that are only available as part of the > backend, the function may not be visible when included from frontend > code. > > Shinichi Matsuda reported that there should be "of" between > "definition" and "an inline function" and I think he is right. Patch > attached. If there's no objection, I will commit it. Sounds good. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] sources.sgml typo
In doc/src/sgml/sources.sgml: When the definition an inline function references symbols (i.e. variables, functions) that are only available as part of the backend, the function may not be visible when included from frontend code. Shinichi Matsuda reported that there should be "of" between "definition" and "an inline function" and I think he is right. Patch attached. If there's no objection, I will commit it. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/sources.sgml b/doc/src/sgml/sources.sgml index d132ee4..877fced 100644 --- a/doc/src/sgml/sources.sgml +++ b/doc/src/sgml/sources.sgml @@ -898,7 +898,7 @@ BETTER: unrecognized node type: 42 expressions of various types need to be passed to the macro. - When the definition an inline function references symbols + When the definition of an inline function references symbols (i.e. variables, functions) that are only available as part of the backend, the function may not be visible when included from frontend code. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Mention column name in error messages
On Sun, Oct 30, 2016 at 12:04 AM, Michael Paquierwrote: > > Okay, so I have reworked the patch a bit and finished with the > attached, adapting the context message to give more information. I > have noticed as well a bug in the patch: the context callback was set > before checking if the column used for transformation is checked on > being a system column or not, the problem being that the callback > could be called without the fields set. > Interesting. I wasn't sure it was set at the right time indeed. I have updated the regression tests that I found, the main portion of > the patch being dedicated to that and being sure that all the > alternate outputs are correctly refreshed. In this case int8, float4, > float8, xml and contrib/citext are the ones impacted by the change > with alternate outputs. > Thanks! I couldn't find time to update it last week, so thanks for chiming in. > I am passing that down to a committer for review. The patch looks > large, but at 95% it involves diffs in the regression tests, > alternative outputs taking a large role in the bloat. > Great, thanks Michael! -- Franck
Re: [HACKERS] JIT compiler for expressions
Hi , On 2016-10-28 14:47:35 +0300, Dmitry Melnik wrote: > We'd like to present our work on adding LLVM JIT compilation of expressions > in SQL queries for PostgreSQL. Great! I'm also working in the area, albeit with a, I think, a bit different approach[1]. Is your goal to integrate this work into postgres proper, or is this more an academic research project? If the former, lets try to collaborate. If the latter, lets talk, so we're not integrating something dumb ;) [1]. So far I've basically converted expression evaluation, and tuple deforming, into small interpreted (switch/computed goto opcode dispatch) mini-languages, which then can be JITed. Adding a small handwritten x86-64 JIT (out of fun, not because I think that's a good approach) also resulted in quite noticeable speedups. Did you experiment with JITing tuple deforming as well? The reason I was thinking of going in this direction, is that it's a lot faster to compile such mini pieces of code, and it already gives significant speedups. There still are function calls to postgres functions, but they're all direct function calls, instead of indirect ones. > Currently, our JIT is used to compile expressions in every query, so for > short-running queries JIT compilation may take longer than query execution > itself. We plan to address it by using planner estimate to decide whether > it worth JIT compiling, also we may try parallel JIT compilation. But for > now we recommend testing it on a large workload in order to pay off the > compilation (we've tested on 40GB database for Q1). Could you give some estimates about how long JITing takes for you, say for Q1? Different approaches here obviously have very different tradeoffs. > This work is a part of our greater effort on implementing full JIT compiler > in PostgreSQL, where along with JITting expressions we've changed the > iteration model from Volcano-style to push-model and reimplemented code > generation with LLVM for most of Scan/Aggregation/Join methods. That > approach gives much better speedup (x4-5 times on Q1), but it takes many > code changes, so we're developing it as PostgreSQL extension. FWIW, I think long term, we're going to want something like that in core. I'm currently working on a "gradual" transformation towards that, by *optionally* dealing with "batches" of tuples which get passed around. Noticeable speedups, but not in the 4-5x range. > Also we're going to give a lightning talk at upcoming PGConf.EU in Tallinn, > and discuss the further development with PostgreSQL community. We'd > appreciate any feedback! Cool, lets chat a bit, I'm also here. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequential scan result order vs performance
Hi, On 2016-10-30 00:36:55 -0700, Andres Freund wrote: > It's quite easy to change iteration so we start with the latest item, > and iterate till the first, rather than the other way round. In > benchmarks with somewhat wide columns and aggregation, this yields > speedups of over 30%, before hitting other bottlenecks. One more point: Over IM Robert commented that it's not guaranteed that itemid order correlates precisely with reverse "tuple data" order. After PageRepairFragmentation() that'll not be the case anymore. That's true, but I suspect in many cases with larger analytics queries the correlation will still be significant, and we also could make it guaranteed with the price of making PageRepairFragmentation() a bit more expensive. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JIT compiler for expressions
This sounds amazing. My only comment is that LLVM 3.7 is kind of old in the accelerated world of LLVM. If you have patches to LLVM you need you won't have much success submitting them as patches on 3.7. The current stable release is 3.9 and the development snapshots are 4.0. I know LLVM moves quickly and that makes it hard to try to track the development. If you worked with 4.0 you might find the apis you're using getting deprecated and rewritten several times while your project is under development.
[HACKERS] sequential scan result order vs performance
Hi, while working on the executor, to process "batches" or "bubbles" of tuples I hit some weird performance issues (as in things didn't improve as much as I had hoped). A fair amount of headscratching later I figured out that the tuple order in sequential scans is a major bottleneck. When iterating over a page we return tuples in itemid order, which makes them returned in *descending* order address-wise, as tuples are stored starting from the end of the page. But when actually accessing the tuples, we access them increasing address order (header, and then column by column). It appears that that, quite understandable confuses prefetch units, leading to drastically increased cache miss ratios. It's quite easy to change iteration so we start with the latest item, and iterate till the first, rather than the other way round. In benchmarks with somewhat wide columns and aggregation, this yields speedups of over 30%, before hitting other bottlenecks. I do wonder however if it's acceptable to change the result order of sequential scans. People don't tend to specify ORDER BY everwhere - as evidenced by large swathes of our regression tests failing spuriously - so they might not be happy to see a somewhat weird order (pages sequentially increasing, but individual tuples inside a page in reverse order). We could change the order only in cases where the user doesn't actually see the result, say below aggregation, sort, and whatnot nodes. On the other hand the benefit is quite significant for heavily filtered sequential scans as well, COPY out also benefits. Comments? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Tue, 28 Jun 2016 11:06:24 +0200 Gilles Daroldwrote: > Le 07/04/2016 08:30, Karl O. Pinc a écrit : > > "src/backend/postmaster/syslogger.c expects to see fopen() fail > > with > ENFILE and EMFILE. What will you do if you get these?" > > - Nothing, if the problem occurs during the log rotate call, then > log rotation file is disabled so logfile_writename() will not be > called. Case where the problem occurs between the rotation call and > the logfile_writename() call is possible but I don't think that it > will be useful to try again. In this last case log filename will be > updated during next rotation. The case I'm interested in is when the rotation call succeeds but you get ENFILE or EMFILE in logfile_writename() and current_logfiles is not updated. This looks like an ugly problem that only happens sporadically under load. If I've set log rotation to, say, 1 week, and I'm post-processing my logs based on the current_logfiles content, and the logs rotate but current_logfiles is not updated, then I lose a week of log post-processing. I'm experimenting with some code that retries writing current_logfiles, if it failed due to ENFILE or EMFILE, the next time a log message is written. If that's too often it'd be easy enough to add a backoff counter that retries progressively less frequently based on a "clock tick" per log message write. However, per my last email, it'll be Tuesday before I really get back to this. Let me know if, instead, you want to jump in and write the code, have a better idea, think this is a really stupid approach, or have other reasons why I should abandon this plan. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to implement pg_current_logfile() function
On Sat, 29 Oct 2016 22:00:08 +0200 Gilles Daroldwrote: > The attached v10 of the current_logfiles patch include your last > changes on documentation but not the patch on v9 about the > user-supplied GUC value. I think the v10 path is ready for committers > and that the additional patch to add src/include/utils/guc_values.h > to define user GUC values is something that need to be taken outside > this one. Imo, thoses GUC values (stderr, csvlog) are not expected to > change so often to require a global definition, but why not, if > committers think this must be done I can add it to a v11 patch. I agree that the guc_values.h patches should be submitted separately to the committers, when your patch is submitted. Creating symbols for these values is a matter of coding style they should resolve. (IMO it's not whether the values will change, it's whether someone reading the code can read the letters "stdout" and know to what they refer and where to find related usage elsewhere in the code.) I'll keep up the guc_values.h patches and have them ready for submission along with your patch. I don't think your patch is quite ready for submission to the committers. Attached is a patch to be applied on top of your v10 patch which does basic fixup to logfile_writename(). I have some questions about logfile_writename(): Why does the logfile_open() call fail silently? Why not use ereport() here? (With a WARNING level.) Why do the ereport() invocations all have a LOG level? You're not recovering and the errors are unexpected so I'd think WARNING would be the right level. (I previously, IIRC, suggested LOG level -- only if you are retrying and recovering from an error.) Have you given any thought to my proposal to change CURRENT_LOG_FILENAME to LOG_METAINFO_FILE? I'm not sure I've looked at every bit of your patch yet. I won't have much time to do more real work until after Tuesday morning. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein patch_pg_current_logfile-v10.diff.logfile_writename Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers