Re: [HACKERS] Parallel build with MSVC
On Thu, Sep 8, 2016 at 3:54 PM, Christian Ullrich wrote: > Much apologizings for coming in late again, but I just realized it would be > better if the user-controlled flags came after all predefined options the > user might want to override. Right now that is only /verbosity in both build > and clean operations. > > Patch attached, also reordering the ecpg-clean command line in clean.bat to > match the others that have the project file first. -"msbuild $buildwhat.vcxproj $msbflags /verbosity:normal /p:Configuration=$bconf" +"msbuild $buildwhat.vcxproj /verbosity:normal $msbflags /p:Configuration=$bconf" Why not... If people are willing to switch to /verbosity:detailed and double the amount of build time... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel build with MSVC
* Noah Misch wrote: Committed. Much apologizings for coming in late again, but I just realized it would be better if the user-controlled flags came after all predefined options the user might want to override. Right now that is only /verbosity in both build and clean operations. Patch attached, also reordering the ecpg-clean command line in clean.bat to match the others that have the project file first. -- Christian >From 09a5f3945e2b87b56ca3525a56db970e9ecf8ffd Mon Sep 17 00:00:00 2001 From: Christian Ullrich Date: Thu, 8 Sep 2016 08:34:45 +0200 Subject: [PATCH] Let MSBFLAGS be used to override predefined options. --- src/tools/msvc/build.pl | 4 ++-- src/tools/msvc/clean.bat | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/tools/msvc/build.pl b/src/tools/msvc/build.pl index 5273977..a5469cd 100644 --- a/src/tools/msvc/build.pl +++ b/src/tools/msvc/build.pl @@ -54,7 +54,7 @@ elsif (uc($ARGV[0]) ne "RELEASE") if ($buildwhat and $vcver >= 10.00) { system( - "msbuild $buildwhat.vcxproj $msbflags /verbosity:normal /p:Configuration=$bconf" + "msbuild $buildwhat.vcxproj /verbosity:normal $msbflags /p:Configuration=$bconf" ); } elsif ($buildwhat) @@ -63,7 +63,7 @@ elsif ($buildwhat) } else { - system("msbuild pgsql.sln $msbflags /verbosity:normal /p:Configuration=$bconf"); + system("msbuild pgsql.sln /verbosity:normal $msbflags /p:Configuration=$bconf"); } # report status diff --git a/src/tools/msvc/clean.bat b/src/tools/msvc/clean.bat index e21e37f..b972ddf 100755 --- a/src/tools/msvc/clean.bat +++ b/src/tools/msvc/clean.bat @@ -107,6 +107,6 @@ REM for /r %%f in (*.sql) do if exist %%f.in del %%f cd %D% REM Clean up ecpg regression test files -msbuild %MSBFLAGS% /NoLogo ecpg_regression.proj /t:clean /v:q +msbuild ecpg_regression.proj /NoLogo /v:q %MSBFLAGS% /t:clean goto :eof -- 2.10.0.windows.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Truncating/vacuuming relations on full tablespaces
On Mon, Jun 20, 2016 at 9:28 PM, Asif Naeem wrote: > Thank you for useful suggestions. PFA patch, I have tried to cover all the > points mentioned. > >> >> Patch needs rebase, it is failing to apply on latest master. And also there are some pending comment fix from Robert. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Bug in two-phase transaction recovery
On Wed, Sep 7, 2016 at 10:48 PM, Stas Kelvich wrote: > Some time ago two-phase state file format was changed to have variable size > GID, > but several places that read that files were not updated to use new offsets. > Problem > exists in master and 9.6 and can be reproduced on prepared transactions with > savepoints. Oops and meh. This meritates an open item, and has better be fixed by 9.6.0. I am glad you noticed that honestly. And we had better take care of this issue as soon as possible. > For example: > > create table t(id int); > begin; > insert into t values (42); > savepoint s1; > insert into t values (43); > prepare transaction 'x'; > begin; > insert into t values (142); > savepoint s1; > insert into t values (143); > prepare transaction 'y’; > > and restart the server. Startup process will hang. Fix attached. In crash recovery this is very likely to fail with an assertion if those are enabled: TRAP: FailedAssertion("!(TransactionIdFollows(subxid, xid))", File: "twophase.c", Line: 1767) And the culprit is e0694cf9, which makes this open item owned by Simon. I also had a look at the patch you are proposing, and I think that's a correct fix. I also looked at the other code paths scanning the 2PC state file and did not notice extra problems. The good news is that the 2PC file generation is not impacted, only its scan at recovery is, so the impact of the bug is limited for existing 9.6 deployments where both 2PC and subtransactions are involved. > Also while looking at StandbyRecoverPreparedTransactions() i’ve noticed that > buffer > for 2pc file is allocated in TopMemoryContext but never freed. That probably > exists > for a long time. @@ -1886,6 +1886,8 @@ StandbyRecoverPreparedTransactions(bool overwriteOK) Assert(TransactionIdFollows(subxid, xid)); SubTransSetParent(xid, subxid, overwriteOK); } + + pfree(buf); } This one is a good catch. I have checked also the other callers of ReadTwoPhaseFile but I am not seeing any other leak. That's a leak, not critical though so applying it only on HEAD would be enough IMO. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting SJIS as a database encoding
Hello, At Wed, 07 Sep 2016 16:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20160907.161304.112519789.horiguchi.kyot...@lab.ntt.co.jp> > > Implementing radix tree code, then redefining the format of mapping table > > > to suppot radix tree, then modifying mapping generator script are needed. > > > > > > If no one oppse to this, I'll do that. So, I did that as a PoC. The radix tree takes a little less than 100k bytes (far smaller than expected:) and it is defnitely faster than binsearch. The attached patch does the following things. - Defines a struct for static radix tree (utf_radix_tree). Currently it supports up to 3-byte encodings. - Adds a map generator script UCS_to_SJIS_radix.pl, which generates utf8_to_sjis_radix.map from utf8_to_sjis.map. - Adds a new conversion function utf8_to_sjis_radix. - Modifies UtfToLocal so as to allow map to be NULL. - Modifies utf8_to_sjis to use the new conversion function instead of ULmapSJIS. The followings are to be done. - utf8_to_sjis_radix could be more generic. - SJIS->UTF8 is not implemented but it would be easily done since there's no difference in using the radix tree mechanism. (but the output character is currently assumed to be 2-byte long) - It doesn't support 4-byte codes so this is not applicable to sjis_2004. Extending the radix tree to support 4-byte wouldn't be hard. The following is the result of a simple test. =# create table t (a text); alter table t alter column a storage plain; =# insert into t values ('... 7130 cahracters containing (I believe) all characters in SJIS encoding'); =# insert into t values ('... 7130 cahracters containing (I believe) all characters in SJIS encoding'); # Doing that twice is just my mistake. $ export PGCLIENTENCODING=SJIS $ time psql postgres -c ' $ psql -c '\encoding' postgres SJIS $ time psql postgres -c 'select t.a from t, generate_series(0, )' > /dev/null real0m22.696s user0m16.991s sys 0m0.182s> Using binsearch the result for the same operation was real0m35.296s user0m17.166s sys 0m0.216s Returning in UTF-8 bloats the result string by about 1.5 times so it doesn't seem to make sense comparing with it. But it takes real = 47.35s. regards, -- Kyotaro Horiguchi NTT Open Source Software Center 0001-Use-radix-tree-to-encoding-characters-of-Shift-JIS.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel build with MSVC
On Mon, Sep 05, 2016 at 02:43:48PM +0900, Michael Paquier wrote: > On Mon, Sep 5, 2016 at 9:18 AM, Noah Misch wrote: > > Every vcbuild and msbuild invocation ought to recognize this variable, so > > please update the two places involving ecpg_regression.proj. Apart from > > that, > > the patch looks good. > > Good catch. I did not notice those during my lookups of those patches. > Not my intention to bump into Christian's feet, but here are updated > patches. Committed. > Actually, is that worth adding for clean.bat? I doubt that many people > would care if MSBFLAGS is not supported in it. The cleanup script is > not the bottleneck, the build script is. I added it in the patch 0001 > attached but I doubt that's worth it to be honest. If parallelism were the only consideration, then I would agree. We don't know, in general, how each operation will respond to arbitrary flags the user selects. I did remove your clean.bat documentation change; documenting MSBFLAGS in the one place suffices. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
Hi, On 2016-08-30 21:50:05 -0400, Peter Eisentraut wrote: > I would like to propose the attached patch implementing autonomous > transactions for discussion and review. > > This work was mostly inspired by the discussion about pg_background a > while back [0]. It seemed that most people liked the idea of having > something like that, but couldn't perhaps agree on the final interface. > Most if not all of the preliminary patches in that thread were > committed, but the user interface portions were then abandoned in favor > of other work. (I'm aware that rebased versions of pg_background > existing. I have one, too.) I kind of dislike this approach for a different reason than already mentioned in this thread: Namely it's not re-usable for implementing streaming logical replication of large transaction, i.e. allow to decode & apply un-committed changes. The issue there is that one really needs to have multiple transactions active in the same connection, which this approach does not allow. That's not necessarily a fatal objection, but I did want to throw that as a design goal out there. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Thu, Sep 8, 2016 at 10:02 AM, Mark Kirkwood wrote: > > Repeating my tests with these new patches applied points to the hang issue > being solved. I tested several 10 minute runs (any of which was enough to > elicit the hang previously). I'll do some longer ones, but looks good! > Thanks for verifying the issue and doing further testing of the patch. It is really helpful. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Wed, Sep 7, 2016 at 11:49 PM, Jeff Janes wrote: > On Thu, Sep 1, 2016 at 8:55 PM, Amit Kapila wrote: >> >> >> I have fixed all other issues you have raised. Updated patch is >> attached with this mail. > > > I am finding the comments (particularly README) quite hard to follow. There > are many references to an "overflow bucket", or similar phrases. I think > these should be "overflow pages". A bucket is a conceptual thing consisting > of a primary page for that bucket and zero or more overflow pages for the > same bucket. There are no overflow buckets, unless you are referring to the > new bucket to which things are being moved. > Hmm. I think page or block is a concept of database systems and buckets is a general concept used in hashing technology. I think the difference is that there are primary buckets and overflow buckets. I have checked how they are referred in one of the wiki pages [1], search for overflow on that wiki page. Now, I think we shouldn't be inconsistent in using them. I will change to make it same if I find any inconsistency based on what you or other people think is the better way to refer overflow space. > Was maintaining on-disk compatibility a major concern for this patch? Would > you do things differently if that were not a concern? > I would not have done much differently from what it is now, however one thing I have considered during development was to change the hash index tuple structure as below to mark the index tuples as move-by-split: typedef struct { IndexTuple entry; /* tuple to insert */ bool moved_by_split; } HashEntryData; The other alternative was to use the (unused) bit in IndexTupleData->tinfo. I have chosen the later approach, now one could definitely argue that it is the last available bit in IndexTuple and using it for hash indexes might or might not be best thing to do. However, I think it is also not advisable to break the compatibility if we can use some existing bit. In any case, the same question can arise whenever anyone wants to use it for some other purpose. > In particular, I am thinking about the need for every insert to > exclusive-content-lock the meta page to increment the index-wide tuple > count. This is not what this patch has changed. The main purpose of this patch is to change heavy-weight locking to light-weight locking and provide a way to handle the in-complete splits, both of which are required to sensibly write WAL for hash-indexes. Having said that, I agree with your point that we can improve the insertion logic, so that we don't need to Write-lock the meta-page at each insert. I have noticed some other improvements in hash indexes as well during this work like caching the meta page, reduce lock/unlock calls for retrieving tuples from a page by making hash index scans work a page at a time as we do for btree scans, kill_prior_tuple mechanism is current quite naive and needs improvement and the biggest improvement is needed in create index logic where we are inserting tuple-by-tuple whereas btree operates at page level and also by-passes the shared buffers. One of such improvements (cache the meta page) is already being worked upon by my colleague and the patch [2] for same is in CF. The main point I want to high light is that apart from what this patch does, there are number of other potential areas which needs improvements in hash indexes and I think it is better to do those as separate enhancements rather than as a single patch. > I think that this is going to be a huge bottleneck on update > intensive workloads (which I don't believe have been performance tested as > of yet). I have done some performance testing with this patch and I find there was a significant improvement as compare to what we have now in hash indexes even for read-write workload. I think the better idea is to compare it with btree, but in any case, even if this proves to be a bottleneck, we should try to improve it in a separate patch rather than as a part of this patch. > I was wondering if we might not want to change that so that each > bucket keeps a local count, and sweeps that up to the meta page only when it > exceeds a threshold. But this would require the bucket page to have an area > to hold such a count. Another idea would to keep not a count of tuples, but > of buckets with at least one overflow page, and split when there are too > many of those. I think both of these ideas could lead to change the point (tuple count) where we currently split. This might impact the search speed and space usage. Yet another alternative could be to change hashm_ntuples to 64bit and use 64-bit atomics to operate on it or may be use a separate spin-lock to protect it. However, whatever we decide to do with it, I think it is a matter of separate patch. Thanks for looking into patch. [1] - https://en.wikipedia.org/wiki/Linear_hashing [2] - https://commitfest.postgresql.org/10/715/ -- With Regards, Amit Kapila. EnterpriseDB: http://www.ente
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On 07/09/16 21:58, Amit Kapila wrote: On Wed, Aug 24, 2016 at 10:32 PM, Jeff Janes wrote: On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila wrote: On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes wrote: After an intentionally created crash, I get an Assert triggering: TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] & (1<<((bitmapbit)%32", File: "hashovfl.c", Line: 553) freep[0] is zero and bitmapbit is 16. Here what is happening is that when it tries to clear the bitmapbit, it expects it to be set. Now, I think the reason for why it didn't find the bit as set could be that after the new overflow page is added and the bit corresponding to it is set, you might have crashed the system and the replay would not have set the bit. Then while freeing the overflow page it can hit the Assert as mentioned by you. I think the problem here could be that I am using REGBUF_STANDARD to log the bitmap page updates which seems to be causing the issue. As bitmap page doesn't follow the standard page layout, it would have omitted the actual contents while taking full page image and then during replay, it would not have set the bit, because page doesn't need REDO. I think here the fix is to use REGBUF_NO_IMAGE as we use for vm buffers. If you can send me the detailed steps for how you have produced the problem, then I can verify after fixing whether you are seeing the same problem or something else. The test is rather awkward, it might be easier to just have me test it. Okay, I have fixed this issue as explained above. Apart from that, I have fixed another issue reported by Mark Kirkwood upthread and few other issues found during internal testing by Ashutosh Sharma. The locking issue reported by Mark and Ashutosh is that the patch didn't maintain the locking order while adding overflow page as it maintains in other write operations (lock the bucket pages first and then metapage to perform the write operation). I have added the comments in _hash_addovflpage() to explain the locking order used in modified patch. During stress testing with pgbench using master-standby setup, we found an issue which indicates that WAL replay machinery doesn't expect completely zeroed pages (See explanation of RBM_NORMAL mode atop XLogReadBufferExtended). Previously before freeing the overflow page we were zeroing it, now I have changed it to just initialize the page such that the page will be empty. Apart from above, I have added support for old snapshot threshold in the hash index code. Thanks to Ashutosh Sharma for doing the testing of the patch and helping me in analyzing some of the above issues. I forgot to mention in my initial mail that Robert and I had some off-list discussions about the design of this patch, many thanks to him for providing inputs. Repeating my tests with these new patches applied points to the hang issue being solved. I tested several 10 minute runs (any of which was enough to elicit the hang previously). I'll do some longer ones, but looks good! regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Useless dependency assumption libxml2 -> libxslt in MSVC scripts
Hi all, Looking at the MSVC scripts for some stuff I have noticed the following thing: if ($options->{xml}) { if (!($options->{xslt} && $options->{iconv})) { die "XML requires both XSLT and ICONV\n"; } } But I don't understand the reason behind such a restriction to be honest because libxml2 does not depend on libxslt. The contrary is true: libxslt needs libxml2. Note as well that libxml2 does depend on ICONV though. So I think that this condition should be relaxed as follows: if ($options->{xml} && !$options->{iconv}) { die "XML requires ICONV\n"; } And we also need to be sure that when libxslt is specified, libxml2 is here to have the build working correctly. Relaxing that would allow people to compile contrib/xml2 with just a dependency to libxml2, without libxslt, something possible on any *nix systems. As far as I can see this restriction comes from 9 years ago in 2007 and commit 7f58ed1a. So nobody has complained about that so far :) Attached is a patch to address both issues. Comments are welcome. -- Michael msvc_xml_relax.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On 8 September 2016 at 08:18, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer >> Of course, if we could decrease the startup cost of a bgworker > >> For this use in autonomous tx's we could probably pool workers. Or at least >> lazily terminate them so that the loop cases work better by re-using an >> existing bgworker. > > I’m not sure whether to be nervous about the context switch cost in the use > cases of autonomous transactions. That's probably going to be one of the smaller costs. Doing this with bgworkers won't be cheap, but you need to consider the alternative. Factoring out all transaction-specific data currently stored in or pointed to by globals into a transaction state struct that can be swapped out. Making PgXact and PGPROC capable of representing multiple/suspended transactions. Lots more. Much of which would have a performance impact on all day to day operations whether or not autononomous xacts are actually in use. I haven't looked into it in detail. Peter can probably explain more and better. I'm just pointing out that I doubt there's any way to do this without a cost somewhere, and having that cost limited to actual uses of autonomous xacts would be nice. (BTW, could you please set your reply style so that your mail client quotes text and it's possible to see what you wrote vs what is quoted?) -- -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
On Wed, Sep 7, 2016 at 2:42 PM, Tom Lane wrote: > The reason it's called siftup is that that's what Knuth calls it. > See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of > Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8. I see that Knuth explains that these steps form the siftup procedure. What steps does tuplesort_heap_insert correspond to, if any? 5.2.3.H is about heapsort, and so the Wikipedia article on heapsort (not the one on heaps in general, which I referenced first) may be a useful reference here. It's also freely available. Consider the Wikipedia pseudocode for siftup [1], from classic heapsort. That version goes from child to parent each iteration (in the first iteration, "child" is initialized to "end" -- not root). So, it *ascends* the heap ("child" is assigned from "parent" for any subsequent iterations). OTOH, tuplesort_heap_siftup always starts from the root of tuplesort's top-level heap (or the "hole" at the root position, if you prefer), and *descends* the heap. > I think this patch is misguided, as it unnecessarily moves the code > away from standard nomenclature. As I mentioned, the patch is guided by the descriptions of fundamental operations on heaps from Wikipedia (I didn't think much about heapsort until now). I'm not really proposing to change things in a way that is inconsistent with Knuth (regardless of how the term siftup is understood). I'm proposing to change things in a way that seems clearer overall, particularly given the way that these various routines are used in fairly distinct contexts. The terminology in this area can be confusing. My Sedgewick/Wayne Algorithms book never uses the term shift or sift when discussing heaps, even once in passing. The call shift up "swim", and shift down "sink", possibly because they'd like to avoid any baggage that other terminology has. I propose, as a compromise, to not introduce the term "shift down" at all in this patch, but to still rename tuplesort_heap_siftup to tuplesort_heap_compact. Even assuming that I'm wrong about siftup here, I think that that still represents an improvement. Would that be acceptable to you? [1] https://en.wikipedia.org/wiki/Heapsort#Pseudocode -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer > Of course, if we could decrease the startup cost of a bgworker For this use in autonomous tx's we could probably pool workers. Or at least lazily terminate them so that the loop cases work better by re-using an existing bgworker. Though I may say something odd, isn’t the bgworker approach going to increase context switches? I thought PostgreSQL has made efforts to decrease context switches, e.g. * Each backend itself writes WAL to disk unlike Oracle requests LGWR process to write REDO to disk. * Releasing and re-acquiring a lwlock appears to try to avoid context switches. /* * Loop here to try to acquire lock after each time we are signaled by * LWLockRelease. * * NOTE: it might seem better to have LWLockRelease actually grant us the * lock, rather than retrying and possibly having to go back to sleep. But * in practice that is no good because it means a process swap for every * lock acquisition when two or more processes are contending for the same * lock. Since LWLocks are normally used to protect not-very-long * sections of computation, a process needs to be able to acquire and * release the same lock many times during a single CPU time slice, even * in the presence of contention. The efficiency of being able to do that * outweighs the inefficiency of sometimes wasting a process dispatch * cycle because the lock is not free when a released waiter finally gets * to run. See pgsql-hackers archives for 29-Dec-01. */ I’m not sure whether to be nervous about the context switch cost in the use cases of autonomous transactions. Regards Takayuki Tsunakawa
Re: [HACKERS] Long options for pg_ctl waiting
On Thu, Sep 8, 2016 at 8:56 AM, Tom Lane wrote: > Vik Fearing writes: >> On 09/08/2016 01:05 AM, Tom Lane wrote: >>> I'm pretty much -1 on printing a warning. There's no ambiguity, and no >>> real reason for us ever to remove the old spellings. Standardizing on >>> "no-" going forward makes sense, but let's not slap people's wrists for >>> existing usage. (Or: if it ain't broke, don't break it.) > >> One could also argue that 2 out of 53 "no" options omitting the dash is >> in fact broken, and a real reason to remove them. > > I do not buy that. As a counter argument, consider that removing them > would make it impossible to write a script that works with both old > and new versions of PG. That's a mighty high price to pay for what > is little more than pedantry. Perhaps discussing that on another thread would be better, and I was the one who began this thing... So I'll do it. Vik's stuff is just to add a --no-wait and --wait long option alias on pg_ctl. And that clearly improves the readability for users, so that's a +1 from here. And let's just use the v1 presented at the beginning of this thread. I agree with the feeling that standardizing things would be better btw for such option names. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
On Thu, Sep 8, 2016 at 5:41 AM, Alvaro Herrera wrote: > Gavin Flower wrote: > >> possibly '--nosync' (& any similar) should have a '--no-sync' variation >> added, with the '--nosync' variation documented as depreciated? > > I agree -- I would go as far as just documenting --no-sync only and > keeping the --nosync one working with minimal (if any) visibility in > docs. Keeping no visibility at all in the docs, with an alias in the binaries sounds fine to me if we want to standardize a bit more things. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
Vik Fearing writes: > On 09/08/2016 01:05 AM, Tom Lane wrote: >> I'm pretty much -1 on printing a warning. There's no ambiguity, and no >> real reason for us ever to remove the old spellings. Standardizing on >> "no-" going forward makes sense, but let's not slap people's wrists for >> existing usage. (Or: if it ain't broke, don't break it.) > One could also argue that 2 out of 53 "no" options omitting the dash is > in fact broken, and a real reason to remove them. I do not buy that. As a counter argument, consider that removing them would make it impossible to write a script that works with both old and new versions of PG. That's a mighty high price to pay for what is little more than pedantry. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
On 09/08/2016 01:05 AM, Tom Lane wrote: > Vik Fearing writes: >> On 09/07/2016 11:39 PM, Gavin Flower wrote: >>> Possibly generate warningswhen the non hyphenated form is used? > >> I'm not quite sure how I got volunteered to do this work, but it's easy >> enough so I don't mind. >> Here is a new patch that emits a warning when --noclean and/or --nosync >> are used. > > I'm pretty much -1 on printing a warning. There's no ambiguity, and no > real reason for us ever to remove the old spellings. Standardizing on > "no-" going forward makes sense, but let's not slap people's wrists for > existing usage. (Or: if it ain't broke, don't break it.) One could also argue that 2 out of 53 "no" options omitting the dash is in fact broken, and a real reason to remove them. I don't see the warning as "slapping wrists" so much as saying that we're harmonizing our conventions and their scripts need to be updated for the better. That said, I'm not going to fight for this. My only goal here is to get --wait and --no-wait added to pg_ctl. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] High-CPU consumption on information_schema (only) query
Hi, An SQL (with only information_schema related JOINS) when triggered, runs with max CPU (and never ends - killed after 2 days). - It runs similarly (very slow) on a replicated server that acts as a read-only slave. - Top shows only postgres as hitting max CPU (nothing else). When query killed, CPU near 0%. - When the DB is restored on a separate test server (with the exact postgresql.conf) the same query works fine. - There is no concurrent usage on the replicated / test server (although the primary is a Production server and has concurrent users). Questions: - If this was a postgres bug or a configuration issue, query on the restored DB should have been slow too. Is there something very basic I am missing here? If someone asks for I could provide SQL + EXPLAIN, but it feels irrelevant here. I amn't looking for a specific solution but what else should I be looking for here? p.s.: All postgres servers are running the v9.3.10 - robins -- - robins
Re: [HACKERS] autonomous transactions
On 8 Sep. 2016 3:47 am, "Robert Haas" wrote: > > Of course, if we could decrease the startup cost of a bgworker For this use in autonomous tx's we could probably pool workers. Or at least lazily terminate them so that the loop cases work better by re-using an existing bgworker. I'm pretty sure we're going to need a bgworker pool sooner or later anyway.
Re: [HACKERS] Long options for pg_ctl waiting
Vik Fearing writes: > On 09/07/2016 11:39 PM, Gavin Flower wrote: >> Possibly generate warningswhen the non hyphenated form is used? > I'm not quite sure how I got volunteered to do this work, but it's easy > enough so I don't mind. > Here is a new patch that emits a warning when --noclean and/or --nosync > are used. I'm pretty much -1 on printing a warning. There's no ambiguity, and no real reason for us ever to remove the old spellings. Standardizing on "no-" going forward makes sense, but let's not slap people's wrists for existing usage. (Or: if it ain't broke, don't break it.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
On 09/07/2016 11:39 PM, Gavin Flower wrote: > On 08/09/16 09:08, Vik Fearing wrote: >> On 09/07/2016 10:41 PM, Alvaro Herrera wrote: >>> Gavin Flower wrote: >>> possibly '--nosync' (& any similar) should have a '--no-sync' variation added, with the '--nosync' variation documented as depreciated? >>> I agree -- I would go as far as just documenting --no-sync only and >>> keeping the --nosync one working with minimal (if any) visibility in >>> docs. >> Okay, here's a patch to do that. I don't think it's the other patch's >> job to do it. >> >> I also changed --noclean to --no-clean, and --no-locale was already >> correct. > > Suggest a comment along the lines "Where flags of the form --xxx have a > negated form, then the preferred negated form is --no-xxx - and that any > existing use of the form --noxxx should be converted to --no-xxx, as the > non hyphenated form is now deprecated & will be removed in a future > version of Postgres." I have verified that these are the only two options anywhere in the tree that start with "no" and not "no-" so it should be pretty easy for future options to conform on their own. I don't see adding a comment like this to every long option definition block to be very helpful, and only adding it to initdb is just weird. So I don't see the need for it. > Possibly generate warningswhen the non hyphenated form is used? I'm not quite sure how I got volunteered to do this work, but it's easy enough so I don't mind. Here is a new patch that emits a warning when --noclean and/or --nosync are used. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support initdb_no_opts_02.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST penalty functions [PoC]
Heikki Linnakangas writes: > Unfortunately, sqrt(x) isn't very cheap. You'd be surprised: sqrt is built-in on most modern hardware. On my three-year-old workstation, sqrt(x) seems to take about 2.6ns. For comparison, the pack_float version posted in takes 3.9ns (and draws multiple compiler warnings, too). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (Comment)Bug in CteScanNext
On 9/6/16 12:00 PM, Tom Lane wrote: On the other hand, if eof_cte is true, then what happened on the last call is that we tried to fetch forwards, reached EOF on the underlying query, and returned NULL. In that case, a backwards fetch *should* produce the last row in the tuplestore. Patch attached. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 diff --git a/src/backend/executor/nodeCtescan.c b/src/backend/executor/nodeCtescan.c index 3c2f684..64248cb 100644 --- a/src/backend/executor/nodeCtescan.c +++ b/src/backend/executor/nodeCtescan.c @@ -53,8 +53,21 @@ CteScanNext(CteScanState *node) */ eof_tuplestore = tuplestore_ateof(tuplestorestate); + /* +* Before fetching, we need to handle a special case: when reversing +* direction at the end of the tuplestore, the next +* tuplestore_gettupleslot() call will return the last tuple. But since +* that tuple was just seen, we want to move past it. This is necessary +* because the tuplestore get routines always move the current pointer, +* unless they hit the end (or beginning) of the store. +*/ if (!forward && eof_tuplestore) { + /* +* However, once we hit the end of the underlying node any call while +* at the end of the tuplestore will return NULL. In that case, we DO +* want to return the last row in the tuplestore. +*/ if (!node->leader->eof_cte) { /* diff --git a/src/backend/executor/nodeMaterial.c b/src/backend/executor/nodeMaterial.c index 9ab03f3..197b91b 100644 --- a/src/backend/executor/nodeMaterial.c +++ b/src/backend/executor/nodeMaterial.c @@ -82,16 +82,23 @@ ExecMaterial(MaterialState *node) eof_tuplestore = (tuplestorestate == NULL) || tuplestore_ateof(tuplestorestate); + /* +* Before fetching, we need to handle a special case: when reversing +* direction at the end of the tuplestore, the next +* tuplestore_gettupleslot() call will return the last tuple. But since +* that tuple was just seen, we want to move past it. This is necessary +* because the tuplestore get routines always move the current pointer, +* unless they hit the end (or beginning) of the store. +*/ if (!forward && eof_tuplestore) { + /* +* However, once we hit the end of the underlying node any call while +* at the end of the tuplestore will return NULL. In that case, we DO +* want to return the last row in the tuplestore. +*/ if (!node->eof_underlying) { - /* -* When reversing direction at tuplestore EOF, the first -* gettupleslot call will fetch the last-added tuple; but we want -* to return the one before that, if possible. So do an extra -* fetch. -*/ if (!tuplestore_advance(tuplestorestate, forward)) return NULL;/* the tuplestore must be empty */ } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
Peter Geoghegan writes: > On Fri, Aug 12, 2016 at 4:30 PM, Peter Geoghegan wrote: >> Doesn't tuplesort_heap_siftup() actually shift-down? The reason it's called siftup is that that's what Knuth calls it. See Algorithm 5.2.3H (Heapsort), pp 146-147 in the first edition of Volume 3; tuplesort_heap_siftup corresponds directly to steps H3-H8. > Heikki (CC'd) and I discussed this privately today, and we were in > agreement that this needs to be fixed, so I wrote a patch, which I > attach here. I think this patch is misguided, as it unnecessarily moves the code away from standard nomenclature. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
On 08/09/16 09:08, Vik Fearing wrote: On 09/07/2016 10:41 PM, Alvaro Herrera wrote: Gavin Flower wrote: possibly '--nosync' (& any similar) should have a '--no-sync' variation added, with the '--nosync' variation documented as depreciated? I agree -- I would go as far as just documenting --no-sync only and keeping the --nosync one working with minimal (if any) visibility in docs. Okay, here's a patch to do that. I don't think it's the other patch's job to do it. I also changed --noclean to --no-clean, and --no-locale was already correct. Suggest a comment along the lines "Where flags of the form --xxx have a negated form, then the preferred negated form is --no-xxx - and that any existing use of the form --noxxx should be converted to --no-xxx, as the non hyphenated form is now deprecated & will be removed in a future version of Postgres." Possibly generate warningswhen the non hyphenated form is used? Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
On 09/07/2016 09:17 AM, Peter Geoghegan wrote: On Tue, Sep 6, 2016 at 11:09 PM, Heikki Linnakangas wrote: The big picture here is that you can't only USEMEM() for tapes as the need arises for new tapes as new runs are created. You'll just run a massive availMem deficit, that you have no way of paying back, because you can't "liquidate assets to pay off your creditors" (e.g., release a bit of the memtuples memory). The fact is that memtuples growth doesn't work that way. The memtuples array never shrinks. Hmm. But memtuples is empty, just after we have built the initial runs. Why couldn't we shrink, i.e. free and reallocate, it? After we've built the initial runs, we do in fact give a FREEMEM() refund to those tapes that were not used within beginmerge(), as I mentioned just now (with a high workMem, this is often the great majority of many thousands of logical tapes -- that's how you get to wasting 8% of 5GB of maintenance_work_mem). I & peter chatted over IM on this. Let me try to summarize the problems, and my plan: 1. When we start to build the initial runs, we currently reserve memory for tape buffers, maxTapes * TAPE_BUFFER_OVERHEAD. But we only actually need the buffers for tapes that are really used. We "refund" the buffers for the unused tapes after we've built the initial runs, but we're still wasting that while building the initial runs. We didn't actually allocate it, but we could've used it for other things. Peter's solution to this was to put a cap on maxTapes. 2. My observation is that during the build-runs phase, you only actually need those tape buffers for the one tape you're currently writing to. When you switch to a different tape, you could flush and free the buffers for the old tape. So reserving maxTapes * TAPE_BUFFER_OVERHEAD is excessive, 1 * TAPE_BUFFER_OVERHEAD would be enough. logtape.c doesn't have an interface for doing that today, but it wouldn't be hard to add. 3. If we do that, we'll still have to reserve the tape buffers for all the tapes that we use during merge. So after we've built the initial runs, we'll need to reserve memory for those buffers. That might require shrinking memtuples. But that's OK: after building the initial runs, memtuples is empty, so we can shrink it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST penalty functions [PoC]
On 09/07/2016 09:20 PM, Andrew Borodin wrote: Well, arithmetics is too fragile. This version of float packing with arithmetical packaging static float pack_float(float actualValue, int realm) { double max,min; max = FLT_MAX / ( 8 >> realm ); min = FLT_MAX / ( 16 >> realm ); if( realm == 0 ) min = 0; /* squeeze the actual value between min and max */ return ( min + (actualValue * ( max - min ) / FLT_MAX)); } Inserts are the same as of bithacked pack_float, but selects are 5 times slower. When we are trying to pack value linearly into range we loose too much bits. That's why I suggested scaling it by the new value's volume and/or edge-sum. I was hoping that the old and new values are roughly of the same magnitude, so that it would work out. I guess not. But we could probably something like the above too, if we use logarithmic or exponential, rather than linear, packing. Something like: static float pack_float(float actualValue, int realm) { double val; val = sqrt(sqrt(actualValue)); if (realm == 0) return actualvalue; if (realm == 1) return actualValue * sqrt(sqrt(FLT_MAX)); if (realm == 2) return actualValue * sqrt(FLT_MAX); } Unfortunately, sqrt(x) isn't very cheap. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is tuplesort_heap_siftup() a misnomer?
On Fri, Aug 12, 2016 at 4:30 PM, Peter Geoghegan wrote: > Doesn't tuplesort_heap_siftup() actually shift-down? > > The Wikipedia article on heaps [1] lists "shift-down" (never "sift > down", FWIW) as a common operation on a heap: > > "shift-down: move a node down in the tree, similar to shift-up; used > to restore heap condition after deletion or replacement." > > This seems to be what tuplesort_heap_siftup() actually does (among > other things; its job is to compact the heap following caller > returning the root, where caller leaves a "hole" at the root position > 0). Heikki (CC'd) and I discussed this privately today, and we were in agreement that this needs to be fixed, so I wrote a patch, which I attach here. -- Peter Geoghegan From b747c088e5ad7fb831e21e6d3a503f594cbc8e12 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Wed, 7 Sep 2016 13:51:38 -0700 Subject: [PATCH] Rename tuplesort_heap_siftup function tuplesort_heap_siftup is renamed to tuplesort_heap_compact. The name tuplesort_heap_siftup did not accurately represent the underlying behavior. The function shifts down, rather than sifting (shifting up). The new name is chosen to describe the function at a slightly higher level, since the underlying heap operation is just an implementation detail. --- src/backend/utils/sort/tuplesort.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index aa8e0e4..02d4b6b 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -570,7 +570,7 @@ static void sort_bounded_heap(Tuplesortstate *state); static void tuplesort_sort_memtuples(Tuplesortstate *state); static void tuplesort_heap_insert(Tuplesortstate *state, SortTuple *tuple, int tupleindex, bool checkIndex); -static void tuplesort_heap_siftup(Tuplesortstate *state, bool checkIndex); +static void tuplesort_heap_compact(Tuplesortstate *state, bool checkIndex); static void reversedirection(Tuplesortstate *state); static unsigned int getlen(Tuplesortstate *state, int tapenum, bool eofOK); static void markrunend(Tuplesortstate *state, int tapenum); @@ -1617,9 +1617,9 @@ puttuple_common(Tuplesortstate *state, SortTuple *tuple) } else { -/* discard top of heap, sift up, insert new tuple */ +/* discard root of heap to compact, insert new tuple */ free_sort_tuple(state, &state->memtuples[0]); -tuplesort_heap_siftup(state, false); +tuplesort_heap_compact(state, false); tuplesort_heap_insert(state, tuple, 0, false); } break; @@ -1987,7 +1987,7 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool forward, * more generally. */ *stup = state->memtuples[0]; -tuplesort_heap_siftup(state, false); +tuplesort_heap_compact(state, false); if ((tupIndex = state->mergenext[srcTape]) == 0) { /* @@ -2642,8 +2642,7 @@ mergeonerun(Tuplesortstate *state) /* writetup adjusted total free space, now fix per-tape space */ spaceFreed = state->availMem - priorAvail; state->mergeavailmem[srcTape] += spaceFreed; - /* compact the heap */ - tuplesort_heap_siftup(state, false); + tuplesort_heap_compact(state, false); if ((tupIndex = state->mergenext[srcTape]) == 0) { /* out of preloaded data on this tape, try to read more */ @@ -3275,13 +3274,12 @@ dumptuples(Tuplesortstate *state, bool alltuples) * Still holding out for a case favorable to replacement * selection. Still incrementally spilling using heap. * - * Dump the heap's frontmost entry, and sift up to remove it from - * the heap. + * Dump the heap's root entry, and remove it from the heap. */ Assert(state->memtupcount > 0); WRITETUP(state, state->tp_tapenum[state->destTape], &state->memtuples[0]); - tuplesort_heap_siftup(state, true); + tuplesort_heap_compact(state, true); } else { @@ -3663,7 +3661,7 @@ make_bounded_heap(Tuplesortstate *state) if (state->memtupcount > state->bound) { free_sort_tuple(state, &state->memtuples[0]); -tuplesort_heap_siftup(state, false); +tuplesort_heap_compact(state, false); } } } @@ -3693,8 +3691,8 @@ sort_bounded_heap(Tuplesortstate *state) { SortTuple stup = state->memtuples[0]; - /* this sifts-up the next-largest entry and decreases memtupcount */ - tuplesort_heap_siftup(state, false); + /* this puts next-largest entry at root, and decreases memtupcount */ + tuplesort_heap_compact(state, false); state->memtuples[state->memtupcount] = stup; } state->memtupcount = tupcount; @@ -3784,10 +3782,10 @@ tuplesort_heap_insert(Tuplesortstate *state, SortTuple *tuple, /* * The tuple at state->memtuples[0] has been removed from the heap. - * Decrement memtupcount, and sift up to maintain the heap invariant. + * Decrement memtupcount, and shift down to maintain the heap invariant. */ static void -tuplesort_heap_siftu
Re: [HACKERS] \timing interval
> > ... and it would probably greatly reduce the amount of mailing list > traffic asking for version if nothing else. That was the major reason for wanting it. The second is that if an explain were posted to a forum like stackexchange, the reader wouldn't have to wonder what version produced the plan.
Re: [HACKERS] \timing interval
On 9/6/16 1:45 PM, Tom Lane wrote: It's sorta out of my hands now, but what Tom said earlier is that because > this is client-side code, it wouldn't use existing interval code. > EXPLAIN *is* server-side, we couldn't use this code, but we could leverage > existing interval code there to achieve a similar concept. If we like this specific output format, I'd be inclined to just copy-and-paste the code from psql. I seriously doubt that getting type interval involved in the discussion would lead to a shorter or better-performing solution. If we could actually execute user functions as part of EXPLAIN generating it's output then it might be a lot less invasive... but I don't think that's an option. > I have another thing I'd like to add to EXPLAIN output : server version > number output. So maybe we can pick those up in another thread. Ugh. There are multiple ways to get that already, and it's not like space in EXPLAIN's output is not a precious resource. I don't think adding a line would be that bad, and it would probably greatly reduce the amount of mailing list traffic asking for version if nothing else. It might also be useful to tools like https://explain.depesz.com/. If nothing else it's probably worth adding to the non-text output formats. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
On 09/07/2016 10:41 PM, Alvaro Herrera wrote: > Gavin Flower wrote: > >> possibly '--nosync' (& any similar) should have a '--no-sync' variation >> added, with the '--nosync' variation documented as depreciated? > > I agree -- I would go as far as just documenting --no-sync only and > keeping the --nosync one working with minimal (if any) visibility in > docs. Okay, here's a patch to do that. I don't think it's the other patch's job to do it. I also changed --noclean to --no-clean, and --no-locale was already correct. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support pg_ctl_no_opts_01.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Dilip Kumar writes: > Basically this patch changes error at three places. > 1. getBaseTypeAndTypmod: This is being called from domain_in exposed > function (domain_in-> > domain_state_setup-> getBaseTypeAndTypmod). Though this function is > being called from many other places which are not exposed function, > but I don't this will have any imapct, will this ? I really don't like this solution. Changing this one function, out of the dozens just like it in lsyscache.c, seems utterly random and arbitrary. I'm actually not especially convinced that passing domain_in a random OID needs to not produce an XX000 error. That isn't a user-facing function and people shouldn't be calling it by hand at all. The same goes for record_in. But if we do want those cases to avoid this, I think it's appropriate to fix it in those functions, not decree that essentially-random other parts of the system should bear the responsibility. (Thought experiment: if we changed the order of operations in domain_in so that getBaseTypeAndTypmod were no longer the first place to fail, would we undo this change and then change wherever the failure had moved to? That's pretty messy.) In the case of record_in, it seems like it'd be easy enough to use lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc() and then throw a user-facing error right there. Not sure about a nice solution for domain_in. We might have to resort to an extra syscache lookup there, but even if we did, it should happen only once per query so that's not very awful. > 3. CheckFunctionValidatorAccess: This is being called from all > language validator functions. This part seems reasonable, since the validator functions are documented as something users might call, and CheckFunctionValidatorAccess seems like an apropos place to handle it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
Gavin Flower wrote: > possibly '--nosync' (& any similar) should have a '--no-sync' variation > added, with the '--nosync' variation documented as depreciated? I agree -- I would go as far as just documenting --no-sync only and keeping the --nosync one working with minimal (if any) visibility in docs. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible optimization on Function Scan
Hi, On 2016-09-07 15:29:08 -0500, Jim Nasby wrote: > I was a bit surprised to discover the difference below in calling an SRF as > part of a target list vs part of the from clause. The from clause generates > a Function Scan, which (apparently blindly) builds a tuplestore. Is there a > relatively easy way to either transform this type of query so the SRF is > back in a target list, or teach Function Scan that it doesn't always need to > create a tuplestore? It would be nice if we could just not use a tuplestore > at all (depending on the planner to add a Materialize node if necessary), > but AIUI functions can directly return a tuplestore, so I guess that's not > an option... I've recently implemented ValuePerCall support for SRF in FROM http://archives.postgresql.org/message-id/20160827214829.zo2dfb5jaikii5nw%40alap3.anarazel.de One mail up in https://www.postgresql.org/message-id/20160822214023.aaxz5l4igypowyri%40alap3.anarazel.de there's before/after performance numbers showing that removing the materialization fixes the issue. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Possible optimization on Function Scan
I was a bit surprised to discover the difference below in calling an SRF as part of a target list vs part of the from clause. The from clause generates a Function Scan, which (apparently blindly) builds a tuplestore. Is there a relatively easy way to either transform this type of query so the SRF is back in a target list, or teach Function Scan that it doesn't always need to create a tuplestore? It would be nice if we could just not use a tuplestore at all (depending on the planner to add a Materialize node if necessary), but AIUI functions can directly return a tuplestore, so I guess that's not an option... ~@decina/45678# explain (analyze,verbose,buffers) select count(*) from (select generate_series(1,)) c; QUERY PLAN Aggregate (cost=17.51..17.52 rows=1 width=8) (actual time=27085.104..27085.104 rows=1 loops=1) Output: count(*) -> Result (cost=0.00..5.01 rows=1000 width=4) (actual time=0.007..14326.945 rows= loops=1) Output: generate_series(1, ) Planning time: 0.125 ms Execution time: 27085.153 ms (6 rows) Time: 27087.624 ms ~@decina/45678# explain (analyze,verbose,buffers) select count(*) from generate_series(1,); QUERY PLAN -- Aggregate (cost=12.50..12.51 rows=1 width=8) (actual time=57968.811..57968.812 rows=1 loops=1) Output: count(*) Buffers: temp read=170900 written=170899 -> Function Scan on pg_catalog.generate_series (cost=0.00..10.00 rows=1000 width=0) (actual time=22407.515..44908.001 rows= loops=1) Output: generate_series Function Call: generate_series(1, ) Buffers: temp read=170900 written=170899 Planning time: 0.060 ms Execution time: 58054.981 ms (9 rows) Time: 58055.929 ms -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Alter or rename enum value
ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: > Here is version 6 of the patch, which just adds RENAME VALUE with no IF > [NOT] EXISTS, rebased onto current master (particularly the > transactional ADD VALUE patch). Pushed with some adjustments. The only thing that wasn't a matter of taste is you forgot to update the backend/nodes/ support functions for struct AlterEnumStmt. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autonomous transactions
On Sat, Sep 3, 2016 at 7:09 AM, Simon Riggs wrote: > On 2 September 2016 at 09:45, Robert Haas wrote: >> On Wed, Aug 31, 2016 at 7:20 AM, Peter Eisentraut >> wrote: >>> I would like to propose the attached patch implementing autonomous >>> transactions for discussion and review. >> >> I'm pretty skeptical of this approach. Like Greg Stark, Serge Rielau, >> and Constantin Pan, I had expected that autonomous transactions should >> be implemented inside of a single backend, without relying on workers. > > The problem is that we have limits on the number of concurrent > transactions, which is currently limited by the number of procs. True. I believe this issue has been discussed before, and I think it's a solvable issue. I believe that an autonomous transaction could be made to appear to the rest of the system (outside the affected backend) as if it were a subtransaction, but then get committed before the containing transaction gets committed. This would avoid needing more PGPROCs (but getting deadlock detection to work is hard). > So doing autonomous transactions inside a single backend doesn't gain > you very much, yet it is an enormously invasive patch to do it that > way, not least because it requires you to rewrite locking and > deadlocks to make them work correctly when proc is not 1:1 with xid. > And as Serge points out it introduces other restrictions that we know > about now, perhaps more as well. Yes, but I think that if we want to really meet the needs of Oracle users who are used to being able to slap an autonomous transaction pragma on any function that they like, we're going to need a solution that is far lighter-weight than starting up a new backend. Any approach based on background workers is going to make the cost of a function call something like 4ms, which I suspect is going to make it useless for a pretty significant percentage of the cases where users want to use autonomous transactions. For example, if you want to log every attempt to access an object, this is a phenomenally expensive way to get there. Calling a per-row trigger is already pretty expensive; calling a per-row trigger that has to *launch a process for every row* is going to be insanely bad. >> That approach would be much less likely to run afoul of limits on the >> number of background workers > > That would also be an argument against using them for parallelism, yet > we used background workers there. I guess that's sort of true, but parallel query intrinsically requires multiple processes or threads, whereas autonomous transactions only require that if you pick an implementation that requires that. Also, the parallel query facility is designed to only apply to operations that are already pretty expensive, namely long-running queries, but there are lots of use cases where an autonomous transaction gets spawned to do a very small amount of work, and not infrequently in a loop. So the startup cost is a significantly more serious problem for this use case. Of course, if we could decrease the startup cost of a bgworker, that'd be good for parallel query, too, because then we could deploy it on shorter queries. But the point remains that for parallel query the planner can always decide to use a non-parallel plan if the query is cheap enough that the startup cost of a worker will be a problem. That isn't the case here; if the worker startup cost is a problem, then you'll just end up with really slow SQL code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
On 08/09/16 07:31, Robert Haas wrote: On Sat, Sep 3, 2016 at 7:13 PM, Michael Paquier wrote: On Sun, Sep 4, 2016 at 5:57 AM, Vik Fearing wrote: One thing that has been irking me ever since I came to PostgreSQL is the fact that pg_ctl -w (and -W) don't have longhand equivalents. I like to use the long version in scripts and such as extra documentation, and I've never been able to with these. What's more, I keep forgetting that --wait (and --no-wait) aren't a thing. Trivial patch attached. Nit: Like --nosync we could use --nowait, without an hyphen. But is that actually better? I think that the idea of omitting the dash here is one of those things that sounds good at first, and then later you realize that it was actually a dumb idea all along. If somebody has an option for --body or --on or --table and has to negate it by running --nobody or --noon or --notable, some confusion may result, because in each case you get a word that is not really the logical inverse of the original option. Also, if you end up with any multi-word options, like --save-backup-files, then users wonder why the opposite, --nosave-backup-files, has a dash between words 2 and 3 and between words 3 and 4, but not between words 1 and 2. I suggest we'd do better to standardize on always including a dash in such cases. +1 possibly '--nosync' (& any similar) should have a '--no-sync' variation added, with the '--nosync' variation documented as depreciated? Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Long options for pg_ctl waiting
On Sat, Sep 3, 2016 at 7:13 PM, Michael Paquier wrote: > On Sun, Sep 4, 2016 at 5:57 AM, Vik Fearing wrote: >> One thing that has been irking me ever since I came to PostgreSQL is the >> fact that pg_ctl -w (and -W) don't have longhand equivalents. I like to >> use the long version in scripts and such as extra documentation, and >> I've never been able to with these. What's more, I keep forgetting that >> --wait (and --no-wait) aren't a thing. >> >> Trivial patch attached. > > Nit: Like --nosync we could use --nowait, without an hyphen. But is that actually better? I think that the idea of omitting the dash here is one of those things that sounds good at first, and then later you realize that it was actually a dumb idea all along. If somebody has an option for --body or --on or --table and has to negate it by running --nobody or --noon or --notable, some confusion may result, because in each case you get a word that is not really the logical inverse of the original option. Also, if you end up with any multi-word options, like --save-backup-files, then users wonder why the opposite, --nosave-backup-files, has a dash between words 2 and 3 and between words 3 and 4, but not between words 1 and 2. I suggest we'd do better to standardize on always including a dash in such cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SELECT FOR UPDATE regression in 9.5
On 07/09/16 7:29 PM, Alvaro Herrera wrote: Marko, does this fix your reported problem too? Both the assertion and the overall test case that causes it to fire? The test case never realized anything was wrong, but the assertion is gone. So yup, problem solved on this end, at least. .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] amcheck (B-Tree integrity checking tool)
On Fri, Sep 2, 2016 at 11:19 AM, Kevin Grittner wrote: > IMV the process is to post a patch to this list to certify that it > is yours to contribute and free of IP encumbrances that would > prevent us from using it. I will wait for that post. I attach my V3. There are only very minor code changes here, such as breaking out a separate elevel constant for DEBUG2 traces that show information of how the B-Tree is accessed (e.g. current level, block number). Tiny tweaks to about 3 comments were also performed. These changes are trivial. I've also removed the tests added, since external sort regression test coverage is in a much better place now. Finally, the documentation was updated, to make it closer to the Github project's documentation. I expanded what was started as the original amcheck sgml documentation based on the experience of using amcheck on production databases at Heroku. This isn't a big change, but it's the biggest in V3. The documentation now emphasizes the likelihood of amcheck finding software errors, which is all we found. Jim Gray predicted that fault tolerant systems will tend to have almost all problems arise from operator error and software faults in practice, so perhaps I should have predicted that that would be the general trend. -- Peter Geoghegan From 03046b49c893c7695bbae365c33e3d7b27a1f9a3 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Tue, 10 Jun 2014 22:20:40 -0700 Subject: [PATCH] Add amcheck extension to contrib The new extension makes available two SQL-callable functions, each of which accept a single argument (a regclass/nbtree index relation argument) and return void. The SQL-callable function bt_index_check() checks invariants of the target index by walking the tree, performing various checks of the sanity of the page space using insertion scankeys to compare items. The SQL-callable function bt_index_parent_check() performs a superset of the checks performed by bt_index_check(): the same checks, as well as another check verifying that each downlink is actually respected as a lower bound on its child page. bt_index_check() requires only an AccessShareLock on the target. bt_index_parent_check() requires an ExclusiveLock on the target, and ShareLock on its heap relation. --- contrib/Makefile |1 + contrib/amcheck/Makefile | 19 + contrib/amcheck/amcheck--1.0.sql | 20 + contrib/amcheck/amcheck.c| 1267 ++ contrib/amcheck/amcheck.control |5 + doc/src/sgml/amcheck.sgml| 278 + doc/src/sgml/contrib.sgml|1 + doc/src/sgml/filelist.sgml |1 + 8 files changed, 1592 insertions(+) create mode 100644 contrib/amcheck/Makefile create mode 100644 contrib/amcheck/amcheck--1.0.sql create mode 100644 contrib/amcheck/amcheck.c create mode 100644 contrib/amcheck/amcheck.control create mode 100644 doc/src/sgml/amcheck.sgml diff --git a/contrib/Makefile b/contrib/Makefile index 25263c0..4acd50e 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -6,6 +6,7 @@ include $(top_builddir)/src/Makefile.global SUBDIRS = \ adminpack \ + amcheck \ auth_delay \ auto_explain \ bloom \ diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile new file mode 100644 index 000..7bb29aa --- /dev/null +++ b/contrib/amcheck/Makefile @@ -0,0 +1,19 @@ +# contrib/amcheck/Makefile + +MODULE_big = amcheck +OBJS = amcheck.o $(WIN32RES) + +EXTENSION = amcheck +DATA = amcheck--1.0.sql +PGFILEDESC = "amcheck - functions to verify access method invariants" + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/amcheck +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/amcheck/amcheck--1.0.sql b/contrib/amcheck/amcheck--1.0.sql new file mode 100644 index 000..ebbd6ac --- /dev/null +++ b/contrib/amcheck/amcheck--1.0.sql @@ -0,0 +1,20 @@ +/* contrib/amcheck/amcheck--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION amcheck" to load this file. \quit + +-- +-- bt_index_check() +-- +CREATE FUNCTION bt_index_check(index regclass) +RETURNS VOID +AS 'MODULE_PATHNAME', 'bt_index_check' +LANGUAGE C STRICT; + +-- +-- bt_index_parent_check() +-- +CREATE FUNCTION bt_index_parent_check(index regclass) +RETURNS VOID +AS 'MODULE_PATHNAME', 'bt_index_parent_check' +LANGUAGE C STRICT; diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c new file mode 100644 index 000..5c4d02d --- /dev/null +++ b/contrib/amcheck/amcheck.c @@ -0,0 +1,1267 @@ +/*- + * + * amcheck.c + * Verifies the integrity of access methods based on invariants. + * + * Provides SQL-callable functions for verifying that various logical + * invariants in the structure of index access method
Re: [HACKERS] GiST penalty functions [PoC]
Well, arithmetics is too fragile. This version of float packing with arithmetical packaging static float pack_float(float actualValue, int realm) { double max,min; max = FLT_MAX / ( 8 >> realm ); min = FLT_MAX / ( 16 >> realm ); if( realm == 0 ) min = 0; /* squeeze the actual value between min and max */ return ( min + (actualValue * ( max - min ) / FLT_MAX)); } Inserts are the same as of bithacked pack_float, but selects are 5 times slower. When we are trying to pack value linearly into range we loose too much bits. Here is how it happens: in floats addition of small values to big values causes loss of small values. Applied to Heikki's algorithm this means that nV, nE and dV can all be in different mantissa ranges. (Please note that dV ca be many times more than nV and many times smaller that nV) Integer bitshift of a float have no arithmetical meaning. It would be hard to describe in formulas what carring of mantissa bit to significant field mean. But this bitshift preserves an order among almost all float values (except 2 controllably lost bits and some values become sNaN ). Entire idea of the advanced GiST penalty stands on this. The other way is to add to API optional handler which executes all of choose subtree algorithm inside cube (or other extension). Best regards, Andrey Borodin, Octonica & Ural Federal University. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Sep 1, 2016 at 8:55 PM, Amit Kapila wrote: > > I have fixed all other issues you have raised. Updated patch is > attached with this mail. > I am finding the comments (particularly README) quite hard to follow. There are many references to an "overflow bucket", or similar phrases. I think these should be "overflow pages". A bucket is a conceptual thing consisting of a primary page for that bucket and zero or more overflow pages for the same bucket. There are no overflow buckets, unless you are referring to the new bucket to which things are being moved. Was maintaining on-disk compatibility a major concern for this patch? Would you do things differently if that were not a concern? If we would benefit from a break in format, I think it would be better to do that now while hash indexes are still discouraged, rather than in a future release. In particular, I am thinking about the need for every insert to exclusive-content-lock the meta page to increment the index-wide tuple count. I think that this is going to be a huge bottleneck on update intensive workloads (which I don't believe have been performance tested as of yet). I was wondering if we might not want to change that so that each bucket keeps a local count, and sweeps that up to the meta page only when it exceeds a threshold. But this would require the bucket page to have an area to hold such a count. Another idea would to keep not a count of tuples, but of buckets with at least one overflow page, and split when there are too many of those. I bring it up now because it would be a shame to ignore it until 10.0 is out the door, and then need to break things in 11.0. Cheers, Jeff
Re: Install extensions using update scripts (was Re: [HACKERS] Remove superuser() checks from pgstattuple)
I wrote: > Still no SGML doc updates. And here's a doc addition. regards, tom lane diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml index df88380..1c8c420 100644 *** a/doc/src/sgml/extend.sgml --- b/doc/src/sgml/extend.sgml *** SELECT * FROM pg_extension_update_paths( *** 885,890 --- 885,927 + + Installing Extensions using Update Scripts + + + An extension that has been around for awhile will probably exist in + several versions, for which the author will need to write update scripts. + For example, if you have released a foo extension in + versions 1.0, 1.1, and 1.2, there + should be update scripts foo--1.0--1.1.sql + and foo--1.1--1.2.sql. + Before PostgreSQL 10, it was necessary to create new + script files foo--1.1.sql and foo--1.2.sql + containing the same changes, or else the newer versions could not be + installed directly, only by installing 1.0 and then updating. + Now, CREATE EXTENSION can do that automatically — + for example, if only the script + files foo--1.0.sql, foo--1.0--1.1.sql, + and foo--1.1--1.2.sql are available then a request to + install version 1.2 is honored by running those three scripts + in sequence. (As with update requests, if multiple pathways are + available then the shortest is preferred.) + Arranging an extension's script files in this style can reduce the amount + of maintenance effort needed to produce small updates. + + + + If you use secondary (version-specific) control files with an extension + maintained in this style, keep in mind that each version needs a control + file even if it has no stand-alone installation script, as that control + file will determine how the implicit update to that version is performed. + For example, if foo--1.0.control specifies requires + = 'bar' but foo's other control files do not, the + extension's dependency on bar will be dropped when updating + from 1.0 to another version. + + + Extension Example -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
> - I can't remember the specific language but they had the collation rule > that "CH" was treated as a distinct entity between C and D. This gave the > order C < CG < CI < CZ < CH < D. Then they removed CH as special which gave > C < CG < CH < CI < CZ < D. Suppose there was the constraint CHECK (COL > BETWEEN 'C' AND 'CH'). Originally it would allow (almost) all strings that > started with C. After the change it the constraint would block everything > that started with CI - CZ leaving many rows that no longer qualify in the > database. (This was probably Spanish.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for lazy_scan_heap
On Tue, Sep 6, 2016 at 11:13 PM, Masahiko Sawada wrote: > I understood, thank you. > > I've measured the performance benefit of this patch by following steps. > 1. Create very large table and set all-visible flag to all blocks. > 2. Measure the execution time of vacuum that skips to scan all heap pages. > > * 1TB Table(visibility map size is 32MB) > HEAD : 11012.274 ms (00:11.012) > Patched : 6742.481 ms (00:06.742) > > * 8TB Table(visibility map size is 64MB) > HEAD : 69886.605 ms (01:09.887) > Patched : 35562.131 ms (00:35.562) > > * 32TB Table(visibility map size is 258MB) > HEAD: 265901.096 ms (04:25.901) > Patched: 131779.082 ms (02:11.779) > > Since current HEAD could scan visibility map twice, the execution time > of Patched is approximately half of HEAD. > But when table is several hundreds gigabyte, performance benefit would > not be large. Wow, those are surprisingly good results. Interesting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
> > This isn't a problem for Postgres, or at least wouldn't be right now, > because we don't have case insensitive collations. I was wondering if Postgres might be that way. It does avoid the RI constraint problem, but there are still troubles with range based predicates. (My previous project wanted case/accent insensitive collations, so we got to deal with it all.) > So, we use a strcmp()/memcmp() tie-breaker when strcoll() indicates > equality, while also making the general notion of text equality actually > mean binary equality. We used a similar tie breaker in places. (e.g. Index keys needed to be identical, not just equal. We also broke ties in sort to make its behaviour more deterministic.) I would like to get case insensitive collations some day, and was > really hoping that ICU would help. That being said, the need for a > strcmp() tie-breaker makes that hard. Oh well. > Prior to adding ICU to my previous project, it had the assumption that equal meant identical as well. It turned out to be a lot easier to break this assumption than I expected, but that code base had religiously used its own string comparison function for user data - strcmp()/memcmp() was never called for user data. (I don't know if the same can be said for Postgres.) We found that very few places needed to be aware of values that were equal but not identical. (Index and sort were the big two.) Hopefully Postgres will be the same. -- Doug Doole
Re: [HACKERS] SELECT FOR UPDATE regression in 9.5
Marti Raudsepp wrote: > Hello list > > While testing an application with PostgreSQL 9.5, we experienced an issue > involving aborted subtransactions and SELECT FOR UPDATE. In certain > situations, a locking query doesn't return rows that should be visible and > already locked by the current transaction. Okay, so the assertion failure is fixed by the attached patch. Also, the division-by-zero that your test case says shouldn't occur doesn't occur. But does it solve the larger problem of not returning rows that should be visible? Marko, does this fix your reported problem too? Both the assertion and the overall test case that causes it to fire? (The problem fixed by the patch is that we were trying to lock tuples down the update chain, but one of the tuples in the chain had been updated by an aborted subtransaction. Obviously, there is no point in locking such a tuple because it effectively "doesn't exist" in the first place.) Thanks! -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index b8f4fe5..9846b9f 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -5403,6 +5403,16 @@ l4: return HeapTupleMayBeUpdated; } + /* + * Also check Xmin: if this tuple was created by an aborted + * transaction, we're done. + */ + if (TransactionIdDidAbort(HeapTupleHeaderGetXmin(mytup.t_data))) + { + UnlockReleaseBuffer(buf); + return HeapTupleMayBeUpdated; + } + old_infomask = mytup.t_data->t_infomask; old_infomask2 = mytup.t_data->t_infomask2; xmax = HeapTupleHeaderGetRawXmax(mytup.t_data); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
> I understand that in principle, but I don't see operating system > providers shipping a bunch of ICU versions to facilitate that. They > will usually ship one. Yep. If you want the protection I've described, you can't depend on the OS copy of ICU. You need to have multiple ICU libraries that are named/installed such that you can load the specific version you want. It also means that you can have dependencies on versions of ICU that are no longer supported. (In my previous project, we were shipping 3 copies of the ICU library, going back to 2.x. Needless to say, we didn't add support for every drop of ICU.) On Wed, Sep 7, 2016 at 5:53 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 9/6/16 1:40 PM, Doug Doole wrote: > > We carried the ICU version numbers around on our collation and locale > > IDs (such as fr_FR%icu36) . The database would load multiple versions of > > the ICU library so that something created with ICU 3.6 would always be > > processed with ICU 3.6. This avoided the problems of trying to change > > the rules on the user. (We'd always intended to provide tooling to allow > > the user to move an existing object up to a newer version of ICU, but we > > never got around to doing it.) In the code, this meant we were > > explicitly calling the versioned API so that we could keep the calls > > straight. > > I understand that in principle, but I don't see operating system > providers shipping a bunch of ICU versions to facilitate that. They > will usually ship one. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Wed, Sep 7, 2016 at 12:12 PM, Greg Stark wrote: > On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs wrote: >> On 6 September 2016 at 19:59, Tom Lane wrote: >> >>> The idea of looking to the stats to *guess* about how many tuples are >>> removable doesn't seem bad at all. But imagining that that's going to be >>> exact is folly of the first magnitude. >> >> Yes. Bear in mind I had already referred to allowing +10% to be safe, >> so I think we agree that a reasonably accurate, yet imprecise >> calculation is possible in most cases. > > That would all be well and good if it weren't trivial to do what > Robert suggested. This is just a large unsorted list that we need to > iterate throught. Just allocate chunks of a few megabytes and when > it's full allocate a new chunk and keep going. There's no need to get > tricky with estimates and resizing and whatever. I agree. While the idea of estimating the right size sounds promising a priori, considering the estimate can go wrong and over or underallocate quite severely, the risks outweigh the benefits when you consider the alternative of a dynamic allocation strategy. Unless the dynamic strategy has a bigger CPU impact than expected, I believe it's a superior approach. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fun fact about autovacuum and orphan temp tables
On Mon, Sep 5, 2016 at 1:14 PM, Bruce Momjian wrote: > On Mon, Sep 5, 2016 at 12:48:32PM -0300, Alvaro Herrera wrote: >> > The least invasive solution would be to have a guc, something like >> > 'keep_orphan_temp_tables' with boolean value. >> > Which would determine a autovacuum worker policy toward encountered orphan >> > temp tables. >> >> The stated reason for keeping them around is to ensure you have time to >> do some forensics research in case there was something useful in the >> crashing backend. My feeling is that if the reason they are kept around >> is not a crash but rather some implementation defect that broke end-time >> cleanup, then they don't have their purported value and I would rather >> just remove them. >> >> I have certainly faced my fair share of customers with dangling temp >> tables, and would like to see this changed in some way or another. > > I don't think we look at those temp tables frequently enough to justify > keeping them around for all users. +1. I think it would be much better to nuke them more aggressively. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs wrote: > On 6 September 2016 at 19:59, Tom Lane wrote: > >> The idea of looking to the stats to *guess* about how many tuples are >> removable doesn't seem bad at all. But imagining that that's going to be >> exact is folly of the first magnitude. > > Yes. Bear in mind I had already referred to allowing +10% to be safe, > so I think we agree that a reasonably accurate, yet imprecise > calculation is possible in most cases. That would all be well and good if it weren't trivial to do what Robert suggested. This is just a large unsorted list that we need to iterate throught. Just allocate chunks of a few megabytes and when it's full allocate a new chunk and keep going. There's no need to get tricky with estimates and resizing and whatever. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index Onlys Scan for expressions
Hi Vladimir, On 05.09.2016 16:38, Ildar Musin wrote: Hi Vladimir, On 03.09.2016 19:31, Vladimir Sitnikov wrote: Ildar>The reason why this doesn't work is that '~~' operator (which is a Ildar>synonym for 'like') isn't supported by operator class for btree. Since Ildar>the only operators supported by btree are <, <=, =, >=, >, you can use Ildar>it with queries like: Ildar>And in 3rd query 'OFFSET' statement prevents rewriter from Ildar>transforming the query, so it is possible to use index only scan on Ildar>subquery and then filter the result of subquery with '~~' operator. I'm afraid I do not follow you. Note: query 3 is 100% equivalent of query 2, however query 3 takes 55 times less reads. It looks like either an optimizer bug, or some missing feature in the "index only scan" logic. Here's quote from "query 2" (note % are at both ends): ... where type=42) as x where upper_vc like '%ABC%'; Note: I do NOT use "indexed scan" for the like operator. I'm very well aware that LIKE patterns with leading % cannot be optimized to a btree range scan. What I want is "use the first indexed column as index scan, then use the second column for filtering". As shown in "query 2" vs "query 3", PostgreSQL cannot come up with such a plan on its own for some reason. This is not a theoretical issue, but it is something that I use a lot with Oracle DB (it just creates a good plan for "query 2"). Vladimir Thanks, I get it now. The reason why it acts like this is that I used match_clause_to_index() function to determine if IOS can be used with the specified clauses. This function among other things checks if operator matches the index opfamily. Apparently this isn't correct. I wrote another prototype to test your case and it seems to work. But it's not ready for public yet, I'll publish it in 1-2 days. Here is a new patch version. I modified check_index_only_clauses() so that it doesn't use match_clause_to_indexcol() anymore. Instead it handles different types of expressions including binary operator expressions, scalar array expressions, row compare expressions (e.g. (a,b)<(1,2)) and null tests and tries to match each part of expression to index regardless an operator. I reproduced your example and was able to get index only scan on all queries. Could you please try the patch and tell if it works for you? -- Ildar Musin i.mu...@postgrespro.ru diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 2952bfb..e81f914 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -25,6 +25,7 @@ #include "catalog/pg_opfamily.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" +#include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" @@ -78,6 +79,14 @@ typedef struct int indexcol; /* index column we want to match to */ } ec_member_matches_arg; +/* Context for index only expression checker */ +typedef struct +{ + IndexOptInfo *index; + bool match; /* TRUE if expression matches + * index */ +} check_index_only_walker_ctx; + static void consider_index_join_clauses(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index, @@ -130,7 +139,15 @@ static PathClauseUsage *classify_index_clause_usage(Path *path, static Relids get_bitmap_tree_required_outer(Path *bitmapqual); static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds); static int find_list_position(Node *node, List **nodelist); -static bool check_index_only(RelOptInfo *rel, IndexOptInfo *index); +static bool check_index_only(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index); +static bool check_index_only_targetlist(PlannerInfo *root, + RelOptInfo *rel, + IndexOptInfo *index, + Bitmapset *index_canreturn_attrs); +static bool check_index_only_clauses(List *clauses, + IndexOptInfo *index); +static bool check_index_only_expr_walker(Node *node, check_index_only_walker_ctx *ctx); +static bool check_index_only_expr(Node *expr, IndexOptInfo *index); static double get_loop_count(PlannerInfo *root, Index cur_relid, Relids outer_relids); static double adjust_rowcount_for_semijoins(PlannerInfo *root, Index cur_relid, @@ -1020,7 +1037,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel, * index data retrieval anyway. */ index_only_scan = (scantype != ST_BITMAPSCAN && - check_index_only(rel, index)); + check_index_only(root, rel, index)); /* * 4. Generate an indexscan path if there are relevant restriction clauses @@ -1786,7 +1803,7 @@ find_list_position(Node *node, List **nodelist) * Determine whether an index-only scan is possible for this index. */ static bool -check_index_only(RelOptInfo *rel, IndexOptInfo *index) +check_index_only(PlannerInfo *root, RelOptInfo *rel, IndexOptInfo *index) { bool result; Bitmapset *attrs_used = NULL; @@ -1837,8 +1854,10 @@ check_inde
Re: [HACKERS] Suggestions for first contribution?
> On 05 Sep 2016, at 20:25, Christian Convey wrote: > > Hi guys, > > Can anyone suggest a project for my first PG contribution? > > My first two ideas didn't pan out: Yury doesn't seem to need help > with CMake, and the TODO list's "-Wcast-align" project (now deleted) > appeared impractical. There is also a list of projects for google summer of code: https://wiki.postgresql.org/wiki/GSoC_2016 That topics expected to be doable by a newcomer during several months. It is also slightly outdated, but you always can check current state of things by searching this mail list on interesting topic. Also postgres have several internal API’s like fdw[1] or gist[2] that allows you to implements something useful without digging too much into a postgres internals. [1] https://www.postgresql.org/docs/current/static/fdwhandler.html [2] https://www.postgresql.org/docs/current/static/gist-extensibility.html -- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestions for first contribution?
Christian Convey wrote: Yury doesn't seem to need help with CMake Hello. I am sorry that the only answer is now. I need help but with write CMake code: 1. Make ecpg tests 2. Make MinGW/Msys support 3. many many ... all targets and discussion here: https://github.com/stalkerg/postgres_cmake/issues If you can only test CMake for Ubuntu x86_64 that it is not necessary. If I not fully understand you - sorry. -- Yury Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
> > 8) get_next_element procedure implementation is way too smart (read > > "complicated"). You could probably just store current list length and > > generate a random number between 0 and length-1. > > No, algorithm here is more complicated. It must ensure that there would > not be second attempt to connect to host, for which unsuccessful > connection attempt was done. So, there is list rearrangement. > > Algorithm for pick random list element by single pass is quite trivial. Great! In this case I would be _trivial_ for you to write a comment that describes how this procedure works, what makes you think that it gives a good distribution in all possible cases (e.g. if there is more than 0x1 elements in a list - why not), etc. Right? :) -- Best regards, Aleksander Alekseev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestions for first contribution?
Here is another idea for a contribution - refactoring. Currently there are a lot of procedures in PostgreSQL code that definitely don't fit on one screen (i.e. ~50 lines). Also many files are larger than say 1000 lines of code. For instance, psql_completion procedure is more than 2000 lines long! I think patches that would make such code easier to read would have a good chance to be accepted. In case you are interested I wrote a script that scans PostgreSQL code and then prints procedures names and corresponding number of lines of code (see attachment). -- Best regards, Aleksander Alekseev #!/usr/bin/env python3 import sys import os import re def process_file(fname): nlines = 0 procname = "" with open(fname) as f: for line in f: nlines += 1 if re.search("^{", line): nlines = 0 elif re.search("^}", line) and procname and (nlines > 0): print("{:08d}:{}".format(nlines-1, procname)) procname = "" else: m = re.search("^(\\w+)\\s*\\(", line) if m: procname = m.group(1) def recursive_search(path): for file in os.listdir(path): if (file == ".") or (file == ".."): continue full_name = os.path.join(path, file) if not os.path.islink(full_name): if os.path.isdir(full_name): recursive_search(full_name) if re.search("\\.c$", file): process_file(full_name) if len(sys.argv) < 2: print("Usage " + sys.argv[0] + " ") sys.exit(1) start_path = sys.argv[1] recursive_search(start_path) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
Hello Amit, Custom script looks like: \; select a \into a from tab where a = 1; \set i debug(:a) I get the following error: undefined variable "a" client 0 aborted in state 1; execution of meta-command failed Good catch! Indeed, there is a problem with empty commands which are simply ignored by libpq/postgres if there are other commands around, so my synchronization between results & commands was too naive. In order to fix this, I made the scanner also count empty commands and ignore comments. I guessed that proposing to change libpq/postgres behavior was not an option. Comments on the patch follow: + + + Stores the first fields of the resulting row from the current or preceding + SELECT command into these variables. Any command returning rows ought to work, no? Yes. I put "SQL command" instead. -char *line; /* text of command line */ +char *line; /* first line for short display */ +char *lines; /* full multi-line text of command */ When I looked at this and related hunks (and also the hunks related to semicolons), it made me wonder whether this patch contains two logical changes. Is this a just a refactoring for the \into implementation or does this provide some new externally visible feature? There is essentially a refactoring that I did when updating how Command is managed because it has to be done in several stages to fit "into" into it and to take care of compounds. However there was small "new externally visible feature": on -r, instead of cutting abruptly a multiline command at the end of the first line it appended "..." as an ellipsis because it looked nicer. I've removed this small visible changed. char *argv[MAX_ARGS]; /* command word list */ +int compound; /* last compound command (number of \;) */ +char***intos; /* per-compound command \into variables */ Need an extra space for intos to align with earlier fields. Ok. Also I wonder if intonames or intoargs would be a slightly better name for the field. I put "intovars" as they are variable names. +static bool +read_response(CState *st, char ** intos[]) Usual style seems to be to use ***intos here. Ok. +fprintf(stderr, +"client %d state %d compound %d: " +"cannot apply \\into to non SELECT statement\n", +st->id, st->state, compound); How about make this error message say something like other messages related to \into, perhaps something like: "\\into cannot follow non SELECT statements\n" As you pointed out above, there may be statements without "SELECT" which which return a row. I wrote "\\into expects a row" instead. /* * Read and discard the query result; note this is not included in - * the statement latency numbers. + * the statement latency numbers (above), thus if reading the + * response fails the transaction is counted nevertheless. */ Does this comment need to mention that the result is not discarded when \into is specified? Hmmm. The result structure is discarded, but the results are copied into variables before that, so the comments seems ok... +my_command->intos = pg_malloc0(sizeof(char**) * (compounds+1)); Need space: s/char**/char **/g Ok. This comments applies also to a couple of nearby places. Indeed. -my_command->line = pg_malloc(nlpos - p + 1); +my_command->line = pg_malloc(nlpos - p + 1 + 3); A comment mentioning what the above means would be helpful. Ok. I removed the "+ 3" anyway. +boolsql_command_in_progress = false; This variable's name could be slightly confusing to readers. If I understand its purpose correctly, perhaps it could be called sql_command_continues. It is possible. I like 'in progress' though. Why is it confusing? It means that the current command is not finished yet and more is expected, that is the final ';' has not been encountered. +if (index == 0) +syntax_error(desc, lineno, NULL, NULL, + "\\into cannot start a script", + NULL, -1); How about: "\\into cannot be at the beginning of a script" ? It is possible, but it's quite longer... I'm not a native speaker, so I'm do not know whether it would be better. The attached patch takes into all your comments but: - comment about discarded results... - the sql_command_in_progress variable name change - the longer message on into at the start of a script -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 285608d..0a474e1 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -809,6 +809,30 @@ pgbench options dbname + + + \into var1 [var2 ...] +
Re: [HACKERS] [COMMITTERS] pgsql: Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL
Simon Riggs writes: > On 7 September 2016 at 13:47, Fujii Masao wrote: >> On Tue, Sep 6, 2016 at 11:41 PM, Simon Riggs wrote: >>> lazy_truncate_heap() was waiting for >>> VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds >>> not milliseconds as originally intended. >> Don't we need to back-patch this? > If we do then a database-wide VACUUM on a busy database will take > substantially longer than it does now. On the other hand, it's also more likely to successfully perform desired truncations. > That may not be perceived as a "fix" by everybody, so we should not do > it without an explicit agreement by many. Agreed, but I vote with Fujii-san for back-patching. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
On Mon, 5 Sep 2016 14:03:11 +0300 Aleksander Alekseev wrote: > Hello, Victor. > > > 1) It looks like code is not properly formatted. > Thanks for pointing to the documentation and formatting problems. I'll fix them > > static int > > connectDBStart(PGconn *conn) > > { > > + struct nodeinfo > > + { > > + char *host; > > + char *port; > > + }; > > Not sure if all compilers we support can parse this. I suggest to > declare nodinfo structure outside of procedure, just to be safe. > Block-local structs are part of C language standard. I don't remember when they appeared first (may be in K&R C), but at least since C89 AFAIR they are allowed explicitely. And using most local scope possible is always a good thing. So, I'll leave this structure function local for now. > > You should check return values of malloc, calloc and strdup > procedures, or use safe pg_* equivalents. Thanks, I'll fix this one. > 6) > > > + for (p = addrs; p->ai_next != NULL; p = > > p->ai_next); > > + p->ai_next = this_node_addrs; > > Looks a bit scary. I suggest to add an empty scope ( {} ) and a > comment to make our intention clear here. Ok, it really would make code more clear. > 7) Code compiles with following warnings (CLang 3.8): > > > 1 warning generated. > > fe-connect.c:5239:39: warning: if statement has empty body > > [-Wempty-body] > > > > errorMessage, > > false, true)); > > > > ^ > > fe-connect.c:5239:39: note: put the semicolon on a separate line to > > silence this warning > > 1 warning generated. Oops, it looks like logic error, which should be fixed. Thanks for testing my patch by more picky compiler. > 8) get_next_element procedure implementation is way too smart (read > "complicated"). You could probably just store current list length and > generate a random number between 0 and length-1. No, algorithm here is more complicated. It must ensure that there would not be second attempt to connect to host, for which unsuccessful connection attempt was done. So, there is list rearrangement. Algorithm for pick random list element by single pass is quite trivial. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug in two-phase transaction recovery
Hello. Some time ago two-phase state file format was changed to have variable size GID, but several places that read that files were not updated to use new offsets. Problem exists in master and 9.6 and can be reproduced on prepared transactions with savepoints. For example: create table t(id int); begin; insert into t values (42); savepoint s1; insert into t values (43); prepare transaction 'x'; begin; insert into t values (142); savepoint s1; insert into t values (143); prepare transaction 'y’; and restart the server. Startup process will hang. Fix attached. Also while looking at StandbyRecoverPreparedTransactions() i’ve noticed that buffer for 2pc file is allocated in TopMemoryContext but never freed. That probably exists for a long time. gidlen_fixes.diff Description: Binary data standby_recover_pfree.diff Description: Binary data -- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Install extensions using update scripts (was Re: [HACKERS] Remove superuser() checks from pgstattuple)
Andres Freund writes: > On 2016-09-05 22:24:09 -0400, Tom Lane wrote: >> Ordinarily I'd be willing to stick this on the queue for the next >> commitfest, but it seems like we ought to try to get it pushed now >> so that Stephen can make use of the feature for his superuser changes. >> Thoughts? > Seems sensible to me. I can have a look at it one of the next few days > if you want. Thanks for offering. Attached is an updated patch that addresses a couple of issues I noticed: 1. When I did the previous patch, I was thinking that any parameters specified in an auxiliary .control file for the base version would apply to any non-base versions based on it; so I had the pg_available_extension_versions view just duplicate those parameters. But actually, most of those parameters are just applied on-the-fly while processing the update script, so it *does* work for them to be different for different versions. The exceptions are schema (which you can't modify during an update) and comment (while we could replace the comment during an update, it doesn't seem like a good idea because we might overwrite a user-written comment if we did). (I think this behavior is undocumented BTW :-(.) So this fixes the view to only inherit those two parameters from the base version. 2. I noticed that CASCADE was not implemented for required extensions processed by ApplyExtensionUpdates, which seems like a bad thing if CREATE processing is going to rely more heavily on that. This only matters if different versions have different requires lists, but that is supposed to be supported, and all the catalog-hacking for it is there. So I made this work. It took a fair amount of refactoring in order to avoid code duplication, but it wasn't really a very big change. At this point it's awfully tempting to make ALTER EXTENSION UPDATE grow a CASCADE option to allow automatic installation of new requirements of the new version, but I didn't do that here. Still no SGML doc updates. regards, tom lane diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index df49a78..59f9e21 100644 *** a/src/backend/commands/extension.c --- b/src/backend/commands/extension.c *** typedef struct ExtensionVersionInfo *** 100,113 static List *find_update_path(List *evi_list, ExtensionVersionInfo *evi_start, ExtensionVersionInfo *evi_target, bool reinitialize); static void get_available_versions_for_extension(ExtensionControlFile *pcontrol, Tuplestorestate *tupstore, TupleDesc tupdesc); static void ApplyExtensionUpdates(Oid extensionOid, ExtensionControlFile *pcontrol, const char *initialVersion, ! List *updateVersions); static char *read_whole_file(const char *filename, int *length); --- 100,124 static List *find_update_path(List *evi_list, ExtensionVersionInfo *evi_start, ExtensionVersionInfo *evi_target, + bool reject_indirect, bool reinitialize); + static Oid get_required_extension(char *reqExtensionName, + char *extensionName, + char *origSchemaName, + bool cascade, + List *parents, + bool is_create); static void get_available_versions_for_extension(ExtensionControlFile *pcontrol, Tuplestorestate *tupstore, TupleDesc tupdesc); + static Datum convert_requires_to_datum(List *requires); static void ApplyExtensionUpdates(Oid extensionOid, ExtensionControlFile *pcontrol, const char *initialVersion, ! List *updateVersions, ! char *origSchemaName, ! bool cascade, ! bool is_create); static char *read_whole_file(const char *filename, int *length); *** identify_update_path(ExtensionControlFil *** 1071,1077 evi_target = get_ext_ver_info(newVersion, &evi_list); /* Find shortest path */ ! result = find_update_path(evi_list, evi_start, evi_target, false); if (result == NIL) ereport(ERROR, --- 1082,1088 evi_target = get_ext_ver_info(newVersion, &evi_list); /* Find shortest path */ ! result = find_update_path(evi_list, evi_start, evi_target, false, false); if (result == NIL) ereport(ERROR, *** identify_update_path(ExtensionControlFil *** 1086,1091 --- 1097,1105 * Apply Dijkstra's algorithm to find the shortest path from evi_start to * evi_target. * + * If reject_indirect is true, ignore paths that go through installable + * versions (presumably, caller will consider starting from such versions). + * * If reinitialize is false, assume the ExtensionVersionInfo list has not * been used for this before, and the initialization done by get_ext_ver_info * is still good. *** static List * *** 1097,1102 --- ,1117 find_update_path(List *evi_list, ExtensionVersionInfo *evi_start, ExtensionVersionInfo *evi_target, + bool reject_in
Re: [HACKERS] ICU integration
On 9/6/16 1:40 PM, Doug Doole wrote: > We carried the ICU version numbers around on our collation and locale > IDs (such as fr_FR%icu36) . The database would load multiple versions of > the ICU library so that something created with ICU 3.6 would always be > processed with ICU 3.6. This avoided the problems of trying to change > the rules on the user. (We'd always intended to provide tooling to allow > the user to move an existing object up to a newer version of ICU, but we > never got around to doing it.) In the code, this meant we were > explicitly calling the versioned API so that we could keep the calls > straight. I understand that in principle, but I don't see operating system providers shipping a bunch of ICU versions to facilitate that. They will usually ship one. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 7, 2016 at 3:52 PM, Kuntal Ghosh wrote: > > I got two types of inconsistencies as following: > > 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup > page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after > redo, only BTP_DELETED flag is set in buffer page. > I see that inconsistency in code as well. I think this is harmless, because after the page is marked as deleted, it is not used for any purpose other than to recycle it for re-use. After re-using it, the caller always suppose to initialize the flags based on it's usage and I see that is happening in the code unless I am missing something. > I assume that we > should clear all btpo_flags before setting BTP_DELETED in > _bt_unlink_halfdead_page(). > Yeah, we can do that for consistency. If we see any problem in doing so, then I think we can log the flags and set them during replay. Note - Please post your replies inline rather than top posting them. It breaks the discussion link, if you top post it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL
On 7 September 2016 at 13:47, Fujii Masao wrote: > On Tue, Sep 6, 2016 at 11:41 PM, Simon Riggs wrote: >> Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL >> >> lazy_truncate_heap() was waiting for >> VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds >> not milliseconds as originally intended. > > Don't we need to back-patch this? If we do then a database-wide VACUUM on a busy database will take substantially longer than it does now. That may not be perceived as a "fix" by everybody, so we should not do it without an explicit agreement by many. Thoughts? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL
On Tue, Sep 6, 2016 at 11:41 PM, Simon Riggs wrote: > Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL > > lazy_truncate_heap() was waiting for > VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds > not milliseconds as originally intended. Don't we need to back-patch this? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On 6 September 2016 at 19:59, Tom Lane wrote: > The idea of looking to the stats to *guess* about how many tuples are > removable doesn't seem bad at all. But imagining that that's going to be > exact is folly of the first magnitude. Yes. Bear in mind I had already referred to allowing +10% to be safe, so I think we agree that a reasonably accurate, yet imprecise calculation is possible in most cases. If a recent transaction has committed, we will see both committed dead rows and stats to show they exist. I'm sure there are corner cases and race conditions where a major effect (greater than 10%) could occur, in which case we run the index scan more than once, just as we do now. The attached patch raises the limits as suggested by Claudio, allowing for larger memory allocations if possible, yet limits the allocation for larger tables based on the estimate gained from pg_stats, while adding 10% for caution. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services vacuum_mem_estimate.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
On Wed, Sep 7, 2016 at 3:52 PM, Kuntal Ghosh wrote: > Hello, > > As per the earlier discussions, I've attached the updated patch for > WAL consistency check feature. This is how the patch works: > The earlier patch (wal_consistency_v6.patch) was based on the commit id 67e1e2aaff. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup
On 9/7/16 8:21 AM, David Steele wrote: > On 9/6/16 10:25 PM, Michael Paquier wrote: >> I find that ugly. I'd rather use an array with undefined size for the >> fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and >> EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on >> _tarWriteHeader... > > My goal was to be able to fully reuse the code that creates the paths, > but this could also be done by following your suggestion and also moving > the path code into a function. I just realized all I did was repeat what you said... -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup
On 9/6/16 10:25 PM, Michael Paquier wrote: > On Wed, Sep 7, 2016 at 12:16 AM, David Steele wrote: >> Attached is a new patch that adds sgml documentation. I can expand on each >> directory individually if you think that's necessary, but thought it was >> better to lump them into a few categories. > > +be ommitted from the backup as they will be initialized on postmaster > +startup. If the is set and is > +under the database cluster directory then the contents of the directory > +specified by can also > be ommitted. > > s/ommitted/omitted/ Thanks! > +#define EXCLUDE_DIR_MAX 8 > +#define EXCLUDE_DIR_STAT_TMP 0 > + > +const char *excludeDirContents[EXCLUDE_DIR_MAX] = > +{ > +/* > + * Skip temporary statistics files. The first array position will be > + * filled with the value of pgstat_stat_directory relative to PGDATA. > + * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set > + * because PGSS_TEXT_FILE is always created there. > + */ > +NULL, > I find that ugly. I'd rather use an array with undefined size for the > fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and > EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on > _tarWriteHeader... My goal was to be able to fully reuse the code that creates the paths, but this could also be done by following your suggestion and also moving the path code into a function. Any opinion on this, Peter? -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/07/2016 01:13 PM, Amit Kapila wrote: > On Wed, Sep 7, 2016 at 1:08 AM, Tomas Vondra > wrote: >> On 09/06/2016 04:49 AM, Amit Kapila wrote: >>> On Mon, Sep 5, 2016 at 11:34 PM, Tomas Vondra >>> wrote: On 09/05/2016 06:03 AM, Amit Kapila wrote: > So, in short we have to compare three > approaches here. > > 1) Group mode to reduce CLOGControlLock contention > 2) Use granular locking model > 3) Use atomic operations > > For approach-1, you can use patch [1]. For approach-2, you can use > 0001-Improve-64bit-atomics-support patch[2] and the patch attached > with this mail. For approach-3, you can use > 0001-Improve-64bit-atomics-support patch[2] and the patch attached > with this mail by commenting USE_CONTENT_LOCK. If the third doesn't > work for you then for now we can compare approach-1 and approach-2. > OK, I can compile all three cases - but onl with gcc 4.7 or newer. Sadly the 4-socket 64-core machine runs Debian Jessie with just gcc 4.6 and my attempts to update to a newer version were unsuccessful so far. >>> >>> So which all patches your are able to compile on 4-socket m/c? I >>> think it is better to measure the performance on bigger machine. >> >> Oh, sorry - I forgot to mention that only the last test (with >> USE_CONTENT_LOCK commented out) fails to compile, because the functions >> for atomics were added in gcc-4.7. >> > > No issues, in that case we can leave the last test for now and do it later. > FWIW I've managed to compile a new GCC on the system (all I had to do was to actually read the damn manual), so I'm ready to do the test once I get a bit of time. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] identity columns
Hello, The first look at the patch: On 8/30/16, Peter Eisentraut wrote: > Here is another attempt to implement identity columns. This is a > standard-conforming variant of PostgreSQL's serial columns. > > ... > > Some comments on the implementation, and where it differs from previous > patches: > > - The new attidentity column stores whether a column is an identity > column and when it is generated (always/by default). I kept this > independent from atthasdef mainly for he benefit of existing (client) > code querying those catalogs, but I kept confusing myself by this, so > I'm not sure about that choice. Basically we need to store four > distinct states (nothing, default, identity always, identity by default) > somehow. I don't have a string opinion which way is preferred. I think if the community is not against it, it can be left as is. > ... > - I did not implement the restriction of one identity column per table. > That didn't seem necessary. I think it should be mentioned in docs' "Compatibility" part as a PG's extension (similar to "Zero-column Tables"). > ... > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > Questions: 1. Is your patch implements T174 feature? Should a corresponding line be changed in src/backend/catalog/sql_features.txt? 2. Initializing attidentity in most places is ' ' but makefuncs.c has "n->identity = 0;". Is it correct? 3. doc/src/sgml/ref/create_table.sgml (5th chunk) has "TODO". Why? 4. There is "ADD GENERATED", but the standard says it should be "SET GENERATED" (ISO/IEC 9075-2 Subcl.11.20) 5. In ATExecAddIdentity: is it a good idea to check whether "attTup->attidentity" is the same as the given in "(ADD) SET GENERATED" and do nothing (except changing sequence's options) in addition to strict checking for "unset" (" ")? 6. In ATExecDropIdentity: is it a good idea to do nothing if the column is already not a identity (the same behavior as DROP NOT NULL/DROP DEFAULT)? 7. Is there any reason to insert CREATE_TABLE_LIKE_IDENTITY before CREATE_TABLE_LIKE_INDEXES, not at the end? Why do you change catversion.h? It can lead conflict when other patches influence it are committed... I'll have a closer look soon. -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Alter or rename enum value
Emre Hasegeli writes: >> Bottom line here is that I'd rather commit ALTER TYPE RENAME VALUE with >> no EXISTS features and then see it accrete those features together with >> other types of RENAME, when and if there's a will to make that happen. > > This sounds like a good conclusion to me. Works for me. I mainly added the IF [NOT] EXISTS support to be consistent with ADD VALUE, and because I like idempotency, but I'm not married to it. Here is version 6 of the patch, which just adds RENAME VALUE with no IF [NOT] EXISTS, rebased onto current master (particularly the transactional ADD VALUE patch). >From e4ee07d53bbab5ee434a12a2f30a9ac9bb5c89d3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= Date: Tue, 6 Sep 2016 14:38:06 +0100 Subject: [PATCH 1/2] =?UTF-8?q?Add=20ALTER=20TYPE=20=E2=80=A6=20RENAME=20V?= =?UTF-8?q?ALUE=20=E2=80=A6=20TO=20=E2=80=A6?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Unlike adding values to an enum, altering an existing value doesn't affect the OID values or ordering, so can be done in a transaction regardless of whether the enum was created in the same transaction. --- doc/src/sgml/ref/alter_type.sgml | 18 +++- src/backend/catalog/pg_enum.c | 91 ++ src/backend/commands/typecmds.c| 12 +++-- src/backend/parser/gram.y | 14 ++ src/include/catalog/pg_enum.h | 2 + src/include/nodes/parsenodes.h | 1 + src/test/regress/expected/enum.out | 58 src/test/regress/sql/enum.sql | 27 +++ 8 files changed, 217 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml index aec73f6..bdc2fa2 100644 --- a/doc/src/sgml/ref/alter_type.sgml +++ b/doc/src/sgml/ref/alter_type.sgml @@ -29,6 +29,7 @@ ALTER TYPE name RENAME ATTRIBUTE name RENAME TO new_name ALTER TYPE name SET SCHEMA new_schema ALTER TYPE name ADD VALUE [ IF NOT EXISTS ] new_enum_value [ { BEFORE | AFTER } existing_enum_value ] +ALTER TYPE name RENAME VALUE existing_enum_value TO new_enum_value where action is one of: @@ -123,6 +124,17 @@ ALTER TYPE name ADD VALUE [ IF NOT + +RENAME VALUE TO + + + This form renames a value in an enum type. + The value's place in the enum's ordering is not affected. + An error will occur if the old value is not already present or the new value is. + + + + CASCADE @@ -241,7 +253,8 @@ ALTER TYPE name ADD VALUE [ IF NOT new_enum_value -The new value to be added to an enum type's list of values. +The new value to be added to an enum type's list of values, +or to rename an existing one to. Like all enum literals, it needs to be quoted. @@ -252,7 +265,8 @@ ALTER TYPE name ADD VALUE [ IF NOT The existing enum value that the new value should be added immediately -before or after in the enum type's sort ordering. +before or after in the enum type's sort ordering, +or renamed from. Like all enum literals, it needs to be quoted. diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c index c66f963..2e48dfe 100644 --- a/src/backend/catalog/pg_enum.c +++ b/src/backend/catalog/pg_enum.c @@ -465,6 +465,97 @@ AddEnumLabel(Oid enumTypeOid, } +/* + * RenameEnumLabel + * Rename a label in an enum set. + */ +void +RenameEnumLabel(Oid enumTypeOid, +const char *oldVal, +const char *newVal) +{ + Relation pg_enum; + HeapTuple old_tup = NULL; + HeapTuple enum_tup; + CatCList *list; + int nelems; + int nbr_index; + Form_pg_enum nbr_en; + boolfound_new = false; + + /* check length of new label is ok */ + if (strlen(newVal) > (NAMEDATALEN - 1)) + ereport(ERROR, +(errcode(ERRCODE_INVALID_NAME), + errmsg("invalid enum label \"%s\"", newVal), + errdetail("Labels must be %d characters or less.", + NAMEDATALEN - 1))); + + /* + * Acquire a lock on the enum type, which we won't release until commit. + * This ensures that two backends aren't concurrently modifying the same + * enum type. Without that, we couldn't be sure to get a consistent view + * of the enum members via the syscache. Note that this does not block + * other backends from inspecting the type; see comments for + * RenumberEnumType. + */ + LockDatabaseObject(TypeRelationId, enumTypeOid, 0, ExclusiveLock); + + pg_enum = heap_open(EnumRelationId, RowExclusiveLock); + + /* Get the list of existing members of the enum */ + list = SearchSysCacheList1(ENUMTYPOIDNAME, + ObjectIdGetDatum(enumTypeOid)); + nelems = list->n_members; + + /* Locate the element to rename and check if the new label is already in + * use. The unique index on pg_enum would catch this anyway, but we + * prefer a friendlier error message. + */ + for
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Sep 7, 2016 at 1:08 AM, Tomas Vondra wrote: > On 09/06/2016 04:49 AM, Amit Kapila wrote: >> On Mon, Sep 5, 2016 at 11:34 PM, Tomas Vondra >> wrote: >>> >>> >>> On 09/05/2016 06:03 AM, Amit Kapila wrote: So, in short we have to compare three approaches here. 1) Group mode to reduce CLOGControlLock contention 2) Use granular locking model 3) Use atomic operations For approach-1, you can use patch [1]. For approach-2, you can use 0001-Improve-64bit-atomics-support patch[2] and the patch attached with this mail. For approach-3, you can use 0001-Improve-64bit-atomics-support patch[2] and the patch attached with this mail by commenting USE_CONTENT_LOCK. If the third doesn't work for you then for now we can compare approach-1 and approach-2. >>> >>> OK, I can compile all three cases - but onl with gcc 4.7 or newer. Sadly >>> the 4-socket 64-core machine runs Debian Jessie with just gcc 4.6 and my >>> attempts to update to a newer version were unsuccessful so far. >>> >> >> So which all patches your are able to compile on 4-socket m/c? I >> think it is better to measure the performance on bigger machine. > > Oh, sorry - I forgot to mention that only the last test (with > USE_CONTENT_LOCK commented out) fails to compile, because the functions > for atomics were added in gcc-4.7. > No issues, in that case we can leave the last test for now and do it later. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP
On Wed, 7 Sep 2016 17:09:17 +0900 Michael Paquier wrote: > On Sun, Sep 4, 2016 at 11:39 PM, Andreas Karlsson > wrote: > > 1) Serialize the certificates, key, and CRL and write them to the > > backend_var temp file and then deserialize everything in the > > backends. > > > > Sounds like you would need to write some code for every SSL library > > to support the serialization and deserialization, which I am not a > > fan of doing just for one platform since I worry about little used > > code paths. Additionally this would mean that we write a copy of > > the private key to potentially another file system than the one > > where the private key is stored, this sounds like a bad idea from a > > security point of view. > > Yeah... This would result in something that is heavily SSL-dependent, > which would be an additional maintenance pain when trying to support > future OpenSSL versions. OpenSSL has documented API for serializing/deserializing each and every cryptographic format it supports. And this API quite unlikely to change in future OpenSSL versions. Moreover, LibreSSL is compatible with this API as far as I know. Really, Apache does simular thing for ages - it starts as root, loads certificates and keys, serializes them in memory, then forks and drops privileges, and then uses these keys and certificates. There are two problems with this approach 1. You have to carefully design data structures to store serialized keys. Apache made a mistake there and didn't allow for future invention of new public key algorithms. So, in 2008 I had problems adding russian GOST cryptography there. 2. You keys and certificates might not be stored in the filesystem at all. They can live in some hardware cryptography module, which don't let private keys out, just provide some handle. (OpenSSL supports it via lodable engine modules, and Postgres allows to use engine modules since 8.2, so people can use it with postgres). Really OpenSSL/LibreSSL provide useful abstraction of memory BIO (Where BIO stands for basic input/output) which can be used to store serialized cryptographic data. And serializing/deserializing API is designed to work with BIO anyway. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Write Ahead Logging for Hash Indexes
> Thanks to Ashutosh Sharma for doing the testing of the patch and > helping me in analyzing some of the above issues. Hi All, I would like to summarize the test-cases that i have executed for validating WAL logging in hash index feature. 1) I have mainly ran the pgbench test with read-write workload at the scale factor of 1000 and various client counts like 16, 64 and 128 for time duration of 30 mins, 1 hr and 24 hrs. I have executed this test on highly configured power2 machine with 128 cores and 512GB of RAM. I ran the test-case both with and without the replication setup. Please note that i have changed the schema of pgbench tables created during initialisation phase. The new schema of pgbench tables looks as shown below on both master and standby: postgres=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Modifiers --+---+--- aid | integer | not null bid | integer | abalance | integer | filler | character(84) | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) "pgbench_accounts_bid" hash (bid) postgres=# \d pgbench_history Table "public.pgbench_history" Column |Type | Modifiers +-+--- tid| integer | bid| integer | aid| integer | delta | integer | mtime | timestamp without time zone | filler | character(22) | Indexes: "pgbench_history_bid" hash (bid) 2) I have also made use of the following tools for analyzing the hash index tables created on master and standby. a) pgstattuple : Using this tool, i could verfiy the tuple level statistics which includes index table physical length, dead tuple percentageand other infos on both master and standby. However, i could see below error message when using this tool for one of the hash index table on both master and standby server. postgres=# SELECT * FROM pgstattuple('pgbench_history_bid'); ERROR: index "pgbench_history_bid" contains unexpected zero page at block 2482297 To investigate on this, i made use of pageinspect contrib module and came to know that above block contains an uninitialised page. But, this is quite possible in hash index as here the bucket split happens in the power-of-2 and we may find some of the uninitialised bucket pages. b) pg_filedump : Using this tool, i could take a dump of all the hash index tables on master and standy and compare the dump file to verify if there is any difference between hash index pages on both master and standby. In short, this is all i did to verify the patch for wal logging in hash index. I would like thank Amit and Robert for their valuable inputs during the hash index testing phase. I am also planning to verify wal logging in hash index using Kuntal's patch for WAL consistency check once it is stablized. With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi, On 2016/09/07 17:56, Rajkumar Raghuwanshi wrote: > Hi, > > I have a query regarding list partitioning, > > For example if I want to store employee data in a table, with "IT" dept > employee in emp_p1 partition, "HR" dept employee in emp_p2 partition and if > employee belongs to other than these two, should come in emp_p3 partition. > > In this case not sure how to create partition table. Do we have something > like we have UNBOUNDED for range partition or oracle have "DEFAULT" for > list partition. > > create table employee (empid int, dept varchar) partition by list(dept); > create table emp_p1 partition of employee for values in ('IT'); > create table emp_p2 partition of employee for values in ('HR'); > create table emp_p3 partition of employee for values in (??); Sorry, no such feature is currently offered. It might be possible to offer something like a "default" list partition which accepts values other than those specified for other existing partitions. However, that means if we add a non-default list partition after a default one has been created, the implementation must make sure that it moves any values from the default partition that now belong to the newly created partition. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL consistency check facility
Hello, As per the earlier discussions, I've attached the updated patch for WAL consistency check feature. This is how the patch works: - If WAL consistency check is enabled for a rmgrID, we always include the backup image in the WAL record. - I've extended the RmgrTable with a new function pointer rm_checkConsistency, which is called after rm_redo. (only when WAL consistency check is enabled for this rmgrID) - In each rm_checkConsistency, both backup pages and buffer pages are masked accordingly before any comparison. - In postgresql.conf, a new guc variable named 'wal_consistency' is added. Default value of this variable is 'None'. Valid values are combinations of Heap2, Heap, Btree, Hash, Gin, Gist, Sequence, SPGist, BRIN, Generic and XLOG. It can also be set to 'All' to enable all the values. - In recovery tests (src/test/recovery/t), I've added wal_consistency parameter in the existing scripts. This feature doesn't change the expected output. If there is any inconsistency, it can be verified in corresponding log file. Results I've tested with installcheck and installcheck-world in master-standby set-up. Followings are the configuration parameters. Master: wal_level = replica max_wal_senders = 3 wal_keep_segments = 4000 hot_standby = on wal_consistency = 'All' Standby: wal_consistency = 'All' I got two types of inconsistencies as following: 1. For Btree/UNLINK_PAGE_META, btpo_flags are different. In backup page, BTP_DELETED and BTP_LEAF both the flags are set, whereas after redo, only BTP_DELETED flag is set in buffer page. I assume that we should clear all btpo_flags before setting BTP_DELETED in _bt_unlink_halfdead_page(). 2. For BRIN/UPDATE+INIT, block numbers (in rm_tid[0]) are different in REVMAP page. This happens only for two cases. I'm not sure what the reason can be. I haven't done sufficient tests yet to measure the overhead of this modification. I'll do that next. Thanks to Amit Kapila, Dilip Kumar and Robert Haas for their off-line suggestions. Thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com On Thu, Sep 1, 2016 at 11:34 PM, Peter Geoghegan wrote: > On Thu, Sep 1, 2016 at 9:23 AM, Robert Haas wrote: >> Indeed, it had occurred to me that we might not even want to compile >> this code into the server unless WAL_DEBUG is defined; after all, how >> does it help a regular user to detect that the server has a bug? Bug >> or no bug, that's the code they've got. But on further reflection, it >> seems like it could be useful: if we suspect a bug in the redo code >> but we can't reproduce it here, we could ask the customer to turn this >> option on to see whether it produces logging indicating the nature of >> the problem. However, because of the likely expensive of enabling the >> feature, it seems like it would be quite desirable to limit the >> expense of generating many extra FPWs to the affected rmgr. For >> example, if a user has a table with a btree index and a gin index, and >> we suspect a bug in GIN, it would be nice for the user to be able to >> enable the feature *only for GIN* rather than paying the cost of >> enabling it for btree and heap as well.[2] > > Yes, that would be rather a large advantage. > > I think that there really is no hard distinction between users and > hackers. Some people will want to run this in production, and it would > be a lot better if performance was at least not atrocious. If amcheck > couldn't do the majority of its verification with only an > AccessShareLock, then users probably just couldn't use it. Heroku > wouldn't have been able to use it on all production databases. It > wouldn't have mattered that the verification was no less effective, > since the bugs it found would simply never have been observed in > practice. > > -- > Peter Geoghegan -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c index 27ba0a9..4c63ded 100644 --- a/src/backend/access/brin/brin_xlog.c +++ b/src/backend/access/brin/brin_xlog.c @@ -14,7 +14,7 @@ #include "access/brin_pageops.h" #include "access/brin_xlog.h" #include "access/xlogutils.h" - +#include "storage/bufmask.h" /* * xlog replay routines @@ -286,3 +286,84 @@ brin_redo(XLogReaderState *record) elog(PANIC, "brin_redo: unknown op code %u", info); } } + +/* + * It checks whether the current buffer page and backup page stored in the + * WAL record are consistent or not. Before comparing the two pages, it applies + * appropiate masking to the pages to ignore certain areas like hint bits, + * unused space between pd_lower and pd_upper etc. For more information about + * masking, see the masking function. + * This function should be called once WAL replay has been completed. + */ +void +brin_checkConsistency(XLogReaderState *record) +{ + uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK; + int block_id; + RelFileNode
Re: [HACKERS] Write Ahead Logging for Hash Indexes
On Wed, Sep 7, 2016 at 3:28 PM, Amit Kapila wrote: > > Okay, I have fixed this issue as explained above. Apart from that, I > have fixed another issue reported by Mark Kirkwood upthread and few > other issues found during internal testing by Ashutosh Sharma. > Forgot to mention that you need to apply the latest concurrent hash index patch [1] before applying this patch. [1] - https://www.postgresql.org/message-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm%2Bqn0jZX31-obXrJw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 07/09/16 02:56, Peter Eisentraut wrote: Review of 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch: Similar concerns as before about ALTER syntax, e.g., does ALTER SUBSCRIPTION ... PUBLICATION add to or replace the publication set? It sets. For that matter, why is there no way to add? Didn't seem all that useful, the expectation here is that most subscriptions will use one couple of publications. I think we should allow creating subscriptions initally without publications. This could be useful for example to test connections, or create slots before later on adding publications. Seeing that there is support for changing the publications later, this shouldn't be a problem. Sure, but they need to be created disabled then. Larger conceptual issues: I haven't read the rest of the code yet to understand why pg_subscription needs to be a shared catalog, but be that as it may, I would still make it so that subscription names appear local to the database. We already have the database OID in the pg_subscription catalog, so I would make the key (subname, subdatid). DDL commands would only operate on subscriptions in the current database (or you could use "datname"."subname" syntax), and \drs would only show subscriptions in the current database. That way the shared catalog is an implementation detail that can be changed in the future. I think it would be very helpful for users if publications and subscriptions appear to work in a parallel way. If I have two databases that I want to replicate between two servers, I might want to have a publication "mypub" in each database and a subscription "mysub" in each database. If I learn that the subscriptions can't be named that way, then I have to go back to rename to the publications, and it'll all be a bit frustrating. Okay that makes sense. The pg_subscription is shared catalog so that we can have one launcher per cluster instead one per database. Otherwise there is no reason why it could not behave like local catalog. Some thoughts on pg_dump and such: Even an INITIALLY DISABLED subscription needs network access to create the replication slot. So restoring a dump when the master is not available will have some difficulties. And restoring master and slave at the same time (say disaster recovery) will not necessarily work well either. Also, the general idea of doing network access during a backup restore without obvious prior warning sounds a bit unsafe. I imagine maybe having three states for subscriptions: DISABLED, PREPARED, ENABLED (to use existing key words). DISABLED just exists in the catalog, PREPARED has the slots set up, ENABLED starts replicating. So you can restore a dump with all slots disabled. And then it might be good to have a command to "prepare" or "enable" all subscriptions at once. Well the DISABLED keyword is also used in alter to stop the subscription but not remove it, that would not longer map well if we used the behavior you described. That being said I agree with the idea of having subscription that exists just locally in catalog, we just need to figure out better naming. As I had mentioned privately before, I would perhaps have CREATE SUBSCRIPTION use the foreign server infrastructure for storing connection information. Hmm, yeah it's an idea. My worry there is that it will make it bit more complex to setup as user will have to first create server and user mapping before creating subscription. We'll have to keep thinking about ways to handle abandonded replication slots. I imagine that people will want to create subscriber instances in fully automated ways. If that fails every so often and requires manual cleanup of replication slots on the master some of the time, that will get messy. I don't have well-formed ideas about this, though. Yes it's potential issue, don't have good solution for it either. It's loosely coupled system so we can't have 100% control over everything. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Hi, I have a query regarding list partitioning, For example if I want to store employee data in a table, with "IT" dept employee in emp_p1 partition, "HR" dept employee in emp_p2 partition and if employee belongs to other than these two, should come in emp_p3 partition. In this case not sure how to create partition table. Do we have something like we have UNBOUNDED for range partition or oracle have "DEFAULT" for list partition. create table employee (empid int, dept varchar) partition by list(dept); create table emp_p1 partition of employee for values in ('IT'); create table emp_p2 partition of employee for values in ('HR'); create table emp_p3 partition of employee for values in (??); Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation On Tue, Sep 6, 2016 at 6:37 PM, Amit Langote wrote: > On Tue, Sep 6, 2016 at 9:19 PM, Robert Haas wrote: > > On Wed, Aug 31, 2016 at 1:05 PM, Amit Langote > > wrote: > >>> However, it seems a lot better to make it a property of the parent > >>> from a performance point of view. Suppose there are 1000 partitions. > >>> Reading one toasted value for pg_class and running stringToNode() on > >>> it is probably a lot faster than scanning pg_inherits to find all of > >>> the child partitions and then doing an index scan to find the pg_class > >>> tuple for each and then decoding all of those tuples and assembling > >>> them into some data structure. > >> > >> Seems worth trying. One point that bothers me a bit is how do we > enforce > >> partition bound condition on individual partition basis. For example > when > >> a row is inserted into a partition directly, we better check that it > does > >> not fall outside the bounds and issue an error otherwise. With current > >> approach, we just look up a partition's bound from the catalog and gin > up > >> a check constraint expression (and cache in relcache) to be enforced in > >> ExecConstraints(). With the new approach, I guess we would need to look > >> up the parent's partition descriptor. Note that the checking in > >> ExecConstraints() is turned off when routing a tuple from the parent. > > > > [ Sorry for the slow response. ] > > > > Yeah, that's a problem. Maybe it's best to associate this data with > > the childrels after all - or halfway in between, e.g. augment > > pg_inherits with this information. After all, the performance problem > > I was worried about above isn't really much of an issue: each backend > > will build a relcache entry for the parent just once and then use it > > for the lifetime of the session unless some invalidation occurs. So > > if that takes a small amount of extra time, it's probably not really a > > big deal. On the other hand, if we can't build the implicit > > constraint for the child table without opening the parent, that's > > probably going to cause us some serious inconvenience. > > Agreed. So I will stick with the existing approach. > > Thanks, > Amit > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP
On Sun, Sep 4, 2016 at 11:39 PM, Andreas Karlsson wrote: > 1) Serialize the certificates, key, and CRL and write them to the > backend_var temp file and then deserialize everything in the backends. > > Sounds like you would need to write some code for every SSL library to > support the serialization and deserialization, which I am not a fan of doing > just for one platform since I worry about little used code paths. > Additionally this would mean that we write a copy of the private key to > potentially another file system than the one where the private key is > stored, this sounds like a bad idea from a security point of view. Yeah... This would result in something that is heavily SSL-dependent, which would be an additional maintenance pain when trying to support future OpenSSL versions. > 2) Copy all the SSL related files into the data directory at SIGHUP, before > loading them. While this does not require any serialization of certificates > it still has the problem of writing private keys to disk. You expressed enough concern about that upthread, copying private keys into PGDATA is a security concern. > 3) Leave my patch as it is now. This means the postmaster will reload > certificates on SIGHUP while the backends will also load them when spawning. > This means windows will continue to work the same as before my patch. > > Is there any other way to pass the current set of loaded certificates and > keys from the postmaster to the backends on Windows? I guess you could use a > pipe, but if so we should probably send all data on this pipe, not just the > SSL stuff. > > I am leaning towards doing (3) but I know I am biased since it is less work > and I do not care much for Windows. Seriously... The benefit of this feature is clear for a lot of people. And the implementation dedicated only to Windows would just result in a grotty thing anyway. So I'd say that at this point we could just push for 3) and facilitate the life of most with SSL configuration. The behavior across platforms needs to be properly documented for sure. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
> On 07 Sep 2016, at 03:09, Michael Paquier wrote: > >>> On 06 Sep 2016, at 12:03, Michael Paquier wrote: >>> >>> On Tue, Sep 6, 2016 at 5:58 PM, Stas Kelvich >>> wrote: Oh, I was preparing new version of patch, after fresh look on it. Probably, I should said that in this topic. I’ve found a bug in sub transaction handling and now working on fix. >>> >>> What's the problem actually? >> >> Handling of xids_p array in PrescanPreparedTransactions() is wrong for >> prepared tx's in memory. >> Also I want to double-check and add comments to RecoveryInProgress() checks >> in FinishGXact. >> >> I’ll post reworked patch in several days. > > Could you use as a base the version I just sent yesterday then? I > noticed many mistakes in the comments, for example many s/it's/its/ > and did a couple of adjustments around the code, the goto next_file > was particularly ugly. That will be that much work not do to again > later. Yes. Already merged branch with your version. -- Stas Kelvich Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Forbid use of LF and CR characters in database and role names
On Wed, Sep 7, 2016 at 2:32 AM, Tom Lane wrote: > I think probably a better answer is to reject bad paths earlier, eg have > initdb error out before doing anything if the proposed -D path contains > CR/LF. Yes, that's a bug that we had better address. It is not nice to not clean up PGDATA in this case. > Given the collection of security bugs we just fixed, and the > strong likelihood that user-written scripts contain other instances of > the same type of problem, I do not think we are doing anybody any favors > if we sweat bullets to support such paths. Yep. Database and role names are the two patterns defined by the user at SQL level that can be reused by command lines, so it still seems to me that the patch I sent is the correct call.. > And I'm not even convinced > that we *can* support such paths on Windows; no one was able to find a > safe shell quoting solution there. None has been found as far as I know, the problem gets into Windows core with paths passed to cmd.exe which cannot handle those two characters correctly. > And yeah, the docs probably need work. Patch 0001 is what I have done for the database and role names. Patch 0002 adds a routine in string_utils.c to check if a string given is valid for shell commands or not. I added a call of that to initdb.c, which is at least what we'd want to check because it calls directly appendShellString. Now I am a bit reluctant to spread that elsewhere... Users would get the message only with initdb if they see the problem. I have added a note in the page of initdb as well, but that does not sound enough to me, we may want to add a warning in a more common plase. Does somebody have an idea where we could add that. Also, in 0001 I have cleared the docs to refer to newline and carriage return characters instead of CR and LF. -- Michael From 016a8168b39b67a6468ee1a752ee9ec82cf3fecb Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 7 Sep 2016 16:39:57 +0900 Subject: [PATCH 1/2] Forbid newline and carriage return characters in database and role names --- doc/src/sgml/ref/create_database.sgml | 7 +++ doc/src/sgml/ref/create_role.sgml | 7 +++ src/backend/commands/dbcommands.c | 23 +++ src/backend/commands/user.c | 21 + src/fe_utils/string_utils.c | 4 +--- 5 files changed, 59 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml index cf33746..2477ba7 100644 --- a/doc/src/sgml/ref/create_database.sgml +++ b/doc/src/sgml/ref/create_database.sgml @@ -260,6 +260,13 @@ CREATE DATABASE name connection slot remains for the database, it is possible that both will fail. Also, the limit is not enforced against superusers. + + +Database names cannot include newline or carriage return characters +as those could be at the origin of security breaches, particularly on +Windows where the command shell is unusable with arguments containing +such characters. + diff --git a/doc/src/sgml/ref/create_role.sgml b/doc/src/sgml/ref/create_role.sgml index 38cd4c8..9d6926e 100644 --- a/doc/src/sgml/ref/create_role.sgml +++ b/doc/src/sgml/ref/create_role.sgml @@ -404,6 +404,13 @@ CREATE ROLE name [ [ WITH ] \password that can be used to safely change the password later. + + + Role names cannot include newline or carriage return characters + as those could be at the origin of security breaches, particularly on + Windows where the command shell is unusable with arguments containing + such characters. + diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index ef48659..22712ec 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -77,6 +77,7 @@ typedef struct } movedb_failure_params; /* non-export function prototypes */ +static void check_db_name(const char *dbname); static void createdb_failure_callback(int code, Datum arg); static void movedb(const char *dbname, const char *tblspcname); static void movedb_failure_callback(int code, Datum arg); @@ -469,6 +470,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) /* Note there is no additional permission check in this path */ } + /* do sanity checks on database name */ + check_db_name(dbname); + /* * Check for db name conflict. This is just to give a more friendly error * message than "unique index violation". There's a race condition but @@ -758,6 +762,22 @@ check_encoding_locale_matches(int encoding, const char *collate, const char *cty pg_encoding_to_char(collate_encoding; } +/* + * Perform sanity checks on the database name. + */ +static void +check_db_name(const char *dbname) +{ + if (strchr(dbname, '\n') != NULL) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("database name cannot include newline character"))); + if (strchr(dbname, '\r') != NULL) + ereport
Re: [HACKERS] [PATCH] Alter or rename enum value
> Bottom line here is that I'd rather commit ALTER TYPE RENAME VALUE with > no EXISTS features and then see it accrete those features together with > other types of RENAME, when and if there's a will to make that happen. This sounds like a good conclusion to me. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stopping logical replication protocol
>Review comments on the 2nd patch, i.e. the 2nd half of your original patch: > >* Other places in logical decoding use the CB suffix for callback >types. This should do the same. > >* I'm not too keen on the name is_active for the callback. We >discussed the name continue_decoding_cb in our prior conversation. > >* Instead of having LogicalContextAlwaysActive, would it be better to >test if the callback pointer is null and skip it if so? > >* Lots of typos in LogicalContextAlwaysActive comments, and wording is unclear > >* comment added to arguments of pg_logical_slot_get_changes_guts is >not in PostgreSQL style - it goes way wide on the line. Probably >better to put the comment above the call and mention which argument it >refers to. > >* A comment somewhere is needed - probably on the IsStreamingActive() >callback - to point readers at the fact that WalSndWriteData() calls >ProcessRepliesIfAny() and that's how IsStreamingActive can get set. I >had to look at the email discussion history to remind myself how this >worked when I was re-reading the code. > >* I'm not keen on > >typedef ReorderBufferIsActive LogicalDecondingContextIsActive; > > I think instead a single callback name that encompasses both should >be used. ContinueDecodingCB ? Fixed patch in attach. > * There are no tests. I don't see any really simple way to test this, though. I will be grateful if you specify the best way how to do it. I tests this patches by pgjdbc driver tests that also build on head of postgres. You can check test scenario for both patches in PRhttps://github.com/pgjdbc/ pgjdbc/pull/550 org.postgresql.replication.LogicalReplicationTest#testDuringSendBigTransactionReplicationStreamCloseNotActive org.postgresql.replication.LogicalReplicationTest#testAfterCloseReplicationStreamDBSlotStatusNotActive 2016-09-06 4:10 GMT+03:00 Craig Ringer : > On 25 August 2016 at 13:04, Craig Ringer wrote: > > > By the way, I now think that the second part of your patch, to allow > > interruption during ReorderBufferCommit processing, is also very > > desirable. > > I've updated your patch, rebasing it on top of 10.0 master and > splitting it into two pieces. I thought you were planning on following > up with an update to the second part, but maybe that was unclear from > our prior discussion, so I'm doing this to help get this moving again. > > The first patch is what I posted earlier - the part of your patch that > allows the walsender to exit COPY BOTH mode and return to command mode > in response to a client-initiated CopyDone when it's in the > WalSndLoop. For logical decoding this means that it will notice a > client CopyDone when it's between transactions or while it's ingesting > transaction changes. It won't react to CopyDone while it's in > ReorderBufferCommit and sending a transaction to an output plugin > though. > > The second piece is the other half of your original patch. Currently > unmodified from what you wrote. It adds a hook in ReorderBufferCommit > that calls back into the walsender to check for new client input and > interrupt processing. I haven't yet investigated this in detail. > > > Review comments on the 2nd patch, i.e. the 2nd half of your original patch: > > * Other places in logical decoding use the CB suffix for callback > types. This should do the same. > > * I'm not too keen on the name is_active for the callback. We > discussed the name continue_decoding_cb in our prior conversation. > > * Instead of having LogicalContextAlwaysActive, would it be better to > test if the callback pointer is null and skip it if so? > > * Lots of typos in LogicalContextAlwaysActive comments, and wording is > unclear > > * comment added to arguments of pg_logical_slot_get_changes_guts is > not in PostgreSQL style - it goes way wide on the line. Probably > better to put the comment above the call and mention which argument it > refers to. > > * A comment somewhere is needed - probably on the IsStreamingActive() > callback - to point readers at the fact that WalSndWriteData() calls > ProcessRepliesIfAny() and that's how IsStreamingActive can get set. I > had to look at the email discussion history to remind myself how this > worked when I was re-reading the code. > > * I'm not keen on > > typedef ReorderBufferIsActive LogicalDecondingContextIsActive; > > I think instead a single callback name that encompasses both should > be used. ContinueDecodingCB ? > > * There are no tests. I don't see any really simple way to test this, > though. > > I suggest that you apply these updated/split patches to a fresh copy > of master then see if you can update it based on this feedback. > Something like: > > git checkout master > git pull > git checkout -b stop-decoding-v10 > git am /path/to/0001-Respect-client-initiated-CopyDone-in-walsender.patch > git am /path/to/0002-Second-part-of-Vladimir-Gordiychuk-s-patch.patch > > then modify the code per the above, and "git commit --amend" the > changes and send an updated set of pa
Re: [HACKERS] Supporting SJIS as a database encoding
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro > Thanks, by the way, there's another issue related to SJIS conversion. MS932 > has several characters that have multiple code points. By converting texts > in this encoding to and from Unicode causes a round-trop problem. For > example, > > 8754(ROMAN NUMERICAL I in NEC specials) > => U+2160(ROMAN NUMERICAL I) > => FA4A (ROMAN NUMERICA I in IBM extension) > > My counting said that 398 characters are affected by this kind of replacement. > Addition to that, "GAIJI" (Private usage area) is not allowed. Is this meet > your purpose? Supporting GAIJI is not a requirement as far as I know. Thank you for sharing information. # I realize my lack of knowledge about character sets... Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
On 7 September 2016 at 14:44, Pavel Stehule wrote: >> >> Suggested comment: >> >> /* >> * This is the parsenode for a column definition in a table-expression >> like XMLTABLE. >> * >> * We can't re-use ColumnDef here; the utility command column >> definition has all the >> * wrong attributes for use in table-expressions and just doesn't make >> sense here. >> */ >> typedef struct TableExprColumn >> { >> ... >> }; >> >> ? >> >> Why "RawCol" ? What does it become when it's not "raw" anymore? Is >> that a reference to ColumnDef's raw_default and cooked_default for >> untransformed vs transformed parse-trees? > > > My previous reply was wrong - it is used by parser only and holds TypeName > field. The analogy with ColumnDef raw_default is perfect. Cool, lets just comment that then. I'll wait on an updated patch per discussion to date. Hopefully somebody else with more of a clue than me can offer better review of the executor/view/caching part you specifically called out as complex. Otherwise maybe it'll be clearer in a revised version. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting SJIS as a database encoding
Hello, At Tue, 6 Sep 2016 03:43:46 +, "Tsunakawa, Takayuki" wrote in <0A3221C70F24FB45833433255569204D1F5E66CE@G01JPEXMBYT05> > > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro > > HORIGUCHI > Implementing radix tree code, then redefining the format of mapping table > > to suppot radix tree, then modifying mapping generator script are needed. > > > > If no one oppse to this, I'll do that. > > +100 > Great analysis and your guts. I very much appreciate your trial! Thanks, by the way, there's another issue related to SJIS conversion. MS932 has several characters that have multiple code points. By converting texts in this encoding to and from Unicode causes a round-trop problem. For example, 8754(ROMAN NUMERICAL I in NEC specials) => U+2160(ROMAN NUMERICAL I) => FA4A (ROMAN NUMERICA I in IBM extension) My counting said that 398 characters are affected by this kind of replacement. Addition to that, "GAIJI" (Private usage area) is not allowed. Is this meet your purpose? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for lazy_scan_heap
On 7 September 2016 at 04:13, Masahiko Sawada wrote: > Since current HEAD could scan visibility map twice, the execution time > of Patched is approximately half of HEAD. Sounds good. To ensure we are doing exactly same amount of work as before, did you see the output of VACUUM VEROBOSE? Can we produce a test that verifies the result patched/unpatched? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST penalty functions [PoC]
Oh, sorry, made one more attemp and now I see your algorithm differently. So you propose to use oE and oV as a markers of borders for what I call Realm. But there may be very little significant bits in one of this ranges. pg_sphere and PostGiS extensions tried to use 1 as a marker, with alike formula. And that generated not a good tree. Here's how they done it https://github.com/postgis/postgis/commit/9569e898078eeac8928891e8019ede2cbf27d06f I'll try to compose a patch for your formula later, I think we should just try and see, it is easier than to reason about digital floating points. Best regards, Andrey Borodin, Octonica & Ural Federal University. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers