Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Sun, Sep 1, 2013 at 4:06 AM, Andres Freund wrote: >> We're looking for the first duplicate. So it would probably be okay >> for the IGNORE case to not bother retrying and re-locking if the other >> transaction committed (which, at least very broadly speaking, is the >> likely outcome). > > Hm. Could you explain this a bit more? Do you mean that we can stop > trying to insert after the first violation (yes, sure if so)? Yeah, that's all I mean. Nothing controversial there. > I can't really agree with this part. First off, a heap_insert() > certainly isn't lightweight operation. There's several things that can > take extended time: > * toasting of rows (writing a gigabyte worth of data...) > * extension of the heap/toast table > * writeout of a dirty victim buffer if the page isn't in s_b > * reading in the target page if the page isn't in s_b > * waiting for an exclusive lock on the target pages These are all valid concerns. I'll need to better assess just how bad this can get (hopefully with help from others). But some of this stuff can surely be done separately for my case. For example, surely we could make TOASTing occur after index insertion proper when locks are released, without too much work. > Secondly, you seem to be saying that such a lock would only block other > sessions from finishing of tuple insertions - but it will block normal > index searches as well? I have a hard time accepting the fact that one > session using INSERT ... KEY IGNORE will block lookup of a range of values > from the index in *other* sessions. I'm sorry that I created the impression that I was denying an impact on read queries as compared to regular insertion. Of course, regular insertion also blocks read queries for very small intervals, and there could be quite a bit of work to be done with that exclusive lock held already. I am only proposing increasing that period for this case, and usually by not terribly much. > I'd expect that with the current implementation strategy if you > e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever > actually hitting duplicates, you'd see a very, very noticeable hit in > scalability. I wouldn't expect a single session to get much slower bug > trying a dozen or two should show a very noticeable dip. I do not accept the premise of your questions regarding the patch's performance, which appears to be that it's fair to expect the same level of performance of INSERT .,. ON DUPLICATE IGNORE as it is to expect of regular INSERT. No scheme is going to get that, including your scheme, which necessitates WAL logging everything related to index tuple insertion twice for each unique index. We certainly didn't require SSI to perform as well as the current REPEATABLE READ, and certainly not as well as READ COMMITTED, and for good reasons. That said, I'm not dismissive of these concerns. As I said already, by all means, let us scrutinise the performance of the patch. I'm not going to bother benchmarking the patch myself until I give it another whack. There is some really obvious low-hanging performance fruit. Obviously by this I mean doing as little as possible when holding the exclusive lock. > I think we could optimize away the second search with some > cleverness. If we remember & pin the page from the first search we know > that the promise will have to be either on that page or - after a page > split - potentially to ones on the right. Which we should be able to find > by repeatedly stepping right. Similar to logic of doing the > _bt_moveright() in _bt_doinsert(). That may well be possible, but we'd still have to do a binary search on the page. And under your proposal, that will happen twice - each time with an exclusive buffer lock held. Since our intent would be to update in-place, the entire binary search operation would need a full write/exclusive lock the second time around as well. You might even end up doing more work with an exclusive/write lock held than under my proposal, before you even factor in the other increased costs (granted, that's not that likely, but you're still doing much more with exclusive locks held than the regular insertion case). What you describe here is the kind of trick that could be very useful to another, existing common cases -- blocking on an xid in the event of a possible duplicate (as you know, it's okay to hold pins pretty much indefinitely) -- and yet has remained unimplemented for many years. > I am pretty sure it would be better. We know that nbtree works pretty > damn well under concurrency. And the "promise" logic wouldn't change > anything of that. It would have multiple extra steps. Including WAL-logging of the initial insertion, and clean-up for the IGNORE case. And WAL-logging of the cleanup. Instead of doing exactly nothing extra in the IGNORE case, which, under my proposal, has an additional duration of the exclusive lock being held of zero (instead of the ereport() call we just immediately release the lock and report
Re: [HACKERS] backup.sgml-cmd-v003.patch
On 09/02/2013 10:56:54 PM, Karl O. Pinc wrote: > I have frobbed your to adjust the indentation and > line-wrap style. Oops. Somehow left a \ out of this. Anyhow, you get the idea. Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Further XLogInsert scaling tweaking
On Mon, Sep 2, 2013 at 10:14:03AM +0300, Heikki Linnakangas wrote: > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index 39c58d0..28e62ea 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -428,8 +428,14 @@ typedef struct XLogCtlInsert > uint64 CurrBytePos; > uint64 PrevBytePos; > > - /* insertion slots, see above for details */ > - XLogInsertSlotPadded *insertSlots; > + /* > + * Make sure the above heavily-contended spinlock and byte positions are > + * on their own cache line. In particular, the RedoRecPtr and full page > + * write variables below should be on a different cache line. They are > + * read on every WAL insertion, but updated rarely, and we don't want > + * those reads to steal the cache line containing Curr/PrevBytePos. > + */ > + charpad[128]; Do we adjust for cache line lengths anywhere else? PGPROC? Should it be a global define? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory
On Tue, Sep 03, 2013 at 12:52:22AM +0200, Andres Freund wrote: > On 2013-09-01 12:07:04 -0400, Noah Misch wrote: > > On Sun, Sep 01, 2013 at 05:08:38PM +0200, Andres Freund wrote: > > > On 2013-09-01 09:24:00 -0400, Noah Misch wrote: > > > > The difficulty depends on whether processes other than the segment's > > > > creator > > > > will attach anytime or only as they start. Attachment at startup is > > > > enough > > > > for parallel query, but it's not enough for something like lock table > > > > expansion. I'll focus on the attach-anytime case since it's more > > > > general. > > > > > > Even on startup it might get more complicated than one immediately > > > imagines on EXEC_BACKEND type platforms because their memory layout > > > doesn't need to be the same. The more shared memory you need, the harder > > > that will be. Afair > > > > Non-Windows EXEC_BACKEND is already facing a dead end that way. > > Not sure whether you mean non-windows EXEC_BACKEND isn't going to be > supported for much longer or that it already has problems. It already has problems: ASLR measures sometimes prevent reattachment of the main shared memory segment. Multiplying the combined size of our fixed-address mappings does not push us over some threshold where this becomes a problem, because it is already a problem. > > > Note that allocating a large mapping, even without using it, has > > > noticeable cost, at least under linux. The kernel has to create & copy > > > data to track each pages state (without copying the memory content's > > > itself due to COW) for every fork afterwards. > So, after reading up on the issue a bit more and reading some more > kernel code, a large mmap(PROT_NONE, MAP_PRIVATE) won't cause much > problems except counting in ulimit -v. It will *not* cause overcommit > violations. mmap(PROT_NONE, MAP_SHARED) will tho, even if not yet > faulted. Which means that to be reliable and not violate overcommit we'd > need to munmap() a chunk of PROT_NONE, MAP_PRIVATE memory, and > immediately (without interceding mallocs, using mmap itself) map it again. > > It only gets really expensive in the sense of making fork expensive if > you set protections on many regions in that mapping individually. Each > mprotect() call will split the VMA into distinct pieces and they won't > get merged even if there are neighboors with the same settings. Thanks for researching that. > > > > I don't foresee fundamental differences on 32-bit. All the allocation > > > > maximums scale down, but that's the usual story for 32-bit. > > > > > > If you actually want to allocate memory after starting up, without > > > carving a section out for that from the beginning, the memory > > > fragmentation will make it very hard to find memory addresses of the > > > same across processes. > > > > True. I wouldn't feel bad if total dynamic shared memory usage above, say, > > 256 MiB were unreliable on 32-bit. If you're still running 32-bit in 2015, > > you probably have a low-memory platform. > > Not sure. I think that will partially depend on whether x32 will have > any success which I still find hard to judge. I won't hold my breath for x32 becoming a common platform for high-memory database servers, regardless of other successes it might find. Not impossible, but I recommend placing trivial priority on maximizing performance for that scenario. > > I think the take-away is that we have a lot of knobs available, not a bright > > line between possible and impossible. Robert opted to omit provision for > > reliable fixed addresses, and the upsides of that decision are the absence > > of > > a DBA-unfriendly space-reservation GUC, trivial overhead when the APIs are > > not > > used, and a clearer portability outlook. > > I guess my point is that if we want to develop stuff that requires > reliable addresses, we should build support for that from a low level > up. Not rely on a hack^Wlayer ontop of the actual dynamic shared memory > API. > That is, it should be a flag to dsm_create() that we require a fixed > address and dsm_attach() will then automatically use that or die > trying. Requiring implementations to take care about passing addresses > around and fiddling with mmap/windows api to make sure those mappings > are possible doesn't strike me to be a good idea. I agree. > In the end, you're going to be the primary/first user as far as I > understand things, so you'll have to argue whether we need fixed > addresses or not. I don't think it's a good idea to forgo this decision > on this layer and bolt on another ontop if we decide it's neccessary. We don't need fixed addresses. Parallel internal sort will probably include the equivalent of a SortTuple array in its shared memory segment, and that implies relative pointers to the tuples also stored in shared memory. I expect that wart to be fairly isolated within the code, so little harm done. I don't think we will have at all painted ourselves into a corner, should
Re: [HACKERS] Extension Templates S03E11
On Tue, Sep 3, 2013 at 4:20 AM, David Fetter wrote: > On Mon, Sep 02, 2013 at 02:32:16AM -0400, Peter Eisentraut wrote: >> On Thu, 2013-08-29 at 12:16 +0200, Dimitri Fontaine wrote: >> > Here's v14 of the patch with pg_upgrade support fixed, so that the >> > automated setup that Peter built is able to have at it! >> >> Fails cpluspluscheck: >> >> In file included from /tmp/cpluspluscheck.5g2uWw/test.cpp:3:0: >> ./src/include/commands/template.h:47:8: error: ‘ExtensionControl’ does not >> name a type >> ./src/include/commands/template.h:51:8: error: ‘ExtensionControl’ does not >> name a type >> >> I think this actually just means the header does not include all it needs by >> itself. > > Is there some standard set of checks you run on new patches, and are > the results showing up on, say, the buildfarm or some other CI > dashboard? I believe that Peter does all those checks using his own Jenkins environment: http://pgci.eisentraut.org/jenkins/ For the commit fest patches here you go: http://pgci.eisentraut.org/jenkins/view/All/job/postgresql_commitfest_world/ -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?
On Mon, Sep 2, 2013 at 7:10 PM, Jeff Davis wrote: > I'd like to submit a patch to just make it into a PGC_POSTMASTER and > remove the code to support changing it. Makes sense to me. I wonder, is anyone really running in production with full_page_writes off? I talked to someone a while ago who used Postgres on ZFS with data journaling, and was perfectly well aware of the fact that theoretically he could safely turn the setting off, and yet chose not to. Now, I know that Greg Smith's book describes the conditions in which it's acceptable and the precautions that should be taken and so on, but in my (admittedly relatively limited) experience, no one actually does it in production. My sample size for people who at least considered doing it (that both believed that Postgres could write pages atomically on their hardware/FS, and also knew that full_page_writes exists and what it means) is only 1. At least in people's minds, it might be that the knowledge that no one runs with full_page_writes off is enough to put them off. It's like building Postgres with a non-standard BLCKSZ -- even if you had solid evidence that it would help performance in your case, would you really want to do it? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [9.4] Make full_page_writes only settable on server start?
There is a significant amount of code supporting the changing of full_page_writes while the server is running, including an XLOG_FPW_CHANGE wal record. But I don't see the use-case; surely almost nobody changes this on a running server, because either your filesystem guarantees atomic page writes for you or not. I'd like to submit a patch to just make it into a PGC_POSTMASTER and remove the code to support changing it. Then, I intend to write another patch to make the full-page writes for checksums honor the full_page_writes setting. That will be easier to write once it's a PGC_POSTMASTER. Any reason to keep the setting as a PGC_SIGHUP? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory
On Mon, Sep 2, 2013 at 6:52 PM, Andres Freund wrote: > Not sure whether you mean non-windows EXEC_BACKEND isn't going to be > supported for much longer or that it already has problems. I'm not sure what Noah was getting at, but I have used EXEC_BACKEND twice now during development, in situations where I would have needed a Windows development otherwise. So it's definitely useful, at least to me. But on my MacBook Pro, you have to compile it with -fno-pie (I think that's the right flag) to disable ASLR in order to get reliable operation. I imagine such problems will become commonplace on more and more platforms as time wears on. > I guess my point is that if we want to develop stuff that requires > reliable addresses, we should build support for that from a low level > up. Not rely on a hack^Wlayer ontop of the actual dynamic shared memory > API. > That is, it should be a flag to dsm_create() that we require a fixed > address and dsm_attach() will then automatically use that or die > trying. Requiring implementations to take care about passing addresses > around and fiddling with mmap/windows api to make sure those mappings > are possible doesn't strike me to be a good idea. > > In the end, you're going to be the primary/first user as far as I > understand things, so you'll have to argue whether we need fixed > addresses or not. I don't think it's a good idea to forgo this decision > on this layer and bolt on another ontop if we decide it's neccessary. I didn't intend to punt that decision to another layer so much as another patch and a more detailed examination of requirements. IME, given a choice between something that is 99% reliable and provides more functionality, or something that is 99.99% reliable and provides less functionality, this community picks the latter every time. And that's why I've left out any capability to insist on a fixed address from this patch. It would be nice to have, to be sure. But it also would take more work and add more complexity, and I don't have a clear sense that that work would be justified. Now, we might get to a point where it seems clear that we're not going to get any further with parallelism without adding a capability for fixed-address mappings. If that happens, I think that's the time to come back to this layer and add that capability. But right now it doesn't seem essential. Now, having said that, I didn't see any particular reason to bury the ability to pass mmap() or shmat() a *preferred* address. But IJWH. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 RC1 psql encoding reporting inconsistently?
Tom Lane-2 wrote > Michael Nolan < > htfoot@ > > writes: >> This is 9.3 RC1 on a Fedora 7 system. Why does \l report the encoding >> as SQL_ASCII and \set report it as UTF8? > > psql sets client_encoding based on its environment (LANG or related > variables). That's been true for some time --- since 9.1, according > to a quick check. > > regards, tom lane My knowledge of encoding is minimal but to expand on the comment: Client and server (or, more specifically, database) encodings can and often do differ just as you are seeing here. I'm guessing that somewhere deep inside psql and/or postgres encoding conversion is performed if the client and server do not match. While I guess it is possible to try and auto-adapt the client encoding to match the server/database the current policy is to require the user to explicitly (so to speak) declare the encoding they are using on their client. I guess a counter-question would be: what would you expect "\set" to report and why? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/9-3-RC1-psql-encoding-reporting-inconsistently-tp5769334p5769339.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Fri, Aug 30, 2013 at 04:24:06PM -0400, Stephen Frost wrote: > > The scenario I'm worried about is where somebody says "hey, Postgres has > > RLS now, I can rely on that to hide my sooper sekrit data from other users > > in the same database", and later they have a security breach through some > > covert-channel attack. Are they going to blame themselves? No, they're > > gonna blame Postgres. Or consider the case where some bozo publishes > > a method for such an attack and uses it to badmouth us as insecure. > > In my experience, issues are discovered, patched, and released as > security updates. Does adding RLS mean we might have more of those? > Sure, as did column level privileges and as does *any* such increase in > the granularity of what we can provide security-wise. Releasing a feature we think is perfect, and finding out later is isn't, and fixing it, is not the same as releasing a feature we _know_ isn't perfect, and having no idea how to fix it. >From later discussions, it seems like we have to accept that we will never be able to avoid all convert channels, e.g. timing queries, but we can avoid the most obvious ones, e.g. EXPLAIN, and improve it as we go. Knowing there is no end to improving it does make some people feel uncomfortable, and I can't think of another feature we have added with such an open-ended nature. We do have open-ended performance features, but here is seems the value of the feature itself, security, is not attainable. Perhaps reasonable security is attainable. I am not saying we should reject this feature --- just that the calculus of how we decide to add this feature doesn't fit our normal analysis. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME
Jeff Davis-8 wrote > Is there any semantic difference between marking a constraint as > DISABLED and simply dropping it? Or does it just make it easier to > re-add it later? I cannot answer the question but if there is none then the main concern I'd have is capturing "meta-information" about WHY such a constraint has been disabled instead of dropped. I guess this whole feature extends from the trigger disable feature that already exists. Given we have the one adding this seems symmetrical... I cannot really see using either feature on a production system (if following best practices) but I can imagine where they could both be helpful during development. Note with this usage pattern the meta-information about "why" becomes considerably less important. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/ENABLE-DISABLE-CONSTRAINT-NAME-tp5769136p5769337.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.4] row level security
On Sun, Sep 1, 2013 at 11:05:58AM -0700, Josh Berkus wrote: > > Security community also concludes it is not avoidable nature as long > > as human can observe system behavior and estimate something, thus, > > security evaluation criteria does not require eliminate covert-channels > > or does not pay attention about covert-channels for the products that > > is installed on the environment with basic robustness (that means, > > non-military, regular enterprise usage). > > To be completely blunt, the security community does not understand > databases. At all. I'd think if anything had become clear through the > course of work on SEPosgres, it would be that. Agreed. The security community realizes these covert channels exist, but doesn't really have any recommendations on how to avoid them. You could argue that avoiding them is too tied to specific database implementations, but there are general channels, like insert with a unique key, that should at least have well-defined solutions. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 RC1 psql encoding reporting inconsistently?
Michael Nolan writes: > This is 9.3 RC1 on a Fedora 7 system. Why does \l report the encoding > as SQL_ASCII and \set report it as UTF8? psql sets client_encoding based on its environment (LANG or related variables). That's been true for some time --- since 9.1, according to a quick check. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.3 RC1 psql encoding reporting inconsistently?
This is 9.3 RC1 on a Fedora 7 system. Why does \l report the encoding as SQL_ASCII and \set report it as UTF8? psql (9.3rc1) Type "help" for help. postgres=# \l List of databases Name Owner Encoding Collate Ctype Access privileges - - --- - - postgres postgres SQL_ASCII C C template0 postgres SQL_ASCII C C =c/postgres + postgres=CTc/postgres template1 postgres SQL_ASCII C C =c/postgres + postgres=CTc/postgres (3 rows) postgres=# \set AUTOCOMMIT = 'on' ECHO = 'queries' PROMPT1 = '%/%R%# ' PROMPT2 = '%/%R%# ' PROMPT3 = '>> ' VERBOSITY = 'default' VERSION = 'PostgreSQL 9.3rc1 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27), 32-bit' DBNAME = 'postgres' USER = 'postgres' PORT = '5432' ENCODING = 'UTF8' postgres=# -- Mike Nolan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory
Hi Noah! On 2013-09-01 12:07:04 -0400, Noah Misch wrote: > On Sun, Sep 01, 2013 at 05:08:38PM +0200, Andres Freund wrote: > > On 2013-09-01 09:24:00 -0400, Noah Misch wrote: > > > The difficulty depends on whether processes other than the segment's > > > creator > > > will attach anytime or only as they start. Attachment at startup is > > > enough > > > for parallel query, but it's not enough for something like lock table > > > expansion. I'll focus on the attach-anytime case since it's more general. > > > > Even on startup it might get more complicated than one immediately > > imagines on EXEC_BACKEND type platforms because their memory layout > > doesn't need to be the same. The more shared memory you need, the harder > > that will be. Afair > > Non-Windows EXEC_BACKEND is already facing a dead end that way. Not sure whether you mean non-windows EXEC_BACKEND isn't going to be supported for much longer or that it already has problems. > > > On a system supporting MAP_FIXED, implement this by having the postmaster > > > reserve address space under a PROT_NONE mapping, then carving out from > > > that > > > mapping for each fixed-address dynamic segment. The size of the > > > reservation > > > would be controlled by a GUC; one might set it to several times > > > anticipated > > > peak usage. (The overhead of doing that depends on the kernel.) Windows > > > permits the same technique with its own primitives. > > > > Note that allocating a large mapping, even without using it, has > > noticeable cost, at least under linux. The kernel has to create & copy > > data to track each pages state (without copying the memory content's > > itself due to COW) for every fork afterwards. If you don't believe me, > > check the whole discussion about go's (the language) memory > > management... > > I believe you, but I'd appreciate a link to the discussion you have in mind. Unfortunately I could only find the first half of the discussion about the issue. Turns out it's not the greatest idea to name your fancy new programming language "go" (yesyes, petpeeve of mine). http://lkml.org/lkml/2011/2/8/118 https://lwn.net/Articles/428100/ So, after reading up on the issue a bit more and reading some more kernel code, a large mmap(PROT_NONE, MAP_PRIVATE) won't cause much problems except counting in ulimit -v. It will *not* cause overcommit violations. mmap(PROT_NONE, MAP_SHARED) will tho, even if not yet faulted. Which means that to be reliable and not violate overcommit we'd need to munmap() a chunk of PROT_NONE, MAP_PRIVATE memory, and immediately (without interceding mallocs, using mmap itself) map it again. It only gets really expensive in the sense of making fork expensive if you set protections on many regions in that mapping individually. Each mprotect() call will split the VMA into distinct pieces and they won't get merged even if there are neighboors with the same settings. > > > I don't foresee fundamental differences on 32-bit. All the allocation > > > maximums scale down, but that's the usual story for 32-bit. > > > > If you actually want to allocate memory after starting up, without > > carving a section out for that from the beginning, the memory > > fragmentation will make it very hard to find memory addresses of the > > same across processes. > > True. I wouldn't feel bad if total dynamic shared memory usage above, say, > 256 MiB were unreliable on 32-bit. If you're still running 32-bit in 2015, > you probably have a low-memory platform. Not sure. I think that will partially depend on whether x32 will have any success which I still find hard to judge. > I think the take-away is that we have a lot of knobs available, not a bright > line between possible and impossible. Robert opted to omit provision for > reliable fixed addresses, and the upsides of that decision are the absence of > a DBA-unfriendly space-reservation GUC, trivial overhead when the APIs are not > used, and a clearer portability outlook. I guess my point is that if we want to develop stuff that requires reliable addresses, we should build support for that from a low level up. Not rely on a hack^Wlayer ontop of the actual dynamic shared memory API. That is, it should be a flag to dsm_create() that we require a fixed address and dsm_attach() will then automatically use that or die trying. Requiring implementations to take care about passing addresses around and fiddling with mmap/windows api to make sure those mappings are possible doesn't strike me to be a good idea. In the end, you're going to be the primary/first user as far as I understand things, so you'll have to argue whether we need fixed addresses or not. I don't think it's a good idea to forgo this decision on this layer and bolt on another ontop if we decide it's neccessary. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailin
Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME
On Fri, 2013-08-30 at 09:57 +0800, wangs...@highgo.com.cn wrote: > Hi hackers, > >In order to achieve enable/disable constraint name,I made a few > modifications to the code. > >First, someone used to build the constraints while building > table. Then inserting data must follow a certain order. >And people usually like to insert the data but not affected by > foreign keys or check. Is there any semantic difference between marking a constraint as DISABLED and simply dropping it? Or does it just make it easier to re-add it later? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On Mon, Sep 2, 2013 at 6:25 AM, Craig Ringer wrote: > It'll be yet another way for people to get upsert wrong, of course. > They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects > w/o locking the table against writes first. Documenting this pitfall > should be enough, though. My preferred solution is to actually provide a variant to lock the rows implicated in the would-be unique constraint violation. Obviously that's a harder problem to solve. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] currval and DISCARD ALL
On 19-08-2013 16:10, Boszormenyi Zoltan wrote: I am reviewing your patch. Thanks... * Is the patch in a patch format which has context? (eg: context diff format) Yes. * Does it apply cleanly to the current git master? Almost. No rejects, no fuzz, only offset for some files. * Does it include reasonable tests, necessary doc patches, etc? Documentation, yes. Tests, no. * Does the patch actually implement what it's supposed to do? Yes. * Do we want that? Yes. * Do we already have it? No. * Does it follow SQL spec, or the community-agreed behavior? The SQL standard doesn't have DISCARD. Otherwise the behaviour is obvious. * Does it include pg_dump support (if applicable)? n/a * Are there dangers? It changes applications' assumptions slightly but takes the behaviour closer to the wording of the documentation. * Have all the bases been covered? Yes. * Does the feature work as advertised? Yes. * Are there corner cases the author has failed to consider? No. * Are there any assertion failures or crashes? No. * Does the patch slow down simple tests? No. * If it claims to improve performance, does it? n/a * Does it slow down other things? No. * Does it follow the project coding guidelines? Yes. Maybe a little stylistic comment: +void +ReleaseSequenceCaches() +{ + SeqTableData *ptr = seqtab; + SeqTableData *tmp = NULL; + + while (ptr != NULL) + { + tmp = ptr; + ptr = ptr->next; + free(tmp); + } + + seqtab = NULL; +} I would rename the variables to "seq" and "next" from "ptr" and "tmp", respectively, to make them even more obvious. This looks a little better: +void +ReleaseSequenceCaches() +{ + SeqTableData *seq = seqtab; + SeqTableData *next; + + while (seq) + { + next = seq->next; + free(seq); + seq = next; + } + + seqtab = NULL; +} Done! * Are there portability issues? No. * Will it work on Windows/BSD etc? It should. There are no extra system calls. * Are the comments sufficient and accurate? The feature needs very little code which is downright obvious. There are no comments but I don't think the code needs it. * Does it do what it says, correctly? Yes. I lied. There is one little problem. There is no command tag reported for DISCARD SEQUENCES: zozo=# create sequence s1; CREATE SEQUENCE zozo=# select nextval('s1'); nextval - 1 (1 row) zozo=# select currval('s1'); currval - 1 (1 row) zozo=# discard all; DISCARD ALL zozo=# discard sequences; ??? zozo=# select currval('s1'); ERROR: currval of sequence "s1" is not yet defined in this session Fixed! * Does it produce compiler warnings? Only one: src/backend/commands/sequence.c should have #include because of this: sequence.c:1608:1: warning: no previous prototype for ‘ReleaseSequenceCaches’ [-Wmissing-prototypes] ReleaseSequenceCaches() ^ Fixed! * Can you make it crash? No. * Is everything done in a way that fits together coherently with other features/modules? Yes. * Are there interdependencies that can cause problems? I don't think so. The attached patch fix the items reviewed by you. Regards, -- Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento diff --git a/doc/src/sgml/ref/discard.sgml b/doc/src/sgml/ref/discard.sgml index 65ebbae..abd3e28 100644 --- a/doc/src/sgml/ref/discard.sgml +++ b/doc/src/sgml/ref/discard.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -DISCARD { ALL | PLANS | TEMPORARY | TEMP } +DISCARD { ALL | PLANS | SEQUENCES | TEMPORARY | TEMP } @@ -67,6 +67,15 @@ DISCARD { ALL | PLANS | TEMPORARY | TEMP } +SEQUENCES + + + Releases all internally cached sequences. + + + + + ALL @@ -83,6 +92,7 @@ UNLISTEN *; SELECT pg_advisory_unlock_all(); DISCARD PLANS; DISCARD TEMP; +DISCARD SEQUENCES; diff --git a/src/backend/commands/discard.c b/src/backend/commands/discard.c index 76f3ab6..f4e7e06 100644 --- a/src/backend/commands/discard.c +++ b/src/backend/commands/discard.c @@ -18,13 +18,14 @@ #include "commands/async.h" #include "commands/discard.h" #include "commands/prepare.h" +#include "commands/sequence.h" #include "utils/guc.h" #include "utils/portal.h" static void DiscardAll(bool isTopLevel); /* - * DISCARD { ALL | TEMP | PLANS } + * DISCARD { ALL | SEQUENCES | TEMP | PLANS } */ void DiscardCommand(DiscardStmt *stmt, bool isTopLevel) @@ -39,6 +40,10 @@ DiscardCommand(DiscardStmt *stmt, bool isTopLevel) ResetPlanCache(); break; + case DISCARD_SEQUENCES: + ReleaseSequenceCaches(); + break; + case DISCARD_TEMP: ResetTempTableNamespace(); break; @@ -69,4 +74,5 @@ DiscardAll(bool isTopLevel) LockReleaseAll(USER_LOCKMETHOD, true); ResetPlanCache();
Re: [HACKERS] Extension Templates S03E11
On Mon, Sep 02, 2013 at 02:32:16AM -0400, Peter Eisentraut wrote: > On Thu, 2013-08-29 at 12:16 +0200, Dimitri Fontaine wrote: > > Here's v14 of the patch with pg_upgrade support fixed, so that the > > automated setup that Peter built is able to have at it! > > Fails cpluspluscheck: > > In file included from /tmp/cpluspluscheck.5g2uWw/test.cpp:3:0: > ./src/include/commands/template.h:47:8: error: ‘ExtensionControl’ does not > name a type > ./src/include/commands/template.h:51:8: error: ‘ExtensionControl’ does not > name a type > > I think this actually just means the header does not include all it needs by > itself. Is there some standard set of checks you run on new patches, and are the results showing up on, say, the buildfarm or some other CI dashboard? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On Fri, 2013-08-30 at 20:34 +0200, Andres Freund wrote: > I have a quick question: The reason I'd asked about the status of the > patch was that I was thinking about the state of the "forensic freezing" > patch. After a quick look at your proposal, we still need to freeze in > some situations (old & new data on the same page basically), so I'd say > it still makes sense to apply the forensic freezing patch, right? > > Differing Opinions? The Freeze Forensically patch is nice because (if I understand it correctly) it allows us to freeze at the same time as we mark PD_ALL_VISIBLE, which avoids the potential extra page write. But that's not such a big advantage if we don't ordinarily have to write again for freezing, anyway. However, there are still some cases where we'd be able to preserve the forensic information. If nothing else, that might help debug this patch, if necessary. There might also be cases where we can freeze more eagerly to avoid the case where very old (but unfrozen) and very new tuples mix on the same page. Perhaps Robert has some thoughts here, as well. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Next CFM?
On Mon, Sep 02, 2013 at 12:00:02PM -0500, Josh Berkus wrote: > Hackers, > > We need a Commit Fest manager for the September CF. I'm not going > to do it; this month is a heavy travel month for me (3 conferences > and a wedding). > > For help, here's the Commitfest Checklist Mike and I assembled: > > https://wiki.postgresql.org/wiki/CommitFest_Checklist > > Mind you, Peter E. seems to be getting patches organized ... are you > CFM for this one, Peter? If Peter won't, I will. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max freeze age query in docs
On 09/02/2013 02:26 PM, Andres Freund wrote: On 2013-09-02 14:20:57 -0400, Andrew Dunstan wrote: On 09/02/2013 01:30 PM, Tom Lane wrote: Andrew Dunstan writes: Yes, possibly, but we can't do that now, but I would like to fix the docs now. If you want this in 9.3.0 it needs to be committed in the next couple of hours. FWIW, the idea seemed generally sane to me, but I'd suggest not depending on reltoastrelid being zero when and only when there's no match. Why not test whether t.oid IS NULL, instead? Or actually, code it like this GREATEST(age(c.relfrozenxid), age(t.relfrozenxid)) and be done, as well as not having an ugly direct use of int4larger. OK, I'll do it that way. Working on it now. I'd vote for c.relkind != 't' AND NOT c.relfrozenxid = 0; instead of relkind = 'r' for the main relation, that way you'd include materialized views and stuff. See what was just committed - the matview case is included for 9.3+ (as it was in fact in the original - I must have been looking at older docs when saw it wasn't there.) I'll be back in an hour or so if any final tweeks are needed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max freeze age query in docs
On 2013-09-02 14:20:57 -0400, Andrew Dunstan wrote: > > On 09/02/2013 01:30 PM, Tom Lane wrote: > >Andrew Dunstan writes: > >>Yes, possibly, but we can't do that now, but I would like to fix the > >>docs now. > >If you want this in 9.3.0 it needs to be committed in the next couple of > >hours. > > > >FWIW, the idea seemed generally sane to me, but I'd suggest not depending > >on reltoastrelid being zero when and only when there's no match. > >Why not test whether t.oid IS NULL, instead? > > > >Or actually, code it like this > > > >GREATEST(age(c.relfrozenxid), age(t.relfrozenxid)) > > > >and be done, as well as not having an ugly direct use of int4larger. > > > > > > > OK, I'll do it that way. Working on it now. I'd vote for c.relkind != 't' AND NOT c.relfrozenxid = 0; instead of relkind = 'r' for the main relation, that way you'd include materialized views and stuff. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max freeze age query in docs
On 09/02/2013 01:30 PM, Tom Lane wrote: Andrew Dunstan writes: Yes, possibly, but we can't do that now, but I would like to fix the docs now. If you want this in 9.3.0 it needs to be committed in the next couple of hours. FWIW, the idea seemed generally sane to me, but I'd suggest not depending on reltoastrelid being zero when and only when there's no match. Why not test whether t.oid IS NULL, instead? Or actually, code it like this GREATEST(age(c.relfrozenxid), age(t.relfrozenxid)) and be done, as well as not having an ugly direct use of int4larger. OK, I'll do it that way. Working on it now. cheers andrew cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max freeze age query in docs
Andrew Dunstan writes: > Yes, possibly, but we can't do that now, but I would like to fix the > docs now. If you want this in 9.3.0 it needs to be committed in the next couple of hours. FWIW, the idea seemed generally sane to me, but I'd suggest not depending on reltoastrelid being zero when and only when there's no match. Why not test whether t.oid IS NULL, instead? Or actually, code it like this GREATEST(age(c.relfrozenxid), age(t.relfrozenxid)) and be done, as well as not having an ugly direct use of int4larger. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max freeze age query in docs
On 09/01/2013 10:33 PM, Josh Berkus wrote: Maybe for bonus points we'd print out the schema (e.g. by selectting c.oid::regclass instead of c.relname), and also include materialized views which are omitted from the query altogether. Given the importance of this, maybe we need to have it as part of pg_stat_user_tables? Yes, possibly, but we can't do that now, but I would like to fix the docs now. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Next CFM?
Hackers, We need a Commit Fest manager for the September CF. I'm not going to do it; this month is a heavy travel month for me (3 conferences and a wedding). For help, here's the Commitfest Checklist Mike and I assembled: https://wiki.postgresql.org/wiki/CommitFest_Checklist Mind you, Peter E. seems to be getting patches organized ... are you CFM for this one, Peter? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE
On 08/31/2013 06:40 AM, Josh Berkus wrote: >> > 3) RETURNING is expanded - "RETURNING REJECTS *" is now possible where >> > that makes sense. > Oh, nifty! OK, now I can *really* use this feature. Absolutely; especially combined with COPY to a staging TEMPORARY or UNLOGGED table. It'll be yet another way for people to get upsert wrong, of course. They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects w/o locking the table against writes first. Documenting this pitfall should be enough, though. Speaking of upsert, I'm starting to think that to solve the upsert problem without forcing full-table locking on people we need a lock type that conflicts only with DELETE/INSERT and a way to prevent UPDATEs from changing the key. Kind of like the table level inverse of FOR KEY UPDATE - a way to say "you can change rows, just cannot add or remove from the set of keys". -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Peter Eisentraut writes: > Fails cpluspluscheck: Turns out I'm discovering that particular check, thanks! I could reproduce and fix the error locally after being led to the command ./src/tools/pginclude/cpluspluscheck. So please find v15 of the patch attached to this email, that passes all previously done checks and this one too now. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support templates.v15.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql and pset without any arguments
Patch added to current open commitfest under the Client section with title: "Call \pset without any arguments displays current status of all printing options" Status: Need review. Let me know if it should not be there. Regards, Le 29/06/2013 01:08, Gilles Darold a écrit : > Hi, > > I was looking at psql 8.3 documention about \pset options and saw that > there was the following note : > > "Note: It is an error to call \pset without any arguments. In the > future this case might show the current status of all printing options." > > I looked backward and forward to find that this note is present in all > versions since 7.1 up to 9.3, maybe it is time to add this little feature. > > I've attached a patch to add the usage of the \pset command without any > arguments to displays current status of all printing options instead of > the error message. Here is a sample output: > > (postgres@[local]:5494) [postgres] > \pset > Output format is aligned. > Border style is 2. > Expanded display is used automatically. > Null display is "NULL". > Field separator is "|". > Tuples only is off. > Title is unset. > Table attributes unset. > Line style is unicode. > Pager is used for long output. > Record separator is . > (postgres@[local]:5494) [postgres] > > > To avoid redundant code I've added a new method printPsetInfo() so that > do_pset() and exec_command() will used the same output message, they are > all in src/bin/psql/command.c. For example: > > (postgres@[local]:5494) [postgres] > \pset null 'NULL' > Null display is "NULL". > (postgres@[local]:5494) [postgres] > > > The patch print all variables information from struct printTableOpt when > \pset is given without any arguments and also update documentation. > > Let me know if there's any additional work to do on this basic patch or > something that I've omitted. > > Best regards, > -- Gilles Darold http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Further XLogInsert scaling tweaking
Now that I've had a little break from the big XLogInsert scaling patch, I went back to do some testing and profiling of it. I saw a lot of contention on the first access of RedoRecPtr and force/fullPageWrites, which made me realize that I put those variables right next to the heavily-contended insertpos_lck spinlock and the variables that it protects. Needless to say, that's a recipe for contention. I added some padding between those two, per attached patch, and got a big improvement. So we should definitely do that. I just added a char[128] field between them, which is enough to put them on different cache lines on the server I'm testing on. I don't think there is any platform-independent way to get the current cache line size, unfortunately. Since this doesn't affect correctness, I'm inclined to just add the 128-byte padding field there. Attached is a graph generated by pgbench-tools. Full results are available here: http://hlinnaka.iki.fi/xloginsert-scaling/padding/. The test query used was: insert into foo:client_id select generate_series(1, 100); That is, each client inserts a lot of rows into a different table. The table has no indexes. This is pretty much the worst-case scenario for stressing XLogInsert. The "master-b03d196" is unpatched git master, "xlog-padding-fb741c0" is with the padding. The -16 plots are the same, but with xloginsert_slots=16 (the default is 8). The server has 16 cores, 32 with hyperthreading. It's interesting that although the peak performance is better with 8 slots than with 16, it doesn't scale as gracefully with 16 slots. I believe that's because with more insertion slots, you have more backends fighting over the insertpos_lck. With fewer slots, the extra work is queued behind the insertion slots instead, and sleeping is better than spinning. In either case, the performance with the padding is better than without. - Heikki <>diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 39c58d0..28e62ea 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -428,8 +428,14 @@ typedef struct XLogCtlInsert uint64 CurrBytePos; uint64 PrevBytePos; - /* insertion slots, see above for details */ - XLogInsertSlotPadded *insertSlots; + /* + * Make sure the above heavily-contended spinlock and byte positions are + * on their own cache line. In particular, the RedoRecPtr and full page + * write variables below should be on a different cache line. They are + * read on every WAL insertion, but updated rarely, and we don't want + * those reads to steal the cache line containing Curr/PrevBytePos. + */ + char pad[128]; /* * fullPageWrites is the master copy used by all backends to determine @@ -455,6 +461,9 @@ typedef struct XLogCtlInsert bool exclusiveBackup; int nonExclusiveBackups; XLogRecPtr lastBackupStart; + + /* insertion slots, see XLogInsertSlot struct above for details */ + XLogInsertSlotPadded *insertSlots; } XLogCtlInsert; /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers