Re: [HACKERS] logical decoding of two-phase transactions
>> >> Yes, that’s also possible but seems to be less flexible restricting us to >> some >> specific GID format. >> >> Anyway, I can measure WAL space overhead introduced by the GID’s inside >> commit records >> to know exactly what will be the cost of such approach. > > Stas, > > Have you had a chance to look at this further? Generally i’m okay with Simon’s approach and will send send updated patch. Anyway want to perform some test to estimate how much disk space is actually wasted by extra WAL records. > I think the approach of storing just the xid and fetching the GID > during logical decoding of the PREPARE TRANSACTION is probably the > best way forward, per my prior mail. I don’t think that’s possible in this way. If we will not put GID in commit record, than by the time when logical decoding will happened transaction will be already committed/aborted and there will be no easy way to get that GID. I thought about several possibilities: * Tracking xid/gid map in memory also doesn’t help much — if server reboots between prepare and commit we’ll lose that mapping. * We can provide some hooks on prepared tx recovery during startup, but that approach also fails if reboot happened between commit and decoding of that commit. * Logical messages are WAL-logged, but they don’t have any redo function so don’t helps much. So to support user-accessible 2PC over replication based on 2PC decoding we should invent something more nasty like writing them into a table. > That should eliminate Simon's > objection re the cost of tracking GIDs and still let us have access to > them when we want them, which is the best of both worlds really. Having 2PC decoding in core is a good thing anyway even without GID tracking =) -- Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Thu, Jan 26, 2017 at 4:09 PM, Nikhil Sontakke wrote: >>I look at this patch from you and that's present for me: >>https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com > >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record) >> (errmsg("unexpected timeline ID %u (should be %u) >> in checkpoint record", >> checkPoint.ThisTimeLineID, ThisTimeLineID))); >> >> +KnownPreparedRecreateFiles(checkPoint.redo); >> RecoveryRestartPoint(&checkPoint); >> } > > Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a > function by this name. And now I see, the name is CheckPointTwoPhase() > :-) My mistake then :D >> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC >> files are not flushed to disk with this patch. This is a problem as a >> new restart point is created... Having the flush in CheckpointTwoPhase >> really makes the most sense. > > Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby > promote" code path. CreateRestartPoint() calls it via CheckPointGuts() while in recovery. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Radix tree for character conversion
On Wed, Jan 25, 2017 at 7:18 PM, Ishii Ayumi wrote: > I patched 4 patchset and run "make", but I got failed. > Is this a bug or my mistake ? > I'm sorry if I'm wrong. > > [$(TOP)]$ patch -p1 < ../0001-Add-missing-semicolon.patch > [$(TOP)]$ patch -p1 < ../0002-Correct-reference-resolution-syntax.patch > [$(TOP)]$ patch -p1 < > ../0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch > [$(TOP)]$ patch -p1 < ../0004-Use-radix-tree-for-character-conversion.patch > [$(TOP)]$ ./configure > [Unicode]$ make > '/usr/bin/perl' UCS_to_most.pl > Type of arg 1 to keys must be hash (not hash element) at convutils.pm > line 443, near "}) > " > Type of arg 1 to values must be hash (not hash element) at > convutils.pm line 596, near "}) > " > Type of arg 1 to each must be hash (not private variable) at > convutils.pm line 755, near "$map) > " > Compilation failed in require at UCS_to_most.pl line 19. > make: *** [iso8859_2_to_utf8.map] Error 255 Hm, I am not sure what you are missing. I was able to get things to build. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Radix tree for character conversion
On Tue, Jan 10, 2017 at 8:22 PM, Kyotaro HORIGUCHI wrote: > [...patch...] Nobody has showed up yet to review this patch, so I am giving it a shot. The patch file sizes are scary at first sight, but after having a look: 36 files changed, 1411 insertions(+), 54398 deletions(-) Yes that's a surprise, something like git diff --irreversible-delete would have helped as most of the diffs are just caused by 3 files being deleted in patch 0004, making 50k lines going to the abyss of deletion. > Hello, I found a bug in my portion while rebasing. Right, that's 0001. Nice catch. > The attached files are the following. This patchset is not > complete missing changes of map files. The change is tremendously > large but generatable. > > 0001-Add-missing-semicolon.patch > > UCS_to_EUC_JP.pl has a line missing teminating semicolon. This > doesn't harm but surely a syntax error. This patch fixes it. > This might should be a separate patch. This requires a back-patch. This makes me wonder how long this script has actually not run... > 0002-Correct-reference-resolution-syntax.patch > > convutils.pm has lines with different syntax of reference > resolution. This unifies the syntax. Yes that looks right to me. I am the best perl guru on this list but looking around $$var{foo} is bad, ${$var}{foo} is better, and $var->{foo} is even better. This also generates no diffs when running make in src/backend/utils/mb/Unicode/. So no objections to that. > 0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch > > Before adding radix tree stuff, applied pgperltidy and inserted > format-skipping pragma for the parts where perltidy seems to do > too much. Which version of perltidy did you use? Looking at the archives, the perl code is cleaned up with a specific version, v20090616. See https://www.postgresql.org/message-id/20151204054322.ga2070...@tornado.leadboat.com for example on the matter. As perltidy changes over time, this may be a sensitive change if done this way. > 0004-Use-radix-tree-for-character-conversion.patch > > Radix tree body. Well, here a lot of diffs could have been saved. > The unattached fifth patch is generated by the following steps. > > [$(TOP)]$ ./configure > [Unicode]$ make > [Unicode]$ make distclean > [Unicode]$ git add . > [Unicode]$ commit > === COMMITE MESSSAGE > Replace map files with radix tree files. > > These encodings no longer uses the former map files and uses new radix > tree files. > === OK, I can see that working, with 200k of maps generated.. So going through the important bits of this jungle.. +/* + * radix tree conversion function - this should be identical to the function in + * ../conv.c with the same name + */ +static inline uint32 +pg_mb_radix_conv(const pg_mb_radix_tree *rt, +int l, +unsigned char b1, +unsigned char b2, +unsigned char b3, +unsigned char b4) This is not nice. Having a duplication like that is a recipe to forget about it as this patch introduces a dependency with conv.c and the radix tree generation. Having a .gitignore in Unicode/ would be nice, particularly to avoid committing map_checker. A README documenting things may be welcome, or at least comments at the top of map_checker.c. Why is map_checker essential? What does it do? There is no way to understand that easily, except that it includes a "radix tree conversion function", and that it performs sanity checks on the radix trees to be sure that they are on a good shape. But as this something that one would guess only after looking at your patch and the code (at least I will sleep less stupid tonight after reading this stuff). --- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl +++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl # Drop these SJIS codes from the source for UTF8=>SJIS conversion #<<< do not let perltidy touch this -my @reject_sjis =( +my @reject_sjis = ( 0xed40..0xeefc, 0x8754..0x875d, 0x878a, 0x8782, - 0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797, + 0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797, 0x879a..0x879c -); + ); This is not generated, it would be nice to drop the noise from the patch. Here is another one: - $i->{code} = $jis | ( - $jis < 0x100 - ? 0x8e00 - : ($sjis >= 0xeffd ? 0x8f8080 : 0x8080)); - +#<<< do not let perltidy touch this + $i->{code} = $jis | ($jis < 0x100 ? 0x8e00: +($sjis >= 0xeffd ? 0x8f8080 : 0x8080)); +#>>> if (l == 2) { - iutf = *utf++ << 8; - iutf |= *utf++; + b3 = *utf++; + b4 = *utf++; } Ah, OK. This conversion is important so as it performs a minimum of bitwise operations. Yes let's keep that. That's pretty cool to get a faster operation. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpr
Re: [HACKERS] Speedup twophase transactions
>I look at this patch from you and that's present for me: >https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record) > (errmsg("unexpected timeline ID %u (should be %u) > in checkpoint record", > checkPoint.ThisTimeLineID, ThisTimeLineID))); > > +KnownPreparedRecreateFiles(checkPoint.redo); > RecoveryRestartPoint(&checkPoint); > } Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a function by this name. And now I see, the name is CheckPointTwoPhase() :-) > And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC > files are not flushed to disk with this patch. This is a problem as a > new restart point is created... Having the flush in CheckpointTwoPhase > really makes the most sense. Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby promote" code path. Regards, Nikhils -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Thu, Jan 26, 2017 at 1:38 PM, Nikhil Sontakke wrote: >> We should really try to do things right now, or we'll never come back >> to it. 9.3 (if my memory does not fail me?) has reduced the time to do >> promotion by removing the need of the end-of-recovery checkpoint, >> while I agree that there should not be that many 2PC transactions at >> this point, if there are for a reason or another, the time it takes to >> complete promotion would be impacted. So let's refactor >> PrescanPreparedTransactions() so as it is able to handle 2PC data from >> a buffer extracted by XlogReadTwoPhaseData(), and we are good to go. > > Not quite. If we modify PrescanPreparedTransactions(), we also need to > make RecoverPreparedTransactions() and > StandbyRecoverPreparedTransactions() handle 2PC data via > XlogReadTwoPhaseData(). Ah, right for both, even for RecoverPreparedTransactions() that happens at the end of recovery. Thanks for noticing. The patch mentions that as well: + ** At the end of recovery we move all known prepared transactions to disk. + * This allows RecoverPreparedTransactions() and + * StandbyRecoverPreparedTransactions() to do their work. I need some strong coffee.. >> + KnownPreparedRecreateFiles(checkPoint.redo); >> RecoveryRestartPoint(&checkPoint); >> Looking again at this code, I think that this is incorrect. The >> checkpointer should be in charge of doing this work and not the >> startup process, so this should go into CheckpointTwoPhase() instead. > > I don't see a function by the above name in the code? I look at this patch from you and that's present for me: https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq-Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com If I look as well at the last version of Stas it is here: https://www.postgresql.org/message-id/becc988a-db74-48d5-b5d5-a54551a62...@postgrespro.ru As this change: --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record) (errmsg("unexpected timeline ID %u (should be %u) in checkpoint record", checkPoint.ThisTimeLineID, ThisTimeLineID))); +KnownPreparedRecreateFiles(checkPoint.redo); RecoveryRestartPoint(&checkPoint); } And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC files are not flushed to disk with this patch. This is a problem as a new restart point is created... Having the flush in CheckpointTwoPhase really makes the most sense. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequence data type
On Wed, Jan 25, 2017 at 11:53 PM, Peter Eisentraut wrote: > Here is an updated patch that allows changing the sequence type. This > was clearly a concern for reviewers, and the presented use cases seemed > convincing. I have been torturing this patch and it looks rather solid to me. Here are a couple of comments: @@ -15984,6 +15992,9 @@ dumpSequence(Archive *fout, TableInfo *tbinfo) "CREATE SEQUENCE %s\n", fmtId(tbinfo->dobj.name)); + if (strcmp(seqtype, "bigint") != 0) + appendPQExpBuffer(query, "AS %s\n", seqtype); Wouldn't it be better to assign that unconditionally? There is no reason that a dump taken from pg_dump version X will work on X - 1 (as there is no reason to not make the life of users uselessly difficult as that's your point), but that seems better to me than rely on the sequence type hardcoded in all the pre-10 dump queries for sequences. That would bring also more consistency to the CREATE SEQUENCE queries of test_pg_dump/t/001_base.pl. Could you increase the regression test coverage to cover some of the new code paths? For example such cases are not tested: =# create sequence toto as smallint; CREATE SEQUENCE =# alter sequence toto as smallint maxvalue 1000; ERROR: 22023: RESTART value (2147483646) cannot be greater than MAXVALUE (1000) LOCATION: init_params, sequence.c:1537 =# select setval('toto', 1); setval 1 (1 row) =# alter sequence toto as smallint; ERROR: 22023: MAXVALUE (2147483647) is too large for sequence data type smallint LOCATION: init_params, sequence.c:1407 + if ((seqform->seqtypid == INT2OID && seqform->seqmin < PG_INT16_MIN) + || (seqform->seqtypid == INT4OID && seqform->seqmin < PG_INT32_MIN) + || (seqform->seqtypid == INT8OID && seqform->seqmin < PG_INT64_MIN)) + { + charbufm[100]; + + snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmin); + + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("MINVALUE (%s) is too large for sequence data type %s", + bufm, format_type_be(seqform->seqtypid; + } "large" does not apply to values lower than the minimum, no? The int64 path is never going to be reached (same for the max value), it doesn't hurt to code it I agree. Testing serial columns, the changes are consistent with the previous releases. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
>> The question remains whether saving off a few fsyncs/reads for these >> long-lived prepared transactions is worth the additional code churn. >> Even if we add code to go through the KnownPreparedList, we still will >> have to go through the other on-disk 2PC transactions anyways. So, >> maybe not. > > We should really try to do things right now, or we'll never come back > to it. 9.3 (if my memory does not fail me?) has reduced the time to do > promotion by removing the need of the end-of-recovery checkpoint, > while I agree that there should not be that many 2PC transactions at > this point, if there are for a reason or another, the time it takes to > complete promotion would be impacted. So let's refactor > PrescanPreparedTransactions() so as it is able to handle 2PC data from > a buffer extracted by XlogReadTwoPhaseData(), and we are good to go. > Not quite. If we modify PrescanPreparedTransactions(), we also need to make RecoverPreparedTransactions() and StandbyRecoverPreparedTransactions() handle 2PC data via XlogReadTwoPhaseData(). > + /* > +* Move prepared transactions, if any, from KnownPreparedList to > files. > +* It is possible to skip this step and teach subsequent code about > +* KnownPreparedList, but PrescanPreparedTransactions() happens once > +* during end of recovery or on promote, so probably it isn't worth > +* the additional code. > +*/ This comment is misplaced. Does not make sense before this specific call. > + KnownPreparedRecreateFiles(checkPoint.redo); > RecoveryRestartPoint(&checkPoint); > Looking again at this code, I think that this is incorrect. The > checkpointer should be in charge of doing this work and not the > startup process, so this should go into CheckpointTwoPhase() instead. I don't see a function by the above name in the code? Regards, Nikhils -- Nikhil Sontakke http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund wrote: > > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > >> * Peter Geoghegan (p...@heroku.com) wrote: > >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost > >> > wrote: > >> > > As it is, there are backup solutions which *do* check the checksum when > >> > > backing up PG. This is no longer, thankfully, some hypothetical thing, > >> > > but something which really exists and will hopefully keep users from > >> > > losing data. > >> > > >> > Wouldn't that have issues with torn pages? > >> > >> No, why would it? The page has either been written out by PG to the OS, > >> in which case the backup s/w will see the new page, or it hasn't been. > > > > Uh. Writes aren't atomic on that granularity. That means you very well > > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries > > of a 8KB postgres page). Just read a page while it's being written out. > > Yeah. This is also why backups force full page writes on even if > they're turned off in general. I've got a question into David about this, I know we chatted about the risk at one point, I just don't recall what we ended up doing (I can imagine a few different possible things- re-read the page, which isn't a guarantee but reduces the chances a fair bit, or check the LSN, or perhaps the plan was to just check if it's in the WAL, as I mentioned) or if we ended up concluding it wasn't a risk for some, perhaps incorrect, reason and need to revisit it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > Your proposed policy is essentially that functions should have > built-in superuser checks if having access to that function is > sufficient to escalate your account to full superuser privileges. Yes, I do. > 1. There's no consensus on any such policy. Evidently. I will say that it was brought up on a thread with quite a bit of general discussion and resulted in a committed patch. If you want my 2c on it, there was at least a consensus on it at that time, even if one no longer exists. I certainly don't recall anyone clamouring at the time that we should re-evaluate how my assessment was done or the choices made regarding the functions which you now, 2 years later, wish to change. > 2. If there were such a policy it would favor, not oppose, changing > pg_ls_dir(), because you can't escalate to superuser given access to > pg_ls_dir(). Yet you are also opposed to changing pg_ls_dir() for > reasons that boil down to a personal preference on your part for > people not using it to build monitoring scripts. If your argument was specifically regarding pg_ls_dir() and pg_ls_dir() only then I would have misgivings about it because it's a piss-poor solution to the problem that you've actually presented and I am concerned that such a "design" will lead to later implication that it's the "right" thing to do and that we shouldn't try to do better. In other words, the exact argument we always tend to have when we're talking about new features and their design- is it the right design, etc. > 3. Such a policy can only be enforced to the extent that we can > accurately predict which functions can be used to escalate to > superuser, which is not necessarily obvious in every case. Under your > proposed policy, if a given function turns out to be more dangerous > than we'd previously thought, we'd have to stick the superuser check > back in for the next release. And if it turns out to be less > dangerous than we thought, we'd take the check out. That would be > silly. If the proposed function was able to be used to escalate a user to superuser status then I'd think we'd want to do a security release to fix whatever that hole is, since it's almost certainly not the intended use of the function and the lack of proper checking is a security risk. If we stop doing that, where do we draw the line? Is it ok for pg_termiante_backend() to be used to gain superuser-level access through some kind of race condition? What about pg_start_backup()? Which functions are "OK" to be allowed to make users superuser, and which are not, when we don't have superuser() checks in any of them? Further, how is the user supposed to know? Are we going to start sticking labels on functions which say "WARNING: This can be used to become a superuser!" I certainly hope not. > 4. Such a policy is useless from a security perspective because you > can't actually prevent superusers from delegating access to those > functions. You can force them to use wrapper functions but that > doesn't eo ipso improve security. It might make security better or > worse depending on how well the functions are written, and it seems > extremely optimistic to suppose that everyone who writes a security > definer wrapper function will actually do anything more than expose > the underlying function as-is (and maybe forget to schema-qualify > something). This is not about preventing superusers from delegating access to whichever users they wish to, where it makes sense to do so. > 5. If you're worried about people granting access to functions that > allow escalation to the superuser account, what you really ought to do > is put some effort into documenting which functions have such hazards > and for what general reasons. I did, by making the ones that I don't believe can be used to gain superuser level access GRANT'able to non-superusers. The ones which I didn't feel comfortable with changing in that way were left with the superuser() checks in them. Maybe I got that wrong, but that was certainly the idea, as discussed on the thread which I linked you to. > I'd be willing to agree to write documentation along the lines > suggested in (5) as a condition of removing the remaining superuser > checks if you'd be willing to review it and suggest a place to put it. I'm really not anxious to have a bunch of such warnings throughout the docs. Nor am I thrilled about the idea of having to argue about what can and what can't- would you say pg_read_file() is superuser-equiv? I would, because in a great many cases, for all practical purposes, it is, because a ton of users run with md5 and the auth info could be pulled from pg_authid directly with pg_read_file(). The lack of any way to contain the access that such a function would provide would also mean that users would end up making a similar choice- GRANT access to this function that an attacker could use to gain superuser() rights, or not? That's not t
Re: [HACKERS] Checksums by default?
On 26 January 2017 at 01:58, Thomas Munro wrote: > > I don't know how comparable it is to our checksum technology, but > MySQL seems to have some kind of checksums on table data, and you can > find public emails, blogs etc lamenting corrupted databases by > searching Google for the string "InnoDB: uncompressed page, stored > checksum in field1" (that's the start of a longer error message that > includes actual and expected checksums). I'm not sure what exactly that teaches us however. I see these were often associated with software bugs (Apparently MySQL long assumed that a checksum of 0 never happened for example). In every non software case I stumbled across seemed to be following a power failure. Apparently MySQL uses a "doublewrite buffer" to protect against torn pages but when I search for that I get tons of people inquiring how to turn it off... So even without software bugs in the checksum code I don't know that the frequency of the error necessarily teaches us anything about the frequency of hardware corruption either. And more to the point it seems what people are asking for in all those lamentations is how they can convince MySQL to continue and ignore the corruption. A typical response was "We slightly modified innochecksum and added option -f that means if the checksum of a page is wrong, rewrite it in the InnoDB page header." Which begs the question... -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund wrote: > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: >> * Peter Geoghegan (p...@heroku.com) wrote: >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> > > As it is, there are backup solutions which *do* check the checksum when >> > > backing up PG. This is no longer, thankfully, some hypothetical thing, >> > > but something which really exists and will hopefully keep users from >> > > losing data. >> > >> > Wouldn't that have issues with torn pages? >> >> No, why would it? The page has either been written out by PG to the OS, >> in which case the backup s/w will see the new page, or it hasn't been. > > Uh. Writes aren't atomic on that granularity. That means you very well > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries > of a 8KB postgres page). Just read a page while it's being written out. Yeah. This is also why backups force full page writes on even if they're turned off in general. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
On Wed, Jan 25, 2017 at 8:22 PM, Stephen Frost wrote: > Apparently we disagree about what is a 'reasonable manner'. Yes. I think that a "reasonable manner" should mean "what the DBA thinks is reasonable", whereas you apparently think it should mean "what the DBA thinks is reasonable, but only if the core developers and in particular Stephen Frost also think it's reasonable". Your proposed policy is essentially that functions should have built-in superuser checks if having access to that function is sufficient to escalate your account to full superuser privileges. But: 1. There's no consensus on any such policy. 2. If there were such a policy it would favor, not oppose, changing pg_ls_dir(), because you can't escalate to superuser given access to pg_ls_dir(). Yet you are also opposed to changing pg_ls_dir() for reasons that boil down to a personal preference on your part for people not using it to build monitoring scripts. 3. Such a policy can only be enforced to the extent that we can accurately predict which functions can be used to escalate to superuser, which is not necessarily obvious in every case. Under your proposed policy, if a given function turns out to be more dangerous than we'd previously thought, we'd have to stick the superuser check back in for the next release. And if it turns out to be less dangerous than we thought, we'd take the check out. That would be silly. 4. Such a policy is useless from a security perspective because you can't actually prevent superusers from delegating access to those functions. You can force them to use wrapper functions but that doesn't eo ipso improve security. It might make security better or worse depending on how well the functions are written, and it seems extremely optimistic to suppose that everyone who writes a security definer wrapper function will actually do anything more than expose the underlying function as-is (and maybe forget to schema-qualify something). 5. If you're worried about people granting access to functions that allow escalation to the superuser account, what you really ought to do is put some effort into documenting which functions have such hazards and for what general reasons. That would have a much better chance of preventing people from delegating access to dangerous functions inadvertently than the current method, which relies on people knowing (without documentation) that you've attempted to leave hard-coded superuser() checks in some functions but not others for reasons that sometimes but not always include privilege escalation, correctly distinguishing which such cases involve privilege escalation as opposed to other arbitrary criteria, and deciding neither to create secdef wrappers for anything that has a built-in check for reasons of privilege isolation nor to give up on privilege isolation and hand out superuser to people who need pg_ls_dir(). While such a clever chain of deductive reasoning cannot be ruled out, it would be astonishing if it happened very often. I'd be willing to agree to write documentation along the lines suggested in (5) as a condition of removing the remaining superuser checks if you'd be willing to review it and suggest a place to put it. But I have a feeling compromise may not be possible here. To me, the hand-wringing about the evils of pg_ls_dir() on this thread contrasts rather starkly with the complete lack of discussion about whether the patch removing superuser checks from pgstattuple was opening up any security vulnerabilities. And given that the aforesaid patch lets a user who has EXECUTE privileges on pgstattuple run that function even on relations for which they have no other privileges, such as say pg_authid, it hardly seems self-evident that there are no leaks there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries
On 15 January 2017 at 05:18, Tom Lane wrote: > Fabien COELHO writes: >>> It ends up being about 30 fewer lines of code overall, despite there >>> being four places that have to make ad-hoc RawStmt nodes. On the whole >>> I feel like this is cleaner, > >> I agree: Better typing, more homogeneous code (PlannedStmt for all), >> less ad-hoc checks to work around utility statements... > > OK, pushed like that. Thanks very much for that Tom, it's great to see this change. One suggestion: it's currently non-obvious that ProcessUtility_hook gets called with the full text of all parts of a multi-statement. I suggest the following wording added to the comment on ProcessUtility() in src/backend/tcop/utility.c, after the note on the query string, or something like it: The same query string may be passed to multiple invocations of ProcessUtility if a utility statement in turn invokes other utility statements, or if the user supplied a query string containing multiple semicolon-separated statements in a single protocol message. It is also possible for the query text to contain other non-utility-statement text like comments, empty statements, and plannable statements. Callers that use the queryString should use pstmt->stmt_location and pstmt->stmt_len to extract the text for the statement of interest and should guard against re-entrant invocation. That should help with at least some of the traps around ProcessUtility_hook, and I certainly wish I'd known it some months ago. For the record, this is commits ab1f0c8225714aaa18d2f9ca4f80cd009f145421 and 83f2061dd037477ec8479ee160367840e203a722 . https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ab1f0c8225714aaa18d2f9ca4f80cd009f145421 https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=83f2061dd037477ec8479ee160367840e203a722 -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba_file_settings view patch
On Thu, Jan 26, 2017 at 2:32 AM, Tom Lane wrote: > The way I'd be inclined to make the individual reporting changes is like > > if (!EnableSSL) > +{ > - ereport(LOG, > + ereport(elevel, > (errcode(ERRCODE_CONFIG_FILE_ERROR), > errmsg("hostssl record cannot match because SSL is disabled"), > errhint("Set ssl = on in postgresql.conf."), > errcontext("line %d of configuration file \"%s\"", > line_num, HbaFileName))); > +*err_msg = pstrdup("hostssl record cannot match because SSL > is disabled"); > +} > > which is considerably less invasive and hence easier to review, and > supports reporting different text in the view than appears in the log, > should we need that. It seems likely also that we could drop the pstrdup > in the case of constant strings (we'd still need psprintf if we want to > insert values into the view messages), which would make this way cheaper > than what's in the patch now. I don't really understand the argument about readability of the patch as what Haribabu has proposed is simply to avoid a duplicate of the strings and the diffs of the patch are really clear. For the sake of not translating the strings sent back to the system view though I can buy it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding of two-phase transactions
On 5 January 2017 at 20:43, Stas Kelvich wrote: > >> On 5 Jan 2017, at 13:49, Simon Riggs wrote: >> >> Surely in this case the master server is acting as the Transaction >> Manager, and it knows the mapping, so we are good? >> >> I guess if you are using >2 nodes then you need to use full 2PC on each node. >> >> Please explain precisely how you expect to use this, to check that GID >> is required. >> > > For example if we are using logical replication just for failover/HA and > allowing user > to be transaction manager itself. Then suppose that user prepared tx on > server A and server A > crashed. After that client may want to reconnect to server B and commit/abort > that tx. > But user only have GID that was used during prepare. > >> But even then, if you adopt the naming convention that all in-progress >> xacts will be called RepOriginId-EPOCH-XID, so they have a fully >> unique GID on all of the child nodes then we don't need to add the >> GID. > > Yes, that’s also possible but seems to be less flexible restricting us to some > specific GID format. > > Anyway, I can measure WAL space overhead introduced by the GID’s inside > commit records > to know exactly what will be the cost of such approach. Stas, Have you had a chance to look at this further? I think the approach of storing just the xid and fetching the GID during logical decoding of the PREPARE TRANSACTION is probably the best way forward, per my prior mail. That should eliminate Simon's objection re the cost of tracking GIDs and still let us have access to them when we want them, which is the best of both worlds really. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Wed, Jan 25, 2017 at 11:55 PM, Nikhil Sontakke wrote: >> We are talking about the recovery/promote code path. Specifically this >> call to KnownPreparedRecreateFiles() in PrescanPreparedTransactions(). >> >> We write the files to disk and they get immediately read up in the >> following code. We could not write the files to disk and read >> KnownPreparedList in the code path that follows as well as elsewhere. > > Thinking more on this. > > The only optimization that's really remaining is handling of prepared > transactions that have not been committed or will linger around for > long. The short lived 2PC transactions have been optimized already via > this patch. > > The question remains whether saving off a few fsyncs/reads for these > long-lived prepared transactions is worth the additional code churn. > Even if we add code to go through the KnownPreparedList, we still will > have to go through the other on-disk 2PC transactions anyways. So, > maybe not. We should really try to do things right now, or we'll never come back to it. 9.3 (if my memory does not fail me?) has reduced the time to do promotion by removing the need of the end-of-recovery checkpoint, while I agree that there should not be that many 2PC transactions at this point, if there are for a reason or another, the time it takes to complete promotion would be impacted. So let's refactor PrescanPreparedTransactions() so as it is able to handle 2PC data from a buffer extracted by XlogReadTwoPhaseData(), and we are good to go. --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9573,6 +9573,15 @@ xlog_redo(XLogReaderState *record) (errmsg("unexpected timeline ID %u (should be %u) in checkpoint record", checkPoint.ThisTimeLineID, ThisTimeLineID))); + + /* +* Move prepared transactions, if any, from KnownPreparedList to files. +* It is possible to skip this step and teach subsequent code about +* KnownPreparedList, but PrescanPreparedTransactions() happens once +* during end of recovery or on promote, so probably it isn't worth +* the additional code. +*/ + KnownPreparedRecreateFiles(checkPoint.redo); RecoveryRestartPoint(&checkPoint); Looking again at this code, I think that this is incorrect. The checkpointer should be in charge of doing this work and not the startup process, so this should go into CheckpointTwoPhase() instead. At the end, we should be able to just live without KnownPreparedRecreateFiles() and just rip it off from the patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 2:28 PM, Stephen Frost wrote: > Sadly, without having them enabled by default, there's not a huge corpus > of example cases to draw from. > > There have been a few examples already posted about corruption failures > with PG, but one can't say with certainty that they would have been > caught sooner if checksums had been enabled. I don't know how comparable it is to our checksum technology, but MySQL seems to have some kind of checksums on table data, and you can find public emails, blogs etc lamenting corrupted databases by searching Google for the string "InnoDB: uncompressed page, stored checksum in field1" (that's the start of a longer error message that includes actual and expected checksums). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
> > >> I don't understand why do we have all these checks. Can't we just pass >> the values to COPY and let it apply the checks? That way, when COPY is >> updated to support multibyte escape chars (for example) we don't need to >> touch this code. Together with removing the unneeded braces that would >> make these stanzas about six lines long instead of fifteen. >> > > If I understand you correctly, COPY (via BeginCopyFrom) itself relies on > having a relation in pg_class to reference for attributes. > In this case, there is no such relation. So I'd have to fake a relcache > entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the > Relation and pass that along to a new function BeginCopyFromReturnSet. I'm > happy to go that route if you think it's a good idea. > > >> >> >> > + tuple = heap_form_tuple(tupdesc,values,nulls); >> > + //tuple = BuildTupleFromCStrings(attinmeta, >> field_strings); >> > + tuplestore_puttuple(tupstore, tuple); >> >> No need to form a tuple; use tuplestore_putvalues here. >> > > Good to know! > > > >> >> I wonder if these should be an auxiliary function in copy.c to do this. >> Surely copy.c itself does pretty much the same thing ... >> > > Yes. This got started as a patch to core because not all of the parts of > COPY are externally callable, and aren't broken down in a way that allowed > for use in a SRF. > > I'll get to work on these suggestions. > I've put in some more work on this patch, mostly just taking Alvaro's suggestions, which resulted in big code savings. I had to add a TupleDesc parameter to BeginCopy() and BeginCopyFrom(). This seemed the easiest way to leverage the existing tested code (and indeed, it worked nearly out-of-the-box). The only drawback is that a minor change will have to be made to the BeginCopyFrom() call in file_fdw.c, and any other extensions that leverage COPY. We could make compatibility functions that take the original signature and pass it along to the corresponding function with rsTupDesc set to NULL. Some issues: - I'm still not sure if the direction we want to go is a set returning function, or a change in grammar that lets us use COPY as a CTE or similar. - This function will have the same difficulties as adding the program option did to file_fdw: there's very little we can reference that isn't os/environment specific - Inline (STDIN) prompts the user for input, but gives the error: server sent data ("D" message) without prior row description ("T" message). I looked for a place where the Relation was consulted for the row description, but I'm not finding it. I can continue to flesh this out with documentation and test cases if there is consensus that this is the way to go. # select * from copy_srf('echo "x\ty"',true) as t(x text, y text); x | y ---+--- x | y (1 row) Time: 1.074 ms # select * from copy_srf('echo "x\t4"',true) as t(x text, y integer); x | y ---+--- x | 4 (1 row) Time: 1.095 ms # select * from copy_srf(null) as t(x text, y integer); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> a4 >> b5 >> \. server sent data ("D" message) without prior row description ("T" message) diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 4dfedf8..26f81f3 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL STRICT IMMUTABLE PARALLEL SAFE AS 'jsonb_insert'; +CREATE OR REPLACE FUNCTION copy_srf( + IN filename text, + IN is_program boolean DEFAULT false, + IN format text DEFAULT null, + IN delimiter text DEFAULT null, + IN null_string text DEFAULT null, + IN header boolean DEFAULT null, + IN quote text DEFAULT null, + IN escape text DEFAULT null, + IN encoding text DEFAULT null) +RETURNS SETOF RECORD +LANGUAGE INTERNAL +VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT +AS 'copy_srf'; + -- The default permissions for functions mean that anyone can execute them. -- A number of functions shouldn't be executable by just anyone, but rather -- than use explicit 'superuser()' checks in those functions, we use the GRANT diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index f9362be..4e6a32c 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -30,6 +30,7 @@ #include "commands/defrem.h" #include "commands/trigger.h" #include "executor/executor.h" +#include "funcapi.h" #include "libpq/libpq.h" #include "libpq/pqformat.h" #include "mb/pg_wchar.h" @@ -286,7 +287,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel, +static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel, + TupleDesc rsTupDesc, RawStmt *raw_query, Oi
Re: [HACKERS] pgbench more operators & functions
Fabien, * Fabien COELHO (coe...@cri.ensmp.fr) wrote: > I think that there is a misunderstanding, most of which being my fault. No worries, it happens. :) > I have really tried to do everything that was required from > committers, including revising the patch to match all previous > feedback. Thanks for continuing to try to work through everything. I know it can be a difficult process, but it's all towards a (hopefully) improved and better PG. > Version 6 sent on Oct 4 did include all fixes required at the time > (no if, no unusual and operators, TAP tests)... However I forgot to > remove some documentation about the removed stuff, which made Robert > think that I had not done it. I apologise for this mistake and the > subsequent misunderstanding:-( Ok, that helps clarify things. As does the rest of your email, for me, anyway. > If pgbench is about being seated on a bench and running postgres on > your laptop to get some heat, my mistake... I thought it was about > benchmarking, which does imply a few extra capabities. I believe we do want to improve pgbench and your changes are generally welcome when it comes to adding useful capabilities. Your explanation was also helpful about the specific requirements. > IMHO the relevant current status of the patch should be "Needs > review" and possibly "Move to next CF". For my 2c, at least, while I'm definitely interested in this, it's not nearly high enough on my plate with everything else going on to get any attention in the next few weeks, at least. I do think that, perhaps, this patch may deserve a bit of a break, to allow people to come back to it with a fresh perspective, so perhaps moving it to the next commitfest would be a good idea, in a Needs Review state. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
Peter, * Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan wrote: > > I understand that my experience with storage devices is unusually > > narrow compared to everyone else here. That's why I remain neutral on > > the high level question of whether or not we ought to enable checksums > > by default. I'll ask other hackers to answer what may seem like a very > > naive question, while bearing what I just said in mind. The question > > is: Have you ever actually seen a checksum failure in production? And, > > if so, how helpful was it? > > I'm surprised that nobody has answered my question yet. > > I'm not claiming that not actually seeing any corruption in the wild > due to a failing checksum invalidates any argument. I *do* think that > data points like this can be helpful, though. Sadly, without having them enabled by default, there's not a huge corpus of example cases to draw from. There have been a few examples already posted about corruption failures with PG, but one can't say with certainty that they would have been caught sooner if checksums had been enabled. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan wrote: > I understand that my experience with storage devices is unusually > narrow compared to everyone else here. That's why I remain neutral on > the high level question of whether or not we ought to enable checksums > by default. I'll ask other hackers to answer what may seem like a very > naive question, while bearing what I just said in mind. The question > is: Have you ever actually seen a checksum failure in production? And, > if so, how helpful was it? I'm surprised that nobody has answered my question yet. I'm not claiming that not actually seeing any corruption in the wild due to a failing checksum invalidates any argument. I *do* think that data points like this can be helpful, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-01-25 18:04:09 -0500, Stephen Frost wrote: > > Robert's made it clear that he'd like to have a blanket rule that we > > don't have superuser checks in these code paths if they can be GRANT'd > > at the database level, which goes beyond pg_ls_dir. > > That seems right to me. I don't see much benefit for the superuser() > style checks, with a few exceptions. Granting by default is obviously > an entirely different question. Well, for my part at least, I disagree. Superuser is a very different animal, imv, than privileges which can be GRANT'd, and I feel that's an altogether good thing. > In other words, you're trying to force people to do stuff your preferred > way, instead of allowing them to get things done is a reasonable manner. Apparently we disagree about what is a 'reasonable manner'. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost wrote: > > I hope to discuss it further after we have the ability to turn it off > > easily. > > I think we should have the ability to flip it in BOTH directions easily. Presumably you imply this to mean "before we enable it by default." I'm not sure that I can agree with that, but we haven't got it in either direction yet, so it's not terribly interesting to discuss that particular "what if." > It sounds to me like you are misleading users about the positives and > negatives of checksums, which then causes them to be shocked that they > are not the default. I don't try to claim that they are without downsides or performance impacts, if that's the implication here. > > [ more unsolicited bragging an unspecified backup tool, presumably still > > pgbackrest ] It was explicitly to counter the claim that there aren't things out there which are working to actively check the checksums. > > I'd rather walk into an engagement where the user is saying "yeah, we > > enabled checksums and it caught this corruption issue" than having to > > break the bad news, which I've had to do over and over, that their > > existing system hasn't got checksums enabled. This isn't hypothetical, > > it's what I run into regularly with entirely reasonable and skilled > > engineers who have been deploying PG. > > Maybe you should just stop telling them and use the time thus freed up > to work on improving the checksum feature. I'm working to improve the usefulness of our checksum feature in a way which will produce practical and much more immediate results than anything I could do today in PG. That said, I do plan to also support working on checksums as I'm able to. At the moment, that's supporting Magnus' thread about enabling them by default. I'd be a bit surprised if he was trying to force a change on PG because he thinks it's going to improve things for pgbackrest, but if so, I'm not going to complain when it seems like an entirely sensible and good change which will benefit PG's users too. Even better would be if we had an independent tool to check checksums endorsed by the PG community, but that won't happen for a release cycle. I'd also be extremely happy if the other backup tools out there grew the ability to check checksums in PG pages; frankly, I hope that adding it to pgbackrest will push them to do so. > I'm skeptical of this whole discussion because you seem to be filled > with unalloyed confidence that checksums have little performance > impact and will do wonderful things to prevent data loss, whereas I > think they have significant performance impact and will only very > slightly help to prevent data loss. I admit that they'll have a significant performance impact in some environments, but I think the vast majority of installations won't see anything different, while some of them may be saved by it, including, as likely as not, a number of actual corruption issues that have been brought up on these lists in the past few days, simply because reports were asked for. > I admit that the idea of having > pgbackrest verify checksums while backing up seems like it could > greatly improve the chances of checksums being useful, but I'm not > going to endorse changing PostgreSQL's default for pgbackrest's > benefit. I'm glad to hear that you generally endorse the idea of having a backup tool verify checksums. I'd love it if all of them did and I'm not going to apologize for, as far as I'm aware, being the first to even make an effort in that direction. > It's got to be to the benefit of PostgreSQL users broadly, > not just the subset of those people who use one particular backup > tool. Hopefully, other backup solutions will add similar capability, and perhaps someone will also write an independent tool, and eventually those will get out in released versions, and maybe PG will grow a tool to check checksums too, but I can't make other tool authors implement it, nor can I make other committers work on it and while I'm doing what I can, as I'm sure you understand, we all have a lot of different hats. > Also, the massive hit that will probably occur on > high-concurrency OLTP workloads larger than shared_buffers is going to > be had to justify for any amount of backup security. I think that > problem's got to be solved or at least mitigated before we think about > changing this. I realize that not everyone would set the bar that > high, but I see far too many customers with exactly that workload to > dismiss it lightly. I have a sneaking suspicion that the customers which you get directly involved with tend to be at a different level than the majority of PG users which exist out in the wild (I can't say that it's really any different for me). I don't think that's a bad thing, but I do think users at all levels deserve consideration and not just those running close to the limits of their gear. Thanks! Ste
Re: [HACKERS] patch: function xmltable
2017-01-25 15:07 GMT+01:00 Alvaro Herrera : > Tom Lane wrote: > > Andres Freund writes: > > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote: > > >> XMLTABLE is specified by the standard to return multiple rows ... but > > >> then as far as my reading goes, it is only supposed to be supported in > > >> the range table (FROM clause) not in the target list. I wonder if > > >> this would end up better if we only tried to support it in RT. I > asked > > >> Pavel to implement it like that a few weeks ago, but ... > > > > > Right - it makes sense in the FROM list - but then it should be an > > > executor node, instead of some expression thingy. > > > > +1 --- we're out of the business of having simple expressions that > > return rowsets. > > Well, that's it. I'm not committing this patch against two other > committers' opinion, plus I was already on the fence about the > implementation anyway. I think you should just go with the flow and > implement this by creating nodeTableexprscan.c. It's not even > difficult. > I am playing with this and the patch looks about 15kB longer - just due implementation basic scan functionality - and I didn't touch a planner. I am not happy from this - still I have a feeling so I try to reimplement reduced SRF. Regards Pavel > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] Checksums by default?
Michael, * Michael Paquier (michael.paqu...@gmail.com) wrote: > That would be enough. It should also be rare enough that there would > not be that many pages to track when looking at records from the > backup start position to minimum recovery point. It could be also > simpler, though more time-consuming, to just let a backup recover up > to the minimum recovery point (recovery_target = 'immediate'), and > then run the checksum sanity checks. There are other checks usually > needed on a backup anyway like being sure that index pages are in good > shape even with a correct checksum, etc. Belive me, I'm all for *all* of that. > But here I am really high-jacking the thread, so I'll stop.. If you have further thoughts, I'm all ears. This is all relatively new, and I don't expect to have all of the answer or solutions. Obviously, having to bring up a full database is an extra step (one we try to make easy to do), but, sadly, we don't have any way to ask PG to verify all the checksums with released versions, so that's what we're working with. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Andres Freund (and...@anarazel.de) wrote: > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > > * Peter Geoghegan (p...@heroku.com) wrote: > > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > > > > As it is, there are backup solutions which *do* check the checksum when > > > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > > > but something which really exists and will hopefully keep users from > > > > losing data. > > > > > > Wouldn't that have issues with torn pages? > > > > No, why would it? The page has either been written out by PG to the OS, > > in which case the backup s/w will see the new page, or it hasn't been. > > Uh. Writes aren't atomic on that granularity. That means you very well > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries > of a 8KB postgres page). Just read a page while it's being written out. > > You simply can't reliably verify checksums without replaying WAL (or > creating a manual version of replay, as in checking the WAL for a FPW). Looking through the WAL isn't any surprise and is something we've been planning to do for other reasons anyway. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:32 AM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier >> wrote: >> > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> >>> As it is, there are backup solutions which *do* check the checksum when >> >>> backing up PG. This is no longer, thankfully, some hypothetical thing, >> >>> but something which really exists and will hopefully keep users from >> >>> losing data. >> >> >> >> Wouldn't that have issues with torn pages? >> > >> > Why? What do you foresee here? I would think such backup solutions are >> > careful enough to ensure correctly the durability of pages so as they >> > are not partially written. >> >> Well, you'd have to keep a read(fd, buf, 8192) performed by the backup >> tool from overlapping with a write(fd, buf, 8192) performed by the >> backend. > > As Michael mentioned, that'd depend on if things are atomic from a > user's perspective at certain sizes (perhaps 4k, which wouldn't be too > surprising, but may also be system-dependent), in which case verifying > that the page is in the WAL would be sufficient. That would be enough. It should also be rare enough that there would not be that many pages to track when looking at records from the backup start position to minimum recovery point. It could be also simpler, though more time-consuming, to just let a backup recover up to the minimum recovery point (recovery_target = 'immediate'), and then run the checksum sanity checks. There are other checks usually needed on a backup anyway like being sure that index pages are in good shape even with a correct checksum, etc. But here I am really high-jacking the thread, so I'll stop.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: > * Peter Geoghegan (p...@heroku.com) wrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > > > As it is, there are backup solutions which *do* check the checksum when > > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > > but something which really exists and will hopefully keep users from > > > losing data. > > > > Wouldn't that have issues with torn pages? > > No, why would it? The page has either been written out by PG to the OS, > in which case the backup s/w will see the new page, or it hasn't been. Uh. Writes aren't atomic on that granularity. That means you very well *can* see a torn page (in linux you can e.g. on 4KB os page boundaries of a 8KB postgres page). Just read a page while it's being written out. You simply can't reliably verify checksums without replaying WAL (or creating a manual version of replay, as in checking the WAL for a FPW). > This isn't like a case where only half the page made it to the disk > because of a system failure though; everything is online and working > properly during an online backup. I don't think that really changes anything. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier > wrote: > > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >>> As it is, there are backup solutions which *do* check the checksum when > >>> backing up PG. This is no longer, thankfully, some hypothetical thing, > >>> but something which really exists and will hopefully keep users from > >>> losing data. > >> > >> Wouldn't that have issues with torn pages? > > > > Why? What do you foresee here? I would think such backup solutions are > > careful enough to ensure correctly the durability of pages so as they > > are not partially written. > > Well, you'd have to keep a read(fd, buf, 8192) performed by the backup > tool from overlapping with a write(fd, buf, 8192) performed by the > backend. As Michael mentioned, that'd depend on if things are atomic from a user's perspective at certain sizes (perhaps 4k, which wouldn't be too surprising, but may also be system-dependent), in which case verifying that the page is in the WAL would be sufficient. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Michael Paquier (michael.paqu...@gmail.com) wrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >> As it is, there are backup solutions which *do* check the checksum when > >> backing up PG. This is no longer, thankfully, some hypothetical thing, > >> but something which really exists and will hopefully keep users from > >> losing data. > > > > Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. I believe his concern was that the backup sw might see a partially-updated page when it reads the file while PG is writing it. In other words, would the kernel return some intermediate state of data while an fwrite() is in progress. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
* Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > > As it is, there are backup solutions which *do* check the checksum when > > backing up PG. This is no longer, thankfully, some hypothetical thing, > > but something which really exists and will hopefully keep users from > > losing data. > > Wouldn't that have issues with torn pages? No, why would it? The page has either been written out by PG to the OS, in which case the backup s/w will see the new page, or it hasn't been. Our testing has not turned up any issues as yet. That said, it's relatively new and I wouldn't be surprised if we need to do some adjustments in that area, which might be system-dependent even. We could certainly check the WAL for the page that had a checksum error (we currently simply report them, though don't throw away a prior backup if we detect one). This isn't like a case where only half the page made it to the disk because of a system failure though; everything is online and working properly during an online backup. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:22 AM, Andres Freund wrote: > On 2017-01-26 09:19:28 +0900, Michael Paquier wrote: >> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> >> As it is, there are backup solutions which *do* check the checksum when >> >> backing up PG. This is no longer, thankfully, some hypothetical thing, >> >> but something which really exists and will hopefully keep users from >> >> losing data. >> > >> > Wouldn't that have issues with torn pages? >> >> Why? What do you foresee here? I would think such backup solutions are >> careful enough to ensure correctly the durability of pages so as they >> are not partially written. > > That means you have to replay enough WAL to get into a consistent > state... Ah, OK I got the point. Yes that would be a problem to check this field on raw backups except if the page size matches the kernel's one at 4k. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier wrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >>> As it is, there are backup solutions which *do* check the checksum when >>> backing up PG. This is no longer, thankfully, some hypothetical thing, >>> but something which really exists and will hopefully keep users from >>> losing data. >> >> Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. Well, you'd have to keep a read(fd, buf, 8192) performed by the backup tool from overlapping with a write(fd, buf, 8192) performed by the backend. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost wrote: > I hope to discuss it further after we have the ability to turn it off > easily. I think we should have the ability to flip it in BOTH directions easily. >> Second, really hard to enable is a relative term. I accept that >> enabling checksums is not a pleasant process. Right now, you'd have >> to do a dump/restore, or use logical replication to replicate the data >> to a new cluster and then switch over. On the other hand, if >> checksums are really a critical feature, how are people getting to the >> point where they've got a mission-critical production system and only >> then discovering that they want to enable checksums? > > I truely do wish everyone would come talk to me before building out a > database. Perhaps that's been your experience, in which case, I envy > you, but I tend to get a reaction more along the lines of "wait, what do > you mean I had to pass some option to initdb to enable checksum?!?!". > The fact that we've got a WAL implementation and clearly understand > fsync requirements, why full page writes make sense, and that our WAL > has its own CRCs which isn't possible to disable, tends to lead people > to think we really know what we're doing and that we care a lot about > their data. It sounds to me like you are misleading users about the positives and negatives of checksums, which then causes them to be shocked that they are not the default. > As I have said, I don't believe it has to be on for everyone. For the second time, I didn't say that. But the default has a powerful influence on behavior. If it didn't, you wouldn't be trying to get it changed. > [ unsolicited bragging about an unspecified backup tool, presumably > pgbackrest ] Great. > Presently, last I checked at least, the database system doesn't fall > over and die if a single page's checksum fails. This is another thing that I never said. > [ more unsolicited bragging an unspecified backup tool, presumably still > pgbackrest ] Swell. >> I'm not trying to downplay the usefulness of checksums *in a certain >> context*. It's a good feature, and I'm glad we have it. But I think >> you're somewhat inflating the utility of it while discounting the very >> real costs. > > The costs for checksums don't bother me any more than the costs for WAL > or WAL CRCs or full page writes. Obviously. But I think they should. Frankly, I think the costs for full page writes should bother the heck out of all of us, but the solution isn't to shut them off any more than it is to enable checksums despite the cost. It's to find a way to reduce the costs. > They may not be required on every > system, but they're certainly required on more than 'zero' entirely > reasonable systems which people deploy in their production environments. Nobody said otherwise. > I'd rather walk into an engagement where the user is saying "yeah, we > enabled checksums and it caught this corruption issue" than having to > break the bad news, which I've had to do over and over, that their > existing system hasn't got checksums enabled. This isn't hypothetical, > it's what I run into regularly with entirely reasonable and skilled > engineers who have been deploying PG. Maybe you should just stop telling them and use the time thus freed up to work on improving the checksum feature. I'm skeptical of this whole discussion because you seem to be filled with unalloyed confidence that checksums have little performance impact and will do wonderful things to prevent data loss, whereas I think they have significant performance impact and will only very slightly help to prevent data loss. I admit that the idea of having pgbackrest verify checksums while backing up seems like it could greatly improve the chances of checksums being useful, but I'm not going to endorse changing PostgreSQL's default for pgbackrest's benefit. It's got to be to the benefit of PostgreSQL users broadly, not just the subset of those people who use one particular backup tool. Also, the massive hit that will probably occur on high-concurrency OLTP workloads larger than shared_buffers is going to be had to justify for any amount of backup security. I think that problem's got to be solved or at least mitigated before we think about changing this. I realize that not everyone would set the bar that high, but I see far too many customers with exactly that workload to dismiss it lightly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 2017-01-26 09:19:28 +0900, Michael Paquier wrote: > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > >> As it is, there are backup solutions which *do* check the checksum when > >> backing up PG. This is no longer, thankfully, some hypothetical thing, > >> but something which really exists and will hopefully keep users from > >> losing data. > > > > Wouldn't that have issues with torn pages? > > Why? What do you foresee here? I would think such backup solutions are > careful enough to ensure correctly the durability of pages so as they > are not partially written. That means you have to replay enough WAL to get into a consistent state... - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] safer node casting
Andres Freund writes: > On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote: >> RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc)); > Are you planning to add this / update this patch? Because I really would > have liked this a number of times already... I can update it according > to my suggestions (to avoid multiple eval scenarios) if helpful Yeah, I'd like that in sooner rather than later, too. But we do need it to be foolproof - no multiple evals. The first draft had inadequate-parenthesization hazards, too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan wrote: > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: >> As it is, there are backup solutions which *do* check the checksum when >> backing up PG. This is no longer, thankfully, some hypothetical thing, >> but something which really exists and will hopefully keep users from >> losing data. > > Wouldn't that have issues with torn pages? Why? What do you foresee here? I would think such backup solutions are careful enough to ensure correctly the durability of pages so as they are not partially written. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] safer node casting
Hi Peter, On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote: > RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc)); Are you planning to add this / update this patch? Because I really would have liked this a number of times already... I can update it according to my suggestions (to avoid multiple eval scenarios) if helpful Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag
(Adding Robert in CC.) On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao wrote: > An unlogged table has an initialization fork. The initialization fork does > not have an BM_PERMANENT flag when get a buffer. > In checkpoint (not shutdown or end of recovery), it will not write to disk. > after a crash recovery, the page of initialization fork will not correctly, > then make the main fork not correctly too. For init forks the flush need absolutely to happen, so that's really not good. We ought to fix BufferAlloc() appropriately here. > Here is an example for GIN index. > > create unlogged table gin_test_tbl(i int4[]); > create index gin_test_idx on gin_test_tbl using gin (i); > checkpoint; > > kill all the postgres process, and restart again. > > vacuum gin_test_tbl; -- crash. > > It seems have same problem in BRIN, GIN, GiST and HASH index which using > buffer for meta page initialize in ambuildempty function. Yeah, other index AMs deal directly with the sync of the page, that's why there is no issue for them. So the patch attached fixes the problem by changing BufferAlloc() in such a way that initialization forks are permanently written to disk, which is what you are suggesting. As a simple fix for back-branches that's enough, though on HEAD I think that we should really rework the empty() routines so as the write goes through shared buffers first, that seems more solid than relying on the sgmr routines to do this work. Robert, what do you think? -- Michael unlogged-flush-fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost wrote: > As it is, there are backup solutions which *do* check the checksum when > backing up PG. This is no longer, thankfully, some hypothetical thing, > but something which really exists and will hopefully keep users from > losing data. Wouldn't that have issues with torn pages? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frost wrote: > > Yet, our default is to have them disabled and *really* hard to enable. > > First of all, that could be fixed by further development. I'm certainly all for doing so, but I don't agree that it necessairly is required before we flip the default. That said, if the way to get checksums enabled by default is providing a relativly easy way to turn them off, then that's something which I'll do what I can to help work towards. In other words, I'm not going to continue to argue, given the various opinions of the group, that we should just flip it tomorrow. I hope to discuss it further after we have the ability to turn it off easily. > Second, really hard to enable is a relative term. I accept that > enabling checksums is not a pleasant process. Right now, you'd have > to do a dump/restore, or use logical replication to replicate the data > to a new cluster and then switch over. On the other hand, if > checksums are really a critical feature, how are people getting to the > point where they've got a mission-critical production system and only > then discovering that they want to enable checksums? I truely do wish everyone would come talk to me before building out a database. Perhaps that's been your experience, in which case, I envy you, but I tend to get a reaction more along the lines of "wait, what do you mean I had to pass some option to initdb to enable checksum?!?!". The fact that we've got a WAL implementation and clearly understand fsync requirements, why full page writes make sense, and that our WAL has its own CRCs which isn't possible to disable, tends to lead people to think we really know what we're doing and that we care a lot about their data. > > I agree that it's unfortunate that we haven't put more effort into > > fixing that- I'm all for it, but it's disappointing to see that people > > are not in favor of changing the default as I believe it would both help > > our users and encourage more development of the feature. > > I think it would help some users and hurt others. I do agree that it > would encourage more development of the feature -- almost of > necessity. In particular, I bet it would spur development of an > efficient way of turning checksums off -- but I'd rather see us > approach it from the other direction: let's develop an efficient way > of turning the feature on and off FIRST. Deciding that the feature > has to be on for everyone because turning it on later is too hard for > the people who later decide they want it is letting the tail wag the > dog. As I have said, I don't believe it has to be on for everyone. > Also, I think that one of the big problems with the way checksums work > is that you don't find problems with your archived data until it's too > late. Suppose that in February bits get flipped in a block. You > don't access the data until July[1]. Well, it's nice to have the > system tell you that the data is corrupted, but what are you going to > do about it? By that point, all of your backups are probably > corrupted. So it's basically: If your backup system is checking the checksums when backing up PG, which I think every backup system *should* be doing, then guess what? You've got a backup which you can go back to immediately, possibly with the ability to restore all of the data from WAL. That won't always be the case, naturally, but it's a much better position than simply having a system which continues to degrade until you've actually reached the "you're screwed" level because PG will no longer read a page or perhaps can't even start up, *and* you no longer have any backups. As it is, there are backup solutions which *do* check the checksum when backing up PG. This is no longer, thankfully, some hypothetical thing, but something which really exists and will hopefully keep users from losing data. > It's nice to know that (maybe?) but without a recovery strategy a > whole lot of people who get that message are going to immediately > start asking "How do I ignore the fact that I'm screwed and try to > read the data anyway?". And we have options for that. > And then you wonder what the point of having > the feature turned on is, especially if it's costly. It's almost an > attractive nuisance at that point - nobody wants to be the user that > turns off checksums because they sound good on paper, but when you > actually have a problem an awful lot of people are NOT going to want > to try to restore from backup and maybe lose recent transactions. > They're going to want to ignore the checksum failures. That's kind of > awful. Presently, last I checked at least, the database system doesn't fall over and die if a single page's checksum fails. I agree entirely that we want the system to fail gracefully (unless the user instructs us otherwise, perhaps because they have a redundant system that they can flip to immediately). > Pe
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
[ in the service of closing out this thread... ] Peter Geoghegan writes: > Finally, 0003-* is a Valgrind suppression borrowed from my parallel > CREATE INDEX patch. It's self-explanatory. Um, I didn't find it all that self-explanatory. Why wouldn't we want to avoid writing undefined data? I think the comment at least needs to explain exactly what part of the written data might be uninitialized. And I'd put the comment into valgrind.supp, too, not in the commit msg. Also, the suppression seems far too broad. It would for instance block any complaint about a write() invoked via an elog call from any function invoked from any LogicalTape* function, no matter how far removed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
On 2017-01-25 18:04:09 -0500, Stephen Frost wrote: > Andres, > > * Andres Freund (and...@anarazel.de) wrote: > > On 2017-01-25 16:52:38 -0500, Stephen Frost wrote: > > > * Robert Haas (robertmh...@gmail.com) wrote: > > > > Preventing people from calling functions by denying the ability to > > > > meaningfully GRANT EXECUTE on those functions doesn't actually stop > > > > them from delegating those functions to non-superusers. It either (a) > > > > causes them to give superuser privileges to accounts that otherwise > > > > would have had lesser privileges or (b) forces them to use wrapper > > > > functions to grant access rather than doing it directly or (c) some > > > > other dirty hack. If they pick (a), security is worse; if they pick > > > > (b) or (c), you haven't prevented them from doing what they wanted to > > > > do anyway. You've just made it annoying and inconvenient. > > > > > > The notion that security is 'worse' under (a) is flawed- it's no > > > different. > > > > Huh? Obviously that's nonesense, given the pg_ls_dir example. > > Robert's made it clear that he'd like to have a blanket rule that we > don't have superuser checks in these code paths if they can be GRANT'd > at the database level, which goes beyond pg_ls_dir. That seems right to me. I don't see much benefit for the superuser() style checks, with a few exceptions. Granting by default is obviously an entirely different question. > If the question was only about pg_ls_dir, then I still wouldn't be for > it, because, as the bits you didn't quote discussed, it encourages users > and 3rd party tool authors to base more things off of pg_ls_dir to > look into the way PG stores data on disk, and affords more access than > the monitoring user has any need for, none of which are good, imv. It > also discourages people from implementing proper solutions which you can > 'just use pg_ls_dir()', which I also don't agree with. In other words, you're trying to force people to do stuff your preferred way, instead of allowing them to get things done is a reasonable manner. > If you really want to do an ls, go talk to the OS. ACLs are possible to > provide that with more granularity than what would be available through > pg_ls_dir(). We aren't in the "give a user the ability to do an ls" > business, frankly. Wut. > > > I am suggesting that we shouldn't make it look like there are > > > distinctions when there is actually no difference. That is a good thing > > > for our users because it keeps them informed about what they're actually > > > doing when they grant access. > > > > This position doesn't make a lick of sense to me. There's simply no > > benefit at all in requiring to create wrapper functions, over allowing > > to grant to non-superuser. Both is possible, secdef is a lot harder to > > get right. And you already heavily started down the path of removing > > superuser() type checks - you're just arguing to make it more or less > > randomly less consistent. > > I find this bizarre considering I went through a detailed effort to go > look at every superuser check in the system and discussed, on this list, > the reasoning behind each and every one of them. I do generally > consider arbitrary access to syscalls via the database to be a privilege > which really only the superuser should have. Just because you argued doesn't mean I agree. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
On Wed, Jan 25, 2017 at 3:11 PM, Tom Lane wrote: > Please. You might want to hit the existing ones with a separate patch, > but it doesn't much matter; I'd be just as happy with a patch that did > both things. Got it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
Peter Geoghegan writes: > It means "another call to tuplesort_gettupleslot", but I believe that > it would be safer (more future-proof) to actually specify "the slot > contents may be invalidated by any subsequent manipulation of the > tuplesort's state" instead. WFM. >> There are several other uses of "call here", both in this patch and >> pre-existing in tuplesort.c, that I find equally vague and unsatisfactory. >> Let's try to improve that. > Should I write a patch along those lines? Please. You might want to hit the existing ones with a separate patch, but it doesn't much matter; I'd be just as happy with a patch that did both things. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
On Wed, Jan 25, 2017 at 2:49 PM, Tom Lane wrote: > I looked at the 0002 patch, and while the code is probably OK, I am > dissatisfied with this API spec: > > + * If copy is TRUE, the slot receives a copied tuple that will stay valid > + * regardless of future manipulations of the tuplesort's state. Memory is > + * owned by the caller. If copy is FALSE, the slot may just receive a > pointer > + * to a tuple held within the tuplesort. The latter is more efficient, but > + * the slot contents may be corrupted if there is another call here before > + * previous slot contents are used. > > What does "here" mean? If that means specifically "another call of > tuplesort_gettupleslot", say so. If "here" refers to the whole module, > it would be better to say something like "the slot contents may be > invalidated by any subsequent manipulation of the tuplesort's state". > In any case it'd be a good idea to delineate safe usage patterns, perhaps > "copy=FALSE is recommended only when the next tuplesort manipulation will > be another tuplesort_gettupleslot fetch into the same slot." I agree with your analysis. It means "another call to tuplesort_gettupleslot", but I believe that it would be safer (more future-proof) to actually specify "the slot contents may be invalidated by any subsequent manipulation of the tuplesort's state" instead. > There are several other uses of "call here", both in this patch and > pre-existing in tuplesort.c, that I find equally vague and unsatisfactory. > Let's try to improve that. Should I write a patch along those lines? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-01-25 16:52:38 -0500, Stephen Frost wrote: > > * Robert Haas (robertmh...@gmail.com) wrote: > > > Preventing people from calling functions by denying the ability to > > > meaningfully GRANT EXECUTE on those functions doesn't actually stop > > > them from delegating those functions to non-superusers. It either (a) > > > causes them to give superuser privileges to accounts that otherwise > > > would have had lesser privileges or (b) forces them to use wrapper > > > functions to grant access rather than doing it directly or (c) some > > > other dirty hack. If they pick (a), security is worse; if they pick > > > (b) or (c), you haven't prevented them from doing what they wanted to > > > do anyway. You've just made it annoying and inconvenient. > > > > The notion that security is 'worse' under (a) is flawed- it's no > > different. > > Huh? Obviously that's nonesense, given the pg_ls_dir example. Robert's made it clear that he'd like to have a blanket rule that we don't have superuser checks in these code paths if they can be GRANT'd at the database level, which goes beyond pg_ls_dir. If the question was only about pg_ls_dir, then I still wouldn't be for it, because, as the bits you didn't quote discussed, it encourages users and 3rd party tool authors to base more things off of pg_ls_dir to look into the way PG stores data on disk, and affords more access than the monitoring user has any need for, none of which are good, imv. It also discourages people from implementing proper solutions which you can 'just use pg_ls_dir()', which I also don't agree with. If you really want to do an ls, go talk to the OS. ACLs are possible to provide that with more granularity than what would be available through pg_ls_dir(). We aren't in the "give a user the ability to do an ls" business, frankly. > > With regard to 'b', if their wrapper function is > > sufficiently careful to ensure that the caller isn't able to do anything > > which would increase the caller's level to that of superuser, then > > security is improved. > > Given how complex "sufficiently careful" is for security definer UDFs, > in comparison to estimating the security of granting to a function like > pg_ls_dir (or pretty much any other that doesn't call out to SQL level > stuff like operators, output functions, etc), I don't understand this. If you're implying that security definer UDFs are hard to write and get correct, then I agree with you there. I was affording the benefit of the doubt to that proposed approach. > > If the wrapper simply turns around can calls the underlying function, > > then it's no different from '(a)'. > > Except for stuff like search path. If you consider '(a)' to be the same as superuser, which I was postulating, then this doesn't strike me as making terribly much difference. > > I am suggesting that we shouldn't make it look like there are > > distinctions when there is actually no difference. That is a good thing > > for our users because it keeps them informed about what they're actually > > doing when they grant access. > > This position doesn't make a lick of sense to me. There's simply no > benefit at all in requiring to create wrapper functions, over allowing > to grant to non-superuser. Both is possible, secdef is a lot harder to > get right. And you already heavily started down the path of removing > superuser() type checks - you're just arguing to make it more or less > randomly less consistent. I find this bizarre considering I went through a detailed effort to go look at every superuser check in the system and discussed, on this list, the reasoning behind each and every one of them. I do generally consider arbitrary access to syscalls via the database to be a privilege which really only the superuser should have. > > I've commented on here and spoken numerous times about exactly that goal > > of reducing the checks in check_postgres.pl which require superuser. > > You're not actually doing that and nothing you've outlined in here so > > far makes me believe you see how having pg_write_file() access is > > equivilant to giving someone superuser, and that concerns me. > > That's the users responsibility, and Robert didnt' really suggest > granting pg_write_file() permissions, so this seems to be a straw man. He was not arguing for only pg_ls_dir(), but for checks in all "friends", which he later clarified to include pg_write_file(). Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
Peter Geoghegan writes: > I should have already specifically pointed out that the original > discussion on what became 0002-* is here: > postgr.es/m/7256.1476711...@sss.pgh.pa.us > As I said already, the general idea seems uncontroversial. I looked at the 0002 patch, and while the code is probably OK, I am dissatisfied with this API spec: + * If copy is TRUE, the slot receives a copied tuple that will stay valid + * regardless of future manipulations of the tuplesort's state. Memory is + * owned by the caller. If copy is FALSE, the slot may just receive a pointer + * to a tuple held within the tuplesort. The latter is more efficient, but + * the slot contents may be corrupted if there is another call here before + * previous slot contents are used. What does "here" mean? If that means specifically "another call of tuplesort_gettupleslot", say so. If "here" refers to the whole module, it would be better to say something like "the slot contents may be invalidated by any subsequent manipulation of the tuplesort's state". In any case it'd be a good idea to delineate safe usage patterns, perhaps "copy=FALSE is recommended only when the next tuplesort manipulation will be another tuplesort_gettupleslot fetch into the same slot." There are several other uses of "call here", both in this patch and pre-existing in tuplesort.c, that I find equally vague and unsatisfactory. Let's try to improve that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
2017-01-25 23:33 GMT+01:00 Andres Freund : > On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote: > > 2017-01-25 22:40 GMT+01:00 Andres Freund : > > > > I afraid when I cannot to reuse a SRF infrastructure, I have to > > > reimplement > > > > it partially :( - mainly for usage in "ROWS FROM ()" > > > > > > > The TableExpr implementation is based on SRF now. You and Alvaro propose > > independent implementation like generic executor node. I am sceptic so > > FunctionScan supports reading from generic executor node. > > Why would it need to? > Simply - due consistency with any other functions that can returns rows. Maybe I don't understand to Alvaro proposal well - I have a XMLTABLE function - TableExpr that looks like SRF function, has similar behave - returns more rows, but should be significantly different implemented, and should to have different limits - should not be used there and there ... It is hard to see consistency there for me. Regards Pavel
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Hi, On 2017-01-25 16:52:38 -0500, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: > > Preventing people from calling functions by denying the ability to > > meaningfully GRANT EXECUTE on those functions doesn't actually stop > > them from delegating those functions to non-superusers. It either (a) > > causes them to give superuser privileges to accounts that otherwise > > would have had lesser privileges or (b) forces them to use wrapper > > functions to grant access rather than doing it directly or (c) some > > other dirty hack. If they pick (a), security is worse; if they pick > > (b) or (c), you haven't prevented them from doing what they wanted to > > do anyway. You've just made it annoying and inconvenient. > > The notion that security is 'worse' under (a) is flawed- it's no > different. Huh? Obviously that's nonesense, given the pg_ls_dir example. > With regard to 'b', if their wrapper function is > sufficiently careful to ensure that the caller isn't able to do anything > which would increase the caller's level to that of superuser, then > security is improved. Given how complex "sufficiently careful" is for security definer UDFs, in comparison to estimating the security of granting to a function like pg_ls_dir (or pretty much any other that doesn't call out to SQL level stuff like operators, output functions, etc), I don't understand this. > If the wrapper simply turns around can calls the underlying function, > then it's no different from '(a)'. Except for stuff like search path. > I am suggesting that we shouldn't make it look like there are > distinctions when there is actually no difference. That is a good thing > for our users because it keeps them informed about what they're actually > doing when they grant access. This position doesn't make a lick of sense to me. There's simply no benefit at all in requiring to create wrapper functions, over allowing to grant to non-superuser. Both is possible, secdef is a lot harder to get right. And you already heavily started down the path of removing superuser() type checks - you're just arguing to make it more or less randomly less consistent. > I've commented on here and spoken numerous times about exactly that goal > of reducing the checks in check_postgres.pl which require superuser. > You're not actually doing that and nothing you've outlined in here so > far makes me believe you see how having pg_write_file() access is > equivilant to giving someone superuser, and that concerns me. That's the users responsibility, and Robert didnt' really suggest granting pg_write_file() permissions, so this seems to be a straw man. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote: > 2017-01-25 22:40 GMT+01:00 Andres Freund : > > > I afraid when I cannot to reuse a SRF infrastructure, I have to > > reimplement > > > it partially :( - mainly for usage in "ROWS FROM ()" > > > > The TableExpr implementation is based on SRF now. You and Alvaro propose > independent implementation like generic executor node. I am sceptic so > FunctionScan supports reading from generic executor node. Why would it need to? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 3:17 PM, Stephen Frost wrote: > > Your email is 'pg_ls_dir & friends', which I took to imply at *least* > > pg_read_file() and pg_read_binary_file(), and it's not unreasonable to > > think you may have meant everything in adminpack, which also includes > > pg_file_write(), pg_file_rename() and pg_file_unlink(). > > Well, I was talking about genfile.c, which doesn't contain those > functions, but for the record I'm in favor of extirpating every single > hard-coded superuser check from the backend without exception. Then it was correct for me to assume that's what you meant, and means my reaction and response are on-point. > Preventing people from calling functions by denying the ability to > meaningfully GRANT EXECUTE on those functions doesn't actually stop > them from delegating those functions to non-superusers. It either (a) > causes them to give superuser privileges to accounts that otherwise > would have had lesser privileges or (b) forces them to use wrapper > functions to grant access rather than doing it directly or (c) some > other dirty hack. If they pick (a), security is worse; if they pick > (b) or (c), you haven't prevented them from doing what they wanted to > do anyway. You've just made it annoying and inconvenient. The notion that security is 'worse' under (a) is flawed- it's no different. With regard to 'b', if their wrapper function is sufficiently careful to ensure that the caller isn't able to do anything which would increase the caller's level to that of superuser, then security is improved. If the wrapper simply turns around can calls the underlying function, then it's no different from '(a)'. I am suggesting that we shouldn't make it look like there are distinctions when there is actually no difference. That is a good thing for our users because it keeps them informed about what they're actually doing when they grant access. > Both EnterpriseDB's PEM and check_postgres.pl currently have a bunch > of things that don't work unless you run as superuser. I think we > should be trying to provide ways of reducing those lists. If I can't > get agreement on method (a), I'm going to go look for ways of doing > (b) or (c), which is more work and uglier but if I can't get consensus > here then oh well. I've commented on here and spoken numerous times about exactly that goal of reducing the checks in check_postgres.pl which require superuser. You're not actually doing that and nothing you've outlined in here so far makes me believe you see how having pg_write_file() access is equivilant to giving someone superuser, and that concerns me. As someone who has contributed code and committed code back to check_postgres.pl, I would be against making changes there which install security definer functions to give the monitoring user superuser-level access, and I believe Greg S-M would feel the same way considering that he and I have discussed exactly that in the past. I don't mean to speak for him and perhaps his opinion has changed, but it seems unlikely to me. If the DBA wants to give the monitoring user superuser-level access to run the superuser-requiring checks that check_postgres.pl has, they're welcome to do so, but they'll be making an informed decision that they have weighed the risk of their monitoring user being compromised against the value of that additional monitoring, which is an entirely appropriate and reasonable decision for them to make. > I do not accept your authority to determine what monitoring tools need > to or should do. Monitoring tools that use pg_ls_dir are facts on the > ground, and your disapprobation doesn't change that at all. It just > obstructs moving to a saner permissions framework. Allowing GRANT to be used to give access to pg_write_file and friends is not a 'permissions framework'. Further, I am not claiming authority over what monitoring tools should need to do or not do, and a great many people run their monitoring tools as superuser. I am not trying to take away their ability to do so. The way to make progress here is not, however, to just decide that all those superuser() checks we put in place were silly and should be removed, it's to provide better ways to monitor PG which provide exactly the monitoring information needed in a useful and sensible way. I understand the allure of just removing a few lines of code to make things "easier" or "faster" or "better", but I don't think it's a good idea to remove these superuser checks, nor do I think it's a good idea to remove our WAL CRCs even if it'd help some of my clients, nor do I think it's a good idea to have checksums disabled by default, or to rip out all of WAL as being "unnecessary" because we have ZFS and it should handle all things data and we could just fsync all of the heap files on commit and be done with it. Admittedly, you're not argueing for half of what I just mentioned, but I'm not arguing
Re: [HACKERS] patch: function xmltable
2017-01-25 22:40 GMT+01:00 Andres Freund : > Hi, > > > > > I'll try to explain my motivation. Please, check it and correct me > if I > > > am > > > > wrong. I don't keep on my implementation - just try to implement > XMLTABLE > > > > be consistent with another behave and be used all time without any > > > > surprise. > > > > > > > > 1. Any function that produces a content can be used in target list. > We > > > > support SRF in target list and in FROM part. Why XMLTABLE should be a > > > > exception? > > > > > > targetlist SRFs were a big mistake. They cause a fair number of > problems > > > code-wise. They permeated for a long while into bits of both planner > and > > > executor, where they really shouldn't belong. Even after the recent > > > changes there's a fair amount of uglyness associated with them. We > > > can't remove tSRFs for backward compatibility reasons, but that's not > > > true for XMLTABLE > > > > > > > > > > > ok > > > > I afraid when I cannot to reuse a SRF infrastructure, I have to > reimplement > > it partially :( - mainly for usage in "ROWS FROM ()" > The TableExpr implementation is based on SRF now. You and Alvaro propose independent implementation like generic executor node. I am sceptic so FunctionScan supports reading from generic executor node. Regards Pavel > Huh? > > Greetings, > > Andres Freund >
Re: [HACKERS] PATCH: recursive json_populate_record()
Nikita Glukhov writes: > On 25.01.2017 23:58, Tom Lane wrote: >> I think you need to take a second look at the code you're producing >> and realize that it's not so clean either. This extract from >> populate_record_field, for example, is pretty horrid: > But what if we introduce some helper macros like this: > #define JsValueIsNull(jsv) \ > ((jsv)->is_json ? !(jsv)->val.json.str \ > : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull) > #define JsValueIsString(jsv) \ > ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \ > : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString) Yeah, I was wondering about that too. I'm not sure that you can make a reasonable set of helper macros that will fix this, but if you want to try, go for it. BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have to go back to the manual every darn time to convince myself whether that means (a?b:c)||d or a?b:(c||d). It's better not to rely on the reader (... or the author) having memorized C's precedence rules in quite that much detail. Extra parens help. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On Wed, Jan 25, 2017 at 9:56 PM, Alvaro Herrera wrote: > Michael Paquier wrote: >> And nothing has happened since. Are there people willing to review >> this patch and help it proceed? > > I am going to grab this patch as committer. Thanks, that's good to know. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
Hi, > > > I'll try to explain my motivation. Please, check it and correct me if I > > am > > > wrong. I don't keep on my implementation - just try to implement XMLTABLE > > > be consistent with another behave and be used all time without any > > > surprise. > > > > > > 1. Any function that produces a content can be used in target list. We > > > support SRF in target list and in FROM part. Why XMLTABLE should be a > > > exception? > > > > targetlist SRFs were a big mistake. They cause a fair number of problems > > code-wise. They permeated for a long while into bits of both planner and > > executor, where they really shouldn't belong. Even after the recent > > changes there's a fair amount of uglyness associated with them. We > > can't remove tSRFs for backward compatibility reasons, but that's not > > true for XMLTABLE > > > > > > > ok > > I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement > it partially :( - mainly for usage in "ROWS FROM ()" Huh? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: recursive json_populate_record()
On 25.01.2017 23:58, Tom Lane wrote: I think you need to take a second look at the code you're producing and realize that it's not so clean either. This extract from populate_record_field, for example, is pretty horrid: But what if we introduce some helper macros like this: #define JsValueIsNull(jsv) \ ((jsv)->is_json ? !(jsv)->val.json.str \ : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull) #define JsValueIsString(jsv) \ ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \ : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString) /* prepare column metadata cache for the given type */ if (col->typid != typid || col->typmod != typmod) prepare_column_cache(col, typid, typmod, mcxt, jsv->is_json); *isnull = JsValueIsNull(jsv); typcat = col->typcat; /* try to convert json string to a non-scalar type through input function */ if (JsValueIsString(jsv) && (typcat == TYPECAT_ARRAY || typcat == TYPECAT_COMPOSITE)) typcat = TYPECAT_SCALAR; /* we must perform domain checks for NULLs */ if (*isnull && typcat != TYPECAT_DOMAIN) return (Datum) 0; -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
> > > > > > > If you plan to hold support SRFin target list, then nothing is different. > > In last patch is executed under nodeProjectSet. > > It is, because we suddenly need to call different functions - and I'm > revamping most of execQual to have an opcode dispatch based execution > model (which then also can be JITed). > > > > > XMLTABLE is specified by the standard to return multiple rows ... but > > > > then as far as my reading goes, it is only supposed to be supported > in > > > > the range table (FROM clause) not in the target list. I wonder if > > > > this would end up better if we only tried to support it in RT. I > asked > > > > Pavel to implement it like that a few weeks ago, but ... > > > > > > Right - it makes sense in the FROM list - but then it should be an > > > executor node, instead of some expression thingy. > > > > > > > The XMLTABLE function is from user perspective, from implementation > > perspective a form of SRF function. I use own executor node, because > fcinfo > > is complex already and not too enough to hold all information about > result > > columns. > > > > The implementation as RT doesn't reduce code - it is just moving to > > different file. > > You're introducing a wholly separate callback system (TableExprRoutine) > for the new functionality. And that stuff is excruciatingly close to > stuff that the normal executor already knows how to do. > These callbacks are related to isolation TableExpr infrastructure and TableExpr implementation - This design is prepared for reusing for JSON_TABLE function. Any placing of TableExpr code should not impact this callback system (Or I am absolutely out and executor is able do some work what is hidden to me). > > > > > I'll try to explain my motivation. Please, check it and correct me if I > am > > wrong. I don't keep on my implementation - just try to implement XMLTABLE > > be consistent with another behave and be used all time without any > > surprise. > > > > 1. Any function that produces a content can be used in target list. We > > support SRF in target list and in FROM part. Why XMLTABLE should be a > > exception? > > targetlist SRFs were a big mistake. They cause a fair number of problems > code-wise. They permeated for a long while into bits of both planner and > executor, where they really shouldn't belong. Even after the recent > changes there's a fair amount of uglyness associated with them. We > can't remove tSRFs for backward compatibility reasons, but that's not > true for XMLTABLE > > > ok I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement it partially :( - mainly for usage in "ROWS FROM ()" Greetings, > > Andres Freund >
Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
Hi, Synthetic PG workload or real world production workload? Both might work, production-like has bigger pull, but I'd guess synthetic is good enough. Thanks! The box should get PostgreSQL in the not too distant future. It'll get a backup from prod, but will act as new prod, so it might take some time until a job can be run and a profile collected. So how would I do a perf profile that would be acceptable as prove? You'd have to look at cpu time, not number of syscalls. IIRC I suggested doing a cycles profile with -g and then using "perf report --children" to see how many cycles are spent somewhere below lseek. Understood. Either profile manually or expand the function. I'd also suggest sharing a profile cycles profile, it's quite likely that the overhead is completely elsewhere. Yeah, could be. It'll be interesting to see for sure. I should get a chance to collect such profile and then I'll post it back here - /Tobias -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 12:23 PM, Robert Haas wrote: > Also, I think that one of the big problems with the way checksums work > is that you don't find problems with your archived data until it's too > late. Suppose that in February bits get flipped in a block. You > don't access the data until July[1]. Well, it's nice to have the > system tell you that the data is corrupted, but what are you going to > do about it? By that point, all of your backups are probably > corrupted. So it's basically: > > ERROR: you're screwed > > It's nice to know that (maybe?) but without a recovery strategy a > whole lot of people who get that message are going to immediately > start asking "How do I ignore the fact that I'm screwed and try to > read the data anyway?". That's also how I tend to think about it. I understand that my experience with storage devices is unusually narrow compared to everyone else here. That's why I remain neutral on the high level question of whether or not we ought to enable checksums by default. I'll ask other hackers to answer what may seem like a very naive question, while bearing what I just said in mind. The question is: Have you ever actually seen a checksum failure in production? And, if so, how helpful was it? I myself have not, despite the fact that Heroku uses checksums wherever possible, and has the technical means to detect problems like this across the entire fleet of customer databases. Not even once. This is not what I would have expected myself several years ago. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : For Auto-Prewarm.
On Wed, Jan 25, 2017 at 2:46 PM, Jim Nasby wrote: > I think the two need to be integrated much better than they are right now. > They should certainly be in the same .so, and as others have mentioned the > docs need to be fixed. For consistency, I think the name should just be > pg_prewarm, as well as the prefix for the GUC. Yikes. +1, definitely. > It would also be handy of those functions > accepted a different filename. That way you could reset shared_buffers to a > known condition before running a test. That would have some pretty unpleasant security implications unless it is awfully carefully thought out. I doubt this has enough utility to make it worth thinking that hard about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
On Wed, Jan 25, 2017 at 3:17 PM, Stephen Frost wrote: > Your email is 'pg_ls_dir & friends', which I took to imply at *least* > pg_read_file() and pg_read_binary_file(), and it's not unreasonable to > think you may have meant everything in adminpack, which also includes > pg_file_write(), pg_file_rename() and pg_file_unlink(). Well, I was talking about genfile.c, which doesn't contain those functions, but for the record I'm in favor of extirpating every single hard-coded superuser check from the backend without exception. Preventing people from calling functions by denying the ability to meaningfully GRANT EXECUTE on those functions doesn't actually stop them from delegating those functions to non-superusers. It either (a) causes them to give superuser privileges to accounts that otherwise would have had lesser privileges or (b) forces them to use wrapper functions to grant access rather than doing it directly or (c) some other dirty hack. If they pick (a), security is worse; if they pick (b) or (c), you haven't prevented them from doing what they wanted to do anyway. You've just made it annoying and inconvenient. Both EnterpriseDB's PEM and check_postgres.pl currently have a bunch of things that don't work unless you run as superuser. I think we should be trying to provide ways of reducing those lists. If I can't get agreement on method (a), I'm going to go look for ways of doing (b) or (c), which is more work and uglier but if I can't get consensus here then oh well. >> I don't really think it's necessary to outline the use case more than >> I have already. It's perfectly reasonable to want a monitoring tool >> to have access to pg_ls_dir() - for example, you could use that to >> monitor for relation files orphaned by a previous crash. > > Sure. What I believe would be better would be a function in PG that > allows you to know about all of the relation files which were orphaned > from a previous crash, and perhaps even one to go clean all of those up. > I certainly would have no issue with both of those being available to a > non-superuser. > > Sure, you could do the same with pg_ls_dir(), various complex bits of > SQL to figure out what's been orphaned, and pg_file_unlink() to handle > the nasty part of unlinking the files, but you could also use > pg_ls_dir() to look at the files in PG's home directory, or > pg_file_unlink() to remove the PG user's .bashrc. > > Does the monitoring tool need to be able to see the files in PG's root > directory, or to be able to unlink the PG user's .bashrc? No. I do not accept your authority to determine what monitoring tools need to or should do. Monitoring tools that use pg_ls_dir are facts on the ground, and your disapprobation doesn't change that at all. It just obstructs moving to a saner permissions framework. I agree that pg_file_unlink() is an unlikely need for a monitoring tool, but (1) conflating pg_ls_dir with pg_file_unlink is a bit unfair and (2) there's no actual point whatsoever in trying to restrict to whom the superuser can give permissions, because the superuser has numerous ways of working around any such restriction. > I don't think it's nannyism to keep things like pg_read_file and > pg_file_write as superuser-only. I entirely disagree. It is for the DBA to decide to whom and for what purpose he wishes to give permissions. And if somebody writes a tool -- monitoring or otherwise -- that needs those permissions, the DBA should be empowered to give those permissions and no more, and should be empowered to do that in a nice, clean way without a lot of hassle. The idea that you or I would try to decide which functions a user is allowed to want to grant access to and which ones they are not allowed to want to grant access to seems laughable to me. We're not smart enough to foresee every possible use case, and we shouldn't try. Even for a function that theoretically allows a privilege escalation to superuser (like pg_write_file), it's got to be better to give a user account permission to use that one function than to just give up and give that account superuser privileges, because escalating privileges with that blunt instrument would take more work than a casual hacker would be willing to put in. For a function like the one that this thread actually started out being about, like pg_ls_dir, it's vastly better. You mention that you're not sure that pg_ls_dir() would allow a privilege escalation to superuser, but I think we both know that there is probably no such escalation. Why you'd rather have people running check_postgres.pl's wal_files check under an actual superuser account rather than an account that only has rights to do pg_ls_dir is beyond me. Your response will no doubt be that there should otherwise be some other way to write that check that doesn't use pg_ls_dir. Maybe. But the thing is that pg_ls_dir is a general tool. You can use it to do a lot of things, and you can write a monitoring probe that does those thing
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
Looking at your 0002 patch now. It no longer applies, but the conflicts are trivial to fix. Please rebase and resubmit. I think the way WARM works has been pretty well hammered by now, other than the CREATE INDEX CONCURRENTLY issues, so I'm looking at the code from a maintainability point of view only. I think we should have some test harness for WARM as part of the source repository. A test that runs for an hour hammering the machine to highest possible load cannot be run in "make check", of course, but we could have some specific Make target to run it manually. We don't have this for any other feature, but this looks like a decent place to start. Maybe we should even do it before going any further. The test code you submitted looks OK to test the feature, but I'm not in love with it enough to add it to the repo. Maybe I will spend some time trying to convert it to Perl using PostgresNode. I think having the "recheck" index methods create an ExecutorState looks out of place. How difficult is it to pass the estate from the calling code? IMO heap_get_root_tuple_one should be called just heap_get_root_tuple(). That function and its plural sibling heap_get_root_tuples() should indicate in their own comments what the expectations are regarding the root_offsets output argument, rather than deferring to the comments in the "internal" function, since they differ on that point; for the rest of the invariants I think it makes sense to say "Also see the comment for heap_get_root_tuples_internal". I wonder if heap_get_root_tuple should just return the ctid instead of assigning the value to a passed-in pointer, i.e. OffsetNumber heap_get_root_tuple(Page page, OffsetNumber target_offnum) { OffsetNumberoff; heap_get_root_tuples_internal(page, target_offnum, &off); return off; } The simple_heap_update + CatalogUpdateIndexes pattern is getting obnoxious. How about creating something like catalog_heap_update which does both things at once, and stop bothering each callsite with the WARM stuff? In fact, given that CatalogUpdateIndexes is used in other places, maybe we should leave its API alone and create another function, so that we don't have to change the many places that only do simple_heap_insert. (Places like OperatorCreate which do either insert or update could just move the index update call into each branch.) I'm not real sure about the interface between index AM and executor, namely IndexScanDesc->xs_tuple_recheck. For example, this pattern: if (!scan->xs_recheck) scan->xs_tuple_recheck = false; else scan->xs_tuple_recheck = true; can become simply scan->xs_tuple_recheck = scan->xs_recheck; which looks odd. I can't pinpoint exactly what's the problem, though. I'll continue looking at this one. I wonder if heap_hot_search_buffer() and heap_hot_search() should return a tri-valued enum instead of boolean; that idea looks reasonable in theory but callers have to do more work afterwards, so maybe not. I think heap_hot_search() sometimes leaving the buffer pinned is confusing. Really, the whole idea of having heap_hot_search have a buffer output argument is an important API change that should be better thought. Maybe it'd be better to return the buffer pinned always, and the caller is always in charge of unpinning if not InvalidBuffer. Or perhaps we need a completely new function, given how different it is to the original? If you tried to document in the comment above heap_hot_search how it works, you'd find that it's difficult to describe, which'd be an indicator that it's not well considered. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
2017-01-25 21:06 GMT+01:00 Jim Nasby : > On 1/23/17 11:38 PM, Pavel Stehule wrote: > >> >> Instead of paralleling all the existing namespace stuff, I wonder if >> it'd be better to create explicit block infrastructure. AFAIK >> PRAGMAs are going to have a lot of the same requirements (certainly >> the nesting is the same), and we might want more of this king of >> stuff in the future. (I've certainly wished I could set a GUC in a >> plpgsql block and have it's settings revert when exiting the block...) >> >> >> I am not sure if I understand. ?? Setting GUC by PRAGMA can work - the >> syntax supports it and GUC API supports nesting. Not sure about >> exception handling - but it should not be problem probably. >> >> Please, can you show some examples. >> > > From a code standpoint, there's already some ugliness around blocks: > there's the code that handles blocks themselves (which IIRC is responsible > for subtransactions), then there's the namespace code, which is very > separate even though namespaces are very much tied to blocks. Your patch is > adding another layer into the mix, separate from both blocks and > namespaces. I think it would be better to combine all 3 together, or at > least not make matters worse. So IMHO the pragma stuff should be part of > handling blocks, and not something that's stand alone. IE: make the pragma > info live in PLpgSQL_stmt_block. > I don't think it is fully correct - the pragma can be related to function too - and namespaces can be related to some other statements - cycles. Any PLpgSQL_stmt_block does some overhead and probably we want to build a fake statements to ensure 1:1 relations between namespaces and blocks. I didn't implement and proposed third level of pragma - statement. For example the assertions in Ada language are implemented with pragma. Currently I am not thinking about this form for Postgres. The cursor options is better stored in expression - the block related GUC probably should be stored in stmt_block. The pragma is additional information, and how this information will be used and what impact will be on generated code depends on pragma - can be different. > GUCs support SET LOCAL, but that's not the same as local scoping because > the setting stays in effect unless the substrans aborts. What I'd like is > the ability to set a GUC in a plpgsql block *and have the setting revert on > block exit*. I am think so it is solvable. Regards Pavel > > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532) >
Re: [HACKERS] PATCH: recursive json_populate_record()
Nikita Glukhov writes: > On 22.01.2017 21:58, Tom Lane wrote: >> 5. The business with having some arguments that are only for json and >> others that are only for jsonb, eg in populate_scalar(), also makes me >> itch. > I've refactored json/jsonb passing using new struct JsValue which contains > union for json/jsonb values. I'm not too enamored of that fix. It doesn't do much for readability, and at least with my compiler (gcc 4.4.7), I sometimes get "variable might be used uninitialized" warnings, probably due to not all fields of the union getting set in every code path. >> I wonder whether this wouldn't all get a lot simpler and more >> readable if we gave up on trying to share code between the two cases. > Maybe two separate families of functions like this > ... > could slightly improve readability, but such code duplication (I believe it is > a duplicate code) would never be acceptable to me. I think you need to take a second look at the code you're producing and realize that it's not so clean either. This extract from populate_record_field, for example, is pretty horrid: /* prepare column metadata cache for the given type */ if (col->typid != typid || col->typmod != typmod) prepare_column_cache(col, typid, typmod, mcxt, jsv->is_json); *isnull = jsv->is_json ? jsv->val.json.str == NULL : jsv->val.jsonb == NULL || jsv->val.jsonb->type == jbvNull; typcat = col->typcat; /* try to convert json string to a non-scalar type through input function */ if ((jsv->is_json ? jsv->val.json.type == JSON_TOKEN_STRING : jsv->val.jsonb && jsv->val.jsonb->type == jbvString) && (typcat == TYPECAT_ARRAY || typcat == TYPECAT_COMPOSITE)) typcat = TYPECAT_SCALAR; /* we must perform domain checks for NULLs */ if (*isnull && typcat != TYPECAT_DOMAIN) return (Datum) 0; When every other line contains an is_json conditional, you are not making good readable code. It's also worth noting that this is going to become even less readable after pgindent gets done with it: /* prepare column metadata cache for the given type */ if (col->typid != typid || col->typmod != typmod) prepare_column_cache(col, typid, typmod, mcxt, jsv->is_json); *isnull = jsv->is_json ? jsv->val.json.str == NULL : jsv->val.jsonb == NULL || jsv->val.jsonb->type == jbvNull; typcat = col->typcat; /* try to convert json string to a non-scalar type through input function */ if ((jsv->is_json ? jsv->val.json.type == JSON_TOKEN_STRING : jsv->val.jsonb && jsv->val.jsonb->type == jbvString) && (typcat == TYPECAT_ARRAY || typcat == TYPECAT_COMPOSITE)) typcat = TYPECAT_SCALAR; /* we must perform domain checks for NULLs */ if (*isnull && typcat != TYPECAT_DOMAIN) return (Datum) 0; You could maybe improve that result with some different formatting choices, but I think it's basically a readability fail to be relying heavily on multiline x?y:z conditionals in the first place. And I still maintain that it's entirely silly to be structuring populate_scalar() this way. So I really think it'd be a good idea to explore separating the json and jsonb code paths. This direction isn't looking like a win. >> 7. More generally, the patch is hard to review because it whacks the >> existing code around so thoroughly that it's tough even to match up >> old and new code to get a handle on what you changed functionally. >> Maybe it would be good if you could separate it into a refactoring >> patch that just moves existing code around without functional changes, >> and then a patch on top of that that adds code and makes only localized >> changes in what's already there. > I've split this patch into two patches as you asked. That didn't really help :-( ... the 0002 patch is still nigh unreadable. Maybe it's trying to do too much in one step. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: function xmltable
Hi, On 2017-01-25 05:45:24 +0100, Pavel Stehule wrote: > 2017-01-25 1:35 GMT+01:00 Andres Freund : > > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote: > > > Andres Freund wrote: > > > > Hi, > > > > > > > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote: > > > > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext > > *econtext, > > > > > + bool *isnull); > > > > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, > > ExprContext *econtext, > > > > > + bool *isNull); > > > > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext > > *econtext, > > > > > + bool *isNull); > > > > > +static void tabexprInitialize(TableExprState *tstate, ExprContext > > *econtext, > > > > > + Datum doc); > > > > > +static void ShutdownTableExpr(Datum arg); > > > > > > > > To me this (and a lot of the other code) hints quite strongly that > > > > expression evalution is the wrong approach to implementing this. What > > > > you're essentially doing is building a vulcano style scan node. Even > > if > > > > we can this, we shouldn't double down on the bad decision to have these > > > > magic expressions that return multiple rows. There's historical reason > > > > for tSRFs, but we shouldn't add more weirdness like this. > > > > > > Thanks for giving it a look. I have long thought that this patch would > > > be at odds with your overall executor work. > > > > Not fundamentally, but it makes it harder. > > > > If you plan to hold support SRFin target list, then nothing is different. > In last patch is executed under nodeProjectSet. It is, because we suddenly need to call different functions - and I'm revamping most of execQual to have an opcode dispatch based execution model (which then also can be JITed). > > > XMLTABLE is specified by the standard to return multiple rows ... but > > > then as far as my reading goes, it is only supposed to be supported in > > > the range table (FROM clause) not in the target list. I wonder if > > > this would end up better if we only tried to support it in RT. I asked > > > Pavel to implement it like that a few weeks ago, but ... > > > > Right - it makes sense in the FROM list - but then it should be an > > executor node, instead of some expression thingy. > > > > The XMLTABLE function is from user perspective, from implementation > perspective a form of SRF function. I use own executor node, because fcinfo > is complex already and not too enough to hold all information about result > columns. > The implementation as RT doesn't reduce code - it is just moving to > different file. You're introducing a wholly separate callback system (TableExprRoutine) for the new functionality. And that stuff is excruciatingly close to stuff that the normal executor already knows how to do. > I'll try to explain my motivation. Please, check it and correct me if I am > wrong. I don't keep on my implementation - just try to implement XMLTABLE > be consistent with another behave and be used all time without any > surprise. > > 1. Any function that produces a content can be used in target list. We > support SRF in target list and in FROM part. Why XMLTABLE should be a > exception? targetlist SRFs were a big mistake. They cause a fair number of problems code-wise. They permeated for a long while into bits of both planner and executor, where they really shouldn't belong. Even after the recent changes there's a fair amount of uglyness associated with them. We can't remove tSRFs for backward compatibility reasons, but that's not true for XMLTABLE Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frost wrote: >> Sure. If the database runs fast enough with checksums enabled, >> there's basically no reason to have them turned off. The issue is >> when it doesn't. > > I don't believe we're talking about forcing every user to have checksums > enabled. We are discussing the default. I never said otherwise. > Would you say that most user's databases run fast enough with checksums > enabled? Or more than most, maybe 70%? 80%? In today's environment, > I'd probably say that it's more like 90+%. I don't have statistics on that, but I'd certainly agree that it's over 90%. However, I estimate that the number of percentage of people who wouldn't be helped by checksums is also over 90%. I don't think it's easy to say whether there are more people who would benefit from checksums than would be hurt by the performance penalty or visca versa. My own feeling is the second, but I understand that yours is the first. > Yet, our default is to have them disabled and *really* hard to enable. First of all, that could be fixed by further development. Second, really hard to enable is a relative term. I accept that enabling checksums is not a pleasant process. Right now, you'd have to do a dump/restore, or use logical replication to replicate the data to a new cluster and then switch over. On the other hand, if checksums are really a critical feature, how are people getting to the point where they've got a mission-critical production system and only then discovering that they want to enable checksums? If you tell somebody "we have an optional feature called checksums and you should really use it" and they respond "well, I'd like to, but I already put my system into critical production use and it's not worth it to me to take downtime to get them enabled", that sounds to me like the feature is nice-to-have, not absolutely essential. When something is essential, you find a way to get it done, whether it's painful or not, because that's what essential means. And if checksums are not essential, then they shouldn't be enabled by default unless they're very cheap -- and I think we already know that's not true in all workloads. > I agree that it's unfortunate that we haven't put more effort into > fixing that- I'm all for it, but it's disappointing to see that people > are not in favor of changing the default as I believe it would both help > our users and encourage more development of the feature. I think it would help some users and hurt others. I do agree that it would encourage more development of the feature -- almost of necessity. In particular, I bet it would spur development of an efficient way of turning checksums off -- but I'd rather see us approach it from the other direction: let's develop an efficient way of turning the feature on and off FIRST. Deciding that the feature has to be on for everyone because turning it on later is too hard for the people who later decide they want it is letting the tail wag the dog. Also, I think that one of the big problems with the way checksums work is that you don't find problems with your archived data until it's too late. Suppose that in February bits get flipped in a block. You don't access the data until July[1]. Well, it's nice to have the system tell you that the data is corrupted, but what are you going to do about it? By that point, all of your backups are probably corrupted. So it's basically: ERROR: you're screwed It's nice to know that (maybe?) but without a recovery strategy a whole lot of people who get that message are going to immediately start asking "How do I ignore the fact that I'm screwed and try to read the data anyway?". And then you wonder what the point of having the feature turned on is, especially if it's costly. It's almost an attractive nuisance at that point - nobody wants to be the user that turns off checksums because they sound good on paper, but when you actually have a problem an awful lot of people are NOT going to want to try to restore from backup and maybe lose recent transactions. They're going to want to ignore the checksum failures. That's kind of awful. Peter's comments upthread get at this: "We need to invest in corruption detection/verification tools that are run on an as-needed basis." Exactly. If we could verify that our data is good before throwing away our old backups, that'd be good. If we could verify that our indexes were structurally sane, that would be superior to anything checksums can ever give us because it catches not only storage failures but also software failures within PostgreSQL itself and user malfeasance above the PostgreSQL layer (e.g. redefining the supposedly-immutable function to give different answers) and damage inflicted inadvertently by environmental changes (e.g. upgrading glibc and having strcoll() change its mind). If we could verify that every XID and MXID in the heap points to a clog or multixact record that still exists, that'
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 8:18 PM, Robert Haas wrote: > Also, it's not as if there are no other ways of checking whether your > disks are failing. SMART, for example, is supposed to tell you about > incipient hardware failures before PostgreSQL ever sees a bit flip. > Surely an average user would love to get a heads-up that their > hardware is failing even when that hardware is not being used to power > PostgreSQL, yet many people don't bother to configure SMART (or > similar proprietary systems provided by individual vendors). You really can't rely on SMART to tell you about hardware failures. 1 in 4 drives fail completely with 0 SMART indication [1]. And for the 1 in 1000 annual checksum failure rate other indicators except system restarts only had a weak correlation[2]. And this is without filesystem and other OS bugs that SMART knows nothing about. My view may be biased by mostly seeing the cases where things have already gone wrong, but I recommend support clients to turn checksums on unless it's known that write IO is going to be an issue. Especially because I know that if it turns out to be a problem I can go in and quickly hack together a tool to help them turn it off. I do agree that to change the PostgreSQL default at least some tool turn it off online should be included. [1] https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/ [2] https://www.usenix.org/legacy/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag
An unlogged table has an initialization fork. The initialization fork does not have an BM_PERMANENT flag when get a buffer. In checkpoint (not shutdown or end of recovery), it will not write to disk. after a crash recovery, the page of initialization fork will not correctly, then make the main fork not correctly too. Here is an example for GIN index. create unlogged table gin_test_tbl(i int4[]); create index gin_test_idx on gin_test_tbl using gin (i); checkpoint; kill all the postgres process, and restart again. vacuum gin_test_tbl; -- crash. It seems have same problem in BRIN, GIN, GiST and HASH index which using buffer for meta page initialize in ambuildempty function.
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 2:08 PM, Stephen Frost wrote: > > That isn't what you're doing with those functions though, you're giving > > the monitoring tool superuser-level rights but trying to pretend like > > you're not. > > Uh, how so? None of those functions can be used to escalate to > superuser privileges. I am trying to give SOME superuser privileges > and not others. That IS how good security works. Your email is 'pg_ls_dir & friends', which I took to imply at *least* pg_read_file() and pg_read_binary_file(), and it's not unreasonable to think you may have meant everything in adminpack, which also includes pg_file_write(), pg_file_rename() and pg_file_unlink(). > I don't really think it's necessary to outline the use case more than > I have already. It's perfectly reasonable to want a monitoring tool > to have access to pg_ls_dir() - for example, you could use that to > monitor for relation files orphaned by a previous crash. Sure. What I believe would be better would be a function in PG that allows you to know about all of the relation files which were orphaned from a previous crash, and perhaps even one to go clean all of those up. I certainly would have no issue with both of those being available to a non-superuser. Sure, you could do the same with pg_ls_dir(), various complex bits of SQL to figure out what's been orphaned, and pg_file_unlink() to handle the nasty part of unlinking the files, but you could also use pg_ls_dir() to look at the files in PG's home directory, or pg_file_unlink() to remove the PG user's .bashrc. Does the monitoring tool need to be able to see the files in PG's root directory, or to be able to unlink the PG user's .bashrc? No. > Also, as > mentioned above, I don't think this should have to be litigated for > every single function individually. If it's a good idea for a > non-superuser to be able to pg_start_backup() and pg_stop_backup(), > the person doing that has access to read every byte of data in the > filesystem; if they don't, there's no point in giving them access to > run those functions. Access to just pg_ls_dir(), for example, can't > be any more dangerous than that. Indeed, I'd argue that it's a heck > of a lot LESS dangerous. It wasn't litigated for every single function. A reivew of all cases was performed and a very lengthy discussion held about how to give non-superusers access to the functions which made sense was done, with the resulting work finally being accepted and included into 9.6. Further, as discussed and explained on the thread I just linked you to, the assumption that a user who can call pg_start/stop_backup() has access to read every byte on the filesystem isn't correct. Am I sure that someone who has pg_ls_dir() could get superuser access on the box? No, but I am sure that, in my opinion, it's frankly a pretty dirty hack to use pg_ls_dir() in a monitoring tool and we should provide better ways to monitor PG to our users. Also, am I sure that we shouldn't ever give a user the ability to arbitrairly list directories on the filesystem? No, but this isn't a good justification for that change. If we *are* going to go down the road of giving users filesystem-like access of any kind then I would *much* rather look at the approach that I outlined before and that other database systems have, which is to have a way to say "user X can read, write, list, do whatever, with files in directory /blah/blah". Perhaps with sub-directory abilities too. Nannyism, if we had that ability, would be to say "you can set DIRECTORY to anything but PGDATA." or "you can't set DIRECTORY to be /". I don't think it's nannyism to keep things like pg_read_file and pg_file_write as superuser-only. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On 01/25/2017 11:41 AM, Tom Lane wrote: Stephen Frost writes: Would you say that most user's databases run fast enough with checksums enabled? Or more than most, maybe 70%? 80%? In today's environment, I'd probably say that it's more like 90+%. It would be nice if there were some actual evidence about this, rather than numbers picked out of the air. I agree that it's unfortunate that we haven't put more effort into fixing that- I'm all for it, but it's disappointing to see that people are not in favor of changing the default as I believe it would both help our users and encourage more development of the feature. I think the really key point is that a whole lot of infrastructure work needs to be done still, and changing the default before that work has been done is not going to be user-friendly. The most pressing issue being the difficulty of changing the setting after the fact. It would be a *whole* lot easier to sell default-on if there were a way to turn it off, and yet you want us to buy into default-on before that way exists. Come back after that feature is in, and we can talk. +1 Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Wed, Jan 25, 2017 at 1:54 PM, Masahiko Sawada wrote: > Thank you for updating the patch! > > + /* > +* Quickly rule out by lower bound (should happen a lot) Upper bound > was > +* already checked by segment search > +*/ > + if (vac_cmp_itemptr((void *) itemptr, (void *) rseg->dead_tuples) < 0) > + return false; > > I think that if the above result is 0, we can return true as itemptr > matched lower bound item pointer in rseg->dead_tuples. That's right. Possibly not a great speedup but... why not? > > +typedef struct DeadTuplesSegment > +{ > + int num_dead_tuples;/* # of > entries in the segment */ > + int max_dead_tuples;/* # of > entries allocated in the segment */ > + ItemPointerData last_dead_tuple;/* Copy of the last > dead tuple (unset > + > * until the segment is fully > + > * populated) */ > + unsigned short padding; > + ItemPointer dead_tuples;/* Array of dead tuples */ > +} DeadTuplesSegment; > + > +typedef struct DeadTuplesMultiArray > +{ > + int num_entries;/* current # of entries */ > + int max_entries;/* total # of slots > that can be allocated in > +* array */ > + int num_segs; /* number of > dead tuple segments allocated */ > + int last_seg; /* last dead > tuple segment with data (or 0) */ > + DeadTuplesSegment *dead_tuples; /* array of num_segs segments > */ > +} DeadTuplesMultiArray; > > It's a matter of personal preference but some same dead_tuples > variables having different meaning confused me. > If we want to access first dead tuple location of first segment, we > need to do 'vacrelstats->dead_tuples.dead_tuples.dead_tuples'. For > example, 'vacrelstats->dead_tuples.dt_segment.dt_array' is better to > me. Yes, I can see how that could be confusing. I went for vacrelstats->dead_tuples.dt_segments[i].dt_tids[j] > + nseg->num_dead_tuples = 0; > + nseg->max_dead_tuples = 0; > + nseg->dead_tuples = NULL; > + vacrelstats->dead_tuples.num_segs++; > + } > + seg = DeadTuplesCurrentSegment(vacrelstats); > + } > + vacrelstats->dead_tuples.last_seg++; > + seg = DeadTuplesCurrentSegment(vacrelstats); > > Because seg is always set later I think the first line starting with > "seg = ..." is not necessary. Thought? That's correct. Attached a v6 with those changes (and rebased). Make check still passes. From c89019089a71517befac0920f22ed75577dda6c6 Mon Sep 17 00:00:00 2001 From: Claudio Freire Date: Mon, 12 Sep 2016 23:36:42 -0300 Subject: [PATCH] Vacuum: allow using more than 1GB work mem Turn the dead_tuples array into a structure composed of several exponentially bigger arrays, to enable usage of more than 1GB of work mem during vacuum and thus reduce the number of full index scans necessary to remove all dead tids when the memory is available. --- src/backend/commands/vacuumlazy.c| 291 --- src/test/regress/expected/vacuum.out | 8 + src/test/regress/sql/vacuum.sql | 10 ++ 3 files changed, 250 insertions(+), 59 deletions(-) diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 005440e..1d2441f 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -93,6 +93,14 @@ #define LAZY_ALLOC_TUPLES MaxHeapTuplesPerPage /* + * Minimum (starting) size of the dead_tuples array segments. Will allocate + * space for 128MB worth of tid pointers in the first segment, further segments + * will grow in size exponentially. Don't make it too small or the segment list + * will grow bigger than the sweetspot for search efficiency on big vacuums. + */ +#define LAZY_MIN_TUPLES Max(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData)) + +/* * Before we consider skipping a page that's marked as clean in * visibility map, we must've seen at least this many clean pages. */ @@ -104,6 +112,27 @@ */ #define PREFETCH_SIZE ((BlockNumber) 32) +typedef struct DeadTuplesSegment +{ + int num_dead_tuples; /* # of entries in the segment */ + int max_dead_tuples; /* # of entries allocated in the segment */ + ItemPointerData last_dead_tuple; /* Copy of the last dead tuple (unset + * until the segment is fully + * populated) */ + unsigned short padding; + ItemPointer dt_tids; /* Array of dead tuples */ +} DeadTuplesSegment; + +typedef struct DeadTuplesMultiArray +{ + int num_entries; /* current
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
On 1/23/17 11:38 PM, Pavel Stehule wrote: Instead of paralleling all the existing namespace stuff, I wonder if it'd be better to create explicit block infrastructure. AFAIK PRAGMAs are going to have a lot of the same requirements (certainly the nesting is the same), and we might want more of this king of stuff in the future. (I've certainly wished I could set a GUC in a plpgsql block and have it's settings revert when exiting the block...) I am not sure if I understand. ?? Setting GUC by PRAGMA can work - the syntax supports it and GUC API supports nesting. Not sure about exception handling - but it should not be problem probably. Please, can you show some examples. From a code standpoint, there's already some ugliness around blocks: there's the code that handles blocks themselves (which IIRC is responsible for subtransactions), then there's the namespace code, which is very separate even though namespaces are very much tied to blocks. Your patch is adding another layer into the mix, separate from both blocks and namespaces. I think it would be better to combine all 3 together, or at least not make matters worse. So IMHO the pragma stuff should be part of handling blocks, and not something that's stand alone. IE: make the pragma info live in PLpgSQL_stmt_block. GUCs support SET LOCAL, but that's not the same as local scoping because the setting stays in effect unless the substrans aborts. What I'd like is the ability to set a GUC in a plpgsql block *and have the setting revert on block exit*. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 2:13 PM, Stephen Frost wrote: > > I went over *every* superuser check in the system when I did that work, > > wrote up a long email about why I made the decisions that I did, posted > > it here, had follow-on discussions, all of which lead to the patch which > > ended up going in. > > Link to that email? I went back and looked at that thread and didn't > see anything that looked like a general policy statement to me. But I > may have missed it. Not sure which thread you were looking at, but this one: https://www.postgresql.org/message-id/20141015052259.GG28859%40tamriel.snowman.net Has a review of all superuser checks in the backend, as noted in the first paragraph ("shown below in a review of the existing superuser checks in the backend"). Later on in that thread, at least in: https://www.postgresql.org/message-id/20160106161302.GP3685%40tamriel.snowman.net In an email to you and Noah: The general approach which I've been using for the default roles is that they should grant rights which aren't able to be used to trivially make oneself a superuser. My recollection is saying that about 10 times during that period of time, though perhaps I am exaggurating due to it being a rather painful process to get through. > I assume we're > both coming at these issues with the intention of making PostgreSQL > better; Always. > the fact that we don't always agree on everything is probably > inevitable. Agreed. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.
Repost from bugs. -- Fabien. -- Forwarded message -- Date: Wed, 25 Jan 2017 18:59:45 +0100 (CET) From: Fabien COELHO To: nuko yokohama Cc: PostgreSQL Bugs List Subject: Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together. It operates normally when only the -C option or only the -R option is specified. In the PostgreSQL document, It is not described that "these two options can not be specified at the same time ". Is this a problem of pgbench? Yes, indeed there is. Thanks for the report. Option -C is seldom used and tested. The problem is already fixed in head. Looking at git log, it was unclear to guess which change fixed that... After another reading, I got it in one, it has been fixed by Heikki restructuring patch 12788ae49e1933f463bc59a6efe46c4a01701b76 which has no vocation to be backpatched to prior versions... The bug is that prior to --rate doCustom was always disconnect/reconnect without exiting, but with rate it returns if it has to wait. However threadRun test whether there is a connection before recalling doCustom, so it was never called. This is exactly the kind of unmanageable state combination that refactoring has cleaned up. Attached a small patch which fixes the issue, I think, in 9.6. Fixing it raised another issue wrt to some stats under -C, that I fixed as well. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 531671a..1f1b7bf 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -1967,7 +1967,6 @@ top: st->listen = false; st->sleeping = false; st->throttling = false; - st->is_throttled = false; memset(st->prepared, 0, sizeof(st->prepared)); } @@ -4345,6 +4344,12 @@ threadRun(void *arg) remains--; /* I've aborted */ } } + else if (is_connect && st->sleeping) + { +/* it is sleeping for throttling, maybe it is done, let us try */ +if (!doCustom(thread, st, &aggs)) + remains--; + } if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..
Hi, On 2017-01-25 10:16:32 +0100, Tobias Oberstein wrote: > > > Using pread instead of lseek+read halfes the syscalls. > > > > > > I really don't understand what you are fighting here .. > > > > Sure, there's some overhead. And as I said upthread, I'm much less > > against this change than Tom. What I'm saying is that your benchmarks > > haven't shown a benefit in a meaningful way, so I don't think I can > > agree with > > > > > "Well, my point remains that I see little value in messing with > > > long-established code if you can't demonstrate a benefit that's clearly > > > above the noise level." > > > > > > I have done lots of benchmarking over the last days on a massive box, and > > > I > > > can provide numbers that I think show that the impact can be significant. > > > > since you've not actually shown that the impact is above the noise level > > when measured with an actual postgres workload. > > I can follow that. > > So real prove cannot be done with FIO, but "actual PG workload". Right. > Synthetic PG workload or real world production workload? Both might work, production-like has bigger pull, but I'd guess synthetic is good enough. > Also: rgd the perf profiles from production that show lseek as #1 syscall. You'll, depending on your workload, still have a lot of lseeks even if we were to use pread/pwrite because we do lseek(SEEK_END) to get file sizes. > You said it wouldn't be prove either, because it only shows number of > syscalls, and though it is clear that millions of syscalls/sec do come with > overhead, it is still not showing "above noise" level relevance (because PG > is such a CPU hog in itself anyways;) Yep. > So how would I do a perf profile that would be acceptable as prove? You'd have to look at cpu time, not number of syscalls. IIRC I suggested doing a cycles profile with -g and then using "perf report --children" to see how many cycles are spent somewhere below lseek. I'd also suggest sharing a profile cycles profile, it's quite likely that the overhead is completely elsewhere. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : For Auto-Prewarm.
On 1/25/17 1:46 PM, Jim Nasby wrote: Based on that and other feedback I'm going to mark this as returned with feedback, though if you're able to get a revised patch in the next few days please do. Actually, based on the message that popped up when I went to do that I guess it's better not to do that, so I marked it as Waiting on Author. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : For Auto-Prewarm.
On 1/24/17 11:13 PM, Beena Emerson wrote: On Wed, Jan 25, 2017 at 10:36 AM, Jim Nasby mailto:jim.na...@bluetreble.com>> wrote: On 1/24/17 2:26 AM, Mithun Cy wrote: Thanks for looking into this patch, I just downloaded the patch and applied same to the latest code, I can see file " autoprewarm.save" in $PGDATA which is created and dumped at shutdown time and an activity is logged as below 2017-01-24 13:22:25.012 IST [91755] LOG: Buffer Dump: saved metadata of 59 blocks. Yeah, I wasn't getting that at all, though I did see the shared library being loaded. If I get a chance I'll try it again. Hope u added the following to the conf file: shared_preload_libraries = 'pg_autoprewarm' # (change requires restart) Therein lied my problem: I was preloading pg_prewarm, not pg_autoprewarm. I think the two need to be integrated much better than they are right now. They should certainly be in the same .so, and as others have mentioned the docs need to be fixed. For consistency, I think the name should just be pg_prewarm, as well as the prefix for the GUC. Based on that and other feedback I'm going to mark this as returned with feedback, though if you're able to get a revised patch in the next few days please do. FYI (and this is just a suggestion), for testing purposes, it might also be handy to allow manual dump and load via functions, with the load function giving you the option to forcibly load (instead of doing nothing if there are no buffers on the free list). It would also be handy of those functions accepted a different filename. That way you could reset shared_buffers to a known condition before running a test. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
Stephen Frost writes: > Would you say that most user's databases run fast enough with checksums > enabled? Or more than most, maybe 70%? 80%? In today's environment, > I'd probably say that it's more like 90+%. It would be nice if there were some actual evidence about this, rather than numbers picked out of the air. > I agree that it's unfortunate that we haven't put more effort into > fixing that- I'm all for it, but it's disappointing to see that people > are not in favor of changing the default as I believe it would both help > our users and encourage more development of the feature. I think the really key point is that a whole lot of infrastructure work needs to be done still, and changing the default before that work has been done is not going to be user-friendly. The most pressing issue being the difficulty of changing the setting after the fact. It would be a *whole* lot easier to sell default-on if there were a way to turn it off, and yet you want us to buy into default-on before that way exists. Come back after that feature is in, and we can talk. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
* Peter Geoghegan (p...@heroku.com) wrote: > On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas wrote: > > Trying to force those people to use checksums is just masterminding; > > they've made their own decision that it's not worth bothering with. > > When something goes wrong, WE still care about distinguishing hardware > > failure from PostgreSQL failure. Our pride is on the line. But the > > customer often doesn't. The DBA isn't the same person as the > > operating system guy, and the operating system guy isn't going to > > listen to the DBA even if the DBA complains of checksum failures. > > We need to invest in corruption detection/verification tools that are > run on an as-needed basis. They are available to users of every other > major database system. Agreed. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasby wrote: > > I'm not completely grokking your second paragraph, but I would think that an > > average user would love got get a heads-up that their hardware is failing. > > Sure. If the database runs fast enough with checksums enabled, > there's basically no reason to have them turned off. The issue is > when it doesn't. I don't believe we're talking about forcing every user to have checksums enabled. We are discussing the default. Would you say that most user's databases run fast enough with checksums enabled? Or more than most, maybe 70%? 80%? In today's environment, I'd probably say that it's more like 90+%. Yet, our default is to have them disabled and *really* hard to enable. I agree that it's unfortunate that we haven't put more effort into fixing that- I'm all for it, but it's disappointing to see that people are not in favor of changing the default as I believe it would both help our users and encourage more development of the feature. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
On Wed, Jan 25, 2017 at 2:13 PM, Stephen Frost wrote: > I went over *every* superuser check in the system when I did that work, > wrote up a long email about why I made the decisions that I did, posted > it here, had follow-on discussions, all of which lead to the patch which > ended up going in. Link to that email? I went back and looked at that thread and didn't see anything that looked like a general policy statement to me. But I may have missed it. > I am not anxious to revisit that decision and certainly not based on > an argument that, so far, boils down to "I think a monitoring system > might be able to use this function that allows it to read pg_authid > directly, so we should drop the superuser() check in it." Well, I'm not eager to revisit all the decisions you'd like to overturn either, but we'll just both have to cope. I assume we're both coming at these issues with the intention of making PostgreSQL better; the fact that we don't always agree on everything is probably inevitable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
On Wed, Jan 25, 2017 at 2:08 PM, Stephen Frost wrote: > That isn't what you're doing with those functions though, you're giving > the monitoring tool superuser-level rights but trying to pretend like > you're not. Uh, how so? None of those functions can be used to escalate to superuser privileges. I am trying to give SOME superuser privileges and not others. That IS how good security works. I don't really think it's necessary to outline the use case more than I have already. It's perfectly reasonable to want a monitoring tool to have access to pg_ls_dir() - for example, you could use that to monitor for relation files orphaned by a previous crash. Also, as mentioned above, I don't think this should have to be litigated for every single function individually. If it's a good idea for a non-superuser to be able to pg_start_backup() and pg_stop_backup(), the person doing that has access to read every byte of data in the filesystem; if they don't, there's no point in giving them access to run those functions. Access to just pg_ls_dir(), for example, can't be any more dangerous than that. Indeed, I'd argue that it's a heck of a lot LESS dangerous. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > Also, the same argument could be made about removing the built-in > superuser check from ANY function, and we've already rejected that > argument for a bunch of other functions. If we say that argument is > valid for some functions but not others, then we've got to decide for > which ones it's valid and for which ones it isn't, and consensus will > not be forthcoming. I take the position that hard-coded superuser > checks stink in general, and I'm grateful to Stephen for his work > making dump/restore work properly on system catalog permissions so > that we can support better alternatives. I'm not asking for anything > more than that we apply that same policy here as we have in other > cases. I went over *every* superuser check in the system when I did that work, wrote up a long email about why I made the decisions that I did, posted it here, had follow-on discussions, all of which lead to the patch which ended up going in. I am not anxious to revisit that decision and certainly not based on an argument that, so far, boils down to "I think a monitoring system might be able to use this function that allows it to read pg_authid directly, so we should drop the superuser() check in it." Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Greg, * Greg Stark (st...@mit.edu) wrote: > I tend to agree. But in the past when this came up people pointed out > you could equally do things this way and still grant all the access > you wanted using SECURITY DEFINER. Arguably that's a better approach > because then instead of auditing the entire monitor script you only > need to audit this one wrapper function, pg_ls_monitor_dir() which > just calls pg_ls_dir() on this one directory. I'm not a fan of SECURITY DEFINER functions for this sort of thing and don't like the suggestion of simply wrapping functions that provide superuser-level access in a security definer function and then saying that giving someone access to that function isn't giving them superuser, because that's just false. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > The use case I have in mind is > a monitoring tool that needs access to one more of those functions -- > in keeping with the principle of least privilege, it's much better to > give the monitoring user only the privileges which it actually needs > than to make it a superuser. That isn't what you're doing with those functions though, you're giving the monitoring tool superuser-level rights but trying to pretend like you're not. That's not really how good security works. I am entirely in agreement with providing a way to give monitoring tools more information, but that should be done through proper design and consideration of just what info they actually need (as well as what a useful format for it is). On my plate for a long time has been to add a function to return how much WAL is remaining in pg_wal for a monitoring system to be able to report on. That could be done with something like pg_ls_dir, but that's a rather hokey way to get it, and it'd be a lot nicer to just have a function which returns it, or maybe one that returns the oldest WAL position available on the system and what the current position is, which I think we might actually have. In other words, please actually outline a use-case, and let's design a proper solution. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 1:37 PM, Peter Geoghegan wrote: > On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas wrote: >> Trying to force those people to use checksums is just masterminding; >> they've made their own decision that it's not worth bothering with. >> When something goes wrong, WE still care about distinguishing hardware >> failure from PostgreSQL failure. Our pride is on the line. But the >> customer often doesn't. The DBA isn't the same person as the >> operating system guy, and the operating system guy isn't going to >> listen to the DBA even if the DBA complains of checksum failures. > > We need to invest in corruption detection/verification tools that are > run on an as-needed basis. They are available to users of every other > major database system. +1, but the trick is (a) figuring out exactly what to develop and (b) finding the time to develop it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
> > > + /* param 7: escape text default null, -- defaults to whatever > quote is */ > > + if (PG_ARGISNULL(7)) > > + { > > + copy_state.escape = copy_state.quote; > > + } > > + else > > + { > > + if (copy_state.csv_mode) > > + { > > + copy_state.escape = TextDatumGetCString(PG_GETARG_ > TEXT_P(7)); > > + if (strlen(copy_state.escape) != 1) > > + { > > + ereport(ERROR, > > + > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > + errmsg("COPY escape must > be a single one-byte character"))); > > + } > > + } > > + else > > + { > > + ereport(ERROR, > > + (errcode(ERRCODE_FEATURE_NOT_ > SUPPORTED), > > + errmsg("COPY escape available > only in CSV mode"))); > > + } > > + } > > I don't understand why do we have all these checks. Can't we just pass > the values to COPY and let it apply the checks? That way, when COPY is > updated to support multibyte escape chars (for example) we don't need to > touch this code. Together with removing the unneeded braces that would > make these stanzas about six lines long instead of fifteen. > If I understand you correctly, COPY (via BeginCopyFrom) itself relies on having a relation in pg_class to reference for attributes. In this case, there is no such relation. So I'd have to fake a relcache entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the Relation and pass that along to a new function BeginCopyFromReturnSet. I'm happy to go that route if you think it's a good idea. > > > > + tuple = heap_form_tuple(tupdesc,values,nulls); > > + //tuple = BuildTupleFromCStrings(attinmeta, > field_strings); > > + tuplestore_puttuple(tupstore, tuple); > > No need to form a tuple; use tuplestore_putvalues here. > Good to know! > > + } > > + > > + /* close "file" */ > > + if (copy_state.is_program) > > + { > > + int pclose_rc; > > + > > + pclose_rc = ClosePipeStream(copy_state.copy_file); > > + if (pclose_rc == -1) > > + ereport(ERROR, > > + (errcode_for_file_access(), > > + errmsg("could not close pipe to > external command: %m"))); > > + else if (pclose_rc != 0) > > + ereport(ERROR, > > + (errcode(ERRCODE_EXTERNAL_ > ROUTINE_EXCEPTION), > > + errmsg("program \"%s\" failed", > > + > copy_state.filename), > > + errdetail_internal("%s", > wait_result_to_str(pclose_rc; > > + } > > + else > > + { > > + if (copy_state.filename != NULL && > FreeFile(copy_state.copy_file)) > > + ereport(ERROR, > > + (errcode_for_file_access(), > > + errmsg("could not close file > \"%s\": %m", > > + > copy_state.filename))); > > + } > > I wonder if these should be an auxiliary function in copy.c to do this. > Surely copy.c itself does pretty much the same thing ... > Yes. This got started as a patch to core because not all of the parts of COPY are externally callable, and aren't broken down in a way that allowed for use in a SRF. I'll get to work on these suggestions.
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas wrote: > Trying to force those people to use checksums is just masterminding; > they've made their own decision that it's not worth bothering with. > When something goes wrong, WE still care about distinguishing hardware > failure from PostgreSQL failure. Our pride is on the line. But the > customer often doesn't. The DBA isn't the same person as the > operating system guy, and the operating system guy isn't going to > listen to the DBA even if the DBA complains of checksum failures. We need to invest in corruption detection/verification tools that are run on an as-needed basis. They are available to users of every other major database system. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: recursive json_populate_record()
Nikita Glukhov writes: > On 22.01.2017 21:58, Tom Lane wrote: >> If you want such macros I think it would be better to submit a separate >> cosmetic patch that tries to hide such bit-tests behind macros throughout >> the jsonb code. > I've attached that patch, but not all the bit-tests were hidden: some of them > in jsonb_util.c still remain valid after upcoming refactoring because they > don't belong to generic code (there might be better to use JBC_XXX() macros). Pushed this; grepping found a couple other places that could be replaced by the macros, so I did. I didn't include the JsonContainerIsEmpty macro, though. It wasn't used anywhere, and I'm not exactly convinced that "IsEmpty" is more readable than "Size == 0", anyhow. We can add it later if the use-case gets stronger. > Sorry for this obvious mistake. But macros JB_ROOT_IS_XXX() also contain the > same hazard. Good point, fixed. I'll look at the rest of this in a bit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
On Wed, Jan 25, 2017 at 1:10 PM, David Fetter wrote: > On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote: > > > > Feel free to mark it returned as feedback. The concept didn't > > generate as much enthusiasm as I had hoped, so I think the right > > thing to do now is scale it back to a patch that makes > > CopyFromRawFields() externally visible so that extensions can use > > them. > > You're getting enthusiasm in the form of these reviews and suggestions > for improvement. That it doesn't always happen immediately is a > byproduct of the scarcity of developer time and the sheer volume of > things to which people need to pay attention. True about that. I was referring to "ooh! I need that!"-type interest. I'll proceed, keeping in mind that there's a fallback position of just making some of the guts of COPY available to be called by extensions like was done for file_fdw.
Re: [HACKERS] Declarative partitioning vs. information_schema
On Wed, Jan 25, 2017 at 1:04 PM, Peter Eisentraut wrote: > On 1/18/17 2:32 PM, Robert Haas wrote: >> Unless we can find something official, I suppose we should just >> display BASE TABLE in that case as we do in other cases. I wonder if >> the schema needs some broader revision; for example, are there >> information_schema elements intended to show information about >> partitions? > > Is it intentional that we show the partitions by default in \d, > pg_tables, information_schema.tables? Or should we treat those as > somewhat-hidden details? I'm not really sure what the right thing to do is there. I was hoping you had an opinion. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasby wrote: > I'm not completely grokking your second paragraph, but I would think that an > average user would love got get a heads-up that their hardware is failing. Sure. If the database runs fast enough with checksums enabled, there's basically no reason to have them turned off. The issue is when it doesn't. Also, it's not as if there are no other ways of checking whether your disks are failing. SMART, for example, is supposed to tell you about incipient hardware failures before PostgreSQL ever sees a bit flip. Surely an average user would love to get a heads-up that their hardware is failing even when that hardware is not being used to power PostgreSQL, yet many people don't bother to configure SMART (or similar proprietary systems provided by individual vendors). Trying to force those people to use checksums is just masterminding; they've made their own decision that it's not worth bothering with. When something goes wrong, WE still care about distinguishing hardware failure from PostgreSQL failure. Our pride is on the line. But the customer often doesn't. The DBA isn't the same person as the operating system guy, and the operating system guy isn't going to listen to the DBA even if the DBA complains of checksum failures. Or the customer has 100 things on the same piece of hardware and PostgreSQL is the only one that failed; or alternatively they all failed around the same time; either way the culprit is obvious. Or the remedy is to restore from backup[1] whether the problem is hardware or software and regardless of whose software is to blame. Or their storage cost a million dollars and is a year old and they simply won't believe that it's failing. Or their storage cost a hundred dollars and is 8 years old and they're looking for an excuse to replace it whether it's responsible for the problem du jour or not. I think it's great that we have a checksum feature and I think it's great for people who want to use it and are willing to pay the cost of it to turn it on. I don't accept the argument that all of our users, or even most of them, fall into that category. I also think it's disappointing that there's such a vigorous argument for changing the default when so little follow-on development has gone into this feature. If we had put any real effort into making this easier to turn on and off, for example, the default value would be less important, because people could change it more easily. But nobody's making that effort. I suggest that the people who think this a super-high-value feature should be willing to put some real work into improving it instead of trying to force it down everybody's throat as-is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Alternatively, sometimes the remedy is to wish the had a usable backup while frantically running pg_resetxlog. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY as a set returning function
On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote: > > Feel free to mark it returned as feedback. The concept didn't > generate as much enthusiasm as I had hoped, so I think the right > thing to do now is scale it back to a patch that makes > CopyFromRawFields() externally visible so that extensions can use > them. You're getting enthusiasm in the form of these reviews and suggestions for improvement. That it doesn't always happen immediately is a byproduct of the scarcity of developer time and the sheer volume of things to which people need to pay attention. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 11:57 AM, Andres Freund wrote: > On 2017-01-21 11:39:18 +0100, Magnus Hagander wrote: >> Is it time to enable checksums by default, and give initdb a switch to turn >> it off instead? > > -1 - the WAL overhead is quite massive, and in contrast to the other > GUCs recently changed you can't just switch this around. I agree. I bet that if somebody does the test suggested by Amit downthread, it'll turn out that the performance is just awful. And those cases are common. I think the people saying "well, the overhead is worth it" must be people whose systems (or whose customer's systems) aren't processing continuous heavy OLTP workloads. If you've got a data warehousing workload, checksums are probably pretty cheap. If you've got a low-velocity OLTP workload, or an OLTP workload that fits in shared_buffers, it's probably bearable. But if you've got 8GB of shared_buffers and 100GB of data, and you've got 100 or so backends continuously doing random updates, I think checksums are going nail you to the wall. And EnterpriseDB, at least, has lots of customers that do exactly that sort of thing. Having said that, I've certain run into situations where I speculated that a customer had a hardware problem and they speculated that we had given them buggy database software. In a pretty significant number of cases, the customer turned out to be right; for example, some of those people were suffering from multixact bugs that resulted in unexplainable corruption. Now, would it have been useful to know that checksums were passing (suggesting a PostgreSQL problem) rather than failing (suggesting an OS problem)? Yes, that would have been great. I could have given those customers better support. On the other hand, I think I've run into MORE cases where the customer was desperately seeking options to improve write performance, which remains a pretty significant problem for PostgreSQL. I can't see taking a significant hit in that area for my convenience in understanding what's going on in data corruption situations. The write performance penalty is paid by everybody all the time, whereas data corruption is a rare event even among support cases. And even when you do have corruption, whether or not the data corruption is accompanied by a checksum failure is only ONE extra bit of useful data. A failure doesn't guarantee a hardware problem; it could be caused by a faulty backup procedure, like forgetting to run pg_start_backup(). The lack of a failure doesn't guarantee a software problem; it could be caused by a faulty backup procedure, like using an OS-level snapshot facility that isn't exactly simultaneous across tablespaces. What you really need to do when a customer has corruption is figure out why they have corruption, and the leading cause by far is neither the hardware nor the software but some kind of user error. Checksums are at best a very modest assist in figuring out whether an error has been made and if so of what type. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning vs. information_schema
On 1/18/17 2:32 PM, Robert Haas wrote: > Unless we can find something official, I suppose we should just > display BASE TABLE in that case as we do in other cases. I wonder if > the schema needs some broader revision; for example, are there > information_schema elements intended to show information about > partitions? Is it intentional that we show the partitions by default in \d, pg_tables, information_schema.tables? Or should we treat those as somewhat-hidden details? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pdf versus single-html
On 1/21/17 6:29 AM, Erik Rijkers wrote: > It might even be good to include such a single-file html in the Makefile > as an option. > > I'll give it a try but has anyone done this work already, perhaps? Already exists: doc/src/sgml$ make postgres.html -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical-replication.sgml improvements
On 1/20/17 11:00 AM, Erik Rijkers wrote: > logical-replication.sgml.diff changes Committed, thanks! -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans
On 25 January 2017 at 17:34, Julian Markwort wrote: > Analogous to this, a bad_plan is saved, when the time has been exceeded by a > factor greater than 1.1 . ...and the plan differs? Probably best to use some stat math to calculate deviation, rather than fixed %. Sounds good. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers