date:20170125

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-25 Thread Stas Kelvich

>> 
>> Yes, that’s also possible but seems to be less flexible restricting us to 
>> some
>> specific GID format.
>> 
>> Anyway, I can measure WAL space overhead introduced by the GID’s inside 
>> commit records
>> to know exactly what will be the cost of such approach.
> 
> Stas,
> 
> Have you had a chance to look at this further?

Generally i’m okay with Simon’s approach and will send send updated patch. 
Anyway want to
perform some test to estimate how much disk space is actually wasted by extra 
WAL records.

> I think the approach of storing just the xid and fetching the GID
> during logical decoding of the PREPARE TRANSACTION is probably the
> best way forward, per my prior mail.

I don’t think that’s possible in this way. If we will not put GID in commit 
record, than by the time
when logical decoding will happened transaction will be already 
committed/aborted and there will
be no easy way to get that GID. I thought about several possibilities:

* Tracking xid/gid map in memory also doesn’t help much — if server reboots 
between prepare 
and commit we’ll lose that mapping. 
* We can provide some hooks on prepared tx recovery during startup, but that 
approach also fails
if reboot happened between commit and decoding of that commit.
* Logical messages are WAL-logged, but they don’t have any redo function so 
don’t helps much.

So to support user-accessible 2PC over replication based on 2PC decoding we 
should invent
something more nasty like writing them into a table.

> That should eliminate Simon's
> objection re the cost of tracking GIDs and still let us have access to
> them when we want them, which is the best of both worlds really.

Having 2PC decoding in core is a good thing anyway even without GID tracking =)

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Michael Paquier

On Thu, Jan 26, 2017 at 4:09 PM, Nikhil Sontakke
 wrote:
>>I look at this patch from you and that's present for me:
>>https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com
>
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
>>  (errmsg("unexpected timeline ID %u (should be %u)
>> in checkpoint record",
>>  checkPoint.ThisTimeLineID, ThisTimeLineID)));
>>
>> +KnownPreparedRecreateFiles(checkPoint.redo);
>>  RecoveryRestartPoint(&checkPoint);
>>  }
>
> Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a
> function by this name. And now I see, the name is CheckPointTwoPhase()
> :-)

My mistake then :D

>> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
>> files are not flushed to disk with this patch. This is a problem as a
>> new restart point is created... Having the flush in CheckpointTwoPhase
>> really makes the most sense.
>
> Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby
> promote" code path.

CreateRestartPoint() calls it via CheckPointGuts() while in recovery.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-01-25 Thread Michael Paquier

On Wed, Jan 25, 2017 at 7:18 PM, Ishii Ayumi  wrote:
> I patched 4 patchset and run "make", but I got failed.
> Is this a bug or my mistake ?
> I'm sorry if I'm wrong.
>
> [$(TOP)]$ patch -p1 < ../0001-Add-missing-semicolon.patch
> [$(TOP)]$ patch -p1 < ../0002-Correct-reference-resolution-syntax.patch
> [$(TOP)]$ patch -p1 <
> ../0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch
> [$(TOP)]$ patch -p1 < ../0004-Use-radix-tree-for-character-conversion.patch
> [$(TOP)]$ ./configure
> [Unicode]$ make
> '/usr/bin/perl' UCS_to_most.pl
> Type of arg 1 to keys must be hash (not hash element) at convutils.pm
> line 443, near "})
> "
> Type of arg 1 to values must be hash (not hash element) at
> convutils.pm line 596, near "})
> "
> Type of arg 1 to each must be hash (not private variable) at
> convutils.pm line 755, near "$map)
> "
> Compilation failed in require at UCS_to_most.pl line 19.
> make: *** [iso8859_2_to_utf8.map] Error 255

Hm, I am not sure what you are missing. I was able to get things to build.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

2017-01-25 Thread Michael Paquier

On Tue, Jan 10, 2017 at 8:22 PM, Kyotaro HORIGUCHI
 wrote:
> [...patch...]

Nobody has showed up yet to review this patch, so I am giving it a shot.

The patch file sizes are scary at first sight, but after having a look:
 36 files changed, 1411 insertions(+), 54398 deletions(-)
Yes that's a surprise, something like git diff --irreversible-delete
would have helped as most of the diffs are just caused by 3 files
being deleted in patch 0004, making 50k lines going to the abyss of
deletion.

> Hello, I found a bug in my portion while rebasing.

Right, that's 0001. Nice catch.

> The attached files are the following. This patchset is not
> complete missing changes of map files. The change is tremendously
> large but generatable.
>
> 0001-Add-missing-semicolon.patch
>
>   UCS_to_EUC_JP.pl has a line missing teminating semicolon. This
>   doesn't harm but surely a syntax error. This patch fixes it.
>   This might should be a separate patch.

This requires a back-patch. This makes me wonder how long this script
has actually not run...

> 0002-Correct-reference-resolution-syntax.patch
>
>   convutils.pm has lines with different syntax of reference
>   resolution. This unifies the syntax.

Yes that looks right to me. I am the best perl guru on this list but
looking around $$var{foo} is bad, ${$var}{foo} is better, and
$var->{foo} is even better. This also generates no diffs when running
make in src/backend/utils/mb/Unicode/. So no objections to that.

> 0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch
>
>   Before adding radix tree stuff, applied pgperltidy and inserted
>   format-skipping pragma for the parts where perltidy seems to do
>   too much.

Which version of perltidy did you use? Looking at the archives, the
perl code is cleaned up with a specific version, v20090616. See
https://www.postgresql.org/message-id/20151204054322.ga2070...@tornado.leadboat.com
for example on the matter. As perltidy changes over time, this may be
a sensitive change if done this way.

> 0004-Use-radix-tree-for-character-conversion.patch
>
>   Radix tree body.

Well, here a lot of diffs could have been saved.

> The unattached fifth patch is generated by the following steps.
>
> [$(TOP)]$ ./configure
> [Unicode]$ make
> [Unicode]$ make distclean
> [Unicode]$ git add .
> [Unicode]$ commit
> === COMMITE MESSSAGE
> Replace map files with radix tree files.
>
> These encodings no longer uses the former map files and uses new radix
> tree files.
> ===

OK, I can see that working, with 200k of maps generated.. So going
through the important bits of this jungle..

+/*
+ * radix tree conversion function - this should be identical to the function in
+ * ../conv.c with the same name
+ */
+static inline uint32
+pg_mb_radix_conv(const pg_mb_radix_tree *rt,
+int l,
+unsigned char b1,
+unsigned char b2,
+unsigned char b3,
+unsigned char b4)
This is not nice. Having a duplication like that is a recipe to forget
about it as this patch introduces a dependency with conv.c and the
radix tree generation.

Having a .gitignore in Unicode/ would be nice, particularly to avoid
committing map_checker.

A README documenting things may be welcome, or at least comments at
the top of map_checker.c. Why is map_checker essential? What does it
do? There is no way to understand that easily, except that it includes
a "radix tree conversion function", and that it performs sanity checks
on the radix trees to be sure that they are on a good shape. But as
this something that one would guess only after looking at your patch
and the code (at least I will sleep less stupid tonight after reading
this stuff).

--- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
 # Drop these SJIS codes from the source for UTF8=>SJIS conversion
 #<<< do not let perltidy touch this
-my @reject_sjis =(
+my @reject_sjis = (
0xed40..0xeefc, 0x8754..0x875d, 0x878a, 0x8782,
-   0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797,
+   0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797,
0x879a..0x879c
-);
+   );
This is not generated, it would be nice to drop the noise from the patch.

Here is another one:
-   $i->{code} = $jis | (
-   $jis < 0x100
-   ? 0x8e00
-   : ($sjis >= 0xeffd ? 0x8f8080 : 0x8080));
-
+#<<< do not let perltidy touch this
+   $i->{code} = $jis | ($jis < 0x100 ? 0x8e00:
+($sjis >= 0xeffd ? 0x8f8080 : 0x8080));
+#>>>

if (l == 2)
{
-   iutf = *utf++ << 8;
-   iutf |= *utf++;
+   b3 = *utf++;
+   b4 = *utf++;
}
Ah, OK. This conversion is important so as it performs a minimum of
bitwise operations. Yes let's keep that. That's pretty cool to get a
faster operation.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpr

Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Nikhil Sontakke

>I look at this patch from you and that's present for me:
>https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq->Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
>  (errmsg("unexpected timeline ID %u (should be %u)
> in checkpoint record",
>  checkPoint.ThisTimeLineID, ThisTimeLineID)));
>
> +KnownPreparedRecreateFiles(checkPoint.redo);
>  RecoveryRestartPoint(&checkPoint);
>  }

Oh, sorry. I was asking about CheckpointTwoPhase(). I don't see a
function by this name. And now I see, the name is CheckPointTwoPhase()
:-)

> And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
> files are not flushed to disk with this patch. This is a problem as a
> new restart point is created... Having the flush in CheckpointTwoPhase
> really makes the most sense.

Umm, AFAICS, CheckPointTwoPhase() does not get called in the "standby
promote" code path.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Michael Paquier

On Thu, Jan 26, 2017 at 1:38 PM, Nikhil Sontakke
 wrote:
>> We should really try to do things right now, or we'll never come back
>> to it. 9.3 (if my memory does not fail me?) has reduced the time to do
>> promotion by removing the need of the end-of-recovery checkpoint,
>> while I agree that there should not be that many 2PC transactions at
>> this point, if there are for a reason or another, the time it takes to
>> complete promotion would be impacted. So let's refactor
>> PrescanPreparedTransactions() so as it is able to handle 2PC data from
>> a buffer extracted by XlogReadTwoPhaseData(), and we are good to go.
>
> Not quite. If we modify PrescanPreparedTransactions(), we also need to
> make RecoverPreparedTransactions() and
> StandbyRecoverPreparedTransactions() handle 2PC data via
> XlogReadTwoPhaseData().

Ah, right for both, even for RecoverPreparedTransactions() that
happens at the end of recovery. Thanks for noticing. The patch
mentions that as well:
+ ** At the end of recovery we move all known prepared transactions to disk.
+ *  This allows RecoverPreparedTransactions() and
+ *  StandbyRecoverPreparedTransactions() to do their work.
I need some strong coffee..

>> +   KnownPreparedRecreateFiles(checkPoint.redo);
>> RecoveryRestartPoint(&checkPoint);
>> Looking again at this code, I think that this is incorrect. The
>> checkpointer should be in charge of doing this work and not the
>> startup process, so this should go into CheckpointTwoPhase() instead.
>
> I don't see a function by the above name in the code?

I look at this patch from you and that's present for me:
https://www.postgresql.org/message-id/CAMGcDxf8Bn9ZPBBJZba9wiyQq-Qk5uqq=vjomnrnw5s+fks...@mail.gmail.com
If I look as well at the last version of Stas it is here:
https://www.postgresql.org/message-id/becc988a-db74-48d5-b5d5-a54551a62...@postgrespro.ru

As this change:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9573,6 +9573,7 @@ xlog_redo(XLogReaderState *record)
 (errmsg("unexpected timeline ID %u (should be %u)
in checkpoint record",
 checkPoint.ThisTimeLineID, ThisTimeLineID)));

+KnownPreparedRecreateFiles(checkPoint.redo);
 RecoveryRestartPoint(&checkPoint);
 }
And actually, when a XLOG_CHECKPOINT_SHUTDOWN record is taken, 2PC
files are not flushed to disk with this patch. This is a problem as a
new restart point is created... Having the flush in CheckpointTwoPhase
really makes the most sense.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sequence data type

2017-01-25 Thread Michael Paquier

On Wed, Jan 25, 2017 at 11:53 PM, Peter Eisentraut
 wrote:
> Here is an updated patch that allows changing the sequence type.  This
> was clearly a concern for reviewers, and the presented use cases seemed
> convincing.

I have been torturing this patch and it looks rather solid to me. Here
are a couple of comments:

@@ -15984,6 +15992,9 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
  "CREATE SEQUENCE %s\n",
  fmtId(tbinfo->dobj.name));

+   if (strcmp(seqtype, "bigint") != 0)
+   appendPQExpBuffer(query, "AS %s\n", seqtype);
Wouldn't it be better to assign that unconditionally? There is no
reason that a dump taken from pg_dump version X will work on X - 1 (as
there is no reason to not make the life of users uselessly difficult
as that's your point), but that seems better to me than rely on the
sequence type hardcoded in all the pre-10 dump queries for sequences.
That would bring also more consistency to the CREATE SEQUENCE queries
of test_pg_dump/t/001_base.pl.

Could you increase the regression test coverage to cover some of the
new code paths? For example such cases are not tested:
=# create sequence toto as smallint;
CREATE SEQUENCE
=# alter sequence toto as smallint maxvalue 1000;
ERROR:  22023: RESTART value (2147483646) cannot be greater than MAXVALUE (1000)
LOCATION:  init_params, sequence.c:1537
=# select setval('toto', 1);
 setval

  1
(1 row)
=# alter sequence toto as smallint;
ERROR:  22023: MAXVALUE (2147483647) is too large for sequence data
type smallint
LOCATION:  init_params, sequence.c:1407

+   if ((seqform->seqtypid == INT2OID && seqform->seqmin < PG_INT16_MIN)
+   || (seqform->seqtypid == INT4OID && seqform->seqmin < PG_INT32_MIN)
+   || (seqform->seqtypid == INT8OID && seqform->seqmin < PG_INT64_MIN))
+   {
+   charbufm[100];
+
+   snprintf(bufm, sizeof(bufm), INT64_FORMAT, seqform->seqmin);
+
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("MINVALUE (%s) is too large for sequence data type %s",
+   bufm, format_type_be(seqform->seqtypid;
+   }
"large" does not apply to values lower than the minimum, no? The int64
path is never going to be reached (same for the max value), it doesn't
hurt to code it I agree.

Testing serial columns, the changes are consistent with the previous releases.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Nikhil Sontakke

>> The question remains whether saving off a few fsyncs/reads for these
>> long-lived prepared transactions is worth the additional code churn.
>> Even if we add code to go through the KnownPreparedList, we still will
>> have to go through the other on-disk 2PC transactions anyways. So,
>> maybe not.
>
> We should really try to do things right now, or we'll never come back
> to it. 9.3 (if my memory does not fail me?) has reduced the time to do
> promotion by removing the need of the end-of-recovery checkpoint,
> while I agree that there should not be that many 2PC transactions at
> this point, if there are for a reason or another, the time it takes to
> complete promotion would be impacted. So let's refactor
> PrescanPreparedTransactions() so as it is able to handle 2PC data from
> a buffer extracted by XlogReadTwoPhaseData(), and we are good to go.
>

Not quite. If we modify PrescanPreparedTransactions(), we also need to
make RecoverPreparedTransactions() and
StandbyRecoverPreparedTransactions() handle 2PC data via
XlogReadTwoPhaseData().


> +   /*
> +* Move prepared transactions, if any, from KnownPreparedList to 
> files.
> +* It is possible to skip this step and teach subsequent code about
> +* KnownPreparedList, but PrescanPreparedTransactions() happens once
> +* during end of recovery or on promote, so probably it isn't worth
> +* the additional code.
> +*/


This comment is misplaced. Does not make sense before this specific call.

> +   KnownPreparedRecreateFiles(checkPoint.redo);
> RecoveryRestartPoint(&checkPoint);
> Looking again at this code, I think that this is incorrect. The
> checkpointer should be in charge of doing this work and not the
> startup process, so this should go into CheckpointTwoPhase() instead.

I don't see a function by the above name in the code?

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund  wrote:
> > On 2017-01-25 19:30:08 -0500, Stephen Frost wrote:
> >> * Peter Geoghegan (p...@heroku.com) wrote:
> >> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  
> >> > wrote:
> >> > > As it is, there are backup solutions which *do* check the checksum when
> >> > > backing up PG.  This is no longer, thankfully, some hypothetical thing,
> >> > > but something which really exists and will hopefully keep users from
> >> > > losing data.
> >> >
> >> > Wouldn't that have issues with torn pages?
> >>
> >> No, why would it?  The page has either been written out by PG to the OS,
> >> in which case the backup s/w will see the new page, or it hasn't been.
> >
> > Uh. Writes aren't atomic on that granularity.  That means you very well
> > *can* see a torn page (in linux you can e.g. on 4KB os page boundaries
> > of a 8KB postgres page). Just read a page while it's being written out.
> 
> Yeah.  This is also why backups force full page writes on even if
> they're turned off in general.

I've got a question into David about this, I know we chatted about the
risk at one point, I just don't recall what we ended up doing (I can
imagine a few different possible things- re-read the page, which isn't a
guarantee but reduces the chances a fair bit, or check the LSN, or
perhaps the plan was to just check if it's in the WAL, as I mentioned)
or if we ended up concluding it wasn't a risk for some, perhaps
incorrect, reason and need to revisit it.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> Your proposed policy is essentially that functions should have
> built-in superuser checks if having access to that function is
> sufficient to escalate your account to full superuser privileges.

Yes, I do.

> 1. There's no consensus on any such policy.

Evidently.  I will say that it was brought up on a thread with quite a
bit of general discussion and resulted in a committed patch.  If you
want my 2c on it, there was at least a consensus on it at that time,
even if one no longer exists.  I certainly don't recall anyone
clamouring at the time that we should re-evaluate how my assessment was
done or the choices made regarding the functions which you now, 2 years
later, wish to change.

> 2. If there were such a policy it would favor, not oppose, changing
> pg_ls_dir(), because you can't escalate to superuser given access to
> pg_ls_dir().  Yet you are also opposed to changing pg_ls_dir() for
> reasons that boil down to a personal preference on your part for
> people not using it to build monitoring scripts.

If your argument was specifically regarding pg_ls_dir() and pg_ls_dir()
only then I would have misgivings about it because it's a piss-poor
solution to the problem that you've actually presented and I am
concerned that such a "design" will lead to later implication that it's
the "right" thing to do and that we shouldn't try to do better.

In other words, the exact argument we always tend to have when we're
talking about new features and their design- is it the right design,
etc.

> 3. Such a policy can only be enforced to the extent that we can
> accurately predict which functions can be used to escalate to
> superuser, which is not necessarily obvious in every case.  Under your
> proposed policy, if a given function turns out to be more dangerous
> than we'd previously thought, we'd have to stick the superuser check
> back in for the next release.  And if it turns out to be less
> dangerous than we thought, we'd take the check out.  That would be
> silly.

If the proposed function was able to be used to escalate a user to
superuser status then I'd think we'd want to do a security release to
fix whatever that hole is, since it's almost certainly not the intended
use of the function and the lack of proper checking is a security risk.

If we stop doing that, where do we draw the line?  Is it ok for
pg_termiante_backend() to be used to gain superuser-level access through
some kind of race condition?  What about pg_start_backup()?  Which
functions are "OK" to be allowed to make users superuser, and which are
not, when we don't have superuser() checks in any of them?  Further, how
is the user supposed to know?  Are we going to start sticking labels on
functions which say "WARNING: This can be used to become a superuser!"
I certainly hope not.

> 4. Such a policy is useless from a security perspective because you
> can't actually prevent superusers from delegating access to those
> functions.  You can force them to use wrapper functions but that
> doesn't eo ipso improve security.  It might make security better or
> worse depending on how well the functions are written, and it seems
> extremely optimistic to suppose that everyone who writes a security
> definer wrapper function will actually do anything more than expose
> the underlying function as-is (and maybe forget to schema-qualify
> something).

This is not about preventing superusers from delegating access to
whichever users they wish to, where it makes sense to do so.

> 5. If you're worried about people granting access to functions that
> allow escalation to the superuser account, what you really ought to do
> is put some effort into documenting which functions have such hazards
> and for what general reasons.  

I did, by making the ones that I don't believe can be used to gain
superuser level access GRANT'able to non-superusers.  The ones which I
didn't feel comfortable with changing in that way were left with the
superuser() checks in them.

Maybe I got that wrong, but that was certainly the idea, as discussed on
the thread which I linked you to.

> I'd be willing to agree to write documentation along the lines
> suggested in (5) as a condition of removing the remaining superuser
> checks if you'd be willing to review it and suggest a place to put it.

I'm really not anxious to have a bunch of such warnings throughout the
docs.  Nor am I thrilled about the idea of having to argue about what
can and what can't- would you say pg_read_file() is superuser-equiv?  I
would, because in a great many cases, for all practical purposes, it is,
because a ton of users run with md5 and the auth info could be pulled
from pg_authid directly with pg_read_file().  The lack of any way to
contain the access that such a function would provide would also mean
that users would end up making a similar choice- GRANT access to this
function that an attacker could use to gain superuser() rights, or not?

That's not t

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Greg Stark

On 26 January 2017 at 01:58, Thomas Munro  wrote:
>
> I don't know how comparable it is to our checksum technology, but
> MySQL seems to have some kind of checksums on table data, and you can
> find public emails, blogs etc lamenting corrupted databases by
> searching Google for the string "InnoDB: uncompressed page, stored
> checksum in field1" (that's the start of a longer error message that
> includes actual and expected checksums).

I'm not sure what exactly that teaches us however. I see these were
often associated with software bugs (Apparently MySQL long assumed
that a checksum of 0 never happened for example).

In every non software case I stumbled across seemed to be following a
power failure. Apparently MySQL uses a "doublewrite buffer" to protect
against torn pages but when I search for that I get tons of people
inquiring how to turn it off... So even without software bugs in the
checksum code I don't know that the frequency of the error necessarily
teaches us anything about the frequency of hardware corruption either.

And more to the point it seems what people are asking for in all those
lamentations is how they can convince MySQL to continue and ignore the
corruption. A typical response was "We slightly modified innochecksum
and added option -f that means if the checksum of a page is wrong,
rewrite it in the InnoDB page header." Which begs the question...

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund  wrote:
> On 2017-01-25 19:30:08 -0500, Stephen Frost wrote:
>> * Peter Geoghegan (p...@heroku.com) wrote:
>> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>> > > As it is, there are backup solutions which *do* check the checksum when
>> > > backing up PG.  This is no longer, thankfully, some hypothetical thing,
>> > > but something which really exists and will hopefully keep users from
>> > > losing data.
>> >
>> > Wouldn't that have issues with torn pages?
>>
>> No, why would it?  The page has either been written out by PG to the OS,
>> in which case the backup s/w will see the new page, or it hasn't been.
>
> Uh. Writes aren't atomic on that granularity.  That means you very well
> *can* see a torn page (in linux you can e.g. on 4KB os page boundaries
> of a 8KB postgres page). Just read a page while it's being written out.

Yeah.  This is also why backups force full page writes on even if
they're turned off in general.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 8:22 PM, Stephen Frost  wrote:
> Apparently we disagree about what is a 'reasonable manner'.

Yes.  I think that a "reasonable manner" should mean "what the DBA
thinks is reasonable", whereas you apparently think it should mean
"what the DBA thinks is reasonable, but only if the core developers
and in particular Stephen Frost also think it's reasonable".

Your proposed policy is essentially that functions should have
built-in superuser checks if having access to that function is
sufficient to escalate your account to full superuser privileges.
But:

1. There's no consensus on any such policy.

2. If there were such a policy it would favor, not oppose, changing
pg_ls_dir(), because you can't escalate to superuser given access to
pg_ls_dir().  Yet you are also opposed to changing pg_ls_dir() for
reasons that boil down to a personal preference on your part for
people not using it to build monitoring scripts.

3. Such a policy can only be enforced to the extent that we can
accurately predict which functions can be used to escalate to
superuser, which is not necessarily obvious in every case.  Under your
proposed policy, if a given function turns out to be more dangerous
than we'd previously thought, we'd have to stick the superuser check
back in for the next release.  And if it turns out to be less
dangerous than we thought, we'd take the check out.  That would be
silly.

4. Such a policy is useless from a security perspective because you
can't actually prevent superusers from delegating access to those
functions.  You can force them to use wrapper functions but that
doesn't eo ipso improve security.  It might make security better or
worse depending on how well the functions are written, and it seems
extremely optimistic to suppose that everyone who writes a security
definer wrapper function will actually do anything more than expose
the underlying function as-is (and maybe forget to schema-qualify
something).

5. If you're worried about people granting access to functions that
allow escalation to the superuser account, what you really ought to do
is put some effort into documenting which functions have such hazards
and for what general reasons.  That would have a much better chance of
preventing people from delegating access to dangerous functions
inadvertently than the current method, which relies on people knowing
(without documentation) that you've attempted to leave hard-coded
superuser() checks in some functions but not others for reasons that
sometimes but not always include privilege escalation, correctly
distinguishing which such cases involve privilege escalation as
opposed to other arbitrary criteria, and deciding neither to create
secdef wrappers for anything that has a built-in check for reasons of
privilege isolation nor to give up on privilege isolation and hand out
superuser to people who need pg_ls_dir().  While such a clever chain
of deductive reasoning cannot be ruled out, it would be astonishing if
it happened very often.

I'd be willing to agree to write documentation along the lines
suggested in (5) as a condition of removing the remaining superuser
checks if you'd be willing to review it and suggest a place to put it.
But I have a feeling compromise may not be possible here.  To me, the
hand-wringing about the evils of pg_ls_dir() on this thread contrasts
rather starkly with the complete lack of discussion about whether the
patch removing superuser checks from pgstattuple was opening up any
security vulnerabilities.  And given that the aforesaid patch lets a
user who has EXECUTE privileges on pgstattuple run that function even
on relations for which they have no other privileges, such as say
pg_authid, it hardly seems self-evident that there are no leaks there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

2017-01-25 Thread Craig Ringer

On 15 January 2017 at 05:18, Tom Lane  wrote:
> Fabien COELHO  writes:
>>> It ends up being about 30 fewer lines of code overall, despite there
>>> being four places that have to make ad-hoc RawStmt nodes.  On the whole
>>> I feel like this is cleaner,
>
>> I agree: Better typing, more homogeneous code (PlannedStmt for all),
>> less ad-hoc checks to work around utility statements...
>
> OK, pushed like that.

Thanks very much for that Tom, it's great to see this change.

One suggestion: it's currently non-obvious that ProcessUtility_hook
gets called with the full text of all parts of a multi-statement. I
suggest the following wording added to the comment on ProcessUtility()
in src/backend/tcop/utility.c, after the note on the query string, or
something like it:

  The same query string may be passed to multiple invocations of ProcessUtility
  if a utility statement in turn invokes other utility statements, or if the
  user supplied a query string containing multiple semicolon-separated
  statements in a single protocol message. It is also possible for the query
  text to contain other non-utility-statement text like comments, empty
  statements, and plannable statements. Callers that use the queryString
  should use pstmt->stmt_location and pstmt->stmt_len to extract the text for
  the statement of interest and should guard against re-entrant invocation.

That should help with at least some of the traps around
ProcessUtility_hook, and I certainly wish I'd known it some months
ago.

For the record, this is commits
ab1f0c8225714aaa18d2f9ca4f80cd009f145421 and
83f2061dd037477ec8479ee160367840e203a722 .

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ab1f0c8225714aaa18d2f9ca4f80cd009f145421

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=83f2061dd037477ec8479ee160367840e203a722

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-25 Thread Michael Paquier

On Thu, Jan 26, 2017 at 2:32 AM, Tom Lane  wrote:
> The way I'd be inclined to make the individual reporting changes is like
>
>  if (!EnableSSL)
> +{
> -   ereport(LOG,
> +   ereport(elevel,
>  (errcode(ERRCODE_CONFIG_FILE_ERROR),
> errmsg("hostssl record cannot match because SSL is disabled"),
>   errhint("Set ssl = on in postgresql.conf."),
>   errcontext("line %d of configuration file \"%s\"",
>  line_num, HbaFileName)));
> +*err_msg = pstrdup("hostssl record cannot match because SSL 
> is disabled");
> +}
>
> which is considerably less invasive and hence easier to review, and
> supports reporting different text in the view than appears in the log,
> should we need that.  It seems likely also that we could drop the pstrdup
> in the case of constant strings (we'd still need psprintf if we want to
> insert values into the view messages), which would make this way cheaper
> than what's in the patch now.

I don't really understand the argument about readability of the patch
as what Haribabu has proposed is simply to avoid a duplicate of the
strings and the diffs of the patch are really clear. For the sake of
not translating the strings sent back to the system view though I can
buy it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding of two-phase transactions

2017-01-25 Thread Craig Ringer

On 5 January 2017 at 20:43, Stas Kelvich  wrote:
>
>> On 5 Jan 2017, at 13:49, Simon Riggs  wrote:
>>
>> Surely in this case the master server is acting as the Transaction
>> Manager, and it knows the mapping, so we are good?
>>
>> I guess if you are using >2 nodes then you need to use full 2PC on each node.
>>
>> Please explain precisely how you expect to use this, to check that GID
>> is required.
>>
>
> For example if we are using logical replication just for failover/HA and 
> allowing user
> to be transaction manager itself. Then suppose that user prepared tx on 
> server A and server A
> crashed. After that client may want to reconnect to server B and commit/abort 
> that tx.
> But user only have GID that was used during prepare.
>
>> But even then, if you adopt the naming convention that all in-progress
>> xacts will be called RepOriginId-EPOCH-XID, so they have a fully
>> unique GID on all of the child nodes then we don't need to add the
>> GID.
>
> Yes, that’s also possible but seems to be less flexible restricting us to some
> specific GID format.
>
> Anyway, I can measure WAL space overhead introduced by the GID’s inside 
> commit records
> to know exactly what will be the cost of such approach.

Stas,

Have you had a chance to look at this further?

I think the approach of storing just the xid and fetching the GID
during logical decoding of the PREPARE TRANSACTION is probably the
best way forward, per my prior mail. That should eliminate Simon's
objection re the cost of tracking GIDs and still let us have access to
them when we want them, which is the best of both worlds really.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

2017-01-25 Thread Michael Paquier

On Wed, Jan 25, 2017 at 11:55 PM, Nikhil Sontakke
 wrote:
>> We are talking about the recovery/promote code path. Specifically this
>> call to KnownPreparedRecreateFiles() in PrescanPreparedTransactions().
>>
>> We write the files to disk and they get immediately read up in the
>> following code. We could not write the files to disk and read
>> KnownPreparedList in the code path that follows as well as elsewhere.
>
> Thinking more on this.
>
> The only optimization that's really remaining is handling of prepared
> transactions that have not been committed or will linger around for
> long. The short lived 2PC transactions have been optimized already via
> this patch.
>
> The question remains whether saving off a few fsyncs/reads for these
> long-lived prepared transactions is worth the additional code churn.
> Even if we add code to go through the KnownPreparedList, we still will
> have to go through the other on-disk 2PC transactions anyways. So,
> maybe not.

We should really try to do things right now, or we'll never come back
to it. 9.3 (if my memory does not fail me?) has reduced the time to do
promotion by removing the need of the end-of-recovery checkpoint,
while I agree that there should not be that many 2PC transactions at
this point, if there are for a reason or another, the time it takes to
complete promotion would be impacted. So let's refactor
PrescanPreparedTransactions() so as it is able to handle 2PC data from
a buffer extracted by XlogReadTwoPhaseData(), and we are good to go.

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9573,6 +9573,15 @@ xlog_redo(XLogReaderState *record)
(errmsg("unexpected timeline ID %u (should be %u)
in checkpoint record",
checkPoint.ThisTimeLineID, ThisTimeLineID)));

+
+   /*
+* Move prepared transactions, if any, from KnownPreparedList to files.
+* It is possible to skip this step and teach subsequent code about
+* KnownPreparedList, but PrescanPreparedTransactions() happens once
+* during end of recovery or on promote, so probably it isn't worth
+* the additional code.
+*/
+   KnownPreparedRecreateFiles(checkPoint.redo);
RecoveryRestartPoint(&checkPoint);
Looking again at this code, I think that this is incorrect. The
checkpointer should be in charge of doing this work and not the
startup process, so this should go into CheckpointTwoPhase() instead.
At the end, we should be able to just live without
KnownPreparedRecreateFiles() and just rip it off from the patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Thomas Munro

On Thu, Jan 26, 2017 at 2:28 PM, Stephen Frost  wrote:
> Sadly, without having them enabled by default, there's not a huge corpus
> of example cases to draw from.
>
> There have been a few examples already posted about corruption failures
> with PG, but one can't say with certainty that they would have been
> caught sooner if checksums had been enabled.

I don't know how comparable it is to our checksum technology, but
MySQL seems to have some kind of checksums on table data, and you can
find public emails, blogs etc lamenting corrupted databases by
searching Google for the string "InnoDB: uncompressed page, stored
checksum in field1" (that's the start of a longer error message that
includes actual and expected checksums).

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker

>
>
>> I don't understand why do we have all these checks.  Can't we just pass
>> the values to COPY and let it apply the checks?  That way, when COPY is
>> updated to support multibyte escape chars (for example) we don't need to
>> touch this code.  Together with removing the unneeded braces that would
>> make these stanzas about six lines long instead of fifteen.
>>
>
> If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
> having a relation in pg_class to reference for attributes.
> In this case, there is no such relation. So I'd have to fake a relcache
> entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
> Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
> happy to go that route if you think it's a good idea.
>
>
>>
>>
>> > + tuple = heap_form_tuple(tupdesc,values,nulls);
>> > + //tuple = BuildTupleFromCStrings(attinmeta,
>> field_strings);
>> > + tuplestore_puttuple(tupstore, tuple);
>>
>> No need to form a tuple; use tuplestore_putvalues here.
>>
>
> Good to know!
>
>
>
>>
>> I wonder if these should be an auxiliary function in copy.c to do this.
>> Surely copy.c itself does pretty much the same thing ...
>>
>
> Yes. This got started as a patch to core because not all of the parts of
> COPY are externally callable, and aren't broken down in a way that allowed
> for use in a SRF.
>
> I'll get to work on these suggestions.
>

I've put in some more work on this patch, mostly just taking Alvaro's
suggestions, which resulted in big code savings.

I had to add a TupleDesc parameter to BeginCopy() and BeginCopyFrom(). This
seemed the easiest way to leverage the existing tested code (and indeed, it
worked nearly out-of-the-box). The only drawback is that a minor change
will have to be made to the BeginCopyFrom() call in file_fdw.c, and any
other extensions that leverage COPY. We could make compatibility functions
that take the original signature and pass it along to the corresponding
function with rsTupDesc set to NULL.

Some issues:
- I'm still not sure if the direction we want to go is a set returning
function, or a change in grammar that lets us use COPY as a CTE or similar.
- This function will have the same difficulties as adding the program
option did to file_fdw: there's very little we can reference that isn't
os/environment specific
- Inline (STDIN) prompts the user for input, but gives the error: server
sent data ("D" message) without prior row description ("T" message). I
looked for a place where the Relation was consulted for the row
description, but I'm not finding it.

I can continue to flesh this out with documentation and test cases if there
is consensus that this is the way to go.


# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
 x | y
---+---
 x | y
(1 row)

Time: 1.074 ms
# select * from copy_srf('echo "x\t4"',true) as t(x text, y integer);
 x | y
---+---
 x | 4
(1 row)

Time: 1.095 ms
# select * from copy_srf(null) as t(x text, y integer);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a4
>> b5
>> \.
server sent data ("D" message) without prior row description ("T" message)
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 4dfedf8..26f81f3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+   IN filename text,
+   IN is_program boolean DEFAULT false,
+   IN format text DEFAULT null,
+   IN delimiter text DEFAULT null,
+   IN null_string text DEFAULT null,
+   IN header boolean DEFAULT null,
+   IN quote text DEFAULT null,
+   IN escape text DEFAULT null,
+   IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f9362be..4e6a32c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -286,7 +287,8 @@ static const char BinarySignature[11] = 
"PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
+static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel, 
+ TupleDesc rsTupDesc,
  RawStmt *raw_query, Oi

Re: [HACKERS] pgbench more operators & functions

2017-01-25 Thread Stephen Frost

Fabien,

* Fabien COELHO (coe...@cri.ensmp.fr) wrote:
> I think that there is a misunderstanding, most of which being my fault.

No worries, it happens. :)

> I have really tried to do everything that was required from
> committers, including revising the patch to match all previous
> feedback.

Thanks for continuing to try to work through everything.  I know it can
be a difficult process, but it's all towards a (hopefully) improved and
better PG.

> Version 6 sent on Oct 4 did include all fixes required at the time
> (no if, no unusual and operators, TAP tests)... However I forgot to
> remove some documentation about the removed stuff, which made Robert
> think that I had not done it. I apologise for this mistake and the
> subsequent misunderstanding:-(

Ok, that helps clarify things.  As does the rest of your email, for me,
anyway.

> If pgbench is about being seated on a bench and running postgres on
> your laptop to get some heat, my mistake... I thought it was about
> benchmarking, which does imply a few extra capabities.

I believe we do want to improve pgbench and your changes are generally
welcome when it comes to adding useful capabilities.  Your explanation
was also helpful about the specific requirements.

> IMHO the relevant current status of the patch should be "Needs
> review" and possibly "Move to next CF".

For my 2c, at least, while I'm definitely interested in this, it's not
nearly high enough on my plate with everything else going on to get any
attention in the next few weeks, at least.

I do think that, perhaps, this patch may deserve a bit of a break, to
allow people to come back to it with a fresh perspective, so perhaps
moving it to the next commitfest would be a good idea, in a Needs Review
state.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

Peter,

* Peter Geoghegan (p...@heroku.com) wrote:
> On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan  wrote:
> > I understand that my experience with storage devices is unusually
> > narrow compared to everyone else here. That's why I remain neutral on
> > the high level question of whether or not we ought to enable checksums
> > by default. I'll ask other hackers to answer what may seem like a very
> > naive question, while bearing what I just said in mind. The question
> > is: Have you ever actually seen a checksum failure in production? And,
> > if so, how helpful was it?
> 
> I'm surprised that nobody has answered my question yet.
> 
> I'm not claiming that not actually seeing any corruption in the wild
> due to a failing checksum invalidates any argument. I *do* think that
> data points like this can be helpful, though.

Sadly, without having them enabled by default, there's not a huge corpus
of example cases to draw from.

There have been a few examples already posted about corruption failures
with PG, but one can't say with certainty that they would have been
caught sooner if checksums had been enabled.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Peter Geoghegan

On Wed, Jan 25, 2017 at 1:22 PM, Peter Geoghegan  wrote:
> I understand that my experience with storage devices is unusually
> narrow compared to everyone else here. That's why I remain neutral on
> the high level question of whether or not we ought to enable checksums
> by default. I'll ask other hackers to answer what may seem like a very
> naive question, while bearing what I just said in mind. The question
> is: Have you ever actually seen a checksum failure in production? And,
> if so, how helpful was it?

I'm surprised that nobody has answered my question yet.

I'm not claiming that not actually seeing any corruption in the wild
due to a failing checksum invalidates any argument. I *do* think that
data points like this can be helpful, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-25 18:04:09 -0500, Stephen Frost wrote:
> > Robert's made it clear that he'd like to have a blanket rule that we
> > don't have superuser checks in these code paths if they can be GRANT'd
> > at the database level, which goes beyond pg_ls_dir.
> 
> That seems right to me.  I don't see much benefit for the superuser()
> style checks, with a few exceptions.  Granting by default is obviously
> an entirely different question.

Well, for my part at least, I disagree.  Superuser is a very different
animal, imv, than privileges which can be GRANT'd, and I feel that's an
altogether good thing.

> In other words, you're trying to force people to do stuff your preferred
> way, instead of allowing them to get things done is a reasonable manner.

Apparently we disagree about what is a 'reasonable manner'.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost  wrote:
> > I hope to discuss it further after we have the ability to turn it off
> > easily.
> 
> I think we should have the ability to flip it in BOTH directions easily.

Presumably you imply this to mean "before we enable it by default."  I'm
not sure that I can agree with that, but we haven't got it in either
direction yet, so it's not terribly interesting to discuss that
particular "what if."

> It sounds to me like you are misleading users about the positives and
> negatives of checksums, which then causes them to be shocked that they
> are not the default.

I don't try to claim that they are without downsides or performance
impacts, if that's the implication here.

> > [ more unsolicited bragging an unspecified backup tool, presumably still 
> > pgbackrest ]

It was explicitly to counter the claim that there aren't things out
there which are working to actively check the checksums.

> > I'd rather walk into an engagement where the user is saying "yeah, we
> > enabled checksums and it caught this corruption issue" than having to
> > break the bad news, which I've had to do over and over, that their
> > existing system hasn't got checksums enabled.  This isn't hypothetical,
> > it's what I run into regularly with entirely reasonable and skilled
> > engineers who have been deploying PG.
> 
> Maybe you should just stop telling them and use the time thus freed up
> to work on improving the checksum feature.

I'm working to improve the usefulness of our checksum feature in a way
which will produce practical and much more immediate results than
anything I could do today in PG.  That said, I do plan to also support
working on checksums as I'm able to.  At the moment, that's supporting
Magnus' thread about enabling them by default.  I'd be a bit surprised
if he was trying to force a change on PG because he thinks it's going to
improve things for pgbackrest, but if so, I'm not going to complain when
it seems like an entirely sensible and good change which will benefit
PG's users too.

Even better would be if we had an independent tool to check checksums
endorsed by the PG community, but that won't happen for a release cycle.
I'd also be extremely happy if the other backup tools out there grew
the ability to check checksums in PG pages; frankly, I hope that adding
it to pgbackrest will push them to do so.

> I'm skeptical of this whole discussion because you seem to be filled
> with unalloyed confidence that checksums have little performance
> impact and will do wonderful things to prevent data loss, whereas I
> think they have significant performance impact and will only very
> slightly help to prevent data loss.  

I admit that they'll have a significant performance impact in some
environments, but I think the vast majority of installations won't see
anything different, while some of them may be saved by it, including, as
likely as not, a number of actual corruption issues that have been
brought up on these lists in the past few days, simply because reports
were asked for.

> I admit that the idea of having
> pgbackrest verify checksums while backing up seems like it could
> greatly improve the chances of checksums being useful, but I'm not
> going to endorse changing PostgreSQL's default for pgbackrest's
> benefit.  

I'm glad to hear that you generally endorse the idea of having a backup
tool verify checksums.  I'd love it if all of them did and I'm not going
to apologize for, as far as I'm aware, being the first to even make an
effort in that direction.

> It's got to be to the benefit of PostgreSQL users broadly,
> not just the subset of those people who use one particular backup
> tool.  

Hopefully, other backup solutions will add similar capability, and
perhaps someone will also write an independent tool, and eventually
those will get out in released versions, and maybe PG will grow a tool
to check checksums too, but I can't make other tool authors implement
it, nor can I make other committers work on it and while I'm doing what
I can, as I'm sure you understand, we all have a lot of different hats.

> Also, the massive hit that will probably occur on
> high-concurrency OLTP workloads larger than shared_buffers is going to
> be had to justify for any amount of backup security.  I think that
> problem's got to be solved or at least mitigated before we think about
> changing this.  I realize that not everyone would set the bar that
> high, but I see far too many customers with exactly that workload to
> dismiss it lightly.

I have a sneaking suspicion that the customers which you get directly
involved with tend to be at a different level than the majority of PG
users which exist out in the wild (I can't say that it's really any
different for me).  I don't think that's a bad thing, but I do think
users at all levels deserve consideration and not just those running
close to the limits of their gear.

Thanks!

Ste

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule

2017-01-25 15:07 GMT+01:00 Alvaro Herrera :

> Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > >> XMLTABLE is specified by the standard to return multiple rows ... but
> > >> then as far as my reading goes, it is only supposed to be supported in
> > >> the range table (FROM clause) not in the target list.  I wonder if
> > >> this would end up better if we only tried to support it in RT.  I
> asked
> > >> Pavel to implement it like that a few weeks ago, but ...
> >
> > > Right - it makes sense in the FROM list - but then it should be an
> > > executor node, instead of some expression thingy.
> >
> > +1 --- we're out of the business of having simple expressions that
> > return rowsets.
>
> Well, that's it.  I'm not committing this patch against two other
> committers' opinion, plus I was already on the fence about the
> implementation anyway.  I think you should just go with the flow and
> implement this by creating nodeTableexprscan.c.  It's not even
> difficult.
>

I am playing with this and the patch looks about 15kB longer - just due
implementation basic scan functionality - and I didn't touch a planner.

I am not happy from this - still I have a feeling so I try to reimplement
reduced SRF.

Regards

Pavel

>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> That would be enough. It should also be rare enough that there would
> not be that many pages to track when looking at records from the
> backup start position to minimum recovery point. It could be also
> simpler, though more time-consuming, to just let a backup recover up
> to the minimum recovery point (recovery_target = 'immediate'), and
> then run the checksum sanity checks. There are other checks usually
> needed on a backup anyway like being sure that index pages are in good
> shape even with a correct checksum, etc.

Belive me, I'm all for *all* of that.

> But here I am really high-jacking the thread, so I'll stop..

If you have further thoughts, I'm all ears.  This is all relatively new,
and I don't expect to have all of the answer or solutions.

Obviously, having to bring up a full database is an extra step (one we
try to make easy to do), but, sadly, we don't have any way to ask PG to
verify all the checksums with released versions, so that's what we're
working with.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-25 19:30:08 -0500, Stephen Frost wrote:
> > * Peter Geoghegan (p...@heroku.com) wrote:
> > > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> > > > As it is, there are backup solutions which *do* check the checksum when
> > > > backing up PG.  This is no longer, thankfully, some hypothetical thing,
> > > > but something which really exists and will hopefully keep users from
> > > > losing data.
> > > 
> > > Wouldn't that have issues with torn pages?
> > 
> > No, why would it?  The page has either been written out by PG to the OS,
> > in which case the backup s/w will see the new page, or it hasn't been.
> 
> Uh. Writes aren't atomic on that granularity.  That means you very well
> *can* see a torn page (in linux you can e.g. on 4KB os page boundaries
> of a 8KB postgres page). Just read a page while it's being written out.
> 
> You simply can't reliably verify checksums without replaying WAL (or
> creating a manual version of replay, as in checking the WAL for a FPW).

Looking through the WAL isn't any surprise and is something we've been
planning to do for other reasons anyway.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Michael Paquier

On Thu, Jan 26, 2017 at 9:32 AM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier
>>  wrote:
>> > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
>> >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>> >>> As it is, there are backup solutions which *do* check the checksum when
>> >>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>> >>> but something which really exists and will hopefully keep users from
>> >>> losing data.
>> >>
>> >> Wouldn't that have issues with torn pages?
>> >
>> > Why? What do you foresee here? I would think such backup solutions are
>> > careful enough to ensure correctly the durability of pages so as they
>> > are not partially written.
>>
>> Well, you'd have to keep a read(fd, buf, 8192) performed by the backup
>> tool from overlapping with a write(fd, buf, 8192) performed by the
>> backend.
>
> As Michael mentioned, that'd depend on if things are atomic from a
> user's perspective at certain sizes (perhaps 4k, which wouldn't be too
> surprising, but may also be system-dependent), in which case verifying
> that the page is in the WAL would be sufficient.

That would be enough. It should also be rare enough that there would
not be that many pages to track when looking at records from the
backup start position to minimum recovery point. It could be also
simpler, though more time-consuming, to just let a backup recover up
to the minimum recovery point (recovery_target = 'immediate'), and
then run the checksum sanity checks. There are other checks usually
needed on a backup anyway like being sure that index pages are in good
shape even with a correct checksum, etc.

But here I am really high-jacking the thread, so I'll stop..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Andres Freund

On 2017-01-25 19:30:08 -0500, Stephen Frost wrote:
> * Peter Geoghegan (p...@heroku.com) wrote:
> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> > > As it is, there are backup solutions which *do* check the checksum when
> > > backing up PG.  This is no longer, thankfully, some hypothetical thing,
> > > but something which really exists and will hopefully keep users from
> > > losing data.
> > 
> > Wouldn't that have issues with torn pages?
> 
> No, why would it?  The page has either been written out by PG to the OS,
> in which case the backup s/w will see the new page, or it hasn't been.

Uh. Writes aren't atomic on that granularity.  That means you very well
*can* see a torn page (in linux you can e.g. on 4KB os page boundaries
of a 8KB postgres page). Just read a page while it's being written out.

You simply can't reliably verify checksums without replaying WAL (or
creating a manual version of replay, as in checking the WAL for a FPW).


> This isn't like a case where only half the page made it to the disk
> because of a system failure though; everything is online and working
> properly during an online backup.

I don't think that really changes anything.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier
>  wrote:
> > On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
> >> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> >>> As it is, there are backup solutions which *do* check the checksum when
> >>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
> >>> but something which really exists and will hopefully keep users from
> >>> losing data.
> >>
> >> Wouldn't that have issues with torn pages?
> >
> > Why? What do you foresee here? I would think such backup solutions are
> > careful enough to ensure correctly the durability of pages so as they
> > are not partially written.
> 
> Well, you'd have to keep a read(fd, buf, 8192) performed by the backup
> tool from overlapping with a write(fd, buf, 8192) performed by the
> backend.

As Michael mentioned, that'd depend on if things are atomic from a
user's perspective at certain sizes (perhaps 4k, which wouldn't be too
surprising, but may also be system-dependent), in which case verifying
that the page is in the WAL would be sufficient.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> >> As it is, there are backup solutions which *do* check the checksum when
> >> backing up PG.  This is no longer, thankfully, some hypothetical thing,
> >> but something which really exists and will hopefully keep users from
> >> losing data.
> >
> > Wouldn't that have issues with torn pages?
> 
> Why? What do you foresee here? I would think such backup solutions are
> careful enough to ensure correctly the durability of pages so as they
> are not partially written.

I believe his concern was that the backup sw might see a
partially-updated page when it reads the file while PG is writing it.
In other words, would the kernel return some intermediate state of data
while an fwrite() is in progress.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

* Peter Geoghegan (p...@heroku.com) wrote:
> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> > As it is, there are backup solutions which *do* check the checksum when
> > backing up PG.  This is no longer, thankfully, some hypothetical thing,
> > but something which really exists and will hopefully keep users from
> > losing data.
> 
> Wouldn't that have issues with torn pages?

No, why would it?  The page has either been written out by PG to the OS,
in which case the backup s/w will see the new page, or it hasn't been.
Our testing has not turned up any issues as yet.  That said, it's
relatively new and I wouldn't be surprised if we need to do some
adjustments in that area, which might be system-dependent even.  We
could certainly check the WAL for the page that had a checksum error (we
currently simply report them, though don't throw away a prior backup if
we detect one).

This isn't like a case where only half the page made it to the disk
because of a system failure though; everything is online and working
properly during an online backup.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Michael Paquier

On Thu, Jan 26, 2017 at 9:22 AM, Andres Freund  wrote:
> On 2017-01-26 09:19:28 +0900, Michael Paquier wrote:
>> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
>> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>> >> As it is, there are backup solutions which *do* check the checksum when
>> >> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>> >> but something which really exists and will hopefully keep users from
>> >> losing data.
>> >
>> > Wouldn't that have issues with torn pages?
>>
>> Why? What do you foresee here? I would think such backup solutions are
>> careful enough to ensure correctly the durability of pages so as they
>> are not partially written.
>
> That means you have to replay enough WAL to get into a consistent
> state...

Ah, OK I got the point. Yes that would be a problem to check this
field on raw backups except if the page size matches the kernel's one
at 4k.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 7:19 PM, Michael Paquier
 wrote:
> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
>> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>>> As it is, there are backup solutions which *do* check the checksum when
>>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>>> but something which really exists and will hopefully keep users from
>>> losing data.
>>
>> Wouldn't that have issues with torn pages?
>
> Why? What do you foresee here? I would think such backup solutions are
> careful enough to ensure correctly the durability of pages so as they
> are not partially written.

Well, you'd have to keep a read(fd, buf, 8192) performed by the backup
tool from overlapping with a write(fd, buf, 8192) performed by the
backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost  wrote:
> I hope to discuss it further after we have the ability to turn it off
> easily.

I think we should have the ability to flip it in BOTH directions easily.

>> Second, really hard to enable is a relative term.  I accept that
>> enabling checksums is not a pleasant process.  Right now, you'd have
>> to do a dump/restore, or use logical replication to replicate the data
>> to a new cluster and then switch over.  On the other hand, if
>> checksums are really a critical feature, how are people getting to the
>> point where they've got a mission-critical production system and only
>> then discovering that they want to enable checksums?
>
> I truely do wish everyone would come talk to me before building out a
> database.  Perhaps that's been your experience, in which case, I envy
> you, but I tend to get a reaction more along the lines of "wait, what do
> you mean I had to pass some option to initdb to enable checksum?!?!".
> The fact that we've got a WAL implementation and clearly understand
> fsync requirements, why full page writes make sense, and that our WAL
> has its own CRCs which isn't possible to disable, tends to lead people
> to think we really know what we're doing and that we care a lot about
> their data.

It sounds to me like you are misleading users about the positives and
negatives of checksums, which then causes them to be shocked that they
are not the default.

> As I have said, I don't believe it has to be on for everyone.

For the second time, I didn't say that.  But the default has a
powerful influence on behavior.  If it didn't, you wouldn't be trying
to get it changed.

> [ unsolicited bragging about an unspecified backup tool, presumably 
> pgbackrest ]

Great.

> Presently, last I checked at least, the database system doesn't fall
> over and die if a single page's checksum fails.

This is another thing that I never said.

> [ more unsolicited bragging an unspecified backup tool, presumably still 
> pgbackrest ]

Swell.

>> I'm not trying to downplay the usefulness of checksums *in a certain
>> context*.  It's a good feature, and I'm glad we have it.  But I think
>> you're somewhat inflating the utility of it while discounting the very
>> real costs.
>
> The costs for checksums don't bother me any more than the costs for WAL
> or WAL CRCs or full page writes.

Obviously.  But I think they should.  Frankly, I think the costs for
full page writes should bother the heck out of all of us, but the
solution isn't to shut them off any more than it is to enable
checksums despite the cost.  It's to find a way to reduce the costs.

> They may not be required on every
> system, but they're certainly required on more than 'zero' entirely
> reasonable systems which people deploy in their production environments.

Nobody said otherwise.

> I'd rather walk into an engagement where the user is saying "yeah, we
> enabled checksums and it caught this corruption issue" than having to
> break the bad news, which I've had to do over and over, that their
> existing system hasn't got checksums enabled.  This isn't hypothetical,
> it's what I run into regularly with entirely reasonable and skilled
> engineers who have been deploying PG.

Maybe you should just stop telling them and use the time thus freed up
to work on improving the checksum feature.

I'm skeptical of this whole discussion because you seem to be filled
with unalloyed confidence that checksums have little performance
impact and will do wonderful things to prevent data loss, whereas I
think they have significant performance impact and will only very
slightly help to prevent data loss.  I admit that the idea of having
pgbackrest verify checksums while backing up seems like it could
greatly improve the chances of checksums being useful, but I'm not
going to endorse changing PostgreSQL's default for pgbackrest's
benefit.  It's got to be to the benefit of PostgreSQL users broadly,
not just the subset of those people who use one particular backup
tool.  Also, the massive hit that will probably occur on
high-concurrency OLTP workloads larger than shared_buffers is going to
be had to justify for any amount of backup security.  I think that
problem's got to be solved or at least mitigated before we think about
changing this.  I realize that not everyone would set the bar that
high, but I see far too many customers with exactly that workload to
dismiss it lightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Andres Freund

On 2017-01-26 09:19:28 +0900, Michael Paquier wrote:
> On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
> > On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> >> As it is, there are backup solutions which *do* check the checksum when
> >> backing up PG.  This is no longer, thankfully, some hypothetical thing,
> >> but something which really exists and will hopefully keep users from
> >> losing data.
> >
> > Wouldn't that have issues with torn pages?
> 
> Why? What do you foresee here? I would think such backup solutions are
> careful enough to ensure correctly the durability of pages so as they
> are not partially written.

That means you have to replay enough WAL to get into a consistent
state...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] safer node casting

2017-01-25 Thread Tom Lane

Andres Freund  writes:
> On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote:
>> RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc));

> Are you planning to add this / update this patch? Because I really would
> have liked this a number of times already...  I can update it according
> to my suggestions (to avoid multiple eval scenarios) if helpful

Yeah, I'd like that in sooner rather than later, too.  But we do need
it to be foolproof - no multiple evals.  The first draft had
inadequate-parenthesization hazards, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Michael Paquier

On Thu, Jan 26, 2017 at 9:14 AM, Peter Geoghegan  wrote:
> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
>> As it is, there are backup solutions which *do* check the checksum when
>> backing up PG.  This is no longer, thankfully, some hypothetical thing,
>> but something which really exists and will hopefully keep users from
>> losing data.
>
> Wouldn't that have issues with torn pages?

Why? What do you foresee here? I would think such backup solutions are
careful enough to ensure correctly the durability of pages so as they
are not partially written.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] safer node casting

2017-01-25 Thread Andres Freund

Hi Peter,


On 2016-12-31 12:08:22 -0500, Peter Eisentraut wrote:
> RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(lc));

Are you planning to add this / update this patch? Because I really would
have liked this a number of times already...  I can update it according
to my suggestions (to avoid multiple eval scenarios) if helpful

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag

2017-01-25 Thread Michael Paquier

(Adding Robert in CC.)

On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao  wrote:
> An unlogged table has an initialization fork. The initialization fork does
> not have an BM_PERMANENT flag when get a buffer.
> In checkpoint (not shutdown or end of recovery), it will not write to disk.
> after a crash recovery, the page of initialization fork will not correctly,
> then make the main fork not correctly too.

For init forks the flush need absolutely to happen, so that's really
not good. We ought to fix BufferAlloc() appropriately here.

> Here is an example for GIN index.
>
> create unlogged table gin_test_tbl(i int4[]);
> create index gin_test_idx on gin_test_tbl using gin (i);
> checkpoint;
>
> kill all the postgres process, and restart again.
>
> vacuum gin_test_tbl;  -- crash.
>
> It seems have same problem in BRIN, GIN, GiST and HASH index which using
> buffer for meta page initialize in ambuildempty function.

Yeah, other index AMs deal directly with the sync of the page, that's
why there is no issue for them.

So the patch attached fixes the problem by changing BufferAlloc() in
such a way that initialization forks are permanently written to disk,
which is what you are suggesting. As a simple fix for back-branches
that's enough, though on HEAD I think that we should really rework the
empty() routines so as the write goes through shared buffers first,
that seems more solid than relying on the sgmr routines to do this
work. Robert, what do you think?
-- 
Michael


unlogged-flush-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Peter Geoghegan

On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost  wrote:
> As it is, there are backup solutions which *do* check the checksum when
> backing up PG.  This is no longer, thankfully, some hypothetical thing,
> but something which really exists and will hopefully keep users from
> losing data.

Wouldn't that have issues with torn pages?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frost  wrote:
> > Yet, our default is to have them disabled and *really* hard to enable.
> 
> First of all, that could be fixed by further development.

I'm certainly all for doing so, but I don't agree that it necessairly is
required before we flip the default.  That said, if the way to get
checksums enabled by default is providing a relativly easy way to turn
them off, then that's something which I'll do what I can to help work
towards.  In other words, I'm not going to continue to argue, given the
various opinions of the group, that we should just flip it tomorrow.

I hope to discuss it further after we have the ability to turn it off
easily.

> Second, really hard to enable is a relative term.  I accept that
> enabling checksums is not a pleasant process.  Right now, you'd have
> to do a dump/restore, or use logical replication to replicate the data
> to a new cluster and then switch over.  On the other hand, if
> checksums are really a critical feature, how are people getting to the
> point where they've got a mission-critical production system and only
> then discovering that they want to enable checksums?  

I truely do wish everyone would come talk to me before building out a
database.  Perhaps that's been your experience, in which case, I envy
you, but I tend to get a reaction more along the lines of "wait, what do
you mean I had to pass some option to initdb to enable checksum?!?!".
The fact that we've got a WAL implementation and clearly understand
fsync requirements, why full page writes make sense, and that our WAL
has its own CRCs which isn't possible to disable, tends to lead people
to think we really know what we're doing and that we care a lot about
their data.

> > I agree that it's unfortunate that we haven't put more effort into
> > fixing that- I'm all for it, but it's disappointing to see that people
> > are not in favor of changing the default as I believe it would both help
> > our users and encourage more development of the feature.
> 
> I think it would help some users and hurt others.  I do agree that it
> would encourage more development of the feature -- almost of
> necessity.  In particular, I bet it would spur development of an
> efficient way of turning checksums off -- but I'd rather see us
> approach it from the other direction: let's develop an efficient way
> of turning the feature on and off FIRST.  Deciding that the feature
> has to be on for everyone because turning it on later is too hard for
> the people who later decide they want it is letting the tail wag the
> dog.

As I have said, I don't believe it has to be on for everyone.

> Also, I think that one of the big problems with the way checksums work
> is that you don't find problems with your archived data until it's too
> late.  Suppose that in February bits get flipped in a block.  You
> don't access the data until July[1].  Well, it's nice to have the
> system tell you that the data is corrupted, but what are you going to
> do about it?  By that point, all of your backups are probably
> corrupted.  So it's basically:

If your backup system is checking the checksums when backing up PG,
which I think every backup system *should* be doing, then guess what?
You've got a backup which you can go back to immediately, possibly with
the ability to restore all of the data from WAL.  That won't always be
the case, naturally, but it's a much better position than simply having
a system which continues to degrade until you've actually reached the
"you're screwed" level because PG will no longer read a page or perhaps
can't even start up, *and* you no longer have any backups.

As it is, there are backup solutions which *do* check the checksum when
backing up PG.  This is no longer, thankfully, some hypothetical thing,
but something which really exists and will hopefully keep users from
losing data.

> It's nice to know that (maybe?) but without a recovery strategy a
> whole lot of people who get that message are going to immediately
> start asking "How do I ignore the fact that I'm screwed and try to
> read the data anyway?".  

And we have options for that.

> And then you wonder what the point of having
> the feature turned on is, especially if it's costly.  It's almost an
> attractive nuisance at that point - nobody wants to be the user that
> turns off checksums because they sound good on paper, but when you
> actually have a problem an awful lot of people are NOT going to want
> to try to restore from backup and maybe lose recent transactions.
> They're going to want to ignore the checksum failures.  That's kind of
> awful.

Presently, last I checked at least, the database system doesn't fall
over and die if a single page's checksum fails.  I agree entirely that
we want the system to fail gracefully (unless the user instructs us
otherwise, perhaps because they have a redundant system that they can
flip to immediately).

> Pe

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Tom Lane

[ in the service of closing out this thread... ]

Peter Geoghegan  writes:
> Finally, 0003-* is a Valgrind suppression borrowed from my parallel
> CREATE INDEX patch. It's self-explanatory.

Um, I didn't find it all that self-explanatory.  Why wouldn't we want
to avoid writing undefined data?  I think the comment at least needs
to explain exactly what part of the written data might be uninitialized.
And I'd put the comment into valgrind.supp, too, not in the commit msg.

Also, the suppression seems far too broad.  It would for instance
block any complaint about a write() invoked via an elog call from
any function invoked from any LogicalTape* function, no matter
how far removed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Andres Freund

On 2017-01-25 18:04:09 -0500, Stephen Frost wrote:
> Andres,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2017-01-25 16:52:38 -0500, Stephen Frost wrote:
> > > * Robert Haas (robertmh...@gmail.com) wrote:
> > > > Preventing people from calling functions by denying the ability to
> > > > meaningfully GRANT EXECUTE on those functions doesn't actually stop
> > > > them from delegating those functions to non-superusers.  It either (a)
> > > > causes them to give superuser privileges to accounts that otherwise
> > > > would have had lesser privileges or (b) forces them to use wrapper
> > > > functions to grant access rather than doing it directly or (c) some
> > > > other dirty hack.  If they pick (a), security is worse; if they pick
> > > > (b) or (c), you haven't prevented them from doing what they wanted to
> > > > do anyway.  You've just made it annoying and inconvenient.
> > >
> > > The notion that security is 'worse' under (a) is flawed- it's no
> > > different.
> > 
> > Huh? Obviously that's nonesense, given the pg_ls_dir example.
> 
> Robert's made it clear that he'd like to have a blanket rule that we
> don't have superuser checks in these code paths if they can be GRANT'd
> at the database level, which goes beyond pg_ls_dir.

That seems right to me.  I don't see much benefit for the superuser()
style checks, with a few exceptions.  Granting by default is obviously
an entirely different question.


> If the question was only about pg_ls_dir, then I still wouldn't be for
> it, because, as the bits you didn't quote discussed, it encourages users
> and 3rd party tool authors to base more things off of pg_ls_dir to
> look into the way PG stores data on disk, and affords more access than
> the monitoring user has any need for, none of which are good, imv.  It
> also discourages people from implementing proper solutions which you can
> 'just use pg_ls_dir()', which I also don't agree with.

In other words, you're trying to force people to do stuff your preferred
way, instead of allowing them to get things done is a reasonable manner.


> If you really want to do an ls, go talk to the OS.  ACLs are possible to
> provide that with more granularity than what would be available through
> pg_ls_dir().  We aren't in the "give a user the ability to do an ls"
> business, frankly.

Wut.


> > > I am suggesting that we shouldn't make it look like there are
> > > distinctions when there is actually no difference.  That is a good thing
> > > for our users because it keeps them informed about what they're actually
> > > doing when they grant access.
> > 
> > This position doesn't make a lick of sense to me.  There's simply no
> > benefit at all in requiring to create wrapper functions, over allowing
> > to grant to non-superuser. Both is possible, secdef is a lot harder to
> > get right. And you already heavily started down the path of removing
> > superuser() type checks - you're just arguing to make it more or less
> > randomly less consistent.
> 
> I find this bizarre considering I went through a detailed effort to go
> look at every superuser check in the system and discussed, on this list,
> the reasoning behind each and every one of them.  I do generally
> consider arbitrary access to syscalls via the database to be a privilege
> which really only the superuser should have.

Just because you argued doesn't mean I agree.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Peter Geoghegan

On Wed, Jan 25, 2017 at 3:11 PM, Tom Lane  wrote:
> Please.  You might want to hit the existing ones with a separate patch,
> but it doesn't much matter; I'd be just as happy with a patch that did
> both things.

Got it.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Tom Lane

Peter Geoghegan  writes:
> It means "another call to tuplesort_gettupleslot", but I believe that
> it would be safer (more future-proof) to actually specify "the slot
> contents may be invalidated by any subsequent manipulation of the
> tuplesort's state" instead.

WFM.

>> There are several other uses of "call here", both in this patch and
>> pre-existing in tuplesort.c, that I find equally vague and unsatisfactory.
>> Let's try to improve that.

> Should I write a patch along those lines?

Please.  You might want to hit the existing ones with a separate patch,
but it doesn't much matter; I'd be just as happy with a patch that did
both things.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Peter Geoghegan

On Wed, Jan 25, 2017 at 2:49 PM, Tom Lane  wrote:
> I looked at the 0002 patch, and while the code is probably OK, I am
> dissatisfied with this API spec:
>
> + * If copy is TRUE, the slot receives a copied tuple that will stay valid
> + * regardless of future manipulations of the tuplesort's state.  Memory is
> + * owned by the caller.  If copy is FALSE, the slot may just receive a 
> pointer
> + * to a tuple held within the tuplesort.  The latter is more efficient, but
> + * the slot contents may be corrupted if there is another call here before
> + * previous slot contents are used.
>
> What does "here" mean?  If that means specifically "another call of
> tuplesort_gettupleslot", say so.  If "here" refers to the whole module,
> it would be better to say something like "the slot contents may be
> invalidated by any subsequent manipulation of the tuplesort's state".
> In any case it'd be a good idea to delineate safe usage patterns, perhaps
> "copy=FALSE is recommended only when the next tuplesort manipulation will
> be another tuplesort_gettupleslot fetch into the same slot."

I agree with your analysis.

It means "another call to tuplesort_gettupleslot", but I believe that
it would be safer (more future-proof) to actually specify "the slot
contents may be invalidated by any subsequent manipulation of the
tuplesort's state" instead.

> There are several other uses of "call here", both in this patch and
> pre-existing in tuplesort.c, that I find equally vague and unsatisfactory.
> Let's try to improve that.

Should I write a patch along those lines?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-25 16:52:38 -0500, Stephen Frost wrote:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> > > Preventing people from calling functions by denying the ability to
> > > meaningfully GRANT EXECUTE on those functions doesn't actually stop
> > > them from delegating those functions to non-superusers.  It either (a)
> > > causes them to give superuser privileges to accounts that otherwise
> > > would have had lesser privileges or (b) forces them to use wrapper
> > > functions to grant access rather than doing it directly or (c) some
> > > other dirty hack.  If they pick (a), security is worse; if they pick
> > > (b) or (c), you haven't prevented them from doing what they wanted to
> > > do anyway.  You've just made it annoying and inconvenient.
> >
> > The notion that security is 'worse' under (a) is flawed- it's no
> > different.
> 
> Huh? Obviously that's nonesense, given the pg_ls_dir example.

Robert's made it clear that he'd like to have a blanket rule that we
don't have superuser checks in these code paths if they can be GRANT'd
at the database level, which goes beyond pg_ls_dir.

If the question was only about pg_ls_dir, then I still wouldn't be for
it, because, as the bits you didn't quote discussed, it encourages users
and 3rd party tool authors to base more things off of pg_ls_dir to
look into the way PG stores data on disk, and affords more access than
the monitoring user has any need for, none of which are good, imv.  It
also discourages people from implementing proper solutions which you can
'just use pg_ls_dir()', which I also don't agree with.

If you really want to do an ls, go talk to the OS.  ACLs are possible to
provide that with more granularity than what would be available through
pg_ls_dir().  We aren't in the "give a user the ability to do an ls"
business, frankly.

> > With regard to 'b', if their wrapper function is
> > sufficiently careful to ensure that the caller isn't able to do anything
> > which would increase the caller's level to that of superuser, then
> > security is improved.
> 
> Given how complex "sufficiently careful" is for security definer UDFs,
> in comparison to estimating the security of granting to a function like
> pg_ls_dir (or pretty much any other that doesn't call out to SQL level
> stuff like operators, output functions, etc), I don't understand this.

If you're implying that security definer UDFs are hard to write and get
correct, then I agree with you there.  I was affording the benefit of
the doubt to that proposed approach.

> > If the wrapper simply turns around can calls the underlying function,
> > then it's no different from '(a)'.
> 
> Except for stuff like search path.

If you consider '(a)' to be the same as superuser, which I was
postulating, then this doesn't strike me as making terribly much
difference.

> > I am suggesting that we shouldn't make it look like there are
> > distinctions when there is actually no difference.  That is a good thing
> > for our users because it keeps them informed about what they're actually
> > doing when they grant access.
> 
> This position doesn't make a lick of sense to me.  There's simply no
> benefit at all in requiring to create wrapper functions, over allowing
> to grant to non-superuser. Both is possible, secdef is a lot harder to
> get right. And you already heavily started down the path of removing
> superuser() type checks - you're just arguing to make it more or less
> randomly less consistent.

I find this bizarre considering I went through a detailed effort to go
look at every superuser check in the system and discussed, on this list,
the reasoning behind each and every one of them.  I do generally
consider arbitrary access to syscalls via the database to be a privilege
which really only the superuser should have.

> > I've commented on here and spoken numerous times about exactly that goal
> > of reducing the checks in check_postgres.pl which require superuser.
> > You're not actually doing that and nothing you've outlined in here so
> > far makes me believe you see how having pg_write_file() access is
> > equivilant to giving someone superuser, and that concerns me.
> 
> That's the users responsibility, and Robert didnt' really suggest
> granting pg_write_file() permissions, so this seems to be a straw man.

He was not arguing for only pg_ls_dir(), but for checks in all
"friends", which he later clarified to include pg_write_file().

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-01-25 Thread Tom Lane

Peter Geoghegan  writes:
> I should have already specifically pointed out that the original
> discussion on what became 0002-* is here:
> postgr.es/m/7256.1476711...@sss.pgh.pa.us
> As I said already, the general idea seems uncontroversial.

I looked at the 0002 patch, and while the code is probably OK, I am
dissatisfied with this API spec:

+ * If copy is TRUE, the slot receives a copied tuple that will stay valid
+ * regardless of future manipulations of the tuplesort's state.  Memory is
+ * owned by the caller.  If copy is FALSE, the slot may just receive a pointer
+ * to a tuple held within the tuplesort.  The latter is more efficient, but
+ * the slot contents may be corrupted if there is another call here before
+ * previous slot contents are used.

What does "here" mean?  If that means specifically "another call of
tuplesort_gettupleslot", say so.  If "here" refers to the whole module,
it would be better to say something like "the slot contents may be
invalidated by any subsequent manipulation of the tuplesort's state".
In any case it'd be a good idea to delineate safe usage patterns, perhaps
"copy=FALSE is recommended only when the next tuplesort manipulation will
be another tuplesort_gettupleslot fetch into the same slot."

There are several other uses of "call here", both in this patch and
pre-existing in tuplesort.c, that I find equally vague and unsatisfactory.
Let's try to improve that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule

2017-01-25 23:33 GMT+01:00 Andres Freund :

> On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> > 2017-01-25 22:40 GMT+01:00 Andres Freund :
> > > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > > reimplement
> > > > it partially :( - mainly for usage in "ROWS FROM ()"
> > >
> >
> > The TableExpr implementation is based on SRF now. You and Alvaro propose
> > independent implementation like generic executor node. I am sceptic so
> > FunctionScan supports reading from generic executor node.
>
> Why would it need to?
>

Simply - due consistency with any other functions that can returns rows.

Maybe I don't understand to Alvaro proposal well - I have a XMLTABLE
function - TableExpr that looks like SRF function, has similar behave -
returns more rows, but should be significantly different implemented, and
should to have different limits - should not be used there and there ... It
is hard to see consistency there for me.

Regards

Pavel

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Andres Freund

Hi,

On 2017-01-25 16:52:38 -0500, Stephen Frost wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > Preventing people from calling functions by denying the ability to
> > meaningfully GRANT EXECUTE on those functions doesn't actually stop
> > them from delegating those functions to non-superusers.  It either (a)
> > causes them to give superuser privileges to accounts that otherwise
> > would have had lesser privileges or (b) forces them to use wrapper
> > functions to grant access rather than doing it directly or (c) some
> > other dirty hack.  If they pick (a), security is worse; if they pick
> > (b) or (c), you haven't prevented them from doing what they wanted to
> > do anyway.  You've just made it annoying and inconvenient.
>
> The notion that security is 'worse' under (a) is flawed- it's no
> different.

Huh? Obviously that's nonesense, given the pg_ls_dir example.

> With regard to 'b', if their wrapper function is
> sufficiently careful to ensure that the caller isn't able to do anything
> which would increase the caller's level to that of superuser, then
> security is improved.

Given how complex "sufficiently careful" is for security definer UDFs,
in comparison to estimating the security of granting to a function like
pg_ls_dir (or pretty much any other that doesn't call out to SQL level
stuff like operators, output functions, etc), I don't understand this.

> If the wrapper simply turns around can calls the underlying function,
> then it's no different from '(a)'.

Except for stuff like search path.

> I am suggesting that we shouldn't make it look like there are
> distinctions when there is actually no difference.  That is a good thing
> for our users because it keeps them informed about what they're actually
> doing when they grant access.

This position doesn't make a lick of sense to me.  There's simply no
benefit at all in requiring to create wrapper functions, over allowing
to grant to non-superuser. Both is possible, secdef is a lot harder to
get right. And you already heavily started down the path of removing
superuser() type checks - you're just arguing to make it more or less
randomly less consistent.

> I've commented on here and spoken numerous times about exactly that goal
> of reducing the checks in check_postgres.pl which require superuser.
> You're not actually doing that and nothing you've outlined in here so
> far makes me believe you see how having pg_write_file() access is
> equivilant to giving someone superuser, and that concerns me.

That's the users responsibility, and Robert didnt' really suggest
granting pg_write_file() permissions, so this seems to be a straw man.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Andres Freund

On 2017-01-25 22:51:37 +0100, Pavel Stehule wrote:
> 2017-01-25 22:40 GMT+01:00 Andres Freund :
> > > I afraid when I cannot to reuse a SRF infrastructure, I have to
> > reimplement
> > > it partially :( - mainly for usage in "ROWS FROM ()"
> >
> 
> The TableExpr implementation is based on SRF now. You and Alvaro propose
> independent implementation like generic executor node. I am sceptic so
> FunctionScan supports reading from generic executor node.

Why would it need to?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 3:17 PM, Stephen Frost  wrote:
> > Your email is 'pg_ls_dir & friends', which I took to imply at *least*
> > pg_read_file() and pg_read_binary_file(), and it's not unreasonable to
> > think you may have meant everything in adminpack, which also includes
> > pg_file_write(), pg_file_rename() and pg_file_unlink().
> 
> Well, I was talking about genfile.c, which doesn't contain those
> functions, but for the record I'm in favor of extirpating every single
> hard-coded superuser check from the backend without exception.

Then it was correct for me to assume that's what you meant, and means my
reaction and response are on-point.

> Preventing people from calling functions by denying the ability to
> meaningfully GRANT EXECUTE on those functions doesn't actually stop
> them from delegating those functions to non-superusers.  It either (a)
> causes them to give superuser privileges to accounts that otherwise
> would have had lesser privileges or (b) forces them to use wrapper
> functions to grant access rather than doing it directly or (c) some
> other dirty hack.  If they pick (a), security is worse; if they pick
> (b) or (c), you haven't prevented them from doing what they wanted to
> do anyway.  You've just made it annoying and inconvenient.

The notion that security is 'worse' under (a) is flawed- it's no
different.  With regard to 'b', if their wrapper function is
sufficiently careful to ensure that the caller isn't able to do anything
which would increase the caller's level to that of superuser, then
security is improved.  If the wrapper simply turns around can calls the
underlying function, then it's no different from '(a)'.

I am suggesting that we shouldn't make it look like there are
distinctions when there is actually no difference.  That is a good thing
for our users because it keeps them informed about what they're actually
doing when they grant access.

> Both EnterpriseDB's PEM and check_postgres.pl currently have a bunch
> of things that don't work unless you run as superuser.  I think we
> should be trying to provide ways of reducing those lists.  If I can't
> get agreement on method (a), I'm going to go look for ways of doing
> (b) or (c), which is more work and uglier but if I can't get consensus
> here then oh well.

I've commented on here and spoken numerous times about exactly that goal
of reducing the checks in check_postgres.pl which require superuser.
You're not actually doing that and nothing you've outlined in here so
far makes me believe you see how having pg_write_file() access is
equivilant to giving someone superuser, and that concerns me.

As someone who has contributed code and committed code back to
check_postgres.pl, I would be against making changes there which
install security definer functions to give the monitoring user
superuser-level access, and I believe Greg S-M would feel the same way
considering that he and I have discussed exactly that in the past.  I
don't mean to speak for him and perhaps his opinion has changed, but it
seems unlikely to me.

If the DBA wants to give the monitoring user superuser-level access to
run the superuser-requiring checks that check_postgres.pl has, they're
welcome to do so, but they'll be making an informed decision that they
have weighed the risk of their monitoring user being compromised against
the value of that additional monitoring, which is an entirely
appropriate and reasonable decision for them to make.

> I do not accept your authority to determine what monitoring tools need
> to or should do.  Monitoring tools that use pg_ls_dir are facts on the
> ground, and your disapprobation doesn't change that at all.  It just
> obstructs moving to a saner permissions framework.  

Allowing GRANT to be used to give access to pg_write_file and friends is
not a 'permissions framework'.  Further, I am not claiming authority
over what monitoring tools should need to do or not do, and a great many
people run their monitoring tools as superuser.  I am not trying to take
away their ability to do so.

The way to make progress here is not, however, to just decide that all
those superuser() checks we put in place were silly and should be
removed, it's to provide better ways to monitor PG which provide exactly
the monitoring information needed in a useful and sensible way.

I understand the allure of just removing a few lines of code to make
things "easier" or "faster" or "better", but I don't think it's a good
idea to remove these superuser checks, nor do I think it's a good idea
to remove our WAL CRCs even if it'd help some of my clients, nor do I
think it's a good idea to have checksums disabled by default, or to rip
out all of WAL as being "unnecessary" because we have ZFS and it should
handle all things data and we could just fsync all of the heap files on
commit and be done with it.

Admittedly, you're not argueing for half of what I just mentioned, but
I'm not arguing

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule

2017-01-25 22:40 GMT+01:00 Andres Freund :

> Hi,
>
> > > > I'll try to explain my motivation. Please, check it and correct me
> if I
> > > am
> > > > wrong. I don't keep on my implementation - just try to implement
> XMLTABLE
> > > > be consistent with another behave and be used all time without any
> > > > surprise.
> > > >
> > > > 1. Any function that produces a content can be used in target list.
> We
> > > > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > > > exception?
> > >
> > > targetlist SRFs were a big mistake. They cause a fair number of
> problems
> > > code-wise. They permeated for a long while into bits of both planner
> and
> > > executor, where they really shouldn't belong. Even after the recent
> > > changes there's a fair amount of uglyness associated with them.  We
> > > can't remove tSRFs for backward compatibility reasons, but that's not
> > > true for XMLTABLE
> > >
> > >
> > >
> > ok
> >
> > I afraid when I cannot to reuse a SRF infrastructure, I have to
> reimplement
> > it partially :( - mainly for usage in "ROWS FROM ()"
>

The TableExpr implementation is based on SRF now. You and Alvaro propose
independent implementation like generic executor node. I am sceptic so
FunctionScan supports reading from generic executor node.

Regards

Pavel


> Huh?
>
> Greetings,
>
> Andres Freund
>

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-25 Thread Tom Lane

Nikita Glukhov  writes:
> On 25.01.2017 23:58, Tom Lane wrote:
>> I think you need to take a second look at the code you're producing
>> and realize that it's not so clean either.  This extract from
>> populate_record_field, for example, is pretty horrid:

> But what if we introduce some helper macros like this:

> #define JsValueIsNull(jsv) \
>  ((jsv)->is_json ? !(jsv)->val.json.str \
>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)

> #define JsValueIsString(jsv) \
>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)

Yeah, I was wondering about that too.  I'm not sure that you can make
a reasonable set of helper macros that will fix this, but if you want
to try, go for it.

BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
to go back to the manual every darn time to convince myself whether
that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
the reader (... or the author) having memorized C's precedence rules
in quite that much detail.  Extra parens help.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

2017-01-25 Thread Michael Paquier

On Wed, Jan 25, 2017 at 9:56 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> And nothing has happened since. Are there people willing to review
>> this patch and help it proceed?
>
> I am going to grab this patch as committer.

Thanks, that's good to know.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Andres Freund

Hi,

> > > I'll try to explain my motivation. Please, check it and correct me if I
> > am
> > > wrong. I don't keep on my implementation - just try to implement XMLTABLE
> > > be consistent with another behave and be used all time without any
> > > surprise.
> > >
> > > 1. Any function that produces a content can be used in target list. We
> > > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > > exception?
> >
> > targetlist SRFs were a big mistake. They cause a fair number of problems
> > code-wise. They permeated for a long while into bits of both planner and
> > executor, where they really shouldn't belong. Even after the recent
> > changes there's a fair amount of uglyness associated with them.  We
> > can't remove tSRFs for backward compatibility reasons, but that's not
> > true for XMLTABLE
> >
> >
> >
> ok
> 
> I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement
> it partially :( - mainly for usage in "ROWS FROM ()"

Huh?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-25 Thread Nikita Glukhov


On 25.01.2017 23:58, Tom Lane wrote:

I think you need to take a second look at the code you're producing
and realize that it's not so clean either.  This extract from
populate_record_field, for example, is pretty horrid:


But what if we introduce some helper macros like this:

#define JsValueIsNull(jsv) \
((jsv)->is_json ? !(jsv)->val.json.str \
: !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)

#define JsValueIsString(jsv) \
((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
: (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)


 /* prepare column metadata cache for the given type */
 if (col->typid != typid || col->typmod != typmod)
 prepare_column_cache(col, typid, typmod, mcxt, jsv->is_json);

 *isnull = JsValueIsNull(jsv);

 typcat = col->typcat;

 /* try to convert json string to a non-scalar type through input function 
*/
 if (JsValueIsString(jsv) &&
 (typcat == TYPECAT_ARRAY || typcat == TYPECAT_COMPOSITE))
 typcat = TYPECAT_SCALAR;

 /* we must perform domain checks for NULLs */
 if (*isnull && typcat != TYPECAT_DOMAIN)
 return (Datum) 0;

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Pavel Stehule

> > >
> >
> > If you plan to hold support SRFin target list, then nothing is different.
> > In last patch is executed under nodeProjectSet.
>
> It is, because we suddenly need to call different functions - and I'm
> revamping most of execQual to have an opcode dispatch based execution
> model (which then also can be JITed).
>

> > > > XMLTABLE is specified by the standard to return multiple rows ... but
> > > > then as far as my reading goes, it is only supposed to be supported
> in
> > > > the range table (FROM clause) not in the target list.  I wonder if
> > > > this would end up better if we only tried to support it in RT.  I
> asked
> > > > Pavel to implement it like that a few weeks ago, but ...
> > >
> > > Right - it makes sense in the FROM list - but then it should be an
> > > executor node, instead of some expression thingy.
> > >
> >
> > The XMLTABLE function is from user perspective, from implementation
> > perspective a form of SRF function. I use own executor node, because
> fcinfo
> > is complex already and not too enough to hold all information about
> result
> > columns.
>
>
> > The implementation as RT doesn't reduce code - it is just moving to
> > different file.
>
> You're introducing a wholly separate callback system (TableExprRoutine)
> for the new functionality.  And that stuff is excruciatingly close to
> stuff that the normal executor already knows how to do.
>

These callbacks are related to isolation TableExpr infrastructure and
TableExpr implementation - This design is prepared for reusing for
JSON_TABLE function.

Any placing of TableExpr code should not impact this callback system (Or I
am absolutely out and executor is able do some work what is hidden to me).


>
>
>
> > I'll try to explain my motivation. Please, check it and correct me if I
> am
> > wrong. I don't keep on my implementation - just try to implement XMLTABLE
> > be consistent with another behave and be used all time without any
> > surprise.
> >
> > 1. Any function that produces a content can be used in target list. We
> > support SRF in target list and in FROM part. Why XMLTABLE should be a
> > exception?
>
> targetlist SRFs were a big mistake. They cause a fair number of problems
> code-wise. They permeated for a long while into bits of both planner and
> executor, where they really shouldn't belong. Even after the recent
> changes there's a fair amount of uglyness associated with them.  We
> can't remove tSRFs for backward compatibility reasons, but that's not
> true for XMLTABLE
>
>
>
ok

I afraid when I cannot to reuse a SRF infrastructure, I have to reimplement
it partially :( - mainly for usage in "ROWS FROM ()"



Greetings,
>
> Andres Freund
>

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-01-25 Thread Tobias Oberstein


Hi,


Synthetic PG workload or real world production workload?


Both might work, production-like has bigger pull, but I'd guess
synthetic is good enough.


Thanks! The box should get PostgreSQL in the not too distant future. 
It'll get a backup from prod, but will act as new prod, so it might take 
some time until a job can be run and a profile collected.



So how would I do a perf profile that would be acceptable as prove?


You'd have to look at cpu time, not number of syscalls.  IIRC I
suggested doing a cycles profile with -g and then using "perf report
--children" to see how many cycles are spent somewhere below lseek.


Understood. Either profile manually or expand the function.


I'd also suggest sharing a profile cycles profile, it's quite likely
that the overhead is completely elsewhere.


Yeah, could be. It'll be interesting to see for sure. I should get a 
chance to collect such profile and then I'll post it back here -


/Tobias



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Peter Geoghegan

On Wed, Jan 25, 2017 at 12:23 PM, Robert Haas  wrote:
> Also, I think that one of the big problems with the way checksums work
> is that you don't find problems with your archived data until it's too
> late.  Suppose that in February bits get flipped in a block.  You
> don't access the data until July[1].  Well, it's nice to have the
> system tell you that the data is corrupted, but what are you going to
> do about it?  By that point, all of your backups are probably
> corrupted.  So it's basically:
>
> ERROR: you're screwed
>
> It's nice to know that (maybe?) but without a recovery strategy a
> whole lot of people who get that message are going to immediately
> start asking "How do I ignore the fact that I'm screwed and try to
> read the data anyway?".

That's also how I tend to think about it.

I understand that my experience with storage devices is unusually
narrow compared to everyone else here. That's why I remain neutral on
the high level question of whether or not we ought to enable checksums
by default. I'll ask other hackers to answer what may seem like a very
naive question, while bearing what I just said in mind. The question
is: Have you ever actually seen a checksum failure in production? And,
if so, how helpful was it?

I myself have not, despite the fact that Heroku uses checksums
wherever possible, and has the technical means to detect problems like
this across the entire fleet of customer databases. Not even once.
This is not what I would have expected myself several years ago.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 2:46 PM, Jim Nasby  wrote:
> I think the two need to be integrated much better than they are right now.
> They should certainly be in the same .so, and as others have mentioned the
> docs need to be fixed. For consistency, I think the name should just be
> pg_prewarm, as well as the prefix for the GUC.

Yikes.  +1, definitely.

> It would also be handy of those functions
> accepted a different filename. That way you could reset shared_buffers to a
> known condition before running a test.

That would have some pretty unpleasant security implications unless it
is awfully carefully thought out.  I doubt this has enough utility to
make it worth thinking that hard about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 3:17 PM, Stephen Frost  wrote:
> Your email is 'pg_ls_dir & friends', which I took to imply at *least*
> pg_read_file() and pg_read_binary_file(), and it's not unreasonable to
> think you may have meant everything in adminpack, which also includes
> pg_file_write(), pg_file_rename() and pg_file_unlink().

Well, I was talking about genfile.c, which doesn't contain those
functions, but for the record I'm in favor of extirpating every single
hard-coded superuser check from the backend without exception.
Preventing people from calling functions by denying the ability to
meaningfully GRANT EXECUTE on those functions doesn't actually stop
them from delegating those functions to non-superusers.  It either (a)
causes them to give superuser privileges to accounts that otherwise
would have had lesser privileges or (b) forces them to use wrapper
functions to grant access rather than doing it directly or (c) some
other dirty hack.  If they pick (a), security is worse; if they pick
(b) or (c), you haven't prevented them from doing what they wanted to
do anyway.  You've just made it annoying and inconvenient.

Both EnterpriseDB's PEM and check_postgres.pl currently have a bunch
of things that don't work unless you run as superuser.  I think we
should be trying to provide ways of reducing those lists.  If I can't
get agreement on method (a), I'm going to go look for ways of doing
(b) or (c), which is more work and uglier but if I can't get consensus
here then oh well.

>> I don't really think it's necessary to outline the use case more than
>> I have already.  It's perfectly reasonable to want a monitoring tool
>> to have access to pg_ls_dir() - for example, you could use that to
>> monitor for relation files orphaned by a previous crash.
>
> Sure.  What I believe would be better would be a function in PG that
> allows you to know about all of the relation files which were orphaned
> from a previous crash, and perhaps even one to go clean all of those up.
> I certainly would have no issue with both of those being available to a
> non-superuser.
>
> Sure, you could do the same with pg_ls_dir(), various complex bits of
> SQL to figure out what's been orphaned, and pg_file_unlink() to handle
> the nasty part of unlinking the files, but you could also use
> pg_ls_dir() to look at the files in PG's home directory, or
> pg_file_unlink() to remove the PG user's .bashrc.
>
> Does the monitoring tool need to be able to see the files in PG's root
> directory, or to be able to  unlink the PG user's .bashrc?  No.

I do not accept your authority to determine what monitoring tools need
to or should do.  Monitoring tools that use pg_ls_dir are facts on the
ground, and your disapprobation doesn't change that at all.  It just
obstructs moving to a saner permissions framework.  I agree that
pg_file_unlink() is an unlikely need for a monitoring tool, but (1)
conflating pg_ls_dir with pg_file_unlink is a bit unfair and (2)
there's no actual point whatsoever in trying to restrict to whom the
superuser can give permissions, because the superuser has numerous
ways of working around any such restriction.

> I don't think it's nannyism to keep things like pg_read_file and
> pg_file_write as superuser-only.

I entirely disagree.  It is for the DBA to decide to whom and for what
purpose he wishes to give permissions.  And if somebody writes a tool
-- monitoring or otherwise -- that needs those permissions, the DBA
should be empowered to give those permissions and no more, and should
be empowered to do that in a nice, clean way without a lot of hassle.
The idea that you or I would try to decide which functions a user is
allowed to want to grant access to and which ones they are not allowed
to want to grant access to seems laughable to me.  We're not smart
enough to foresee every possible use case, and we shouldn't try.  Even
for a function that theoretically allows a privilege escalation to
superuser (like pg_write_file), it's got to be better to give a user
account permission to use that one function than to just give up and
give that account superuser privileges, because escalating privileges
with that blunt instrument would take more work than a casual hacker
would be willing to put in.  For a function like the one that this
thread actually started out being about, like pg_ls_dir, it's vastly
better.  You mention that you're not sure that pg_ls_dir() would allow
a privilege escalation to superuser, but I think we both know that
there is probably no such escalation.  Why you'd rather have people
running check_postgres.pl's wal_files check under an actual superuser
account rather than an account that only has rights to do pg_ls_dir is
beyond me.

Your response will no doubt be that there should otherwise be some
other way to write that check that doesn't use pg_ls_dir.  Maybe.  But
the thing is that pg_ls_dir is a general tool.  You can use it to do a
lot of things, and you can write a monitoring probe that does those
thing

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-25 Thread Alvaro Herrera

Looking at your 0002 patch now.  It no longer applies, but the conflicts
are trivial to fix.  Please rebase and resubmit.

I think the way WARM works has been pretty well hammered by now, other
than the CREATE INDEX CONCURRENTLY issues, so I'm looking at the code
from a maintainability point of view only.

I think we should have some test harness for WARM as part of the source
repository.  A test that runs for an hour hammering the machine to
highest possible load cannot be run in "make check", of course, but we
could have some specific Make target to run it manually.  We don't have
this for any other feature, but this looks like a decent place to start.
Maybe we should even do it before going any further.  The test code you
submitted looks OK to test the feature, but I'm not in love with it
enough to add it to the repo.  Maybe I will spend some time trying to
convert it to Perl using PostgresNode.


I think having the "recheck" index methods create an ExecutorState looks
out of place.  How difficult is it to pass the estate from the calling
code?

IMO heap_get_root_tuple_one should be called just heap_get_root_tuple().
That function and its plural sibling heap_get_root_tuples() should
indicate in their own comments what the expectations are regarding the
root_offsets output argument, rather than deferring to the comments in
the "internal" function, since they differ on that point; for the rest
of the invariants I think it makes sense to say "Also see the comment
for heap_get_root_tuples_internal".  I wonder if heap_get_root_tuple
should just return the ctid instead of assigning the value to a
passed-in pointer, i.e.
OffsetNumber
heap_get_root_tuple(Page page, OffsetNumber target_offnum)
{
OffsetNumberoff;
heap_get_root_tuples_internal(page, target_offnum, &off);
return off;
}


The simple_heap_update + CatalogUpdateIndexes pattern is getting
obnoxious.  How about creating something like catalog_heap_update which
does both things at once, and stop bothering each callsite with the WARM
stuff?  In fact, given that CatalogUpdateIndexes is used in other
places, maybe we should leave its API alone and create another function,
so that we don't have to change the many places that only do
simple_heap_insert.  (Places like OperatorCreate which do either insert
or update could just move the index update call into each branch.)


I'm not real sure about the interface between index AM and executor,
namely IndexScanDesc->xs_tuple_recheck.  For example, this pattern:
if (!scan->xs_recheck)
scan->xs_tuple_recheck = false;
else
scan->xs_tuple_recheck = true;
can become simply
scan->xs_tuple_recheck = scan->xs_recheck;
which looks odd.  I can't pinpoint exactly what's the problem, though.
I'll continue looking at this one.

I wonder if heap_hot_search_buffer() and heap_hot_search() should return
a tri-valued enum instead of boolean; that idea looks reasonable in
theory but callers have to do more work afterwards, so maybe not.

I think heap_hot_search() sometimes leaving the buffer pinned is
confusing.  Really, the whole idea of having heap_hot_search have a
buffer output argument is an important API change that should be better
thought.  Maybe it'd be better to return the buffer pinned always, and
the caller is always in charge of unpinning if not InvalidBuffer.  Or
perhaps we need a completely new function, given how different it is to
the original?  If you tried to document in the comment above
heap_hot_search how it works, you'd find that it's difficult to
describe, which'd be an indicator that it's not well considered.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-01-25 Thread Pavel Stehule

2017-01-25 21:06 GMT+01:00 Jim Nasby :

> On 1/23/17 11:38 PM, Pavel Stehule wrote:
>
>>
>> Instead of paralleling all the existing namespace stuff, I wonder if
>> it'd be better to create explicit block infrastructure. AFAIK
>> PRAGMAs are going to have a lot of the same requirements (certainly
>> the nesting is the same), and we might want more of this king of
>> stuff in the future. (I've certainly wished I could set a GUC in a
>> plpgsql block and have it's settings revert when exiting the block...)
>>
>>
>> I am not sure if I understand. ?? Setting GUC by PRAGMA can work - the
>> syntax supports it and GUC API supports nesting. Not sure about
>> exception handling - but it should not be problem probably.
>>
>> Please, can you show some examples.
>>
>
> From a code standpoint, there's already some ugliness around blocks:
> there's the code that handles blocks themselves (which IIRC is responsible
> for subtransactions), then there's the namespace code, which is very
> separate even though namespaces are very much tied to blocks. Your patch is
> adding another layer into the mix, separate from both blocks and
> namespaces. I think it would be better to combine all 3 together, or at
> least not make matters worse. So IMHO the pragma stuff should be part of
> handling blocks, and not something that's stand alone. IE: make the pragma
> info live in PLpgSQL_stmt_block.
>

I don't think it is fully correct - the pragma can be related to function
too - and namespaces can be related to some other statements - cycles. Any
PLpgSQL_stmt_block does some overhead and probably we want to build a fake
statements to ensure 1:1 relations between namespaces and blocks.

I didn't implement and proposed third level of pragma - statement. For
example the assertions in Ada language are implemented with pragma.
Currently I am not thinking about this form for Postgres.

The cursor options is better stored in expression - the block related GUC
probably should be stored in stmt_block. The pragma is  additional
information, and how this information will be used and what impact will be
on generated code depends on pragma - can be different.


> GUCs support SET LOCAL, but that's not the same as local scoping because
> the setting stays in effect unless the substrans aborts. What I'd like is
> the ability to set a GUC in a plpgsql block *and have the setting revert on
> block exit*.


I am think so it is solvable.

Regards

Pavel

>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)
>

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-25 Thread Tom Lane

Nikita Glukhov  writes:
> On 22.01.2017 21:58, Tom Lane wrote:
>> 5. The business with having some arguments that are only for json and
>> others that are only for jsonb, eg in populate_scalar(), also makes me
>> itch.

> I've refactored json/jsonb passing using new struct JsValue which contains
> union for json/jsonb values.

I'm not too enamored of that fix.  It doesn't do much for readability, and
at least with my compiler (gcc 4.4.7), I sometimes get "variable might be
used uninitialized" warnings, probably due to not all fields of the union
getting set in every code path.

>> I wonder whether this wouldn't all get a lot simpler and more
>> readable if we gave up on trying to share code between the two cases.

> Maybe two separate families of functions like this
> ...
> could slightly improve readability, but such code duplication (I believe it is
> a duplicate code) would never be acceptable to me.

I think you need to take a second look at the code you're producing
and realize that it's not so clean either.  This extract from
populate_record_field, for example, is pretty horrid:

/* prepare column metadata cache for the given type */
if (col->typid != typid || col->typmod != typmod)
prepare_column_cache(col, typid, typmod, mcxt, jsv->is_json);

*isnull = jsv->is_json ? jsv->val.json.str == NULL
   : jsv->val.jsonb == NULL ||
 jsv->val.jsonb->type == jbvNull;

typcat = col->typcat;

/* try to convert json string to a non-scalar type through input function */
if ((jsv->is_json ? jsv->val.json.type == JSON_TOKEN_STRING
  : jsv->val.jsonb &&
jsv->val.jsonb->type == jbvString) &&
 (typcat == TYPECAT_ARRAY ||
  typcat == TYPECAT_COMPOSITE))
typcat = TYPECAT_SCALAR;

/* we must perform domain checks for NULLs */
if (*isnull && typcat != TYPECAT_DOMAIN)
return (Datum) 0;

When every other line contains an is_json conditional, you are not making
good readable code.  It's also worth noting that this is going to become
even less readable after pgindent gets done with it:

/* prepare column metadata cache for the given type */
if (col->typid != typid || col->typmod != typmod)
prepare_column_cache(col, typid, typmod, mcxt, jsv->is_json);

*isnull = jsv->is_json ? jsv->val.json.str == NULL
: jsv->val.jsonb == NULL ||
jsv->val.jsonb->type == jbvNull;

typcat = col->typcat;

/* try to convert json string to a non-scalar type through input function */
if ((jsv->is_json ? jsv->val.json.type == JSON_TOKEN_STRING
 : jsv->val.jsonb &&
 jsv->val.jsonb->type == jbvString) &&
(typcat == TYPECAT_ARRAY ||
 typcat == TYPECAT_COMPOSITE))
typcat = TYPECAT_SCALAR;

/* we must perform domain checks for NULLs */
if (*isnull && typcat != TYPECAT_DOMAIN)
return (Datum) 0;

You could maybe improve that result with some different formatting
choices, but I think it's basically a readability fail to be relying
heavily on multiline x?y:z conditionals in the first place.

And I still maintain that it's entirely silly to be structuring
populate_scalar() this way.

So I really think it'd be a good idea to explore separating the json and
jsonb code paths.  This direction isn't looking like a win.

>> 7. More generally, the patch is hard to review because it whacks the
>> existing code around so thoroughly that it's tough even to match up
>> old and new code to get a handle on what you changed functionally.
>> Maybe it would be good if you could separate it into a refactoring
>> patch that just moves existing code around without functional changes,
>> and then a patch on top of that that adds code and makes only localized
>> changes in what's already there.

> I've split this patch into two patches as you asked.

That didn't really help :-( ... the 0002 patch is still nigh unreadable.
Maybe it's trying to do too much in one step.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-25 Thread Andres Freund

Hi,

On 2017-01-25 05:45:24 +0100, Pavel Stehule wrote:
> 2017-01-25 1:35 GMT+01:00 Andres Freund :
> 
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > > Andres Freund wrote:
> > > > Hi,
> > > >
> > > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > +   bool *isnull);
> > > > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate,
> > ExprContext *econtext,
> > > > > +   bool *isNull);
> > > > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > + bool *isNull);
> > > > > +static void tabexprInitialize(TableExprState *tstate, ExprContext
> > *econtext,
> > > > > +   Datum doc);
> > > > > +static void ShutdownTableExpr(Datum arg);
> > > >
> > > > To me this (and a lot of the other code) hints quite strongly that
> > > > expression evalution is the wrong approach to implementing this.  What
> > > > you're essentially doing is building a vulcano style scan node.  Even
> > if
> > > > we can this, we shouldn't double down on the bad decision to have these
> > > > magic expressions that return multiple rows.  There's historical reason
> > > > for tSRFs, but we shouldn't add more weirdness like this.
> > >
> > > Thanks for giving it a look.  I have long thought that this patch would
> > > be at odds with your overall executor work.
> >
> > Not fundamentally, but it makes it harder.
> >
> 
> If you plan to hold support SRFin target list, then nothing is different.
> In last patch is executed under nodeProjectSet.

It is, because we suddenly need to call different functions - and I'm
revamping most of execQual to have an opcode dispatch based execution
model (which then also can be JITed).


> > > XMLTABLE is specified by the standard to return multiple rows ... but
> > > then as far as my reading goes, it is only supposed to be supported in
> > > the range table (FROM clause) not in the target list.  I wonder if
> > > this would end up better if we only tried to support it in RT.  I asked
> > > Pavel to implement it like that a few weeks ago, but ...
> >
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
> >
> 
> The XMLTABLE function is from user perspective, from implementation
> perspective a form of SRF function. I use own executor node, because fcinfo
> is complex already and not too enough to hold all information about result
> columns.


> The implementation as RT doesn't reduce code - it is just moving to
> different file.

You're introducing a wholly separate callback system (TableExprRoutine)
for the new functionality.  And that stuff is excruciatingly close to
stuff that the normal executor already knows how to do.



> I'll try to explain my motivation. Please, check it and correct me if I am
> wrong. I don't keep on my implementation - just try to implement XMLTABLE
> be consistent with another behave and be used all time without any
> surprise.
> 
> 1. Any function that produces a content can be used in target list. We
> support SRF in target list and in FROM part. Why XMLTABLE should be a
> exception?

targetlist SRFs were a big mistake. They cause a fair number of problems
code-wise. They permeated for a long while into bits of both planner and
executor, where they really shouldn't belong. Even after the recent
changes there's a fair amount of uglyness associated with them.  We
can't remove tSRFs for backward compatibility reasons, but that's not
true for XMLTABLE


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frost  wrote:
>> Sure.  If the database runs fast enough with checksums enabled,
>> there's basically no reason to have them turned off.  The issue is
>> when it doesn't.
>
> I don't believe we're talking about forcing every user to have checksums
> enabled.  We are discussing the default.

I never said otherwise.

> Would you say that most user's databases run fast enough with checksums
> enabled?  Or more than most, maybe 70%?  80%?  In today's environment,
> I'd probably say that it's more like 90+%.

I don't have statistics on that, but I'd certainly agree that it's
over 90%.  However, I estimate that the number of percentage of people
who wouldn't be helped by checksums is also over 90%.  I don't think
it's easy to say whether there are more people who would benefit from
checksums than would be hurt by the performance penalty or visca
versa.  My own feeling is the second, but I understand that yours is
the first.

> Yet, our default is to have them disabled and *really* hard to enable.

First of all, that could be fixed by further development.

Second, really hard to enable is a relative term.  I accept that
enabling checksums is not a pleasant process.  Right now, you'd have
to do a dump/restore, or use logical replication to replicate the data
to a new cluster and then switch over.  On the other hand, if
checksums are really a critical feature, how are people getting to the
point where they've got a mission-critical production system and only
then discovering that they want to enable checksums?  If you tell
somebody "we have an optional feature called checksums and you should
really use it" and they respond "well, I'd like to, but I already put
my system into critical production use and it's not worth it to me to
take downtime to get them enabled", that sounds to me like the feature
is nice-to-have, not absolutely essential.  When something is
essential, you find a way to get it done, whether it's painful or not,
because that's what essential means.  And if checksums are not
essential, then they shouldn't be enabled by default unless they're
very cheap -- and I think we already know that's not true in all
workloads.

> I agree that it's unfortunate that we haven't put more effort into
> fixing that- I'm all for it, but it's disappointing to see that people
> are not in favor of changing the default as I believe it would both help
> our users and encourage more development of the feature.

I think it would help some users and hurt others.  I do agree that it
would encourage more development of the feature -- almost of
necessity.  In particular, I bet it would spur development of an
efficient way of turning checksums off -- but I'd rather see us
approach it from the other direction: let's develop an efficient way
of turning the feature on and off FIRST.  Deciding that the feature
has to be on for everyone because turning it on later is too hard for
the people who later decide they want it is letting the tail wag the
dog.

Also, I think that one of the big problems with the way checksums work
is that you don't find problems with your archived data until it's too
late.  Suppose that in February bits get flipped in a block.  You
don't access the data until July[1].  Well, it's nice to have the
system tell you that the data is corrupted, but what are you going to
do about it?  By that point, all of your backups are probably
corrupted.  So it's basically:

ERROR: you're screwed

It's nice to know that (maybe?) but without a recovery strategy a
whole lot of people who get that message are going to immediately
start asking "How do I ignore the fact that I'm screwed and try to
read the data anyway?".  And then you wonder what the point of having
the feature turned on is, especially if it's costly.  It's almost an
attractive nuisance at that point - nobody wants to be the user that
turns off checksums because they sound good on paper, but when you
actually have a problem an awful lot of people are NOT going to want
to try to restore from backup and maybe lose recent transactions.
They're going to want to ignore the checksum failures.  That's kind of
awful.

Peter's comments upthread get at this: "We need to invest in
corruption detection/verification tools that are run on an as-needed
basis."  Exactly.  If we could verify that our data is good before
throwing away our old backups, that'd be good.  If we could verify
that our indexes were structurally sane, that would be superior to
anything checksums can ever give us because it catches not only
storage failures but also software failures within PostgreSQL itself
and user malfeasance above the PostgreSQL layer (e.g. redefining the
supposedly-immutable function to give different answers) and damage
inflicted inadvertently by environmental changes (e.g. upgrading glibc
and having strcoll() change its mind).  If we could verify that every
XID and MXID in the heap points to a clog or multixact record that
still exists, that'

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Ants Aasma

On Wed, Jan 25, 2017 at 8:18 PM, Robert Haas  wrote:
> Also, it's not as if there are no other ways of checking whether your
> disks are failing.  SMART, for example, is supposed to tell you about
> incipient hardware failures before PostgreSQL ever sees a bit flip.
> Surely an average user would love to get a heads-up that their
> hardware is failing even when that hardware is not being used to power
> PostgreSQL, yet many people don't bother to configure SMART (or
> similar proprietary systems provided by individual vendors).

You really can't rely on SMART to tell you about hardware failures. 1
in 4 drives fail completely with 0 SMART indication [1]. And for the 1
in 1000 annual checksum failure rate other indicators except system
restarts only had a weak correlation[2]. And this is without
filesystem and other OS bugs that SMART knows nothing about.

My view may be biased by mostly seeing the cases where things have
already gone wrong, but I recommend support clients to turn checksums
on unless it's known that write IO is going to be an issue. Especially
because I know that if it turns out to be a problem I can go in and
quickly hack together a tool to help them turn it off. I do agree that
to change the PostgreSQL default at least some tool turn it off online
should be included.

[1] 
https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/
[2] 
https://www.usenix.org/legacy/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf

Regards,
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag

2017-01-25 Thread Wang Hao

An unlogged table has an initialization fork. The initialization fork does
not have an BM_PERMANENT flag when get a buffer.
In checkpoint (not shutdown or end of recovery), it will not write to disk.
after a crash recovery, the page of initialization fork will not correctly,
then make the main fork not correctly too.

Here is an example for GIN index.

create unlogged table gin_test_tbl(i int4[]);
create index gin_test_idx on gin_test_tbl using gin (i);
checkpoint;

kill all the postgres process, and restart again.

vacuum gin_test_tbl;  -- crash.

It seems have same problem in BRIN, GIN, GiST and HASH index which using
buffer for meta page initialize in ambuildempty function.

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 2:08 PM, Stephen Frost  wrote:
> > That isn't what you're doing with those functions though, you're giving
> > the monitoring tool superuser-level rights but trying to pretend like
> > you're not.
> 
> Uh, how so?  None of those functions can be used to escalate to
> superuser privileges.  I am trying to give SOME superuser privileges
> and not others.  That IS how good security works.

Your email is 'pg_ls_dir & friends', which I took to imply at *least*
pg_read_file() and pg_read_binary_file(), and it's not unreasonable to
think you may have meant everything in adminpack, which also includes
pg_file_write(), pg_file_rename() and pg_file_unlink().

> I don't really think it's necessary to outline the use case more than
> I have already.  It's perfectly reasonable to want a monitoring tool
> to have access to pg_ls_dir() - for example, you could use that to
> monitor for relation files orphaned by a previous crash.  

Sure.  What I believe would be better would be a function in PG that
allows you to know about all of the relation files which were orphaned
from a previous crash, and perhaps even one to go clean all of those up.
I certainly would have no issue with both of those being available to a
non-superuser.

Sure, you could do the same with pg_ls_dir(), various complex bits of
SQL to figure out what's been orphaned, and pg_file_unlink() to handle
the nasty part of unlinking the files, but you could also use
pg_ls_dir() to look at the files in PG's home directory, or
pg_file_unlink() to remove the PG user's .bashrc.

Does the monitoring tool need to be able to see the files in PG's root
directory, or to be able to  unlink the PG user's .bashrc?  No.

> Also, as
> mentioned above, I don't think this should have to be litigated for
> every single function individually.  If it's a good idea for a
> non-superuser to be able to pg_start_backup() and pg_stop_backup(),
> the person doing that has access to read every byte of data in the
> filesystem; if they don't, there's no point in giving them access to
> run those functions.  Access to just pg_ls_dir(), for example, can't
> be any more dangerous than that.  Indeed, I'd argue that it's a heck
> of a lot LESS dangerous.

It wasn't litigated for every single function.  A reivew of all cases
was performed and a very lengthy discussion held about how to give
non-superusers access to the functions which made sense was done, with
the resulting work finally being accepted and included into 9.6.

Further, as discussed and explained on the thread I just linked you to,
the assumption that a user who can call pg_start/stop_backup() has
access to read every byte on the filesystem isn't correct.

Am I sure that someone who has pg_ls_dir() could get superuser access on
the box?  No, but I am sure that, in my opinion, it's frankly a pretty
dirty hack to use pg_ls_dir() in a monitoring tool and we should provide
better ways to monitor PG to our users.

Also, am I sure that we shouldn't ever give a user the ability to
arbitrairly list directories on the filesystem?  No, but this isn't a
good justification for that change.  If we *are* going to go down the
road of giving users filesystem-like access of any kind then I would
*much* rather look at the approach that I outlined before and that other
database systems have, which is to have a way to say "user X can read,
write, list, do whatever, with files in directory /blah/blah".  Perhaps
with sub-directory abilities too.  Nannyism, if we had that ability,
would be to say "you can set DIRECTORY to anything but PGDATA." or "you
can't set DIRECTORY to be /".

I don't think it's nannyism to keep things like pg_read_file and
pg_file_write as superuser-only.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Joshua D. Drake


On 01/25/2017 11:41 AM, Tom Lane wrote:

Stephen Frost  writes:

Would you say that most user's databases run fast enough with checksums
enabled?  Or more than most, maybe 70%?  80%?  In today's environment,
I'd probably say that it's more like 90+%.


It would be nice if there were some actual evidence about this, rather
than numbers picked out of the air.


I agree that it's unfortunate that we haven't put more effort into
fixing that- I'm all for it, but it's disappointing to see that people
are not in favor of changing the default as I believe it would both help
our users and encourage more development of the feature.


I think the really key point is that a whole lot of infrastructure work
needs to be done still, and changing the default before that work has been
done is not going to be user-friendly.  The most pressing issue being the
difficulty of changing the setting after the fact.  It would be a *whole*
lot easier to sell default-on if there were a way to turn it off, and yet
you want us to buy into default-on before that way exists.  Come back
after that feature is in, and we can talk.


+1

Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-01-25 Thread Claudio Freire

On Wed, Jan 25, 2017 at 1:54 PM, Masahiko Sawada  wrote:
> Thank you for updating the patch!
>
> +   /*
> +* Quickly rule out by lower bound (should happen a lot) Upper bound 
> was
> +* already checked by segment search
> +*/
> +   if (vac_cmp_itemptr((void *) itemptr, (void *) rseg->dead_tuples) < 0)
> +   return false;
>
> I think that if the above result is 0, we can return true as itemptr
> matched lower bound item pointer in rseg->dead_tuples.

That's right. Possibly not a great speedup but... why not?

>
>  +typedef struct DeadTuplesSegment
> +{
> +   int num_dead_tuples;/* # of
> entries in the segment */
> +   int max_dead_tuples;/* # of
> entries allocated in the segment */
> +   ItemPointerData last_dead_tuple;/* Copy of the last
> dead tuple (unset
> +
>   * until the segment is fully
> +
>   * populated) */
> +   unsigned short padding;
> +   ItemPointer dead_tuples;/* Array of dead tuples */
> +}  DeadTuplesSegment;
> +
> +typedef struct DeadTuplesMultiArray
> +{
> +   int num_entries;/* current # of entries */
> +   int max_entries;/* total # of slots
> that can be allocated in
> +* array */
> +   int num_segs;   /* number of
> dead tuple segments allocated */
> +   int last_seg;   /* last dead
> tuple segment with data (or 0) */
> +   DeadTuplesSegment *dead_tuples; /* array of num_segs segments 
> */
> +}  DeadTuplesMultiArray;
>
> It's a matter of personal preference but some same dead_tuples
> variables having different meaning confused me.
> If we want to access first dead tuple location of first segment, we
> need to do 'vacrelstats->dead_tuples.dead_tuples.dead_tuples'. For
> example, 'vacrelstats->dead_tuples.dt_segment.dt_array' is better to
> me.

Yes, I can see how that could be confusing.

I went for vacrelstats->dead_tuples.dt_segments[i].dt_tids[j]

> +   nseg->num_dead_tuples = 0;
> +   nseg->max_dead_tuples = 0;
> +   nseg->dead_tuples = NULL;
> +   vacrelstats->dead_tuples.num_segs++;
> +   }
> +   seg = DeadTuplesCurrentSegment(vacrelstats);
> +   }
> +   vacrelstats->dead_tuples.last_seg++;
> +   seg = DeadTuplesCurrentSegment(vacrelstats);
>
> Because seg is always set later I think the first line starting with
> "seg = ..." is not necessary. Thought?

That's correct.

Attached a v6 with those changes (and rebased).

Make check still passes.
From c89019089a71517befac0920f22ed75577dda6c6 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.
---
 src/backend/commands/vacuumlazy.c| 291 ---
 src/test/regress/expected/vacuum.out |   8 +
 src/test/regress/sql/vacuum.sql  |  10 ++
 3 files changed, 250 insertions(+), 59 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 005440e..1d2441f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -93,6 +93,14 @@
 #define LAZY_ALLOC_TUPLES		MaxHeapTuplesPerPage
 
 /*
+ * Minimum (starting) size of the dead_tuples array segments. Will allocate
+ * space for 128MB worth of tid pointers in the first segment, further segments
+ * will grow in size exponentially. Don't make it too small or the segment list
+ * will grow bigger than the sweetspot for search efficiency on big vacuums.
+ */
+#define LAZY_MIN_TUPLES		Max(MaxHeapTuplesPerPage, (128<<20) / sizeof(ItemPointerData))
+
+/*
  * Before we consider skipping a page that's marked as clean in
  * visibility map, we must've seen at least this many clean pages.
  */
@@ -104,6 +112,27 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+typedef struct DeadTuplesSegment
+{
+	int			num_dead_tuples;	/* # of entries in the segment */
+	int			max_dead_tuples;	/* # of entries allocated in the segment */
+	ItemPointerData last_dead_tuple;	/* Copy of the last dead tuple (unset
+		 * until the segment is fully
+		 * populated) */
+	unsigned short padding;
+	ItemPointer dt_tids;	/* Array of dead tuples */
+}	DeadTuplesSegment;
+
+typedef struct DeadTuplesMultiArray
+{
+	int			num_entries;	/* current

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-01-25 Thread Jim Nasby


On 1/23/17 11:38 PM, Pavel Stehule wrote:


Instead of paralleling all the existing namespace stuff, I wonder if
it'd be better to create explicit block infrastructure. AFAIK
PRAGMAs are going to have a lot of the same requirements (certainly
the nesting is the same), and we might want more of this king of
stuff in the future. (I've certainly wished I could set a GUC in a
plpgsql block and have it's settings revert when exiting the block...)


I am not sure if I understand. ?? Setting GUC by PRAGMA can work - the
syntax supports it and GUC API supports nesting. Not sure about
exception handling - but it should not be problem probably.

Please, can you show some examples.


From a code standpoint, there's already some ugliness around blocks: 
there's the code that handles blocks themselves (which IIRC is 
responsible for subtransactions), then there's the namespace code, which 
is very separate even though namespaces are very much tied to blocks. 
Your patch is adding another layer into the mix, separate from both 
blocks and namespaces. I think it would be better to combine all 3 
together, or at least not make matters worse. So IMHO the pragma stuff 
should be part of handling blocks, and not something that's stand alone. 
IE: make the pragma info live in PLpgSQL_stmt_block.


GUCs support SET LOCAL, but that's not the same as local scoping because 
the setting stays in effect unless the substrans aborts. What I'd like 
is the ability to set a GUC in a plpgsql block *and have the setting 
revert on block exit*.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 2:13 PM, Stephen Frost  wrote:
> > I went over *every* superuser check in the system when I did that work,
> > wrote up a long email about why I made the decisions that I did, posted
> > it here, had follow-on discussions, all of which lead to the patch which
> > ended up going in.
> 
> Link to that email?  I went back and looked at that thread and didn't
> see anything that looked like a general policy statement to me.  But I
> may have missed it.

Not sure which thread you were looking at, but this one:

https://www.postgresql.org/message-id/20141015052259.GG28859%40tamriel.snowman.net

Has a review of all superuser checks in the backend, as noted in the
first paragraph ("shown below in a review of the existing superuser
checks in the backend").

Later on in that thread, at least in:
https://www.postgresql.org/message-id/20160106161302.GP3685%40tamriel.snowman.net

In an email to you and Noah:

The general approach which I've been using for the default roles is that
they should grant rights which aren't able to be used to trivially make
oneself a superuser.

My recollection is saying that about 10 times during that period of
time, though perhaps I am exaggurating due to it being a rather painful
process to get through.

> I assume we're
> both coming at these issues with the intention of making PostgreSQL
> better;

Always.

> the fact that we don't always agree on everything is probably
> inevitable.

Agreed.

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-01-25 Thread Fabien COELHO

Repost from bugs.

--
Fabien.

-- Forwarded message --
Date: Wed, 25 Jan 2017 18:59:45 +0100 (CET)
From: Fabien COELHO 
To: nuko yokohama 
Cc: PostgreSQL Bugs List 
Subject: Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R
rate) options together.

It operates normally when only the -C option or only the -R option is 
specified.

In the PostgreSQL document, It is not described that "these two options can 
not be specified at the same time ". Is this a problem of pgbench?

Yes, indeed there is. Thanks for the report. Option -C is seldom used and 
tested.

The problem is already fixed in head. Looking at git log, it was unclear to 
guess which change fixed that... After another reading, I got it in one, it has 
been fixed by Heikki restructuring patch 
12788ae49e1933f463bc59a6efe46c4a01701b76 which has no vocation to be 
backpatched to prior versions...

The bug is that prior to --rate doCustom was always disconnect/reconnect 
without exiting, but with rate it returns if it has to wait. However threadRun 
test whether there is a connection before recalling doCustom, so it was never 
called.

This is exactly the kind of unmanageable state combination that refactoring has 
cleaned up.

Attached a small patch which fixes the issue, I think, in 9.6.
Fixing it raised another issue wrt to some stats under -C, that I fixed as 
well.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 531671a..1f1b7bf 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1967,7 +1967,6 @@ top:
 		st->listen = false;
 		st->sleeping = false;
 		st->throttling = false;
-		st->is_throttled = false;
 		memset(st->prepared, 0, sizeof(st->prepared));
 	}

@@ -4345,6 +4344,12 @@ threadRun(void *arg)
 		remains--;		/* I've aborted */
 }
 			}
+			else if (is_connect && st->sleeping)
+			{
+/* it is sleeping for throttling, maybe it is done, let us try */
+if (!doCustom(thread, st, &aggs))
+	remains--;
+			}

 			if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND)
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-01-25 Thread Andres Freund

Hi,

On 2017-01-25 10:16:32 +0100, Tobias Oberstein wrote:
> > > Using pread instead of lseek+read halfes the syscalls.
> > > 
> > > I really don't understand what you are fighting here ..
> > 
> > Sure, there's some overhead. And as I said upthread, I'm much less
> > against this change than Tom.  What I'm saying is that your benchmarks
> > haven't shown a benefit in a meaningful way, so I don't think I can
> > agree with
> > 
> > > "Well, my point remains that I see little value in messing with
> > > long-established code if you can't demonstrate a benefit that's clearly
> > > above the noise level."
> > > 
> > > I have done lots of benchmarking over the last days on a massive box, and 
> > > I
> > > can provide numbers that I think show that the impact can be significant.
> > 
> > since you've not actually shown that the impact is above the noise level
> > when measured with an actual postgres workload.
> 
> I can follow that.
> 
> So real prove cannot be done with FIO, but "actual PG workload".

Right.


> Synthetic PG workload or real world production workload?

Both might work, production-like has bigger pull, but I'd guess
synthetic is good enough.


> Also: rgd the perf profiles from production that show lseek as #1 syscall.

You'll, depending on your workload, still have a lot of lseeks even if
we were to use pread/pwrite because we do lseek(SEEK_END) to get file
sizes.


> You said it wouldn't be prove either, because it only shows number of
> syscalls, and though it is clear that millions of syscalls/sec do come with
> overhead, it is still not showing "above noise" level relevance (because PG
> is such a CPU hog in itself anyways;)

Yep.


> So how would I do a perf profile that would be acceptable as prove?

You'd have to look at cpu time, not number of syscalls.  IIRC I
suggested doing a cycles profile with -g and then using "perf report
--children" to see how many cycles are spent somewhere below lseek.

I'd also suggest sharing a profile cycles profile, it's quite likely
that the overhead is completely elsewhere.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-25 Thread Jim Nasby


On 1/25/17 1:46 PM, Jim Nasby wrote:

Based on that and other feedback I'm going to mark this as returned with
feedback, though if you're able to get a revised patch in the next few
days please do.


Actually, based on the message that popped up when I went to do that I 
guess it's better not to do that, so I marked it as Waiting on Author.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-25 Thread Jim Nasby


On 1/24/17 11:13 PM, Beena Emerson wrote:


On Wed, Jan 25, 2017 at 10:36 AM, Jim Nasby mailto:jim.na...@bluetreble.com>> wrote:

On 1/24/17 2:26 AM, Mithun Cy wrote:

Thanks for looking into this patch, I just downloaded the patch and
applied same to the latest code, I can see file "
autoprewarm.save" in
$PGDATA which is created and dumped at shutdown time and an activity
is logged as below
2017-01-24 13:22:25.012 IST [91755] LOG:  Buffer Dump: saved
metadata
of 59 blocks.


Yeah, I wasn't getting that at all, though I did see the shared
library being loaded. If I get a chance I'll try it again.



Hope u added the following to the conf file:

shared_preload_libraries = 'pg_autoprewarm' # (change requires restart)


Therein lied my problem: I was preloading pg_prewarm, not pg_autoprewarm.

I think the two need to be integrated much better than they are right 
now. They should certainly be in the same .so, and as others have 
mentioned the docs need to be fixed. For consistency, I think the name 
should just be pg_prewarm, as well as the prefix for the GUC.


Based on that and other feedback I'm going to mark this as returned with 
feedback, though if you're able to get a revised patch in the next few 
days please do.


FYI (and this is just a suggestion), for testing purposes, it might also 
be handy to allow manual dump and load via functions, with the load 
function giving you the option to forcibly load (instead of doing 
nothing if there are no buffers on the free list). It would also be 
handy of those functions accepted a different filename. That way you 
could reset shared_buffers to a known condition before running a test.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Tom Lane

Stephen Frost  writes:
> Would you say that most user's databases run fast enough with checksums
> enabled?  Or more than most, maybe 70%?  80%?  In today's environment,
> I'd probably say that it's more like 90+%.

It would be nice if there were some actual evidence about this, rather
than numbers picked out of the air.

> I agree that it's unfortunate that we haven't put more effort into
> fixing that- I'm all for it, but it's disappointing to see that people
> are not in favor of changing the default as I believe it would both help
> our users and encourage more development of the feature.

I think the really key point is that a whole lot of infrastructure work
needs to be done still, and changing the default before that work has been
done is not going to be user-friendly.  The most pressing issue being the
difficulty of changing the setting after the fact.  It would be a *whole*
lot easier to sell default-on if there were a way to turn it off, and yet
you want us to buy into default-on before that way exists.  Come back
after that feature is in, and we can talk.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

* Peter Geoghegan (p...@heroku.com) wrote:
> On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas  wrote:
> > Trying to force those people to use checksums is just masterminding;
> > they've made their own decision that it's not worth bothering with.
> > When something goes wrong, WE still care about distinguishing hardware
> > failure from PostgreSQL failure.   Our pride is on the line.  But the
> > customer often doesn't.  The DBA isn't the same person as the
> > operating system guy, and the operating system guy isn't going to
> > listen to the DBA even if the DBA complains of checksum failures.
> 
> We need to invest in corruption detection/verification tools that are
> run on an as-needed basis. They are available to users of every other
> major database system.

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasby  wrote:
> > I'm not completely grokking your second paragraph, but I would think that an
> > average user would love got get a heads-up that their hardware is failing.
> 
> Sure.  If the database runs fast enough with checksums enabled,
> there's basically no reason to have them turned off.  The issue is
> when it doesn't.

I don't believe we're talking about forcing every user to have checksums
enabled.  We are discussing the default.

Would you say that most user's databases run fast enough with checksums
enabled?  Or more than most, maybe 70%?  80%?  In today's environment,
I'd probably say that it's more like 90+%.

Yet, our default is to have them disabled and *really* hard to enable.

I agree that it's unfortunate that we haven't put more effort into
fixing that- I'm all for it, but it's disappointing to see that people
are not in favor of changing the default as I believe it would both help
our users and encourage more development of the feature.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 2:13 PM, Stephen Frost  wrote:
> I went over *every* superuser check in the system when I did that work,
> wrote up a long email about why I made the decisions that I did, posted
> it here, had follow-on discussions, all of which lead to the patch which
> ended up going in.

Link to that email?  I went back and looked at that thread and didn't
see anything that looked like a general policy statement to me.  But I
may have missed it.

> I am not anxious to revisit that decision and certainly not based on
> an argument that, so far, boils down to "I think a monitoring system
> might be able to use this function that allows it to read pg_authid
> directly, so we should drop the superuser() check in it."

Well, I'm not eager to revisit all the decisions you'd like to
overturn either, but we'll just both have to cope.  I assume we're
both coming at these issues with the intention of making PostgreSQL
better; the fact that we don't always agree on everything is probably
inevitable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 2:08 PM, Stephen Frost  wrote:
> That isn't what you're doing with those functions though, you're giving
> the monitoring tool superuser-level rights but trying to pretend like
> you're not.

Uh, how so?  None of those functions can be used to escalate to
superuser privileges.  I am trying to give SOME superuser privileges
and not others.  That IS how good security works.

I don't really think it's necessary to outline the use case more than
I have already.  It's perfectly reasonable to want a monitoring tool
to have access to pg_ls_dir() - for example, you could use that to
monitor for relation files orphaned by a previous crash.  Also, as
mentioned above, I don't think this should have to be litigated for
every single function individually.  If it's a good idea for a
non-superuser to be able to pg_start_backup() and pg_stop_backup(),
the person doing that has access to read every byte of data in the
filesystem; if they don't, there's no point in giving them access to
run those functions.  Access to just pg_ls_dir(), for example, can't
be any more dangerous than that.  Indeed, I'd argue that it's a heck
of a lot LESS dangerous.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> Also, the same argument could be made about removing the built-in
> superuser check from ANY function, and we've already rejected that
> argument for a bunch of other functions.  If we say that argument is
> valid for some functions but not others, then we've got to decide for
> which ones it's valid and for which ones it isn't, and consensus will
> not be forthcoming.  I take the position that hard-coded superuser
> checks stink in general, and I'm grateful to Stephen for his work
> making dump/restore work properly on system catalog permissions so
> that we can support better alternatives.  I'm not asking for anything
> more than that we apply that same policy here as we have in other
> cases.

I went over *every* superuser check in the system when I did that work,
wrote up a long email about why I made the decisions that I did, posted
it here, had follow-on discussions, all of which lead to the patch which
ended up going in.

I am not anxious to revisit that decision and certainly not based on 
an argument that, so far, boils down to "I think a monitoring system
might be able to use this function that allows it to read pg_authid
directly, so we should drop the superuser() check in it."

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Greg,

* Greg Stark (st...@mit.edu) wrote:
> I tend to agree. But in the past when this came up people pointed out
> you could equally do things this way and still grant all the access
> you wanted using SECURITY DEFINER. Arguably that's a better approach
> because then instead of auditing the entire monitor script you only
> need to audit this one wrapper function, pg_ls_monitor_dir() which
> just calls pg_ls_dir() on this one directory.

I'm not a fan of SECURITY DEFINER functions for this sort of thing and
don't like the suggestion of simply wrapping functions that provide
superuser-level access in a security definer function and then saying
that giving someone access to that function isn't giving them superuser,
because that's just false.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_ls_dir & friends still have a hard-coded superuser check

2017-01-25 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> The use case I have in mind is
> a monitoring tool that needs access to one more of those functions --
> in keeping with the principle of least privilege, it's much better to
> give the monitoring user only the privileges which it actually needs
> than to make it a superuser.

That isn't what you're doing with those functions though, you're giving
the monitoring tool superuser-level rights but trying to pretend like
you're not.

That's not really how good security works.

I am entirely in agreement with providing a way to give monitoring tools
more information, but that should be done through proper design and
consideration of just what info they actually need (as well as what a
useful format for it is).

On my plate for a long time has been to add a function to return how
much WAL is remaining in pg_wal for a monitoring system to be able to
report on.  That could be done with something like pg_ls_dir, but that's
a rather hokey way to get it, and it'd be a lot nicer to just have a
function which returns it, or maybe one that returns the oldest WAL
position available on the system and what the current position is, which
I think we might actually have.

In other words, please actually outline a use-case, and let's design a
proper solution.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 1:37 PM, Peter Geoghegan  wrote:
> On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas  wrote:
>> Trying to force those people to use checksums is just masterminding;
>> they've made their own decision that it's not worth bothering with.
>> When something goes wrong, WE still care about distinguishing hardware
>> failure from PostgreSQL failure.   Our pride is on the line.  But the
>> customer often doesn't.  The DBA isn't the same person as the
>> operating system guy, and the operating system guy isn't going to
>> listen to the DBA even if the DBA complains of checksum failures.
>
> We need to invest in corruption detection/verification tools that are
> run on an as-needed basis. They are available to users of every other
> major database system.

+1, but the trick is (a) figuring out exactly what to develop and (b)
finding the time to develop it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker

>
> > + /* param 7: escape text default null, -- defaults to whatever
> quote is */
> > + if (PG_ARGISNULL(7))
> > + {
> > + copy_state.escape = copy_state.quote;
> > + }
> > + else
> > + {
> > + if (copy_state.csv_mode)
> > + {
> > + copy_state.escape = TextDatumGetCString(PG_GETARG_
> TEXT_P(7));
> > + if (strlen(copy_state.escape) != 1)
> > + {
> > + ereport(ERROR,
> > +
>  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +  errmsg("COPY escape must
> be a single one-byte character")));
> > + }
> > + }
> > + else
> > + {
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_
> SUPPORTED),
> > +  errmsg("COPY escape available
> only in CSV mode")));
> > + }
> > + }
>
> I don't understand why do we have all these checks.  Can't we just pass
> the values to COPY and let it apply the checks?  That way, when COPY is
> updated to support multibyte escape chars (for example) we don't need to
> touch this code.  Together with removing the unneeded braces that would
> make these stanzas about six lines long instead of fifteen.
>

If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
having a relation in pg_class to reference for attributes.
In this case, there is no such relation. So I'd have to fake a relcache
entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
happy to go that route if you think it's a good idea.


>
>
> > + tuple = heap_form_tuple(tupdesc,values,nulls);
> > + //tuple = BuildTupleFromCStrings(attinmeta,
> field_strings);
> > + tuplestore_puttuple(tupstore, tuple);
>
> No need to form a tuple; use tuplestore_putvalues here.
>

Good to know!



> > + }
> > +
> > + /* close "file" */
> > + if (copy_state.is_program)
> > + {
> > + int pclose_rc;
> > +
> > + pclose_rc = ClosePipeStream(copy_state.copy_file);
> > + if (pclose_rc == -1)
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > +  errmsg("could not close pipe to
> external command: %m")));
> > + else if (pclose_rc != 0)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_EXTERNAL_
> ROUTINE_EXCEPTION),
> > +  errmsg("program \"%s\" failed",
> > +
>  copy_state.filename),
> > +  errdetail_internal("%s",
> wait_result_to_str(pclose_rc;
> > + }
> > + else
> > + {
> > + if (copy_state.filename != NULL &&
> FreeFile(copy_state.copy_file))
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > +  errmsg("could not close file
> \"%s\": %m",
> > +
>  copy_state.filename)));
> > + }
>
> I wonder if these should be an auxiliary function in copy.c to do this.
> Surely copy.c itself does pretty much the same thing ...
>

Yes. This got started as a patch to core because not all of the parts of
COPY are externally callable, and aren't broken down in a way that allowed
for use in a SRF.

I'll get to work on these suggestions.

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Peter Geoghegan

On Wed, Jan 25, 2017 at 10:18 AM, Robert Haas  wrote:
> Trying to force those people to use checksums is just masterminding;
> they've made their own decision that it's not worth bothering with.
> When something goes wrong, WE still care about distinguishing hardware
> failure from PostgreSQL failure.   Our pride is on the line.  But the
> customer often doesn't.  The DBA isn't the same person as the
> operating system guy, and the operating system guy isn't going to
> listen to the DBA even if the DBA complains of checksum failures.

We need to invest in corruption detection/verification tools that are
run on an as-needed basis. They are available to users of every other
major database system.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-25 Thread Tom Lane

Nikita Glukhov  writes:
> On 22.01.2017 21:58, Tom Lane wrote:
>> If you want such macros I think it would be better to submit a separate
>> cosmetic patch that tries to hide such bit-tests behind macros throughout
>> the jsonb code.

> I've attached that patch, but not all the bit-tests were hidden: some of them
> in jsonb_util.c still remain valid after upcoming refactoring because they
> don't belong to generic code (there might be better to use JBC_XXX() macros).

Pushed this; grepping found a couple other places that could be replaced
by the macros, so I did.

I didn't include the JsonContainerIsEmpty macro, though.  It wasn't used
anywhere, and I'm not exactly convinced that "IsEmpty" is more readable
than "Size == 0", anyhow.  We can add it later if the use-case gets
stronger.

> Sorry for this obvious mistake.  But macros JB_ROOT_IS_XXX() also contain the
> same hazard.

Good point, fixed.

I'll look at the rest of this in a bit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker

On Wed, Jan 25, 2017 at 1:10 PM, David Fetter  wrote:

> On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:
> >
> > Feel free to mark it returned as feedback. The concept didn't
> > generate as much enthusiasm as I had hoped, so I think the right
> > thing to do now is scale it back to a patch that makes
> > CopyFromRawFields() externally visible so that extensions can use
> > them.
>
> You're getting enthusiasm in the form of these reviews and suggestions
> for improvement.  That it doesn't always happen immediately is a
> byproduct of the scarcity of developer time and the sheer volume of
> things to which people need to pay attention.

True about that. I was referring to "ooh! I need that!"-type interest. I'll
proceed, keeping in mind that there's a fallback position of just making
some of the guts of COPY available to be called by extensions like was done
for file_fdw.

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 1:04 PM, Peter Eisentraut
 wrote:
> On 1/18/17 2:32 PM, Robert Haas wrote:
>> Unless we can find something official, I suppose we should just
>> display BASE TABLE in that case as we do in other cases.  I wonder if
>> the schema needs some broader revision; for example, are there
>> information_schema elements intended to show information about
>> partitions?
>
> Is it intentional that we show the partitions by default in \d,
> pg_tables, information_schema.tables?  Or should we treat those as
> somewhat-hidden details?

I'm not really sure what the right thing to do is there.  I was hoping
you had an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas

On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasby  wrote:
> I'm not completely grokking your second paragraph, but I would think that an
> average user would love got get a heads-up that their hardware is failing.

Sure.  If the database runs fast enough with checksums enabled,
there's basically no reason to have them turned off.  The issue is
when it doesn't.

Also, it's not as if there are no other ways of checking whether your
disks are failing.  SMART, for example, is supposed to tell you about
incipient hardware failures before PostgreSQL ever sees a bit flip.
Surely an average user would love to get a heads-up that their
hardware is failing even when that hardware is not being used to power
PostgreSQL, yet many people don't bother to configure SMART (or
similar proprietary systems provided by individual vendors).

Trying to force those people to use checksums is just masterminding;
they've made their own decision that it's not worth bothering with.
When something goes wrong, WE still care about distinguishing hardware
failure from PostgreSQL failure.   Our pride is on the line.  But the
customer often doesn't.  The DBA isn't the same person as the
operating system guy, and the operating system guy isn't going to
listen to the DBA even if the DBA complains of checksum failures.  Or
the customer has 100 things on the same piece of hardware and
PostgreSQL is the only one that failed; or alternatively they all
failed around the same time; either way the culprit is obvious.  Or
the remedy is to restore from backup[1] whether the problem is
hardware or software and regardless of whose software is to blame.  Or
their storage cost a million dollars and is a year old and they simply
won't believe that it's failing.  Or their storage cost a hundred
dollars and is 8 years old and they're looking for an excuse to
replace it whether it's responsible for the problem du jour or not.

I think it's great that we have a checksum feature and I think it's
great for people who want to use it and are willing to pay the cost of
it to turn it on.  I don't accept the argument that all of our users,
or even most of them, fall into that category.  I also think it's
disappointing that there's such a vigorous argument for changing the
default when so little follow-on development has gone into this
feature.  If we had put any real effort into making this easier to
turn on and off, for example, the default value would be less
important, because people could change it more easily.  But nobody's
making that effort.  I suggest that the people who think this a
super-high-value feature should be willing to put some real work into
improving it instead of trying to force it down everybody's throat
as-is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Alternatively, sometimes the remedy is to wish the had a usable
backup while frantically running pg_resetxlog.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter

On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:
> 
> Feel free to mark it returned as feedback. The concept didn't
> generate as much enthusiasm as I had hoped, so I think the right
> thing to do now is scale it back to a patch that makes
> CopyFromRawFields() externally visible so that extensions can use
> them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement.  That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-25 Thread Robert Haas

On Sat, Jan 21, 2017 at 11:57 AM, Andres Freund  wrote:
> On 2017-01-21 11:39:18 +0100, Magnus Hagander wrote:
>> Is it time to enable checksums by default, and give initdb a switch to turn
>> it off instead?
>
> -1 - the WAL overhead is quite massive, and in contrast to the other
> GUCs recently changed you can't just switch this around.

I agree.  I bet that if somebody does the test suggested by Amit
downthread, it'll turn out that the performance is just awful.  And
those cases are common.  I think the people saying "well, the overhead
is worth it" must be people whose systems (or whose customer's
systems) aren't processing continuous heavy OLTP workloads.  If you've
got a data warehousing workload, checksums are probably pretty cheap.
If you've got a low-velocity OLTP workload, or an OLTP workload that
fits in shared_buffers, it's probably bearable.  But if you've got 8GB
of shared_buffers and 100GB of data, and you've got 100 or so backends
continuously doing random updates, I think checksums are going nail
you to the wall.  And EnterpriseDB, at least, has lots of customers
that do exactly that sort of thing.

Having said that, I've certain run into situations where I speculated
that a customer had a hardware problem and they speculated that we had
given them buggy database software.  In a pretty significant number of
cases, the customer turned out to be right; for example, some of those
people were suffering from multixact bugs that resulted in
unexplainable corruption.  Now, would it have been useful to know that
checksums were passing (suggesting a PostgreSQL problem) rather than
failing (suggesting an OS problem)?  Yes, that would have been great.
I could have given those customers better support.  On the other hand,
I think I've run into MORE cases where the customer was desperately
seeking options to improve write performance, which remains a pretty
significant problem for PostgreSQL.  I can't see taking a significant
hit in that area for my convenience in understanding what's going on
in data corruption situations.  The write performance penalty is paid
by everybody all the time, whereas data corruption is a rare event
even among support cases.

And even when you do have corruption, whether or not the data
corruption is accompanied by a checksum failure is only ONE extra bit
of useful data.  A failure doesn't guarantee a hardware problem; it
could be caused by a faulty backup procedure, like forgetting to run
pg_start_backup().  The lack of a failure doesn't guarantee a software
problem; it could be caused by a faulty backup procedure, like using
an OS-level snapshot facility that isn't exactly simultaneous across
tablespaces.  What you really need to do when a customer has
corruption is figure out why they have corruption, and the leading
cause by far is neither the hardware nor the software but some kind of
user error.  Checksums are at best a very modest assist in figuring
out whether an error has been made and if so of what type.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning vs. information_schema

2017-01-25 Thread Peter Eisentraut

On 1/18/17 2:32 PM, Robert Haas wrote:
> Unless we can find something official, I suppose we should just
> display BASE TABLE in that case as we do in other cases.  I wonder if
> the schema needs some broader revision; for example, are there
> information_schema elements intended to show information about
> partitions?

Is it intentional that we show the partitions by default in \d,
pg_tables, information_schema.tables?  Or should we treat those as
somewhat-hidden details?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pdf versus single-html

2017-01-25 Thread Peter Eisentraut

On 1/21/17 6:29 AM, Erik Rijkers wrote:
> It might even be good to include such a single-file html in the Makefile 
> as an option.
> 
> I'll give it a try but has anyone done this work already, perhaps?

Already exists:

doc/src/sgml$ make postgres.html

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical-replication.sgml improvements

2017-01-25 Thread Peter Eisentraut

On 1/20/17 11:00 AM, Erik Rijkers wrote:
> logical-replication.sgml.diff changes

Committed, thanks!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans

2017-01-25 Thread Simon Riggs

On 25 January 2017 at 17:34, Julian Markwort
 wrote:

> Analogous to this, a bad_plan is saved, when the time has been exceeded by a
> factor greater than 1.1 .

...and the plan differs?

Probably best to use some stat math to calculate deviation, rather than fixed %.

Sounds good.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 140 matches

Mail list logo