Re: [HACKERS] background sessions

2016-12-14 Thread amul sul
On Thu, Dec 15, 2016 at 12:24 PM, Andrew Borodin  wrote:
> 2016-12-15 0:30 GMT+05:00 Peter Eisentraut :
> TryBeginSession()?

 What exactly would that do?
>>> Return status (success\failure) and session object, if a function succeeded.
>>>
>>> If there is max_connections exceeded, then (false,null).
>>>
>>> I'm not sure whether this idiom is common for Python.
>>
>> You can catch PostgreSQL exceptions in PL/Python, so this can be handled
>> in user code.
>>
>> Some better connection management or pooling can probably be built on
>> top of the primitives later, I'd say.
>
> Agree, doing this in Python is the better option.
>
> And one more thing... Can we have BackgroundSessionExecute() splitted
> into two parts: start query and wait for results?
> It would allow pg_background to reuse bgsession's code.
>
+1

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2016-12-14 Thread Andrew Borodin
2016-12-15 0:30 GMT+05:00 Peter Eisentraut :
 TryBeginSession()?
>>>
>>> What exactly would that do?
>> Return status (success\failure) and session object, if a function succeeded.
>>
>> If there is max_connections exceeded, then (false,null).
>>
>> I'm not sure whether this idiom is common for Python.
>
> You can catch PostgreSQL exceptions in PL/Python, so this can be handled
> in user code.
>
> Some better connection management or pooling can probably be built on
> top of the primitives later, I'd say.

Agree, doing this in Python is the better option.

And one more thing... Can we have BackgroundSessionExecute() splitted
into two parts: start query and wait for results?
It would allow pg_background to reuse bgsession's code.

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-12-14 Thread Michael Paquier
On Tue, Dec 13, 2016 at 2:44 PM, Michael Paquier
 wrote:
> SASLPrep is defined here:
> https://tools.ietf.org/html/rfc4013
> And stringprep is here:
> https://tools.ietf.org/html/rfc3454
> So that's roughly applying a conversion from the mapping table, taking
> into account prohibited, bi-directional, mapping characters, etc. The
> spec says that the password should be in unicode. But we cannot be
> sure of that, right? Those mapping tables should be likely a separated
> thing.. (perl has Unicode::Stringprep::Mapping for example).

OK. I have look at that and I have bumped into libidn, that offers a
couple of APIs that could be used directly for this purpose.
Particularly, what has caught my eyes is stringprep_profile():
https://www.gnu.org/software/libidn/manual/html_node/Stringprep-Functions.html
res = stringprep_profile (input, output, "SASLprep", STRINGPREP_NO_UNASSIGNED);

libidn can be installed on Windows, and I have found packages for
cygwin, mingw, linux, freebsd and macos via brew. In the case where
libidn is not installed, I think that the safest path would be to
check if the input string has any high bits set (0x80) and bail out
because that would mean that it is a UTF-8 string that we cannot
change. Any thoughts about using libidn?

Also, after discussion with Heikki, here are the things that we need to do:
1) In libpq, we need to check if the string is valid utf-8. If that's
valid utf-8, apply SASLprep. if not, copy the string as-is. We could
error as well in this case... Perhaps a WARNING could be more adapted,
that's the most tricky case, and if the client does not use utf-8 that
may lead to unexpected behavior.
2) In server, when the password verifier is created. If
client_encoding is utf-8, but not server_encoding, convert the
password to utf-8 and build the verifier after applying SASLprep.

In the case where the binaries are *not* built with libidn, I think
that we had better reject valid UTF-8 string directly and just allow
ASCII? SASLprep is a no-op on ASCII characters.

Thoughts about this approach?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-14 Thread Kyotaro HORIGUCHI
At Thu, 15 Dec 2016 14:20:53 +0900, Masahiko Sawada  
wrote in 
> On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier
>  wrote:
> > On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao  wrote:
> >> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
> >>  wrote:
> >>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao  
> >>> wrote:
>  If we drop the "standby_list" syntax, I don't think that new parameter is
>  necessary. We can keep s_s_names and just drop the support for that 
>  syntax
>  from s_s_names. This may be ok if we're really in "break all the things" 
>  mode
>  for PostgreSQL 10.
> >>>
> >>> Please let's not raise that as an argument again... And not break the
> >>> s_list argument. Many users depend on that for just single sync
> >>> standbys. FWIW, I'd be in favor of backward compatibility and say that
> >>> a standby list is a priority list if we can maintain that. Upthread
> >>> agreement was to break that, I did not insist further, and won't if
> >>> that's still the feeling.
> >>
> >> I wonder why you think that the backward-compatibility for standby_list is
> >> so "special". We renamed pg_xlog directory to pg_wal and are planning to
> >> change recovery.conf API at all, though they have bigger impacts on
> >> the existing users in terms of the backward compatibility. OTOH, so far,
> >> changing GUC between major releases happened several times.
> >
> > Silent failures for existing failover deployments is a pain to solve
> > after doing upgrades. That's my only concern. Changing pg_wal would
> > result in a hard failure when upgrading. And changing the meaning of
> > the standby list (without keyword ANY or FIRST!) does not fall into
> > that category... So yes just removing support for standby list would
> > result in a hard failure, which would be fine for the
> > let-s-break-all-things move.
> >
> >> But I'm not against keeping the backward compatibility for standby_list,
> >> to be honest. My concern is that the latest patch tries to support
> >> the backward compatibility "partially" and which would be confusing to 
> >> users,
> >> as I told upthread.
> > If we try to support backward compatibility, I'd personally do it
> > fully, and have a list of standby names specified meaning a priority
> > list.
> >
> >> So I'd like to propose to keep the backward compatibility fully for 
> >> s_s_names
> >> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
> >> at the first commit, then continue discussing this and change it if we 
> >> reach
> >> the consensus until PostgreSQL 10 is actually released. Thought?
> >
> > +1 on that.
> 
> +1.

FWIW, +1 from me.

> I'll update the patch.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-14 Thread Masahiko Sawada
On Thu, Dec 15, 2016 at 11:23 AM, Michael Paquier
 wrote:
> On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao  wrote:
>> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
>>  wrote:
>>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao  wrote:
 If we drop the "standby_list" syntax, I don't think that new parameter is
 necessary. We can keep s_s_names and just drop the support for that syntax
 from s_s_names. This may be ok if we're really in "break all the things" 
 mode
 for PostgreSQL 10.
>>>
>>> Please let's not raise that as an argument again... And not break the
>>> s_list argument. Many users depend on that for just single sync
>>> standbys. FWIW, I'd be in favor of backward compatibility and say that
>>> a standby list is a priority list if we can maintain that. Upthread
>>> agreement was to break that, I did not insist further, and won't if
>>> that's still the feeling.
>>
>> I wonder why you think that the backward-compatibility for standby_list is
>> so "special". We renamed pg_xlog directory to pg_wal and are planning to
>> change recovery.conf API at all, though they have bigger impacts on
>> the existing users in terms of the backward compatibility. OTOH, so far,
>> changing GUC between major releases happened several times.
>
> Silent failures for existing failover deployments is a pain to solve
> after doing upgrades. That's my only concern. Changing pg_wal would
> result in a hard failure when upgrading. And changing the meaning of
> the standby list (without keyword ANY or FIRST!) does not fall into
> that category... So yes just removing support for standby list would
> result in a hard failure, which would be fine for the
> let-s-break-all-things move.
>
>> But I'm not against keeping the backward compatibility for standby_list,
>> to be honest. My concern is that the latest patch tries to support
>> the backward compatibility "partially" and which would be confusing to users,
>> as I told upthread.
> If we try to support backward compatibility, I'd personally do it
> fully, and have a list of standby names specified meaning a priority
> list.
>
>> So I'd like to propose to keep the backward compatibility fully for s_s_names
>> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
>> at the first commit, then continue discussing this and change it if we reach
>> the consensus until PostgreSQL 10 is actually released. Thought?
>
> +1 on that.

+1.
I'll update the patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-14 Thread Amit Kapila
On Thu, Dec 15, 2016 at 7:53 AM, Michael Paquier
 wrote:
> On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao  wrote:
>> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
>>  wrote:
>
>> So I'd like to propose to keep the backward compatibility fully for s_s_names
>> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
>> at the first commit, then continue discussing this and change it if we reach
>> the consensus until PostgreSQL 10 is actually released. Thought?
>
> +1 on that.
>

+1.  That is the safest option to proceed.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken SSL tests in master

2016-12-14 Thread Michael Paquier
On Tue, Dec 13, 2016 at 10:07 PM, Robert Haas  wrote:
> On Mon, Dec 12, 2016 at 1:34 AM, Heikki Linnakangas  wrote:
>> On 12/01/2016 09:45 PM, Andres Freund wrote:
>>>
>>> And nobody has added a buildfarm module to run it manually on their
>>> servers either :(
>>
>> I just added a module to run "make -C src/test/ssl check" in chipmunk. So at
>> least that's covered now.
>>
>> http://pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=chipmunk=2016-12-10%2012%3A08%3A31=test-ssl-check
>
> Thanks.

Could you publish the module or send a pull request to Andrew?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Andres Freund
Hi,

On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
> Ashutosh, could you try it and see if it improves things?

So what's the theory of why this triggers pldebugger hangs, but
apparently causes not many other problems? I tried to skim pldebugger
code, but it's state isn't very condusive to seing the problem :(

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 10:08 PM, Andres Freund  wrote:
> On 2016-12-14 22:00:45 -0500, Robert Haas wrote:
>> I took a look at Andres's patches today and saw that they can't really
>> be applied as-is, because they "temporarily" change the master's
>> ParallelWorkerNumber but have no provision for restoring it after an
>> ERROR.
>
> Yea, that's not quite right. Although I'm not sure it really matters
> that much, given that we're not continuing execution after an error. We
> could just reset it at appropriate points - but that seems ugly.

Yes.

>> I think that approach isn't right anyway; it seems to me that
>> what we want to do is set hash_iv based on we're in a Partial HashAgg,
>> not whether we're in a parallel worker.  For instance, if we're
>> somehow in a nodeSubplan.c there's no need for this just because we
>> happen to be in a worker -- I think.  That led me to develop the
>> attached patch.
>
> Hm, I'd rather have this be a more general solution - it doesn't seem
> unlikely that other parts of the system want to know whether they're
> currently executing a parallel portion of the plan. E.g. it really
> should be forbidden to do modifications even in the parallel portion of
> the plan the master executes.  Right now that'd escape notice, which I
> don't think is good.

Actually, that wouldn't escape notice.  You can test with
IsInParallelMode().  That's entered before beginning execution of the
plan at the outermost level - see ExecutePlan().

>> This may not be perfect, but it causes TPC-H Q18 to complete instead
>> of hanging forever, so I'm going to commit it RSN unless there are
>> loud objections combined with promising steps toward some alternative
>> fix.  It's been over a month since these problems were reported, and
>> it is not good that the tree has been in a broken state for that
>> entire time.
>
> Yea, sorry for that :(. Unfortunately the JIT stuff currently occupies a
> large portion of my brain.
>
> I really hope to come up with a new version of the patch that does the
> "smarter" expansion soon.

I've got no problem with that at all, but I want to unbreak things
more or less immediately and then you/we can further improve it later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Robert Haas
On Tue, Dec 13, 2016 at 11:19 AM, Andres Freund  wrote:
> On 2016-12-12 16:46:38 +0900, Michael Paquier wrote:
>> OK, I think that I have spotted an issue after additional read of the
>> code. When a WSA event is used for read/write activity on a socket,
>> the same WSA event gets reused again and again. That's fine for
>> performance reasons
>
> It's actually also required to not loose events,
> i.e. correctness. Windows events are edge, not level triggered.

Windows might not lose the event, but PostgreSQL can certainly lose
the event.  In the Windows implementation of WaitEventSetBlock, we'll
ignore WL_SOCKET_READABLE if WL_LATCH_SET also fires.  Now, since the
event is edge-triggered[1], it never fires again and we hang.  Oops.
Even if we rewrote WaitEventSetBlock to always process all of the
events, there's nothing in the latch API that requires the caller to
read from the socket the first time WL_SOCKET_READABLE shows up.  The
caller is perfectly entitled to write:

if (rv & WL_LATCH_SET)
   // do stuff
else if (rv & WL_SOCKET_READABLE)
   // do other stuff

I suppose we could make a coding rule that the caller must not rely on
WL_SOCKET_READABLE being level-triggered, but I think that's likely to
lead to a stream of bugs that only show up on Windows, because most
people are used to the Linux semantics and will naturally write code
like the above which isn't robust in the face of an edge-triggered
event -- just as you yourself did in WaitEventSetBlock.

> The
> previous implementation wasn't correct.  So just resetting things ain't
> ok.

How was it incorrect?  The previous implementation *worked* and the
new one doesn't, at least in the case complained of here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] 
https://msdn.microsoft.com/en-us/library/windows/desktop/ms741576(v=vs.85).aspx


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Amit Kapila
On Thu, Dec 15, 2016 at 7:48 AM, Michael Paquier
 wrote:
> On Thu, Dec 15, 2016 at 1:18 AM, Robert Haas  wrote:
>> On Wed, Dec 14, 2016 at 5:35 AM, Ashutosh Sharma  
>> wrote:
 I have just read the patch, and hardcoding the array position for a
 socket event to 0 assumes that the socket event will be *always* the
 first one in the list. but that cannot be true all the time, any code
 that does not register the socket event as the first one in the list
 will likely break.
>>>
>>> I think your point is very valid and even i had similar thought in my
>>> mind when implementing it but as i mentioned upthread that's how it is
>>> being used in the current code base. Please check a function call to
>>> ModifyWaitEvent() in secure_read().
>>
>> That's kind of irrelevant. A WaitEventSet is a general-purpose
>> primitive, and it needs to work in all situations, current and future,
>> where a reasonable developer would expect it to work.  Sure, pq_init()
>> puts the socket in FeBeWaitSet at position 0.  However, in
>> WaitLatchOrSocket, the socket event, if required, is added after the
>> latch and postmaster-death events.  And new code written using
>> CreateWaitEventSet() / AddWaitEventToSet() in the future could put a
>> socket at any offset in the event array, or could add multiple
>> sockets.  So Michael is right: you can't just stick a hack into
>> WaitEventSetWait that assumes the socket is at offset 0 even if that
>> happens to fix the problem we are facing right at this moment.
>>
>> (I am not sure whether Michael's proposed placement is correct, but
>> I'm almost 100% sure the placement in the submitted patch is not.)
>

Robert, you are right.  As submitted, the placement is incorrect for
the reasons mentioned by you.  The intention was to do it in
secure_read() using FeBeWaitSet where we are sure it is at 0th
position.  Also, I think this is primarily required for windows, so
what we could instead do is introduce a new API ResetWaitEvent() which
will reset socket events passed to it.  This API can have port
specific reset functions something similar to what we have for
ModifyWaitEvent and for now we can have only Win32 specific function
as we are not aware if it is required for any other port.  Then call
this from secure_read before "goto retry;"

> What would seem like the good place is really WaitEventAdjustWin32()
> that gets called in ModifyWaitEvent() by secure_read().
>

Actually, we can't use or do this change in ModifyWaitEvent(), because
it is mainly a Set call, it won't allow you to reset it mainly due to
below check in this API.  Here, what we need is reset.

ModifyWaitEvent()
{
..
if (events == event->events &&
(!(event->events & WL_LATCH_SET) || set->latch == latch))
return;
..
}

If you refer the code prior to 9.6, we always reset the socket at end
of WaitLatchOrSocket for windows and the same cleanup is missing in
new code which is causing this issue.

/* Clean up the event object we created for the socket */
if (sockevent != WSA_INVALID_EVENT)
{
WSAEventSelect(sock, NULL, 0);
WSACloseEvent(sockevent);
}

Upthread Andres mentioned that it isn't okay to reset the events as
was done previously and I believe he is right with the minor exception
that we need it for some of the events like FD_READ.  So that's why I
think it is better to fix the problem for FD_READ.

> And
> WaitEventAdjustWin32() is the place where WSAEventSelect() gets called
> to adjust the flags FD_READ and FD_WRITE depending on the events
> WL_SOCKET_READABLE or WL_SOCKET_WRITEABLE.
>
> Amit, I have a question here. As far as I can read, secure_read() does
> set WL_SOCKET_READABLE as an event to wait for, so FD_READ is
> correctly being set by WSAEventSelect(). What the patch proposed is
> doing is making sure that *only* WL_SOCKET_READABLE is set in the
> event flags. Why is that necessary?
>

Oh no, the intention of the patch is to just reset the
WL_SOCKET_READABLE event flag after WaitEventSetWait() in
secure_read(), so that next time it calls ModifyWaitEvent(), it could
call WSAEventSelect() to re-enable the event.

> I can buy that this addresses the
> problem as this has been visibly tested, but I am unclear about why
> that's the correct thing to do. The patch clearly needs comments to
> clarify its position. Actually, I think that what is surprising in the
> solution proposed is actually that the event flags are reset *after*
> the while loop in WaitEventSetWait().
>

As mentioned above, the current patch doesn't do it correctly.

> I could understand something
> happening inside the loop if WSAEnumNetworkEvents() updates things on
> the fly though...
>

No, nothing wrong in that, it is just that for some of the events
(probably the events that are level triggered) we need to reenable
them by using WSAEventSelect before reuse.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via 

Re: [HACKERS] Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

2016-12-14 Thread Andres Freund
On 2016-12-14 22:00:45 -0500, Robert Haas wrote:
> I took a look at Andres's patches today and saw that they can't really
> be applied as-is, because they "temporarily" change the master's
> ParallelWorkerNumber but have no provision for restoring it after an
> ERROR.

Yea, that's not quite right. Although I'm not sure it really matters
that much, given that we're not continuing execution after an error. We
could just reset it at appropriate points - but that seems ugly.


> I think that approach isn't right anyway; it seems to me that
> what we want to do is set hash_iv based on we're in a Partial HashAgg,
> not whether we're in a parallel worker.  For instance, if we're
> somehow in a nodeSubplan.c there's no need for this just because we
> happen to be in a worker -- I think.  That led me to develop the
> attached patch.

Hm, I'd rather have this be a more general solution - it doesn't seem
unlikely that other parts of the system want to know whether they're
currently executing a parallel portion of the plan. E.g. it really
should be forbidden to do modifications even in the parallel portion of
the plan the master executes.  Right now that'd escape notice, which I
don't think is good.


> This may not be perfect, but it causes TPC-H Q18 to complete instead
> of hanging forever, so I'm going to commit it RSN unless there are
> loud objections combined with promising steps toward some alternative
> fix.  It's been over a month since these problems were reported, and
> it is not good that the tree has been in a broken state for that
> entire time.

Yea, sorry for that :(. Unfortunately the JIT stuff currently occupies a
large portion of my brain.

I really hope to come up with a new version of the patch that does the
"smarter" expansion soon.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

2016-12-14 Thread Robert Haas
On Thu, Dec 8, 2016 at 5:23 PM, Robert Haas  wrote:
> On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund  wrote:
>> On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
>>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund  wrote:
>>> > I've a working fix for this, and for a similar issue Robert found. I'm
>>> > still playing around with it, but basically the fix is to make the
>>> > growth policy a bit more adaptive.
>>>
>>> Any chance you can post a patch soon?
>>
>> Here's my WIP series addressing this and related problems. With this
>> we're again noticeably faster than the dynahash implementation, in both
>> the case here, and the query you brought up over IM.
>>
>> This definitely needs some more TLC, but the general approach seems
>> good. I particularly like that it apparently allows us to increase the
>> default fillfactor without much downside according to my measurements.
>
> Are you going to commit something here?  At least enough to make
> Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite
> time?  Because the fact that it doesn't really sucks.

I took a look at Andres's patches today and saw that they can't really
be applied as-is, because they "temporarily" change the master's
ParallelWorkerNumber but have no provision for restoring it after an
ERROR.  I think that approach isn't right anyway; it seems to me that
what we want to do is set hash_iv based on we're in a Partial HashAgg,
not whether we're in a parallel worker.  For instance, if we're
somehow in a nodeSubplan.c there's no need for this just because we
happen to be in a worker -- I think.  That led me to develop the
attached patch.

This may not be perfect, but it causes TPC-H Q18 to complete instead
of hanging forever, so I'm going to commit it RSN unless there are
loud objections combined with promising steps toward some alternative
fix.  It's been over a month since these problems were reported, and
it is not good that the tree has been in a broken state for that
entire time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index 94cc59d..3149fbe 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -18,6 +18,8 @@
  */
 #include "postgres.h"
 
+#include "access/hash.h"
+#include "access/parallel.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
 #include "utils/lsyscache.h"
@@ -289,7 +291,8 @@ BuildTupleHashTable(int numCols, AttrNumber *keyColIdx,
 	FmgrInfo *eqfunctions,
 	FmgrInfo *hashfunctions,
 	long nbuckets, Size additionalsize,
-	MemoryContext tablecxt, MemoryContext tempcxt)
+	MemoryContext tablecxt, MemoryContext tempcxt,
+	bool use_variable_hash_iv)
 {
 	TupleHashTable hashtable;
 	Size		entrysize = sizeof(TupleHashEntryData) + additionalsize;
@@ -314,6 +317,19 @@ BuildTupleHashTable(int numCols, AttrNumber *keyColIdx,
 	hashtable->in_hash_funcs = NULL;
 	hashtable->cur_eq_funcs = NULL;
 
+	/*
+	 * If parallelism is in use, even if the master backend is performing the
+	 * scan itself, we don't want to create the hashtable exactly the same way
+	 * in all workers. As hashtables are iterated over in keyspace-order,
+	 * doing so in all processes in the same way is likely to lead to
+	 * "unbalanced" hashtables when the table size initially is
+	 * underestimated.
+	 */
+	if (use_variable_hash_iv)
+		hashtable->hash_iv = hash_uint32(ParallelWorkerNumber);
+	else
+		hashtable->hash_iv = 0;
+
 	hashtable->hashtab = tuplehash_create(tablecxt, nbuckets);
 	hashtable->hashtab->private_data = hashtable;
 
@@ -450,7 +466,7 @@ TupleHashTableHash(struct tuplehash_hash *tb, const MinimalTuple tuple)
 	TupleHashTable hashtable = (TupleHashTable) tb->private_data;
 	int			numCols = hashtable->numCols;
 	AttrNumber *keyColIdx = hashtable->keyColIdx;
-	uint32		hashkey = 0;
+	uint32		hashkey = hashtable->hash_iv;
 	TupleTableSlot *slot;
 	FmgrInfo   *hashfunctions;
 	int			i;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index eefb3d6..a093862 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1723,7 +1723,8 @@ build_hash_table(AggState *aggstate)
 			  node->numGroups,
 			  additionalsize,
 			 aggstate->aggcontexts[0]->ecxt_per_tuple_memory,
-			  tmpmem);
+			  tmpmem,
+  !DO_AGGSPLIT_SKIPFINAL(aggstate->aggsplit));
 }
 
 /*
diff --git a/src/backend/executor/nodeRecursiveunion.c b/src/backend/executor/nodeRecursiveunion.c
index acded07..5b734c0 100644
--- a/src/backend/executor/nodeRecursiveunion.c
+++ b/src/backend/executor/nodeRecursiveunion.c
@@ -43,7 +43,8 @@ build_hash_table(RecursiveUnionState *rustate)
 			 node->numGroups,
 			 0,
 			 rustate->tableContext,
-			 rustate->tempContext);
+			 

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-14 Thread Michael Paquier
On Thu, Dec 15, 2016 at 11:04 AM, Fujii Masao  wrote:
> On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
>  wrote:
>> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao  wrote:
>>> If we drop the "standby_list" syntax, I don't think that new parameter is
>>> necessary. We can keep s_s_names and just drop the support for that syntax
>>> from s_s_names. This may be ok if we're really in "break all the things" 
>>> mode
>>> for PostgreSQL 10.
>>
>> Please let's not raise that as an argument again... And not break the
>> s_list argument. Many users depend on that for just single sync
>> standbys. FWIW, I'd be in favor of backward compatibility and say that
>> a standby list is a priority list if we can maintain that. Upthread
>> agreement was to break that, I did not insist further, and won't if
>> that's still the feeling.
>
> I wonder why you think that the backward-compatibility for standby_list is
> so "special". We renamed pg_xlog directory to pg_wal and are planning to
> change recovery.conf API at all, though they have bigger impacts on
> the existing users in terms of the backward compatibility. OTOH, so far,
> changing GUC between major releases happened several times.

Silent failures for existing failover deployments is a pain to solve
after doing upgrades. That's my only concern. Changing pg_wal would
result in a hard failure when upgrading. And changing the meaning of
the standby list (without keyword ANY or FIRST!) does not fall into
that category... So yes just removing support for standby list would
result in a hard failure, which would be fine for the
let-s-break-all-things move.

> But I'm not against keeping the backward compatibility for standby_list,
> to be honest. My concern is that the latest patch tries to support
> the backward compatibility "partially" and which would be confusing to users,
> as I told upthread.

If we try to support backward compatibility, I'd personally do it
fully, and have a list of standby names specified meaning a priority
list.

> So I'd like to propose to keep the backward compatibility fully for s_s_names
> (i.e., both "standby_list" and "N (standby_list)" mean the priority method)
> at the first commit, then continue discussing this and change it if we reach
> the consensus until PostgreSQL 10 is actually released. Thought?

+1 on that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Michael Paquier
On Thu, Dec 15, 2016 at 1:18 AM, Robert Haas  wrote:
> On Wed, Dec 14, 2016 at 5:35 AM, Ashutosh Sharma  
> wrote:
>>> I have just read the patch, and hardcoding the array position for a
>>> socket event to 0 assumes that the socket event will be *always* the
>>> first one in the list. but that cannot be true all the time, any code
>>> that does not register the socket event as the first one in the list
>>> will likely break.
>>
>> I think your point is very valid and even i had similar thought in my
>> mind when implementing it but as i mentioned upthread that's how it is
>> being used in the current code base. Please check a function call to
>> ModifyWaitEvent() in secure_read().
>
> That's kind of irrelevant. A WaitEventSet is a general-purpose
> primitive, and it needs to work in all situations, current and future,
> where a reasonable developer would expect it to work.  Sure, pq_init()
> puts the socket in FeBeWaitSet at position 0.  However, in
> WaitLatchOrSocket, the socket event, if required, is added after the
> latch and postmaster-death events.  And new code written using
> CreateWaitEventSet() / AddWaitEventToSet() in the future could put a
> socket at any offset in the event array, or could add multiple
> sockets.  So Michael is right: you can't just stick a hack into
> WaitEventSetWait that assumes the socket is at offset 0 even if that
> happens to fix the problem we are facing right at this moment.
>
> (I am not sure whether Michael's proposed placement is correct, but
> I'm almost 100% sure the placement in the submitted patch is not.)

What would seem like the good place is really WaitEventAdjustWin32()
that gets called in ModifyWaitEvent() by secure_read(). And
WaitEventAdjustWin32() is the place where WSAEventSelect() gets called
to adjust the flags FD_READ and FD_WRITE depending on the events
WL_SOCKET_READABLE or WL_SOCKET_WRITEABLE.

Amit, I have a question here. As far as I can read, secure_read() does
set WL_SOCKET_READABLE as an event to wait for, so FD_READ is
correctly being set by WSAEventSelect(). What the patch proposed is
doing is making sure that *only* WL_SOCKET_READABLE is set in the
event flags. Why is that necessary? I can buy that this addresses the
problem as this has been visibly tested, but I am unclear about why
that's the correct thing to do. The patch clearly needs comments to
clarify its position. Actually, I think that what is surprising in the
solution proposed is actually that the event flags are reset *after*
the while loop in WaitEventSetWait(). I could understand something
happening inside the loop if WSAEnumNetworkEvents() updates things on
the fly though...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-14 Thread Fujii Masao
On Thu, Dec 15, 2016 at 6:47 AM, Michael Paquier
 wrote:
> On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao  wrote:
>> If we drop the "standby_list" syntax, I don't think that new parameter is
>> necessary. We can keep s_s_names and just drop the support for that syntax
>> from s_s_names. This may be ok if we're really in "break all the things" mode
>> for PostgreSQL 10.
>
> Please let's not raise that as an argument again... And not break the
> s_list argument. Many users depend on that for just single sync
> standbys. FWIW, I'd be in favor of backward compatibility and say that
> a standby list is a priority list if we can maintain that. Upthread
> agreement was to break that, I did not insist further, and won't if
> that's still the feeling.

I wonder why you think that the backward-compatibility for standby_list is
so "special". We renamed pg_xlog directory to pg_wal and are planning to
change recovery.conf API at all, though they have bigger impacts on
the existing users in terms of the backward compatibility. OTOH, so far,
changing GUC between major releases happened several times.

But I'm not against keeping the backward compatibility for standby_list,
to be honest. My concern is that the latest patch tries to support
the backward compatibility "partially" and which would be confusing to users,
as I told upthread.

So I'd like to propose to keep the backward compatibility fully for s_s_names
(i.e., both "standby_list" and "N (standby_list)" mean the priority method)
at the first commit, then continue discussing this and change it if we reach
the consensus until PostgreSQL 10 is actually released. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Michael Paquier
On Wed, Dec 14, 2016 at 8:33 PM, Heikki Linnakangas  wrote:
> But, a password stored in plaintext works with either MD5 or SCRAM, or any
> future authentication mechanism. So as soon as we have SCRAM authentication,
> it becomes somewhat useful again.
>
> In a nutshell:
>
> auth / stored   MD5 SCRAM   plaintext
> -
> passwordY   Y   Y
> md5 Y   N   Y
> scram   N   Y   Y
>
> If a password is stored in plaintext, it can be used with any authentication
> mechanism. And the plaintext 'password' authentication mechanism works with
> any kind of a stored password. But an MD5 hash cannot be used with SCRAM
> authentication, or vice versa.

So.. I have been thinking about this portion of the thread. And what I
find the most scary is not the fact that we use plain passwords for
SCRAM authentication, it is the fact that we would need to do a
catalog lookup earlier in the connection workflow to decide what is
the connection protocol to use depending on the username provided in
the startup packet if the pg_hba.conf entry matching the user and
database names uses "password".

And, honestly, why do we actually need to have a support table that
spread? SCRAM is designed to be secure, so it seems to me that it
would on the contrary a bad idea to encourage the use of plain
passwords if we actually think that they should never be used (they
are actually useful for located, development instances, not production
ones). So what I would suggest would be to have a support table like
that:
auth / stored   MD5 SCRAM   plaintext
-
passwordY   Y   N
md5 Y   N   Y
scram   N   N   Y

So here is an idea for things to do now:
1) do not change the format of the existing passwords
2) do not change pg_authid
3) block access to instances if "password" or "md5" are used in
pg_hba.conf if the user have a SCRAM verifier.
4) block access if "scram" is used and if user has a plain or md5 verifier.
5) Allow access if "scram" is used and if user has a SCRAM verifier.
We had a similar discussion regarding verifier/password formats last
year but that did not end well. It would be sad to fall back again
into this discussion and get no result. If somebody wants to support
access to SCRAM with plain password entries, why not. But that would
gain a -1 from me regarding the earlier lookup of pg_authid needed to
do the decision making on the protocol to use. And I think that we
want SCRAM to be designed to be a maximum stable and secure.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-14 Thread Bruce Momjian
On Wed, Dec 14, 2016 at 02:29:07PM -0800, Josh Berkus wrote:
> On 12/14/2016 08:06 AM, Bruce Momjian wrote:
> > On Fri, Dec  9, 2016 at 09:46:44AM +0900, Michael Paquier wrote:
>  My own take on it is that the release notes are already a massive
>  amount of work, and putting duplicative material in a bunch of other
>  places isn't going to make things better, it'll just increase the
>  maintenance burden.
> >>>
> >>> This would mean adding literally pages of material to the release notes.
> >>> In the past, folks have been very negative on anything which would make
> >>> the release notes longer.  Are you sure?
> >>
> >> As that's a per-version information, that seems adapted to me. There
> >> could be as well in the release notes a link to the portion of the
> >> docs holding this manual. Definitely this should be self-contained in
> >> the docs, and not mention the wiki. My 2c.
> > 
> > Yes, that is the usual approach.
> > 
> 
> So where in the docs should these go, then?  We don't (currently) have a
> place for this kind of doc.  Appendices?

You are saying this is more massive than any other change we have made
in the past?  In general, what need to be documented?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2016-12-14 Thread Michael Paquier
On Thu, Dec 15, 2016 at 1:20 AM, Vladimir Rusinov  wrote:
>
> On Wed, Dec 14, 2016 at 4:07 PM, Peter Eisentraut
>  wrote:
> Others will follow later in separate patches. Or is it preferred to have one
> huge patch submitted?

Please yes. One change makes little sense.

>> Personally, I think this is not important, but if you want to do it, I'd
>> follow the suggestion in the thread to rename all functions and leave
>> the old names as aliases or wrappers of some kind.
>
> I think I agree with Michael Paquier: "Better to do breakages in a single
> release rather than spreading them across releases.".
> There's going to be a lot of broken scripts following pg_xlog rename, so I
> think it makes sense to just drop functions as well.

For consistency that makes sense in my view. But I won't be too noisy
as well if people think that we should keep aliases for compatibility.

> We don't maintain pg_xlog -> pg_wal symlink, do we?

This part would be a mess for base backups, that's why we didn't do
it. Symlinks are equivalent to empty directories in the pg_basebackup
world. So by having one you would very likely break your backup *and*
restore tools silently.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Make pg_basebackup -x stream the default

2016-12-14 Thread Vladimir Rusinov
Summary

===


Thank you for submission! I think it needs a bit more work to be even
better.

Please deal with '-x' argument and with wording in documentation.


I'll set status to 'waiting on author' now.

Submission review

==

Patch is in correct format.

Patch applies cleanly on HEAD.

Tests pass. There seems to be sufficient amount of tests to cover a change.

Usability review




Patch sounds like a good idea and does what it supposed to do. /me in DBA
hat will be happy to have it.

However, it makes '-x' parameter a bit confusing/surprising: specifying it
will be equivalent to '-X fetch' which is surprising regression from the
new default.


I think we need to either change -x to be equivalent to '-X streaming' or
just get rid of it altogether.

Feature test

===


I've tested the change manually by doing pg_basebackup with -X none, -X
streaming and -X fetch and their shorthand variants, each with
max_wal_senders sent to 1 and 2.

I've got all the expected results/failures (e.g. -X streaming fails when
max_wal_senders set to 1).


Regression tests pass.

Coding review

===


One comment about docs:


 Includes the required transaction log files (WAL files) in the

 backup. This will include all transaction logs generated during

-the backup. If this option is specified, it is possible to start

-a postmaster directly in the extracted directory without the need

-to consult the log archive, thus making this a completely
standalone

-backup.

+the backup. Unless the option none is specified,

+it is possible to start a postmaster directly in the extracted

+directory without the need to consult the log archive, thus

+making this a completely standalone backup.





I suggest "method none" instead of "option
none". I found the word "option" confusing in that
sentence.



-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047

On Tue, Nov 8, 2016 at 5:45 PM, Magnus Hagander  wrote:

> Per some discussions with a number of different people at pgconfeu, here
> is a patch that changes the default mode of pg_basebackup to be streaming
> the wal, as this is what most users would want -- and those that don't want
> it have to make other changes as well. Doing the "most safe" thing by
> default is a good idea.
>
> The new option "-x none" is provided to turn this off and get the previous
> behavior back.
>
> I will add this to the next (January) commitfest.
>
> //Magnus
>
>
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Peter Eisentraut
On 12/14/16 5:12 PM, Stephen Frost wrote:
> For my 2c, at least, because we're going to be constantly fighting with
> the trailing whitespace in those examples.  If you forget to s/ the docs aren't going to build and it's going to be extremely obvious
> that you need to do something.  Not that I'm actually happy about that-
> I'd much rather tell the doc engine "copy this verbatim until you see a
>  tag" or whatever, and not have to go hack up psql
> output at all,

This already exists: git grep -w CDATA

> That said, if we can make git complain about trailing whitespace in the
> docs but not mind it in the regression test output, then at least most
> will hopefully realize that they need to go through and strip out the
> trailing whitespace before committing.

That would be easy to do.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Vladimir Rusinov
On Wed, Dec 14, 2016 at 9:37 PM, Stephen Frost  wrote:

> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> > On 12/14/16 12:03 PM, Stephen Frost wrote:
> > > If we do want to change that, perhaps we should also change psql to not
> > > output the trailing whitespace in the first place..?
> >
> > Previous discussion:
> > https://www.postgresql.org/message-id/flat/1285093687.
> 5468.18.camel%40vanquo.pezone.net
>
> Thanks for that, but, frankly, it seems like most were in agreement that
> we should go ahead and get rid of the trailing whitespace from psql's
> output.  The last couple emails on that thread hardly seems like a
> critical issue (and I'm not sure why we couldn't eliminate that
> whitespace and then programmatically forbid trailing whitespace
> anyway..).
>

Thanks for the pointer, that's quite interesting.

The real problem here, IMO, is the break in expected regression outputs.
> The previous thread mainly discussed that in terms of its impact on
> third-party tests using pg_regress, but for our own purposes it would be
> just as nasty to need to adjust every single test case we back-patch for
> the next five years.


That's funny. Linked thread is from 2010. Here we are in 2016 arguing about
it (we would've done by now). :)
Looks like cost of having them around is not exactly 0.

Either way, I've attached another version of my patch - this time it avoids
touching example psql output. Baby steps.
I'll let you decide on the way forward. I'm just happy to send some patches.


Overall, regression tests that compare output of psql seems like a solution
not without drawbacks. I absolutely see the appeal, but this practically
freezes behavior of psql and makes e.g. server tests depend not only server
behavior but also on piece of irrelevant client-only code.
I could imagine a test system that is both has more-or-less human-readable
expected.out files and does not depend on exact decorations added by psql.

-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047
From 80cf7ee7f71c95c89e0908512e41154daff74d3a Mon Sep 17 00:00:00 2001
From: Vladimir Rusinov 
Date: Wed, 14 Dec 2016 22:19:08 +
Subject: [PATCH 1/1] Remove some of the trailing whitespaces from
 documentation.

They are considered bad practice in many style guides and many editors
configured to strip them on every save.

Such editors will produce spurious diffs when editing the documentation.

Signed-off-by: Vladimir Rusinov 
---
 doc/src/sgml/config.sgml  |  2 +-
 doc/src/sgml/parallel.sgml| 22 +++---
 doc/src/sgml/ref/alter_table.sgml |  2 +-
 doc/src/sgml/ref/insert.sgml  |  2 +-
 doc/src/sgml/ref/prepare.sgml |  2 +-
 doc/src/sgml/ref/reindexdb.sgml   |  2 +-
 6 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0fc4e57..3b614b6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1181,7 +1181,7 @@ include_dir 'conf.d'
 it in plaintext. on and off are also accepted, as
 aliases for md5 and plain, respectively.

-   
+
   
  
 
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
index cf4c1c9..bca4886 100644
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -134,12 +134,12 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 
   
 
-   
+  
 The query writes any data or locks any database rows. If a query
 contains a data-modifying operation either at the top level or within
 a CTE, no parallel plans for that query will be generated. This is a
 limitation of the current implementation which could be lifted in a
-future release. 
+future release.
   
 
 
@@ -153,7 +153,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 FOR x IN query LOOP .. END LOOP will never use a
 parallel plan, because the parallel query system is unable to verify
 that the code in the loop is safe to execute while parallel query is
-active. 
+active.
   
 
 
@@ -174,7 +174,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 query itself, that query will never use a parallel plan. This is a
 limitation of the current implementation, but it may not be desirable
 to remove this limitation, since it could result in a single query
-using a very large number of processes. 
+using a very large number of processes.
   
 
 
@@ -197,7 +197,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
 
   
 
-   
+  
 No background workers can be obtained because of the limitation that
 the total number of background workers cannot 

Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-14 Thread Josh Berkus
On 12/14/2016 08:06 AM, Bruce Momjian wrote:
> On Fri, Dec  9, 2016 at 09:46:44AM +0900, Michael Paquier wrote:
 My own take on it is that the release notes are already a massive
 amount of work, and putting duplicative material in a bunch of other
 places isn't going to make things better, it'll just increase the
 maintenance burden.
>>>
>>> This would mean adding literally pages of material to the release notes.
>>> In the past, folks have been very negative on anything which would make
>>> the release notes longer.  Are you sure?
>>
>> As that's a per-version information, that seems adapted to me. There
>> could be as well in the release notes a link to the portion of the
>> docs holding this manual. Definitely this should be self-contained in
>> the docs, and not mention the wiki. My 2c.
> 
> Yes, that is the usual approach.
> 

So where in the docs should these go, then?  We don't (currently) have a
place for this kind of doc.  Appendices?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Stephen Frost
Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Tom Lane wrote:
> 
> > Thinking about this, I'm wondering what is the connection between
> > what psql does and what should be in the SGML (or XML) docs, anyway.
> > Nobody says boo when we have to do s/ > to put it in the docs; why is stripping trailing whitespace a bigger
> > issue?
> 
> Why do we need to put regular psql output verbatim in SGML/XML?  One
> idea is to add an output format to psql for docbook table markup, then
> use that whenever we need to paste stuff into the docs.

I do tend to like this idea.

Though, if we want to talk about writing lots of new code, what I really
like is the pgBackrest doc building system, which actually executes
pgBackrest during the documentation build process to run the commands
and get exactly what they produce included in the docs.  There's been
more than one example of *incorrect* hacking up of the example in the
docs, to the point where the example doesn't actually run, just this
year.

Perhaps if the regression tests ran the examples and produced the output
in a way which could be included in the docs...

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Alvaro Herrera
Tom Lane wrote:

> Thinking about this, I'm wondering what is the connection between
> what psql does and what should be in the SGML (or XML) docs, anyway.
> Nobody says boo when we have to do s/ to put it in the docs; why is stripping trailing whitespace a bigger
> issue?

Why do we need to put regular psql output verbatim in SGML/XML?  One
idea is to add an output format to psql for docbook table markup, then
use that whenever we need to paste stuff into the docs.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> The real problem here, IMO, is the break in expected regression outputs.
> The previous thread mainly discussed that in terms of its impact on
> third-party tests using pg_regress, but for our own purposes it would be
> just as nasty to need to adjust every single test case we back-patch for
> the next five years.

I agree, that'd be annoying.  Having to rewrite the fix for dumping
casts with pg_dump for every single major version we support because the
patch from the prior version couldn't be applied has certainly been a
rather annoying exercise.

> Thinking about this, I'm wondering what is the connection between
> what psql does and what should be in the SGML (or XML) docs, anyway.
> Nobody says boo when we have to do s/ to put it in the docs; why is stripping trailing whitespace a bigger
> issue?

For my 2c, at least, because we're going to be constantly fighting with
the trailing whitespace in those examples.  If you forget to s/ tag" or whatever, and not have to go hack up psql
output at all, which I contend is also rather tedious and annoying to do
when writing new code and new tests.  Is that worse than having to deal
with back-patching regression test output?  Who knows.

That said, if we can make git complain about trailing whitespace in the
docs but not mind it in the regression test output, then at least most
will hopefully realize that they need to go through and strip out the
trailing whitespace before committing.  Maybe we can come up with a psql
output mode of "make this work for the docs" or a tool to pipe arbitrary
text through to have it "do the right thing" for including it in the
docs.

I'm still a bit split, as I do feel like it's 'bad' of psql to be
emitting the trailing whitespace in the first place, even if it's been
done that way forever and even though it'll be an annoyance if we
change it.  Ultimately, I'm alright with either way though, I just don't
like the idea of removing all the whitespace from the docs without a way
to minimize the chance that we end up adding it back due to copy/paste
from psql output.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Peter Eisentraut
On 12/14/16 4:51 PM, Tom Lane wrote:
> Thinking about this, I'm wondering what is the connection between
> what psql does and what should be in the SGML (or XML) docs, anyway.
> Nobody says boo when we have to do s/ to put it in the docs; why is stripping trailing whitespace a bigger
> issue?

There is a previous discussion for that, too, equally inconclusive:
https://www.postgresql.org/message-id/flat/1383707883.10722.5.camel%40vanquo.pezone.net

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Tom Lane
Stephen Frost  writes:
> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>> Previous discussion:
>> https://www.postgresql.org/message-id/flat/1285093687.5468.18.camel%40vanquo.pezone.net

> Thanks for that, but, frankly, it seems like most were in agreement that
> we should go ahead and get rid of the trailing whitespace from psql's
> output.  The last couple emails on that thread hardly seems like a
> critical issue (and I'm not sure why we couldn't eliminate that
> whitespace and then programmatically forbid trailing whitespace
> anyway..).

The real problem here, IMO, is the break in expected regression outputs.
The previous thread mainly discussed that in terms of its impact on
third-party tests using pg_regress, but for our own purposes it would be
just as nasty to need to adjust every single test case we back-patch for
the next five years.

The way I read that thread is that people were left feeling that it
was probably more work than it was worth.  That's what I'm thinking,
anyway.

Thinking about this, I'm wondering what is the connection between
what psql does and what should be in the SGML (or XML) docs, anyway.
Nobody says boo when we have to do s/http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-14 Thread Michael Paquier
On Wed, Dec 14, 2016 at 11:34 PM, Fujii Masao  wrote:
> If we drop the "standby_list" syntax, I don't think that new parameter is
> necessary. We can keep s_s_names and just drop the support for that syntax
> from s_s_names. This may be ok if we're really in "break all the things" mode
> for PostgreSQL 10.

Please let's not raise that as an argument again... And not break the
s_list argument. Many users depend on that for just single sync
standbys. FWIW, I'd be in favor of backward compatibility and say that
a standby list is a priority list if we can maintain that. Upthread
agreement was to break that, I did not insist further, and won't if
that's still the feeling.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Stephen Frost
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 12/14/16 12:03 PM, Stephen Frost wrote:
> > If we do want to change that, perhaps we should also change psql to not
> > output the trailing whitespace in the first place..?
> 
> Previous discussion:
> https://www.postgresql.org/message-id/flat/1285093687.5468.18.camel%40vanquo.pezone.net

Thanks for that, but, frankly, it seems like most were in agreement that
we should go ahead and get rid of the trailing whitespace from psql's
output.  The last couple emails on that thread hardly seems like a
critical issue (and I'm not sure why we couldn't eliminate that
whitespace and then programmatically forbid trailing whitespace
anyway..).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Peter Eisentraut
On 12/14/16 12:03 PM, Stephen Frost wrote:
> If we do want to change that, perhaps we should also change psql to not
> output the trailing whitespace in the first place..?

Previous discussion:
https://www.postgresql.org/message-id/flat/1285093687.5468.18.camel%40vanquo.pezone.net

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-14 Thread Kevin Grittner
On Wed, Dec 14, 2016 at 11:12 AM, Ian Jackson  wrote:
> Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: 
> Retry on constraint violation [and 2 more messages] [and 1 more messages]"):
>> On Wed, Dec 14, 2016 at 10:20 AM, Ian Jackson  
>> wrote:

>> I would alter that slightly to:
>>
>> There is a consistent serialization of all serializable
>> transactions which successfully commit.
>
> Here `serializable' means SERIALIZABLE ?

I'm not entirely sure what you mean to convey by the
capitalization, so I'll just say that 'serializable' there referred
to the transaction isolation level.  (I *think* that was what you
were getting at.)

>> For examples, please see this Wiki page.  You might be particularly
>> interested in the examples in the "Read Only Transactions" section:
>>
>> https://wiki.postgresql.org/wiki/SSI
>
> Thanks.  I read that part of the wiki page.  But in that example, we
> are told that T1 will be aborted, not T3.

That is true in the first "Deposit Report") example in that
section.  The second ("Rollover") example shows the read-only
transaction (T3) being the one which is aborted and retried.

> Can it happen that a transaction which does not make any update
> attempts, will see "impossible" data, and that this is only detected
> at COMMIT ?  Does that apply even to READ ONLY transactions ?

As Robert pointed out, a read-only transaction cannot normally be
aborted by a serialization failure on COMMIT.  Under exceptional
conditions, like an attempt to suppress the serialization failure,
you might see the commit aborted, though.

Also as pointed out by Robert, the state seen by a read-only
transaction doesn't lack internal consistency, but it will be
rolled back with a serialization failure exception if it can show a
state which is inconsistent with some successfully-committed state
of the database.

In the "Rollover" example, the first time T3 is attempted its
SELECT it would have shown rows containing 100 and 11, were it not
canceled.  That could have been consistent with the earlier state
of 100 and 10 and the business rules that the first number can only
change by having the second number added to it, and the second
number can only change by being incremented; but that state and
those rules don't fit with the *later* state of 110, 11, because
that requires that the second number be added to the first before
it was incremented, and if we allow the result of the first T3
transaction attempt to be seen, it would tell us that the increment
happened first.  Since we've already allowed successful commit of
transactions putting things into a state only consistent with
adding 10 to 100 before incrementing 10 to 11, cancel the read-only
transaction and start over.  This time it will show something
consistent with the apparent order of execution of the other
transactions.

Note that neither the order that the first two transaction started
in (T1->T2) nor the order they committed in (T2->T1) determines the
apparent order of execution.  It is the rw-dependencies that
control (T1 reads a version of data before T2's work is applied, so
T1 *appears* to have run before T2 in apparent order of execution.)
Since both are successfully committed with that apparent order of
execution, a third transaction (T3), which sees the work of T2
(since it had committed when the snapshot for T3 was taken) but not
T1 (since it had not committed when the snapshot for T3 was taken)
cannot be allowed to proceed.

I know an example like that can cause one's head to hurt a bit
(been there), but even if you don't fight your way to a full grasp
of that case, it will hopefully give you some idea of both why we
can have high concurrency with this approach, and why it is
necessary to discard results from failed transactions.

>> Once a serialization failure occurs the transaction is flagged as
>> "doomed" and will not, under any circumstances be allowed to
>> commit.  If you find any exception to that, please report it as a
>> bug.
>
> Right.  I think this prevents any exception-catching arrangements from
> suppressing the serialisation failure.  Since AIUI it is not possible
> to run the outer COMMIT from within an exception trapping context.

Right.

> If it /is/ possible to run that outer COMMIT in a way which swallows
> the exception then [...]

That is not possible, as I understand things.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Linear vs. nonlinear planner cost estimates

2016-12-14 Thread Tom Lane
Robert Haas  writes:
> On Wed, Dec 14, 2016 at 2:12 PM, Tom Lane  wrote:
>> I don't have a concrete proposal right now about how to fix this.  The
>> most expansive response would be to decorate every path with an explicitly
>> nonlinear cost function, which would need to be able to report the cost
>> to fetch the first N tuples rather than the total scan cost.  That would
>> be a lot of work though.

> And it would have some significant overhead, I bet.

I think the extra cost of running such a function would only come in when
we're actually considering a query with LIMIT and are ready to choose
which path gets the LIMIT stuck on top of it.  That seems okay to me.
My real pain point with extra planner work is when it happens in queries
that get no benefit.

The really *critical* aspect of this, which could cost far more than a
few extra executions of a costing function, comes in if it damages
add_path's ability to prune inferior paths early.  So we would not want
to simply drop the startup_cost field; and we certainly don't want
add_path calling this function during every path comparison, either.
So my thought is that we'd need to keep startup_cost for add_path to
use, though we could (and likely should) redefine it as below.  We
should still be able to assume that if path A dominates path B at
first-tuple and last-tuple costs, it dominates everywhere in between.

A thought that occurred to me after more study of the problem example
is that the "good" index is a bit bigger than the "bad" one, meaning
it will get charged a bit more in index descent costs.  With our current
definition of startup_cost as being only the index descent cost, that
means the good index looks worse on startup_cost than the bad one.  In
this example, the good index should survive the add_path tournament
anyway because it has better total_cost --- but it's easy to imagine
cases where that wouldn't be true.  So I'm thinking that redefining
startup_cost as time to fetch the first tuple would be a good thing
in terms of making sure that desirable plans don't lose out in add_path.

>> Maybe we could make this work just by setting the startup cost of an
>> indexscan differently.  We could redefine "startup cost" as "cost to fetch
>> the first tuple and then stop", and compute that independently of the
>> total cost for plan types where it'd matter.  That would fix the problem
>> for LIMIT 1 cases, and even for cases with a larger limit, scaling
>> linearly from that to the total cost would produce a better estimate than
>> what we get now.  But it feels like it's still a band-aid.

> I think that would be a good place to start.  I bet it would help
> materially, and it's a lot less work than anything more sophisticated.
> There are plenty of cases (semi joins, and many nested loops that are
> not semi joins) where the number of rows that we expect to fetch is 1
> or very close to 1; LIMIT 100 or LIMIT 1000 happens, but it's a lot
> less common.

Yeah.  Also, per the further thoughts above, we'd probably still end up
redefining startup_cost like this even if we then plastered on a function
to do more complex interpolation in between first and last tuple.  So
it'd make sense to do this as a first step even if we add the additional
mechanism later.

I'll put this on my to-do list...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Linear vs. nonlinear planner cost estimates

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 2:12 PM, Tom Lane  wrote:
> I don't have a concrete proposal right now about how to fix this.  The
> most expansive response would be to decorate every path with an explicitly
> nonlinear cost function, which would need to be able to report the cost
> to fetch the first N tuples rather than the total scan cost.  That would
> be a lot of work though.

And it would have some significant overhead, I bet.

> Maybe we could make this work just by setting the startup cost of an
> indexscan differently.  We could redefine "startup cost" as "cost to fetch
> the first tuple and then stop", and compute that independently of the
> total cost for plan types where it'd matter.  That would fix the problem
> for LIMIT 1 cases, and even for cases with a larger limit, scaling
> linearly from that to the total cost would produce a better estimate than
> what we get now.  But it feels like it's still a band-aid.

I think that would be a good place to start.  I bet it would help
materially, and it's a lot less work than anything more sophisticated.
There are plenty of cases (semi joins, and many nested loops that are
not semi joins) where the number of rows that we expect to fetch is 1
or very close to 1; LIMIT 100 or LIMIT 1000 happens, but it's a lot
less common.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating a DSA area to provide work space for parallel execution

2016-12-14 Thread Robert Haas
On Mon, Dec 5, 2016 at 3:12 PM, Robert Haas  wrote:
> On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
>  wrote:
>> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
>>  wrote:
>>> Here's a new version to apply on top of dsa-v7.patch.
>>
>> Here's a version to go with dsa-v8.patch.
>
> Thomas has spent a fair amount of time beating me up off-list about
> the fact that we have no way to recycle LWLock tranche IDs, and I've
> spent a fair amount of time trying to defend that decision, but it's
> really a problem here.  This is all well and good the way it's written
> provided that there's only one parallel context in existence at a
> time, but should that number ever exceed 1, which it can, then the
> tranche's array_base will be incorrect for LWLocks in one or the other
> tranche.
>
> That *almost* doesn't matter.  If you're not compiling with dtrace or
> LOCK_DEBUG or LWLOCK_STATS, you'll be fine.  And if you are compiling
> with one of those things, I believe the consequences will be no worse
> than an occasional nonsensical lock ID.  It's halfway tempting to just
> accept that as a known wart, but, uggh, that sucks.
>
> It's a bit hard to come up with a better alternative.

After thinking about this some more, I realized that the problem here
boils down to T_ID() not being smart enough to cope with multiple
instances of the same tranche at different base addresses.  We can
either make it more complicated so that it can do that, or (drum roll,
please!) get rid of it altogether.  I don't think making it more
complicated is very desirable, because I think that we end up
computing T_ID() for every lock acquisition and release whenever
--enable-dtrace is used, even if dtrace is not actually in use.  And
the usefulness of T_ID() for debugging is pretty marginal, with one
important exception, which is that currently T_ID() is used to
distinguish between individual LWLocks in the main tranche.  So here's
my proposal:

1. Give every LWLock in the main tranche a separate tranche ID.  This
has been previously proposed, so it's not a revolutionary concept.

2. Always identify LWLocks in pg_stat_activity only by tranche ID,
never offset within the tranche, not even for the main tranche.  This
results in pg_stat_activity saying "LWLock" rather than either
"LWLockNamed" or "LWLockTranche", which is a user-visible behavior
change but not AFAICS a very upsetting one.

3. Change the dtrace probes that currently pass both T_NAME() and
T_ID() to pass only T_NAME().  This is a minor loss of information for
dtrace, but in view of (1) it's not a very significant loss.

4. Change LOCK_DEBUG and LWLOCK_STATS output that identifies locks
using T_NAME() and T_ID() to instead identify them by T_NAME() and the
pointer address.  Since these are only developer debugging facilities
not intended for production builds, I think it's OK to expose the
pointer address, and it's arguably MORE useful to do so than to expose
an offset into an array with an unknown base address.

5. Remove T_ID() from the can't-happen elog() in LWLockRelease().

6. Remove T_ID() itself.  And then, since that's the only client of
array_base/array_stride, remove those too.  And then, since there's
nothing left in LWLockTranche except for the tranche name, get rid of
the whole structure, simplifying a whole bunch of code all over the
system.

Patch implementing all of this attached.  There's further
simplification that could be done here -- with array_stride gone, we
could do more at compile time rather than run-time -- but I think that
can be left for another day.

The overall result of this is a considerable savings of code:

 doc/src/sgml/monitoring.sgml |  52 -
 src/backend/access/transam/slru.c|   6 +-
 src/backend/access/transam/xlog.c|   9 +-
 src/backend/postmaster/pgstat.c  |  10 +-
 src/backend/replication/logical/origin.c |   8 +-
 src/backend/replication/slot.c   |   8 +-
 src/backend/storage/buffer/buf_init.c|  16 +--
 src/backend/storage/ipc/procarray.c  |   9 +-
 src/backend/storage/lmgr/lwlock.c| 175 ++-
 src/backend/utils/mmgr/dsa.c |  15 +--
 src/backend/utils/probes.d   |  16 +--
 src/include/access/slru.h|   1 -
 src/include/pgstat.h |   3 +-
 src/include/storage/lwlock.h |  45 ++--
 14 files changed, 112 insertions(+), 261 deletions(-)

It also noticeably reduces the number of bytes of machine code
generated for lwlock.c:

[rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo master
__TEXT__DATA__OBJCothersdechex
11815336004643061605f0a5
[rhaas pgsql]$ size src/backend/storage/lmgr/lwlock.o # echo lwlock
__TEXT__DATA__OBJCothersdechex
11199326404548759950ea2e

That's better than a 5% reduction in the code size of 

Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Joshua D. Drake

On 12/14/2016 11:41 AM, Stephen Frost wrote:

* Heikki Linnakangas (hlinn...@iki.fi) wrote:

On 14 December 2016 20:12:05 EET, Bruce Momjian  wrote:

On Wed, Dec 14, 2016 at 11:27:15AM +0100, Magnus Hagander wrote:



Storing plaintext passwords has been bad form for just about forever and
I wouldn't be sad to see our support of it go.  At the least, as was
discussed somewhere, but I'm not sure where it ended up, we should give
administrators the ability to control what ways a password can be
stored.  In particular, once a user has migrated all of their users to
SCRAM, they should be able to say "don't let new passwords be in any
format other than SCRAM-SHA-256".


It isn't as bad as it used to be. I remember with PASSWORD was the 
default. I agree that we should be able to set a policy that says, "we 
only allow X for password storage".


JD




Thanks!

Stephen




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Stephen Frost
* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> On 14 December 2016 20:12:05 EET, Bruce Momjian  wrote:
> >On Wed, Dec 14, 2016 at 11:27:15AM +0100, Magnus Hagander wrote:
> >> I would so like to just drop support for plain passwords completely
> >:) But
> >> there's a backwards compatibility issue to think about of course.
> >> 
> >> But -- is there any actual usecase for them anymore?
> >
> >I thought we recommended 'password' for SSL connections because if you
> >use MD5 passwords the password text layout is known and that simplifies
> >cryptanalysis.
> 
> No, that makes no sense. And whether you use 'password' or 'md5' 
> authentication is a different question than whether you store passwords in 
> plaintext or as md5 hashes. Magnus was asking whether it ever makes sense to 
> *store* passwords in plaintext.

Right.

> Since you brought it up, there is a legitimate argument to be made that 
> 'password' authentication is more secure than 'md5', when SSL is used. 
> Namely, if an attacker can acquire contents of pg_authid e.g. by stealing a 
> backup tape, with 'md5' authentication he can log in as any user, using just 
> the stolen hashes. But with 'password', he needs to reverse the hash first. 
> It's not a great difference, but it's something.

Tunnelled passwords which are stored as hashes is also well understood
and comparable to SSH with passwords in /etc/passwd.

Storing plaintext passwords has been bad form for just about forever and
I wouldn't be sad to see our support of it go.  At the least, as was
discussed somewhere, but I'm not sure where it ended up, we should give
administrators the ability to control what ways a password can be
stored.  In particular, once a user has migrated all of their users to
SCRAM, they should be able to say "don't let new passwords be in any
format other than SCRAM-SHA-256".

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Heikki Linnakangas


On 14 December 2016 20:12:05 EET, Bruce Momjian  wrote:
>On Wed, Dec 14, 2016 at 11:27:15AM +0100, Magnus Hagander wrote:
>> I would so like to just drop support for plain passwords completely
>:) But
>> there's a backwards compatibility issue to think about of course.
>> 
>> But -- is there any actual usecase for them anymore?
>
>I thought we recommended 'password' for SSL connections because if you
>use MD5 passwords the password text layout is known and that simplifies
>cryptanalysis.

No, that makes no sense. And whether you use 'password' or 'md5' authentication 
is a different question than whether you store passwords in plaintext or as md5 
hashes. Magnus was asking whether it ever makes sense to *store* passwords in 
plaintext.

Since you brought it up, there is a legitimate argument to be made that 
'password' authentication is more secure than 'md5', when SSL is used. Namely, 
if an attacker can acquire contents of pg_authid e.g. by stealing a backup 
tape, with 'md5' authentication he can log in as any user, using just the 
stolen hashes. But with 'password', he needs to reverse the hash first. It's 
not a great difference, but it's something.

 - Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2016-12-14 Thread Peter Eisentraut
On 12/14/16 11:33 AM, Andrew Borodin wrote:
> 2016-12-14 20:45 GMT+05:00 Peter Eisentraut 
> :
>> On 12/11/16 5:38 AM, Andrew Borodin wrote:
>>> 2. From my point of view the interface of the feature is the strong
>>> point here (compared to pg_background). But it does not help
>>> programmer to follow good practice: bgworker is a valuable and limited
>>> resource, may be we could start session with something like
>>> TryBeginSession()?
>>
>> What exactly would that do?
> Return status (success\failure) and session object, if a function succeeded.
> 
> If there is max_connections exceeded, then (false,null).
> 
> I'm not sure whether this idiom is common for Python.

You can catch PostgreSQL exceptions in PL/Python, so this can be handled
in user code.

Some better connection management or pooling can probably be built on
top of the primitives later, I'd say.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Linear vs. nonlinear planner cost estimates

2016-12-14 Thread Tom Lane
I've been looking into the problem reported here:
https://www.postgresql.org/message-id/CAOfJSTwzQ7Fx6Yjeg9mFkMsM5OVKPoa=egkhcegdkr1twg8...@mail.gmail.com

I can reproduce similar misbehavior with this test case:

create table updates as
select (-log(random()) * 10)::int as driver_id,
   now() + x * '1 second'::interval as time,
   random() as junk
from generate_series(1,1500) x;

(I've intentionally set up this test data so that "driver_id" is randomly
ordered while the "time" column is perfectly in table order.  While I
don't have direct evidence, it seems pretty plausible that the original
problem table might be somewhat like that.)

set maintenance_work_mem = '100MB';
set default_statistics_target = 1;
vacuum analyze updates;
create index on updates(driver_id, time);

explain SELECT * FROM updates WHERE driver_id=1 ORDER BY "time" DESC LIMIT 1;
 QUERY PLAN 
 
-
 Limit  (cost=0.56..0.73 rows=1 width=20)
   ->  Index Scan Backward using updates_driver_id_time_idx on updates  
(cost=0.56..470495.61 rows=2750660 width=20)
 Index Cond: (driver_id = 1)
(3 rows)

That's all good, but if we create this less-well-adapted index, the
planner prefers it:

create index on updates(time);

explain SELECT * FROM updates WHERE driver_id=1 ORDER BY "time" DESC LIMIT 1;
QUERY PLAN  
   
---
 Limit  (cost=0.43..0.62 rows=1 width=20)
   ->  Index Scan Backward using updates_time_idx on updates  
(cost=0.43..522569.43 rows=2750660 width=20)
 Filter: (driver_id = 1)
(3 rows)

Why did that happen?  The filter can be expected to reject about four out
of every five tuples, so this plan should be expected to cost about five
times as much to get the first row.  (If the driver_id values aren't
randomly located, things could be far worse; but the point here is that
even under the planner's assumption of uncorrelated row order, this should
not be considered a preferable plan.)

I think the reason is that, because the planner knows that using this
index will require a full-index scan, it's costing the updates_time_idx
plan on the basis of a full scan.  Moreover, that index is perfectly
correlated with the physical table order, so it gets a low cost per fetch.
Evidently, the cost is coming out at about 522569/1500 or 0.035 cost
units per tuple fetched.  Meanwhile, the better plan is being costed with
the knowledge that it fetches only one-fifth of the table; and we also see
that that index has zero ordering correlation with the table.  So the cost
per tuple fetched is coming out at about 0.171, which is enough more that
even having to fetch five tuples instead of one makes the poorer plan look
cheaper.

This is, of course, nonsense; we're costing the plans on the basis of
fetching many more tuples than will actually happen, and the cost
amortization effects that cost_index is accounting for will not happen.
So in short, we're taking a fundamentally nonlinear cost estimate from
cost_index and then scaling it linearly to try to account for the LIMIT.

I've known that this was a theoretical hazard of the way we do LIMIT
costing for some time (I recall bitching about it in my 2011 PGCon talk).
But this is the first case I've seen where it results in a wrong choice
in a relatively simple query.

I don't have a concrete proposal right now about how to fix this.  The
most expansive response would be to decorate every path with an explicitly
nonlinear cost function, which would need to be able to report the cost
to fetch the first N tuples rather than the total scan cost.  That would
be a lot of work though.

Maybe we could make this work just by setting the startup cost of an
indexscan differently.  We could redefine "startup cost" as "cost to fetch
the first tuple and then stop", and compute that independently of the
total cost for plan types where it'd matter.  That would fix the problem
for LIMIT 1 cases, and even for cases with a larger limit, scaling
linearly from that to the total cost would produce a better estimate than
what we get now.  But it feels like it's still a band-aid.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-14 Thread Fujii Masao
On Wed, Dec 14, 2016 at 11:10 PM, Fujii Masao  wrote:
> On Tue, Dec 13, 2016 at 10:24 PM, Michael Paquier
>  wrote:
>> On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao  wrote:
>>
>> Thanks for the review.
>
> Thanks for the updated version of the patch!
>
>>> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>> + errmsg("a backup is already starting")));
>>> +}
>>> +if (XLogCtl->Insert.exclusiveBackupState == 
>>> EXCLUSIVE_BACKUP_STOPPING)
>>> +{
>>> +WALInsertLockRelease();
>>> +ereport(ERROR,
>>> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>> + errmsg("a backup is currently stopping")));
>>>
>>> Isn't it better to use "an exclusive backup" explicitly rather than
>>> "a backup" here?
>>
>> Yes. It would be better to not touch the original in-progress messages
>> though.
>
> On second thought, do users really want to distinguish those three errornous
> states? I'm inclined to merge the checks for those three conditions into one,
> that is,
>
> if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)
> {
> WALInsertLockRelease();
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>  errmsg("a backup is already in progress"),
>
> Also it may be better to handle the similar checks in pg_stop_backup,
> in the same way.

Here is the updated version of the patch that I applied the above "merge" to.

Unfortunately this patch is not applied cleanly to old versions.
So we need to create more patches for back-patch.

Regards,

-- 
Fujii Masao


base-backup-crash-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Bruce Momjian
On Wed, Dec 14, 2016 at 11:27:15AM +0100, Magnus Hagander wrote:
> I would so like to just drop support for plain passwords completely :) But
> there's a backwards compatibility issue to think about of course.
> 
> But -- is there any actual usecase for them anymore?

I thought we recommended 'password' for SSL connections because if you
use MD5 passwords the password text layout is known and that simplifies
cryptanalysis.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Stephen Frost
* Vladimir Rusinov (vrusi...@google.com) wrote:
> I'm not sure if it makes sense to merge just these, as it will not help
> people with whitespace-eating editors.

I think we've established that it's going to be quite a while before we
will reach a point where whitespace-eating editors aren't a problem when
it comes to working with our docs.

That doesn't mean we shouldn't fix obviously incorrect trailing
whitespace in the docs.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Vladimir Rusinov
On Wed, Dec 14, 2016 at 5:41 PM, Stephen Frost  wrote:

> > They are considered bad practice in many style guides and many editors
> > configured to stip them on every save.
> >
> > Such editors will produce spurious diffs when editing the documentation.
> >
> > Therefore, I propose this patch.
>
> As mentioned down-thread, most of this is from psql output and I don't
> know that we actually want to get rid of that whitespace.  Ideally, we
> could indicate that trailing whitespace should be preserved when
> printing examples, either with the SGML or the XML format.
>
> The (relativly few) ones I include below do look like cases we should
> probably fix and back-patch (to simplify later back-patching efforts).
> Most of these only go back to 9.6 (parallel.sgml) anyway.


I'm not sure if it makes sense to merge just these, as it will not help
people with whitespace-eating editors.

-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Stephen Frost
* Vladimir Rusinov (vrusi...@google.com) wrote:
> They are considered bad practice in many style guides and many editors
> configured to stip them on every save.
> 
> Such editors will produce spurious diffs when editing the documentation.
> 
> Therefore, I propose this patch.

As mentioned down-thread, most of this is from psql output and I don't
know that we actually want to get rid of that whitespace.  Ideally, we
could indicate that trailing whitespace should be preserved when
printing examples, either with the SGML or the XML format.

The (relativly few) ones I include below do look like cases we should
probably fix and back-patch (to simplify later back-patching efforts).
Most of these only go back to 9.6 (parallel.sgml) anyway.

Thanks!

Stephen

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index 0fc4e57..3b614b6 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -1181,7 +1181,7 @@ include_dir 'conf.d'
>  it in plaintext. on and off are also 
> accepted, as
>  aliases for md5 and plain, respectively.
> 
> -   
> +
>
>   

> diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml
> @@ -134,12 +134,12 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler 
> LIKE '%x%';
>  
>
>  
> -   
> +  
>  The query writes any data or locks any database rows. If a query
>  contains a data-modifying operation either at the top level or within
>  a CTE, no parallel plans for that query will be generated. This is a
>  limitation of the current implementation which could be lifted in a
> -future release. 
> +future release.
>
>  
>  
> @@ -153,7 +153,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  FOR x IN query LOOP .. END LOOP will never use a
>  parallel plan, because the parallel query system is unable to verify
>  that the code in the loop is safe to execute while parallel query is
> -active. 
> +active.
>
>  
>  
> @@ -174,7 +174,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  query itself, that query will never use a parallel plan. This is a
>  limitation of the current implementation, but it may not be desirable
>  to remove this limitation, since it could result in a single query
> -using a very large number of processes. 
> +using a very large number of processes.
>
>  
>  
> @@ -197,7 +197,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  
>
>  
> -   
> +  
>  No background workers can be obtained because of the limitation that
>  the total number of background workers cannot exceed
>  .
> @@ -205,7 +205,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  
>  
>  
> -   
> +  
>  No background workers can be obtained because of the limitation that
>  the total number of background workers launched for purposes of
>  parallel query cannot exceed  linkend="guc-max-parallel-workers">.
> @@ -213,7 +213,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  
>  
>  
> -   
> +  
>  The client sends an Execute message with a non-zero fetch count.
>  See the discussion of the
>  extended query 
> protocol.
> @@ -228,7 +228,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  
>  
>  
> -   
> +  
>  A prepared statement is executed using a CREATE TABLE .. AS
>  EXECUTE .. statement.  This construct converts what 
> otherwise
>  would have been a read-only operation into a read-write operation,
> @@ -237,7 +237,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  
>  
>  
> -   
> +  
>  The transaction isolation level is serializable.  This situation
>  does not normally arise, because parallel query plans are not
>  generated when the transaction isolation level is serializable.
> @@ -467,7 +467,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  transaction. If you write a function which does this, and this behavior
>  difference is important to you, mark such functions as
>  PARALLEL RESTRICTED
> -to ensure that they execute only in the leader. 
> +to ensure that they execute only in the leader.
>
>  
>
> @@ -475,7 +475,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE 
> '%x%';
>  parallel-restricted functions or aggregates involved in the query in
>  order to obtain a superior plan.  So, for example, if a WHERE
>  clause applied to a particular table is parallel restricted, the query
> -planner will not consider placing the scan of that table below a 
> +planner will not consider 

Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Vladimir Rusinov
On Wed, Dec 14, 2016 at 5:14 PM, Tom Lane  wrote:

> Stephen Frost  writes:
> > If we do want to change that, perhaps we should also change psql to not
> > output the trailing whitespace in the first place..?
>
> Yeah, maybe.  I seem to recall having looked at that a long time ago
> and deciding that it wasn't worth the trouble, but the code involved
> has probably been restructured since then, so maybe it'd be easier now.
>

That sounds like a good idea, I guess I can take another look.
I could think of no reason to keep whitespaces there.

Of course, that would also create a back-patch breakpoint for every
> single regression test expected-file, which is doubling down on the
> PITA factor in a rather major way.
>

It looks like tests (at least in master) ignore whitespace-only diffs.
So we can handle trailing whitespaces in expected output files as a
separate issue.
One way to ease the pain is to remove trailing whitespaces in all supported
branches via separate patches.

-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > If we do want to change that, perhaps we should also change psql to not
> > output the trailing whitespace in the first place..?
> 
> Yeah, maybe.  I seem to recall having looked at that a long time ago
> and deciding that it wasn't worth the trouble, but the code involved
> has probably been restructured since then, so maybe it'd be easier now.
> 
> Of course, that would also create a back-patch breakpoint for every
> single regression test expected-file, which is doubling down on the
> PITA factor in a rather major way.

I'm not convinced that any of this is worth the effort.  Wouldn't it be
possible, if we are moving the docs to XML, to mark the examples in the
docs in a way similar to LaTeX's 'verbatim' and allow trailing
whitespace there?

We wouldn't be able to use git to complain about trailing whitespace in
the docs then, I don't think, but perhaps we could detect invalid
trailing whitespace when building the docs and complain then.  I expect
that most of us build the docs before we commit anything to them anyway.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Vladimir Rusinov
On Wed, Dec 14, 2016 at 4:50 PM, Tom Lane  wrote:

> Vladimir Rusinov  writes:
> > Therefore, I propose this patch.
>
> Right now is a really bad time to do that; what it will mostly accomplish
> is to break back-patching of doc fixes for little benefit.
>

ack. As I said, it's a proposal and I'm not too attached to it.
Glad it sparked some discussions though.

That said, I don't fully buy this argument: diff is very simple. Merge
conflicts during backporting will be trivial to fix, although a bit more
annoying.


> There is work afoot to convert the documentation to xml.  If that
> succeeds, it'd make sense to strip trailing spaces (and start enforcing
> that in .gitattributes) when we do that, since it'll be a back-patch
> breakpoint anyway.  But right now the PITA factor outweighs the benefit.
>

What is the ETA for this work to be complete (or fail and be abandoned)?

-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] Hash Indexes

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 4:27 AM, Amit Kapila  wrote:
>> It's not required for enabling WAL.  You don't *have* to release and
>> reacquire the buffer lock; in fact, that just adds overhead.
>
> If we don't release the lock, then it will break the general coding
> pattern of writing WAL (Acquire pin and an exclusive lock,
> Markbufferdirty, Write WAL, Release Lock).  Basically, I think it is
> to ensure that we don't hold it for multiple atomic operations or in
> this case avoid calling MarkBufferDirty multiple times.

I think you're being too pedantic.  Yes, the general rule is to
release the lock after each WAL record, but no harm comes if we take
the lock, emit TWO WAL records, and then release it.

> It is possible that we can MarkBufferDirty multiple times (twice in
> hashbucketcleanup and then in _hash_squeezebucket) while holding the
> lock.  I don't think that is a big problem as of now but wanted to
> avoid it as I thought we need it for WAL patch.

I see no harm in calling MarkBufferDirty multiple times, either now or
after the WAL patch.  Of course we don't want to end up with tons of
extra calls - it's not totally free - but it's pretty cheap.

>>  Aside from hopefully fixing the bug we're talking about
>> here, this makes the logic in several places noticeably less
>> contorted.
>
> Yeah, it will fix the problem in hashbucketcleanup, but there are two
> other problems that need to be fixed for which I can send a separate
> patch.  By the way, as mentioned to you earlier that WAL patch already
> takes care of removing _hash_wrtbuf and simplified the logic wherever
> possible without introducing the logic of MarkBufferDirty multiple
> times under one lock.  However, if you want to proceed with the
> current patch, then I can send you separate patches for the remaining
> problems as addressed in bug fix patch.

That sounds good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Tom Lane
Stephen Frost  writes:
> If we do want to change that, perhaps we should also change psql to not
> output the trailing whitespace in the first place..?

Yeah, maybe.  I seem to recall having looked at that a long time ago
and deciding that it wasn't worth the trouble, but the code involved
has probably been restructured since then, so maybe it'd be easier now.

Of course, that would also create a back-patch breakpoint for every
single regression test expected-file, which is doubling down on the
PITA factor in a rather major way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 11:20 AM, Ian Jackson  wrote:
>> I'm not sure Ian is intentionally taking that position.  Not all of us
>> are as familiar with the ramifications of every serializability
>> behavior we may want as you are.
>
> Indeed.  I think it's fair to say that I'm totally unfamiliar with the
> ramifications.  You might also fairly characterise me as naive; I had
> certainly made some assumptions which it seems are known (around here)
> to be both false and difficult to make true.

We can't all be database gurus...

> Let me try to summarise my understanding of what the developers think
> they can and intend to promise, about SERIALIZABLE transactions:
>
>  There is a consistent serialisation of all transactions which
>  successfully commit; or which do not attempt to make any changes.

I think we've figured out that it is limited to transactions which
successfully commit plus read-only transactions that roll back at the
top level but never roll back a subtransaction.  And I'm not sure
there aren't other exceptions.  Basically, be very wary about relying
on information extracted from a transaction that rolled back: there
might be dragons there.

>  A "consistent serialisation" means that there is an order in which
>  the same transactions might have been executed giving exactly the
>  answers.  This includes, if applicable, the same errors.  (The
>  database is free to generate synchronisation failure errors 40P01 and
>  40001 whenever it chooses.)

Seems right.

>  A transaction which attempts to make any changes, and which does not
>  commit (whether because the application never asks for COMMIT, or
>  because of reported synchronisation failure) might see internally
>  inconsistent data, or an internally-consistent view which is not
>  compatible with any serialisation of other transactions.  An
>  application which needs a coherent view should not rely on any of the
>  information from such a transaction.

I think it will see an internally-consistent view which is not
compatible with any global serial ordering.  I don't see why it would
see an internally-inconsistent view; inconsistent how?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-14 Thread Ian Jackson
Kevin Grittner writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry 
on constraint violation [and 2 more messages] [and 1 more messages]"):
> On Wed, Dec 14, 2016 at 10:20 AM, Ian Jackson  
> wrote:
> 
> > Let me try to summarise my understanding of what the developers think
> > they can and intend to promise, about SERIALIZABLE transactions:
> >
> >  There is a consistent serialisation of all transactions which
> >  successfully commit; or which do not attempt to make any changes.
> >
> >  A "consistent serialisation" means that there is an order in which
> >  the same transactions might have been executed giving exactly the
> >  answers.  This includes, if applicable, the same errors.  (The
> >  database is free to generate synchronisation failure errors 40P01 and
> >  40001 whenever it chooses.)
> 
> I would alter that slightly to:
> 
> There is a consistent serialization of all serializable
> transactions which successfully commit.

Here `serializable' means SERIALIZABLE ?

> >  A transaction which attempts to make any changes, and which does not
> >  commit (whether because the application never asks for COMMIT, or
> >  because of reported synchronisation failure) might see internally
> >  inconsistent data, or an internally-consistent view which is not
> >  compatible with any serialisation of other transactions.  An
> >  application which needs a coherent view should not rely on any of the
> >  information from such a transaction.
> 
> Even a read-only transaction can see a state that is not consistent
> with business rules (as enforced in the software) given that some
> particular later state is reached.
> 
> For examples, please see this Wiki page.  You might be particularly
> interested in the examples in the "Read Only Transactions" section:
> 
> https://wiki.postgresql.org/wiki/SSI

Thanks.  I read that part of the wiki page.  But in that example, we
are told that T1 will be aborted, not T3.

Can it happen that a transaction which does not make any update
attempts, will see "impossible" data, and that this is only detected
at COMMIT ?  Does that apply even to READ ONLY transactions ?

> >  Serialisation failures in subtransactions might cause the parent
> >  transaction to experience a serialisation failure too.
> 
> There is currently at least one bug

Right.  I was trying to capture the intent, modulo bugs.

> Once a serialization failure occurs the transaction is flagged as
> "doomed" and will not, under any circumstances be allowed to
> commit.  If you find any exception to that, please report it as a
> bug.

Right.  I think this prevents any exception-catching arrangements from
suppressing the serialisation failure.  Since AIUI it is not possible
to run the outer COMMIT from within an exception trapping context.

If it /is/ possible to run that outer COMMIT in a way which swallows
the exception then this is not a practical problem but the wording
ought to be changed to refer to the success of the COMMIT statement.

Ian.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Vladimir Rusinov  writes:
> > Therefore, I propose this patch.
> 
> Right now is a really bad time to do that; what it will mostly accomplish
> is to break back-patching of doc fixes for little benefit.
> 
> There is work afoot to convert the documentation to xml.  If that
> succeeds, it'd make sense to strip trailing spaces (and start enforcing
> that in .gitattributes) when we do that, since it'll be a back-patch
> breakpoint anyway.  But right now the PITA factor outweighs the benefit.

Almost all of that patch was just removing the whitespace from examples
where the actual commands used produced output with trailing whitespace.

I'm not entirely sure we want to change those cases given that's what we
do, in fact, produce as output.  In other words, the trailing whitespace
there isn't just because someone mistakenly included it.

If we do want to change that, perhaps we should also change psql to not
output the trailing whitespace in the first place..?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-14 Thread Kevin Grittner
On Wed, Dec 14, 2016 at 10:20 AM, Ian Jackson  wrote:

> Let me try to summarise my understanding of what the developers think
> they can and intend to promise, about SERIALIZABLE transactions:
>
>  There is a consistent serialisation of all transactions which
>  successfully commit; or which do not attempt to make any changes.
>
>  A "consistent serialisation" means that there is an order in which
>  the same transactions might have been executed giving exactly the
>  answers.  This includes, if applicable, the same errors.  (The
>  database is free to generate synchronisation failure errors 40P01 and
>  40001 whenever it chooses.)

I would alter that slightly to:

There is a consistent serialization of all serializable
transactions which successfully commit.

>  A transaction which attempts to make any changes, and which does not
>  commit (whether because the application never asks for COMMIT, or
>  because of reported synchronisation failure) might see internally
>  inconsistent data, or an internally-consistent view which is not
>  compatible with any serialisation of other transactions.  An
>  application which needs a coherent view should not rely on any of the
>  information from such a transaction.

Even a read-only transaction can see a state that is not consistent
with business rules (as enforced in the software) given that some
particular later state is reached.

For examples, please see this Wiki page.  You might be particularly
interested in the examples in the "Read Only Transactions" section:

https://wiki.postgresql.org/wiki/SSI

>  Serialisation failures in subtransactions might cause the parent
>  transaction to experience a serialisation failure too.

There is currently at least one bug which may allow serialization
anomalies into the database if a constraint violation error is
thrown in a subtransaction and that subtransaction catches and
suppresses that exception and rolls back its work without throwing
an error.  I expect that any bugs of this type are will be fixed in
a minor release set soon -- probably the next one that is released.
Note that I don't think that an exception from any source other
than a declarative constraint can cause this type of problem, and
that other conditions must exist in combination with this to create
a serialization anomaly.

A serialization failure within any subtransaction will ensure the
top level transaction will fail, even if there is an attempt to
catch this exception and commit the top level transaction.  It
would be possible to catch a serialization failure exception and
throw some *other* exception to terminate the transaction; however,
(to step into very convoluted territory) if that other exception is
caught and suppressed, the serialization failure error would occur.
Once a serialization failure occurs the transaction is flagged as
"doomed" and will not, under any circumstances be allowed to
commit.  If you find any exception to that, please report it as a
bug.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Tom Lane
Vladimir Rusinov  writes:
> Therefore, I propose this patch.

Right now is a really bad time to do that; what it will mostly accomplish
is to break back-patching of doc fixes for little benefit.

There is work afoot to convert the documentation to xml.  If that
succeeds, it'd make sense to strip trailing spaces (and start enforcing
that in .gitattributes) when we do that, since it'll be a back-patch
breakpoint anyway.  But right now the PITA factor outweighs the benefit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Remove trailing whitespaces from documentation

2016-12-14 Thread Vladimir Rusinov
They are considered bad practice in many style guides and many editors
configured to stip them on every save.

Such editors will produce spurious diffs when editing the documentation.

Therefore, I propose this patch.

-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047
From c1e5b6fa577c1af0934a3f0735b7860305ca7500 Mon Sep 17 00:00:00 2001
From: Vladimir Rusinov 
Date: Wed, 14 Dec 2016 16:34:59 +
Subject: [PATCH 1/1] Remove trailing whitespaces from documentation.

They are considered bad practice in many style guides and many editors
configured to stip them on every save.

Such editors will produce spurious diffs when editing the documentation.

Signed-off-by: Vladimir Rusinov 
---
 doc/src/sgml/advanced.sgml |   8 +--
 doc/src/sgml/charset.sgml  |  12 ++--
 doc/src/sgml/config.sgml   |   2 +-
 doc/src/sgml/datatype.sgml |  30 -
 doc/src/sgml/dblink.sgml   |   6 +-
 doc/src/sgml/ddl.sgml  |   6 +-
 doc/src/sgml/func.sgml |  90 +--
 doc/src/sgml/hstore.sgml   |   8 +--
 doc/src/sgml/json.sgml |   6 +-
 doc/src/sgml/pageinspect.sgml  |  14 ++---
 doc/src/sgml/parallel.sgml |  24 
 doc/src/sgml/pgfreespacemap.sgml   |   4 +-
 doc/src/sgml/plperl.sgml   |   2 +-
 doc/src/sgml/plpython.sgml |   4 +-
 doc/src/sgml/ref/alter_table.sgml  |   2 +-
 doc/src/sgml/ref/create_index.sgml |   4 +-
 doc/src/sgml/ref/explain.sgml  |   4 +-
 doc/src/sgml/ref/grant.sgml|   4 +-
 doc/src/sgml/ref/insert.sgml   |   2 +-
 doc/src/sgml/ref/prepare.sgml  |   2 +-
 doc/src/sgml/ref/psql-ref.sgml |  14 ++---
 doc/src/sgml/ref/reindexdb.sgml|   2 +-
 doc/src/sgml/ref/rollback_to.sgml  |   4 +-
 doc/src/sgml/ref/select.sgml   |   2 +-
 doc/src/sgml/ref/set_role.sgml |   4 +-
 doc/src/sgml/ref/set_session_auth.sgml |   4 +-
 doc/src/sgml/ref/show.sgml |   2 +-
 doc/src/sgml/rules.sgml|   8 +--
 doc/src/sgml/sepgsql.sgml  |   2 +-
 doc/src/sgml/sql.sgml  |   2 +-
 doc/src/sgml/syntax.sgml   |  14 ++---
 doc/src/sgml/textsearch.sgml   | 108 -
 doc/src/sgml/xaggr.sgml|   4 +-
 doc/src/sgml/xfunc.sgml|  14 ++---
 34 files changed, 209 insertions(+), 209 deletions(-)

diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 1e45511..aab334d 100644
--- a/doc/src/sgml/advanced.sgml
+++ b/doc/src/sgml/advanced.sgml
@@ -344,7 +344,7 @@ SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM emps
 
 
 
-  depname  | empno | salary |  avg  
+  depname  | empno | salary |  avg
 ---+---++---
  develop   |11 |   5200 | 5020.
  develop   | 7 |   4200 | 5020.
@@ -395,7 +395,7 @@ FROM empsalary;
 
 
 
-  depname  | empno | salary | rank 
+  depname  | empno | salary | rank
 ---+---++--
  develop   | 8 |   6000 |1
  develop   |10 |   5200 |2
@@ -459,7 +459,7 @@ SELECT salary, sum(salary) OVER () FROM empsalary;
 
 
 
- salary |  sum  
+ salary |  sum
 +---
5200 | 47100
5000 | 47100
@@ -488,7 +488,7 @@ SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
 
 
 
- salary |  sum  
+ salary |  sum
 +---
3500 |  3500
3900 |  7400
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index f8c7ac3..db5be56 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1056,13 +1056,13 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE=
 
 $ psql -l
  List of databases
-   Name|  Owner   | Encoding  |  Collation  |Ctype|  Access Privileges  
+   Name|  Owner   | Encoding  |  Collation  |Ctype|  Access Privileges
 ---+--+---+-+-+-
- clocaledb | hlinnaka | SQL_ASCII | C   | C   | 
- englishdb | hlinnaka | UTF8  | en_GB.UTF8  | en_GB.UTF8  | 
- japanese  | hlinnaka | UTF8  | ja_JP.UTF8  | ja_JP.UTF8  | 
- korean| hlinnaka | EUC_KR| ko_KR.euckr | ko_KR.euckr | 
- postgres  | hlinnaka | UTF8  | fi_FI.UTF8  | fi_FI.UTF8  | 
+ clocaledb | hlinnaka | SQL_ASCII | C   | C   |
+ englishdb | hlinnaka | UTF8  | en_GB.UTF8  | en_GB.UTF8  |
+ japanese  | hlinnaka | UTF8  | ja_JP.UTF8  | ja_JP.UTF8  |
+ korean| hlinnaka | EUC_KR| ko_KR.euckr | ko_KR.euckr |
+ 

Re: [HACKERS] background sessions

2016-12-14 Thread Andrew Borodin
2016-12-14 20:45 GMT+05:00 Peter Eisentraut :
> On 12/11/16 5:38 AM, Andrew Borodin wrote:
>> 2. From my point of view the interface of the feature is the strong
>> point here (compared to pg_background). But it does not help
>> programmer to follow good practice: bgworker is a valuable and limited
>> resource, may be we could start session with something like
>> TryBeginSession()?
>
> What exactly would that do?
Return status (success\failure) and session object, if a function succeeded.

If there is max_connections exceeded, then (false,null).

I'm not sure whether this idiom is common for Python.

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages] [and 1 more messages]

2016-12-14 Thread Ian Jackson
Robert Haas writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on 
constraint violation [and 2 more messages]"):
> On Wed, Dec 14, 2016 at 9:26 AM, Kevin Grittner  wrote:
> > considered.  Essentially, the position Ian has been taking is that
> > PostgreSQL should provide  the guarantee of (2) above.  As far as I
> > can see, that would require using S2PL -- something the community
> > ripped out of PostgreSQL because of its horrible performance and
> > has refused to consider restoring (for good reason, IMO).
> 
> I'm not sure Ian is intentionally taking that position.  Not all of us
> are as familiar with the ramifications of every serializability
> behavior we may want as you are.

Indeed.  I think it's fair to say that I'm totally unfamiliar with the
ramifications.  You might also fairly characterise me as naive; I had
certainly made some assumptions which it seems are known (around here)
to be both false and difficult to make true.

Robert Haas writes ("Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on 
constraint violation [and 2 more messages]"):
>  For example, imagine a transaction that queries pg_stat_activity or
> pg_locks and then makes decisions based on the contents thereof.

I agree that such behviour is unreasonable and should be excluded from
consistency guarantees!  I don't think (even very naive) application
programmers would disagree.  From my point of those tables are `part
of the innards', and expecting transactional behaviour from them is
clearly too optimistic.  (I guess that should be made clear somewhere
near where these kind of system tables are mentioned in the docs.)


Let me try to summarise my understanding of what the developers think
they can and intend to promise, about SERIALIZABLE transactions:

 There is a consistent serialisation of all transactions which
 successfully commit; or which do not attempt to make any changes.

 A "consistent serialisation" means that there is an order in which
 the same transactions might have been executed giving exactly the
 answers.  This includes, if applicable, the same errors.  (The
 database is free to generate synchronisation failure errors 40P01 and
 40001 whenever it chooses.)

 A transaction which attempts to make any changes, and which does not
 commit (whether because the application never asks for COMMIT, or
 because of reported synchronisation failure) might see internally
 inconsistent data, or an internally-consistent view which is not
 compatible with any serialisation of other transactions.  An
 application which needs a coherent view should not rely on any of the
 information from such a transaction.

 "Transactions which do not attempt to make any changes" excludes any
 transactions whose subtransactions try to make changes.
 Serialisation failures in subtransactions might cause the parent
 transaction to experience a serialisation failure too.  "Try to make
 changes" includes even DML statements which are bound to fail.

Is that an accurate statement of the current thinking ?

Ian.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2016-12-14 Thread Vladimir Rusinov
On Wed, Dec 14, 2016 at 4:07 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/13/16 12:47 PM, Vladimir Rusinov wrote:
> > Based on discussion in
> > https://www.postgresql.org/message-id/CAE1wr-w%
> 3DLE1cK5uG_rmAh-VBxc4_Bnw-gAE3qSqL-%3DtWwvLvjQ%40mail.gmail.com.
> >
> > Tested via regression tests.
> > To be applied in master only and to be included in 10.0.
>
> I don't think the idea was to rename just one "xlog" function to "wal".
>

Others will follow later in separate patches. Or is it preferred to have
one huge patch submitted?

Personally, I think this is not important, but if you want to do it, I'd
> follow the suggestion in the thread to rename all functions and leave
> the old names as aliases or wrappers of some kind.
>

I think I agree with Michael Paquier: "Better to do breakages in a single
release rather than spreading them across releases.".
There's going to be a lot of broken scripts following pg_xlog rename, so I
think it makes sense to just drop functions as well. We don't maintain
pg_xlog -> pg_wal symlink, do we?

-- 
Vladimir Rusinov
Storage SRE, Google Ireland

Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland
Registered in Dublin, Ireland
Registration Number: 368047


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 5:35 AM, Ashutosh Sharma  wrote:
>> I have just read the patch, and hardcoding the array position for a
>> socket event to 0 assumes that the socket event will be *always* the
>> first one in the list. but that cannot be true all the time, any code
>> that does not register the socket event as the first one in the list
>> will likely break.
>
> I think your point is very valid and even i had similar thought in my
> mind when implementing it but as i mentioned upthread that's how it is
> being used in the current code base. Please check a function call to
> ModifyWaitEvent() in secure_read().

That's kind of irrelevant. A WaitEventSet is a general-purpose
primitive, and it needs to work in all situations, current and future,
where a reasonable developer would expect it to work.  Sure, pq_init()
puts the socket in FeBeWaitSet at position 0.  However, in
WaitLatchOrSocket, the socket event, if required, is added after the
latch and postmaster-death events.  And new code written using
CreateWaitEventSet() / AddWaitEventToSet() in the future could put a
socket at any offset in the event array, or could add multiple
sockets.  So Michael is right: you can't just stick a hack into
WaitEventSetWait that assumes the socket is at offset 0 even if that
happens to fix the problem we are facing right at this moment.

(I am not sure whether Michael's proposed placement is correct, but
I'm almost 100% sure the placement in the submitted patch is not.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 2:34 AM, Ashutosh Sharma  wrote:
>> Okay, but I think we need to re-enable the existing event handle for
>> required event (FD_READ) by using WSAEventSelect() to make it work
>> sanely.  We have tried something on those lines and it resolved the
>> problem.  Ashutosh will post a patch on those lines later today.  Let
>> us know if you have something else in mind.
>
> Attached is the patch based on Amit's explanation above. Please have a
> look and let me know for any concerns. Thank you Micheal and Andres
> for sharing your thoughts and Amit for your valuable inputs.

One immediate concern is that the patch has no comments.  Considering
this seems to be a pretty tricky issue, that seems like a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2016-12-14 Thread Peter Eisentraut
On 12/13/16 12:47 PM, Vladimir Rusinov wrote:
> Based on discussion in
> https://www.postgresql.org/message-id/CAE1wr-w%3DLE1cK5uG_rmAh-VBxc4_Bnw-gAE3qSqL-%3DtWwvLvjQ%40mail.gmail.com.
> 
> Tested via regression tests.
> To be applied in master only and to be included in 10.0.

I don't think the idea was to rename just one "xlog" function to "wal".

Personally, I think this is not important, but if you want to do it, I'd
follow the suggestion in the thread to rename all functions and leave
the old names as aliases or wrappers of some kind.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-14 Thread Bruce Momjian
On Fri, Dec  9, 2016 at 09:46:44AM +0900, Michael Paquier wrote:
> >> My own take on it is that the release notes are already a massive
> >> amount of work, and putting duplicative material in a bunch of other
> >> places isn't going to make things better, it'll just increase the
> >> maintenance burden.
> >
> > This would mean adding literally pages of material to the release notes.
> > In the past, folks have been very negative on anything which would make
> > the release notes longer.  Are you sure?
> 
> As that's a per-version information, that seems adapted to me. There
> could be as well in the release notes a link to the portion of the
> docs holding this manual. Definitely this should be self-contained in
> the docs, and not mention the wiki. My 2c.

Yes, that is the usual approach.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2016-12-14 Thread Dagfinn Ilmari Mannsåker
Kevin Grittner  writes:

> On Wed, Dec 14, 2016 at 8:16 AM, Dagfinn Ilmari Mannsåker
>  wrote:
>
>> Attached is a patch
>
> Please add this to the open CommitFest to ensure that it is reviewed.

Done.

https://commitfest.postgresql.org/12/910/

-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] background sessions

2016-12-14 Thread Peter Eisentraut
On 12/11/16 5:38 AM, Andrew Borodin wrote:
> 2. From my point of view the interface of the feature is the strong
> point here (compared to pg_background). But it does not help
> programmer to follow good practice: bgworker is a valuable and limited
> resource, may be we could start session with something like
> TryBeginSession()?

What exactly would that do?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2016-12-14 Thread Kevin Grittner
On Wed, Dec 14, 2016 at 8:16 AM, Dagfinn Ilmari Mannsåker
 wrote:

> Attached is a patch

Please add this to the open CommitFest to ensure that it is reviewed.

http://commitfest.postgresql.org/action/commitfest_view/open

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-14 Thread Kevin Grittner
On Wed, Dec 14, 2016 at 8:38 AM, Robert Haas  wrote:
> On Wed, Dec 14, 2016 at 9:26 AM, Kevin Grittner  wrote:

>> Predicate locks
>> from reads within subtransactions are not discarded, even if the
>> work of the subtransaction is otherwise discarded.
>
> Oh, interesting.  Just to be clear, I'm not lobbying to change that; I
> was guessing (very late at night) what decision you probably made and
> it seems I was incorrect.  But doesn't that imply that if a read fails
> in a subtransaction with serialization_error, the parent MUST also be
> killed with serialization_error?

To prevent serialization anomalies, a top level transaction which
gets a serialization error in a subtransaction must fail.  To
provide the best information for an application framework which
wants to make smart decisions about when to retry a transaction
from the start, it should fail with a serialization error.  I'm
starting to think that, in addition to the doomed flag mechanism,
we should perhaps (as you suggested earlier in this thread) not
allow the serialization failure exception to be caught.  Or require
that it be re-thrown?  I don't know.  Maybe if someone catches a
serialization failure error, they should just be told that they
can't complain about where, later in the transaction, it might be
re-thrown.  (And yes, that could be in a COMMIT, even for a
read-only transaction.)  It may be that catching a serialization
failure and throwing a *different* error could confuse the
application framework's retry logic; I think at that point we have
to give up and let that be.  There may be some circumstances where
there is a valid need to replace a serialization failure with some
other error that you *want* to have appear in front of an end user.

>> Fortunately, Thomas Munro took an interest in the problem as it
>> related to duplicates on primary keys, unique constraints, and
>> unique indexes, and put forward a patch that cured the defect in
>> the common cases, and provided an easy workaround for the one case
>> he was unable to fix in that initial patch.  To finish the work for
>> these constraints and indexes, I think we need to add predicate
>> locking while descending to the insertion point  during the check
>> for an existing duplicate.
>
> I suggest adding something about this to README-SSI as a known issue.

Good idea.  Will do.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2016-12-14 Thread Peter Eisentraut
On 12/5/16 12:17 AM, Michael Paquier wrote:
> OK, here is attached what I had in mind as reload-ssl-v08-02.patch for
> reference. This applies on top of the main patch
> reload-ssl-v08-01.patch that is the same version as v7 with the issues
> I reported previously as addressed. LoadedSSL is mapped with a
> read-only GUC parameter that new sessions can query after connecting.
> The only use case where that would be useful would be when using
> sslmode=prefer to check whether the SSL context is loaded even if ssl
> has been switched from off to on. But let's be honest, pg_stat_ssl
> reports already this kind of information, making this patch at the end
> useless because LoadedSSL does not change for an already-spawned
> backend.

Yeah, it seems that if you want to know whether you are using SSL, then
we already have that.  I don't see the need for this new read-only setting.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Stephen Frost
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 12/14/16 5:15 AM, Michael Paquier wrote:
> > I would be tempted to suggest adding the verifier type as a new column
> > of pg_authid
> 
> Yes please.

This discussion seems to continue to come up and I don't entirely
understand why we keep trying to shove more things into pg_authid, or
worse, into rolpassword.

We should have an independent table for the verifiers, which has a
different column for the verifier type, and either starts off supporting
multiple verifiers per role or at least gives us the ability to add that
easily later.  We should also move rolvaliduntil to that new table.

No, I am specifically *not* concerned with "backwards compatibility" of
that table- we continually add to it and change it and applications
which are so closely tied to PG that they look at pg_authid need to be
updated with nearly every release anyway.  What we *do* need to make
sure we get correct is what pg_dump/pg_upgrade do, but that's entirely
within our control to manage and shouldn't be that much of an issue to
implement.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Declarative partitioning - another take

2016-12-14 Thread Tomas Vondra

Hi Amit,

On 12/13/2016 09:45 AM, Amit Langote wrote:

On 2016/12/13 0:17, Tomas Vondra wrote:

On 12/12/2016 07:37 AM, Amit Langote wrote:


Hi Tomas,

On 2016/12/12 10:02, Tomas Vondra wrote:


2) I'm wondering whether having 'table' in the catalog name (and also in
the new relkind) is too limiting. I assume we'll have partitioned indexes
one day, for example - do we expect to use the same catalogs?


I am not sure I understand your idea of partitioned indexes, but I doubt
it would require entries in the catalog under consideration.  Could you
perhaps elaborate more?



OK, let me elaborate. Let's say we have a partitioned table, and I want to
create an index. The index may be either "global" i.e. creating a single
relation for data from all the partitions, or "local" (i.e. partitioned
the same way as the table).

Local indexes are easier to implement (it's essentially what we have now,
except that we need to create the indexes manually for each partition),
and don't work particularly well for some use cases (e.g. unique
constraints). This is what I mean by "partitioned indexes".

If the index is partitioned just like the table, we probably won't need to
copy the partition key info (so, nothing in pg_partitioned_table).
I'm not sure it makes sense to partition the index differently than the
table - I don't see a case where that would be useful.

The global indexes would work better for the unique constraint use case,
but it clearly contradicts our idea of TID (no information about which
partition that references).

So maybe the catalog really only needs to track info about tables? Not
sure. I'm just saying it'd be unfortunate to have _table in the name, and
end up using it for indexes too.


Hmm, I didn't quite think of the case where the index is partitioned
differently from the table, but perhaps that's possible with some other
databases.



I haven't thought about that very deeply either, so perhaps it's an 
entirely silly idea. Also, probably quite complex to implement I guess, 
so unlikely to be pursued soon.



What you describe as "local indexes" or "locally partitioned indexes" is
something I would like to see being pursued in the near term.  In that
case, we would allow defining indexes on the parent that are recursively
defined on the partitions and marked as inherited index, just like we have
inherited check constraints and NOT NULL constraints.  I have not studied
whether we could implement (globally) *unique* indexes with this scheme
though, wherein the index key is a superset of the partition key.



I think implementing UNIQUE constraint with local indexes is possible 
and possibly even fairly simple, but it likely requires SHARE lock on 
all partitions, which is not particularly nice.


When the partition key is referenced in the constraint, that may allow 
locking only a subset of the partitions, possibly even a single one. But 
with multi-level partitioning schemes that may be difficult.


Also, I don't think it's very likely to have the partitioning key as 
part of the unique constraint. For example 'users' table is unlikely to 
be distributed by 'login' and so on.


The  global indexes make this easier, because there's just a single 
index to check. But of course, attaching/detaching partitions gets more 
expensive.


Anyway, starting a detailed discussion about local/global indexes was 
not really what I meant to do.



Clearly, this is a consequence of building the partitioning on top of
inheritance (not objecting to that approach, merely stating a fact).

I'm fine with whatever makes the error messages more consistent, if it
does not make the code significantly more complex. It's a bit confusing
when some use 'child tables' and others 'partitions'. I suspect even a
single DML command may return a mix of those, depending on where exactly
it fails (old vs. new code).


So, we have mostly some old DDL (CREATE/ALTER TABLE) and maintenance
commands that understand inheritance.  All of the their error messages
apply to partitions as well, wherein they will be referred to as "child
tables" using old terms.  We now have some cases where the commands cause
additional error messages for only partitions because of additional
restrictions that apply to them.  We use "partitions" for them because
they are essentially new error messages.

There won't be a case where single DML command would mix the two terms,
because we do not allow mixing partitioning and regular inheritance.
Maybe I misunderstood you though.



Don't we call inheritance-related functions from the new DDL? In that 
case we'd fail with 'child tables' error messages in the old code, and 
'partitions' in the new code. I'd be surprised if there was no such code 
reuse, but I haven't checked.



Am I right that one of the ambitions of the new partitioning is to improve
behavior with large number of partitions?


Yes.  Currently, SELECT planning is O(n) with significantly large constant
factor.  It is possible now to make it O(log 

Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Peter Eisentraut
On 12/14/16 5:15 AM, Michael Paquier wrote:
> I would be tempted to suggest adding the verifier type as a new column
> of pg_authid

Yes please.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 9:40 AM, Kevin Grittner  wrote:
> On Wed, Dec 14, 2016 at 8:27 AM, Robert Haas  wrote:
>> But even after that fix, at the least, you'll still be able to
>> demonstrate the same problem by trapping serialization_failure rather
>> than unique_constraint.
>
> I hope not; the "doomed" flag associated with a serializable
> transaction should cause another attempt to cancel the transaction
> at every subsequent opportunity, including commit.  While we're
> digging into bugs in this area it wouldn't hurt (as I said in my
> prior post) to confirm that this is being handled everywhere it
> should be, but I'd be kinda surprised if it wasn't.

Oh, hmm.  So you're saying that if I begin a subtransaction, read some
data that causes a serialization anomaly, and then rollback the
subtransaction, my toplevel transaction is now doomed?  If so, then
isn't that a counterexample to my proposition that a read-only
transaction can't be cancelled at commit time?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-14 Thread Fujii Masao
On Tue, Dec 13, 2016 at 5:06 PM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 13 Dec 2016 08:46:06 +0530, Amit Kapila  
> wrote in 
>> On Mon, Dec 12, 2016 at 9:54 PM, Masahiko Sawada  
>> wrote:
>> > On Mon, Dec 12, 2016 at 9:52 PM, Fujii Masao  wrote:
>> >> On Mon, Dec 12, 2016 at 9:31 PM, Masahiko Sawada  
>> >> wrote:
>> >>> On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila  
>> >>> wrote:
>> 
>>  Few comments:
>> >>>
>> >>> Thank you for reviewing.
>> >>>
>>  + * In 10.0 we support two synchronization methods, priority and
>>  + * quorum. The number of synchronous standbys that transactions
>>  + * must wait for replies from and synchronization method are specified
>>  + * in synchronous_standby_names. This parameter also specifies a list
>>  + * of standby names, which determines the priority of each standby for
>>  + * being chosen as a synchronous standby. In priority method, the 
>>  standbys
>>  + * whose names appear earlier in the list are given higher priority
>>  + * and will be considered as synchronous. Other standby servers 
>>  appearing
>>  + * later in this list represent potential synchronous standbys. If any 
>>  of
>>  + * the current synchronous standbys disconnects for whatever reason,
>>  + * it will be replaced immediately with the next-highest-priority 
>>  standby.
>>  + * In quorum method, the all standbys appearing in the list are
>>  + * considered as a candidate for quorum commit.
>> 
>>  In the above description, is priority method represented by FIRST and
>>  quorum method by ANY in the synchronous_standby_names syntax?  If so,
>>  it might be better to write about it explicitly.
>> >>>
>> >>> Added description.
>> >>>
>>
>> + * specified in synchronous_standby_names. The priority method is
>> + * represented by FIRST, and the quorum method is represented by ANY
>>
>> Full stop is missing after "ANY".
>>
>> >>>
>>  6.
>>  + int sync_method; /* synchronization method */
>>    /* member_names contains nmembers consecutive nul-terminated C 
>>  strings */
>>    char member_names[FLEXIBLE_ARRAY_MEMBER];
>>   } SyncRepConfigData;
>> 
>>  Can't we use 1 or 2 bytes to store sync_method information?
>> >>>
>> >>> I changed it to uint8.
>> >>>
>>
>> + int8 sync_method; /* synchronization method */
>>
>> > I changed it to uint8.
>>
>> In mail, you have mentioned uint8, but in the code it is int8.  I
>> think you want to update code to use uint8.
>>
>>
>> >
>> >> +standby_list{ $$ =
>> >> create_syncrep_config("1", $1, SYNC_REP_PRIORITY); }
>> >> +| NUM '(' standby_list ')'{ $$ =
>> >> create_syncrep_config($1, $3, SYNC_REP_QUORUM); }
>> >> +| ANY NUM '(' standby_list ')'{ $$ =
>> >> create_syncrep_config($2, $4, SYNC_REP_QUORUM); }
>> >> +| FIRST NUM '(' standby_list ')'{ $$ =
>> >> create_syncrep_config($2, $4, SYNC_REP_PRIORITY); }
>> >>
>> >> Isn't this "partial" backward-compatibility (i.e., "NUM (list)" works
>> >> differently from curent version while "list" works in the same way as
>> >> current one) very confusing?
>> >>
>> >> I prefer to either of
>> >>
>> >> 1. break the backward-compatibility, i.e., treat the first syntax of
>> >> "standby_list" as quorum commit
>> >> 2. keep the backward-compatibility, i.e., treat the second syntax of
>> >> "NUM (standby_list)" as sync rep with the priority
>> >
>>
>> +1.
>>
>> > There were some comments when I proposed the quorum commit. If we do
>> > #1 it breaks the backward-compatibility with 9.5 or before as well. I
>> > don't think it's a good idea. On the other hand, if we do #2 then the
>> > behaviour of s_s_name is 'NUM (standby_list)' == 'FIRST NUM
>> > (standby_list)''. But it would not what most of user will want and
>> > would confuse the users of future version who will want to use the
>> > quorum commit. Since many hackers thought that the sensible default
>> > behaviour is 'NUM (standby_list)' == 'ANY NUM (standby_list)' the
>> > current patch chose to changes the behaviour of s_s_names and document
>> > that changes thoroughly.
>> >
>>
>> Your arguments are sensible, but I think we should address the point
>> of confusion raised by Fujii-san.  As a developer, I feel breaking
>> backward compatibility (go with Option-1 mentioned above) here is a
>> good move as it can avoid confusions in future.  However, I know many
>> a time users are so accustomed to the current usage that they feel
>> irritated with the change in behavior even it is for their betterment,
>> so it is advisable to do so only if it is necessary or we have
>> feedback from a couple of users.  So in this case, if we don't want to
>> go with 

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-14 Thread Kevin Grittner
On Wed, Dec 14, 2016 at 8:27 AM, Robert Haas  wrote:

> But even after that fix, at the least, you'll still be able to
> demonstrate the same problem by trapping serialization_failure rather
> than unique_constraint.

I hope not; the "doomed" flag associated with a serializable
transaction should cause another attempt to cancel the transaction
at every subsequent opportunity, including commit.  While we're
digging into bugs in this area it wouldn't hurt (as I said in my
prior post) to confirm that this is being handled everywhere it
should be, but I'd be kinda surprised if it wasn't.

> imagine a transaction that queries pg_stat_activity or
> pg_locks and then makes decisions based on the contents thereof.  That
> transaction is determined to behave different under concurrency than
> it does on an idle system, and even the ineluctable triumvirate of
> Kevin Grittner, Dan Ports, and Michael Cahill will not be able to
> prevent it from doing so.  That's not a bug.

OK, I'll agree that it may be theoretically possible to create some
sort of "side channel" for seeing data which subverts
serializability in some arcane way.  I would agree that's not a bug
any more than limited data that is unavoidably leaked through
security barriers is.  I don't think that subtransactions should
rise to that level, though.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-14 Thread Robert Haas
Thanks for the reply.

On Wed, Dec 14, 2016 at 9:26 AM, Kevin Grittner  wrote:
> considered.  Essentially, the position Ian has been taking is that
> PostgreSQL should provide  the guarantee of (2) above.  As far as I
> can see, that would require using S2PL -- something the community
> ripped out of PostgreSQL because of its horrible performance and
> has refused to consider restoring (for good reason, IMO).

I'm not sure Ian is intentionally taking that position.  Not all of us
are as familiar with the ramifications of every serializability
behavior we may want as you are.

> Huh, I had to think about that for a minute, but you are right
> about never rolling back a read-only transaction at commit time ...

Yeah, I had to think about it for about an hour ... and look at the
code and README-SSI.  So if you only had to think about it for a
minute, you win. :-)

>> if either transaction had executed before the other in a
>> serial schedule, the second transaction in the schedule would have had
>> to have seen (A, B') or (A', B) rather than (A, B), but that's not
>> what happened.  But what if each of T1 and T2 did the reads in a
>> subtransaction, rolled it back, and then did the write in the main
>> transaction and committed?  The database system has two options.
>> First, it could assume that the toplevel transaction may have relied
>> on the results of the aborted subtransaction.
>
> This is what we do in our implementation of SSI.  Predicate locks
> from reads within subtransactions are not discarded, even if the
> work of the subtransaction is otherwise discarded.

Oh, interesting.  Just to be clear, I'm not lobbying to change that; I
was guessing (very late at night) what decision you probably made and
it seems I was incorrect.  But doesn't that imply that if a read fails
in a subtransaction with serialization_error, the parent MUST also be
killed with serialization_error?  If not, then I don't see how you can
escape having results that, overall, are not serializable.

> What we missed is that, while we took action to try to ensure that
> a serialization failure could not be discarded, we didn't consider
> that a constraint violation exception which was preventing an
> anomaly *could* be discarded.  Effectively, this has escalated
> detection of serialization failures involving reads which are part
> of enforcing declarative constraints from the level of a feature
> request to cure a significant annoyance for those using them, to a
> bug fix necessary to prevent serialization anomalies.

Hmm.  I see.

> Fortunately, Thomas Munro took an interest in the problem as it
> related to duplicates on primary keys, unique constraints, and
> unique indexes, and put forward a patch that cured the defect in
> the common cases, and provided an easy workaround for the one case
> he was unable to fix in that initial patch.  To finish the work for
> these constraints and indexes, I think we need to add predicate
> locking while descending to the insertion point  during the check
> for an existing duplicate.

I suggest adding something about this to README-SSI as a known issue.

> I'm not sure about foreign key constraints and exclusion
> constraints.  I have neither seen a failure related to either of
> these, nor proven that there cannot be one.  Without having
> assessed the scope of the problems (if any) in those constraints,
> it's hard to say what needs to be done or how much work it is.

I think it's a natural result of implementing techniques from academic
research papers that there will sometimes be open research questions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-14 Thread Kevin Grittner
On Wed, Dec 14, 2016 at 12:44 AM, Robert Haas  wrote:
> On Tue, Dec 13, 2016 at 1:00 PM, Ian Jackson  
> wrote:

> Saying that a set of transactions are serializable is equivalent to
> the statement that there is some serial order of execution which would
> have produced results equivalent to the actual results.  That is,
> there must be at least one schedule (T1, ..., Tn) such that running
> the transactions one after another in that order would have produced
> the same results that were obtained by running them concurrently.
> Any transactions that roll back whether due to serialization anomalies
> or manual user intervention or any other reason are not part of the
> schedule, which considers only successfully committed transactions.
> The rolled-back transactions effectively didn't happen, and
> serializability as a concept makes no claim about the behavior of such
> transactions.  That's not a PostgreSQL thing: that's database theory.

Right.  That said, with S2PL's reliance on blocking, it can provide
certain guarantees that are not required by the SQL standard nor by
normal definitions of serializable transactions.
(1)  The effective order of execution will match commit order of
successfully committed transactions.  Both the SQL standard and the
most definitions of serializable transactions merely require that
there is *some* order of transactions which provides the same
effects as having run the transactions one at a time in that order.
(2)  Even transactions terminated to resolve deadlocks never show
serialization anomalies before being canceled.

S2PL was the most common technique for implementing serializable
transactions for a long time, and some have come to think that its
guarantees should always be provided.  The problem is that in most
common workloads S2PL performs far worse than some other
techniques, including the SSI technique used by PostgreSQL, even
when the need to discard results from failed transactions is
considered.  Essentially, the position Ian has been taking is that
PostgreSQL should provide  the guarantee of (2) above.  As far as I
can see, that would require using S2PL -- something the community
ripped out of PostgreSQL because of its horrible performance and
has refused to consider restoring (for good reason, IMO).

> However, in practice, the scenario that you mention should generally
> work fine in PostgreSQL, I think.  If T performed any writes, the
> rollback throws them away, so imagine removing the actual T from the
> mix and replacing it with a substitute version T' which performs the
> same reads but no writes and then tries to COMMIT where T tried to
> ROLLBACK.  T' will succeed, because we never roll back a read-only
> transaction at commit time.

Huh, I had to think about that for a minute, but you are right
about never rolling back a read-only transaction at commit time
(except possibly for a prepared transaction -- I would have to look
at that more closely).  The reason is that to have a serialization
failure under SSI we must have a "dangerous structure" (of a pivot
(Tpivot) with both a rw-dependency in (Tin) and a rw-dependency out
(Tout)) and Tout must have committed (and committed first). If
Tpivot is still active, we will cancel it, because Tin, if it were
to be canceled and perform an immediate retry, could hit the exact
same problem, and be canceled again based on conflicts with the
exact same other transactions.  We don't want to burn resources
repeating transactions with no progress made.  So the only time the
read-only transaction can be canceled is when Tout and Tpivot are
running concurrently, Tout commits, Tpivot develops a rw-dependency
to Tout (either before or after the commit of Tout), Tin starts
(after the commit of Tout), Tpivot commits before Tin develops a
rw-dependency with it (otherwise Tpivot would be the one canceled),
and then Tin develops a rw-dependency to Tpivot.  That dependency
should be recognized when it is developed, causing Tin to be
canceled -- so the commit of Tin should never get the serialization
failure.

> If it were to fail, it would have to fail *while performing one
> of the reads*, not later.

Yeah, that's the concise statement of what my lengthy explanation
above proves.  ;-)

> But imagine a hypothetical database system in which anomalies are
> never detected until commit time.  We somehow track the global
> must-happen-before graph and refuse the commit of any transaction
> which will create a cycle.

You have just described optimistic concurrency control (OCC), which
has been used, although not in any really popular DBMS systems I
can think of.  It certainly has enjoyed a resurgence in some ORMs,
though -- implemented outside the DBMS.

> Let's also suppose that this system uses
> snapshots to implement MVCC.  In such a system, read-only transactions
> will sometimes fail at commit time if they've seen a view of the world
> that is inconsistent with the only 

Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 9:05 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> I have not read any database literature on the interaction of
>> serializability with subtransactions.  This seems very thorny.
>> Suppose T1 reads A and B and updates A -> A' while concurrently T2
>> reads A and B and updates B -> B'.  This is obviously not
>> serializable; if either transaction had executed before the other in a
>> serial schedule, the second transaction in the schedule would have had
>> to have seen (A, B') or (A', B) rather than (A, B), but that's not
>> what happened.  But what if each of T1 and T2 did the reads in a
>> subtransaction, rolled it back, and then did the write in the main
>> transaction and committed?  The database system has two options.
>> First, it could assume that the toplevel transaction may have relied
>> on the results of the aborted subtransaction.  But if it does that,
>> then any serialization failure which afflicts a subtransaction must
>> necessarily also kill the main transaction, which seems pedantic and
>> unhelpful.  If you'd wanted the toplevel transaction to be killled,
>> you wouldn't have used a subtransaction, right?  Second, it could
>> assume that the toplevel transaction in no way relied on or used the
>> values obtained from the aborted subtransaction.  However, that
>> defeats the whole purpose of having subtransactions in the first
>> place.  What's the point of being able to start subtransactions if you
>> can't roll them back and then decide to do something else instead?  It
>> seems to me that what the database system should do is make that
>> second assumption, and what the user should do is understand that to
>> the degree that transactions depend on the results of aborted
>> subtransactions, there may be serialization anomalies that go
>> undetected.
>
> Actually, Ian's sample transaction is even more malicious, because the
> problem is not caused by reads within the aborted subtransaction -- the
> cached-in-app reads occured *before* the subtransaction started in the
> first place.  I think saving a count(*) across a possibly-failed
> insertion on the same table is wrong.  I can't blame the database for
> the behavior.

I don't see why it's wrong to cache a COUNT(*) across a
possibly-failed substransaction; instead, I would argue that the
problem is that by relying on the knowledge that the subtransaction
failed while inserting A as a justification for inserting B, it's
intrinsically doing something that it sensitive to concurrency.  I
understand that Ian --- and Kevin, and Thomas Munro --- would like
that subtransaction to fail with serialization_failure rather than a
unique_violation, and I previously argued that the change was fine but
shouldn't be back-patched since it might break things for existing
users.  But people are continuing to show up and say "hey, this is
broken", and now Kevin's chosen to back-patch, and I'm not going to
kick up a fuss about that.  After all, there's a growing number of
people complaining about the same thing and saying it's a bug, so I
guess it's a bug!

But even after that fix, at the least, you'll still be able to
demonstrate the same problem by trapping serialization_failure rather
than unique_constraint.  And even if you decree that trapping
serialization_failure is off the table, I bet there are other ways for
a supposedly-serializable transaction to sort of cheat, find out
enough information about what's going on in the system to adjust its
behavior, and then do something different based on that.  And if
that's possible and if a transaction finds a way to do it, then it's
going to allow results not equivalent to any serial schedule.  For
example, imagine a transaction that queries pg_stat_activity or
pg_locks and then makes decisions based on the contents thereof.  That
transaction is determined to behave different under concurrency than
it does on an idle system, and even the ineluctable triumvirate of
Kevin Grittner, Dan Ports, and Michael Cahill will not be able to
prevent it from doing so.  That's not a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2016-12-14 Thread Dagfinn Ilmari Mannsåker
Hi hackers,

I have a workload using SSI that causes a lot of tuple predicate to be
promoted to page locks.  This causes a lot of spurious serialisability
errors, since the promotion happens at once three tuples on a page are
locked, and the affected tables have 30-90 tuples per page.

PredicateLockPromotionThreshold() has the following comment:

 * TODO SSI: We should do something more intelligent about what the
 * thresholds are, either making it proportional to the number of
 * tuples in a page & pages in a relation, or at least making it a
 * GUC. 

Attached is a patch that does the "at least" part of this.

One thing I don't like about this patch is that if a user has increased
max_pred_locks_per_transaction, they need to set
max_pred_locks_per_relation to half of that to retain the current
behaviour, or they'll suddenly find themselves with a lot more relation
locks.  If it's possible to make a GUCs default value dependent on the
value of another, that could be a solution.  Otherwise, the page lock
threshold GUC could be changed to be expressed as a fraction of
max_pred_locks_per_transaction, to keep the current behaviour.


Cheers,

Ilmari

>From bb81a54ee6c9a4855f6aeb52b968d188f44b14ac Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 12 Dec 2016 17:57:33 +
Subject: [PATCH] Add GUCs for predicate lock promotion thresholds

This addresses part of the TODO comment for predicate lock promotion
threshold, by making them configurable.  The default values are the same
as what used to be hardcoded.
---
 doc/src/sgml/config.sgml  | 36 +++
 doc/src/sgml/mvcc.sgml|  4 ++-
 src/backend/storage/lmgr/predicate.c  | 18 +-
 src/backend/utils/misc/guc.c  | 22 
 src/backend/utils/misc/postgresql.conf.sample |  3 +++
 src/include/storage/predicate.h   |  2 ++
 6 files changed, 78 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0fc4e57d90..6e133ffebd 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7172,6 +7172,42 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  max_pred_locks_per_relation (integer)
+  
+   max_pred_locks_per_relation configuration parameter
+  
+  
+  
+   
+This controls how many pages of a single relation can be
+predicate-locked before the lock is promoted to covering the whole
+relation.  The default is 32.  In previous versions
+of PostgreSQL it used to be hard-coded to half
+of , and you might
+want to raise this value if you raise that.  This parameter can only
+be set at server start.
+   
+
+  
+ 
+
+ 
+  max_pred_locks_per_page (integer)
+  
+   max_pred_locks_per_page configuration parameter
+  
+  
+  
+   
+This controls how many rows on a single page can be predicate-locked
+before the lock is promoted to covering the whole page.  The default
+is 3.  This parameter can only be set at server start.
+   
+
+  
+ 
+
  

 
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 306def4a15..4652cdf094 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -765,7 +765,9 @@ ERROR:  could not serialize access due to read/write dependencies among transact
locks into a single relation-level predicate lock because the predicate
lock table is short of memory, an increase in the rate of serialization
failures may occur.  You can avoid this by increasing
-   .
+   ,
+and/or
+   .
   
  
  
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 24ed21b487..e73b2b417c 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -355,6 +355,12 @@ static SERIALIZABLEXACT *OldCommittedSxact;
 /* This configuration variable is used to set the predicate lock table size */
 int			max_predicate_locks_per_xact;		/* set by guc.c */
 
+/* This configuration variable is used to decide when to upgrade a page lock to a relation lock */
+int			max_predicate_locks_per_relation;	/* set by guc.c */
+
+/* This configuration variable is used to decide when to upgrade a row lock to a page lock */
+int			max_predicate_locks_per_page;		/* set by guc.c */
+
 /*
  * This provides a list of objects in order to track transactions
  * participating in predicate locking.  Entries in the list are fixed size,
@@ -2124,10 +2130,10 @@ DeleteChildTargetLocks(const PREDICATELOCKTARGETTAG *newtargettag)
  * descendants, e.g. both tuples and pages for a relation lock.
  *
  * TODO SSI: We should do something more intelligent about what the
- * thresholds are, either making it proportional to the 

Re: [HACKERS] Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)

2016-12-14 Thread Fujii Masao
On Tue, Dec 13, 2016 at 10:24 PM, Michael Paquier
 wrote:
> On Tue, Dec 13, 2016 at 12:49 PM, Fujii Masao  wrote:
>
> Thanks for the review.

Thanks for the updated version of the patch!

>> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> + errmsg("a backup is already starting")));
>> +}
>> +if (XLogCtl->Insert.exclusiveBackupState == 
>> EXCLUSIVE_BACKUP_STOPPING)
>> +{
>> +WALInsertLockRelease();
>> +ereport(ERROR,
>> +(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> + errmsg("a backup is currently stopping")));
>>
>> Isn't it better to use "an exclusive backup" explicitly rather than
>> "a backup" here?
>
> Yes. It would be better to not touch the original in-progress messages
> though.

On second thought, do users really want to distinguish those three errornous
states? I'm inclined to merge the checks for those three conditions into one,
that is,

if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_NONE)
{
WALInsertLockRelease();
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("a backup is already in progress"),

Also it may be better to handle the similar checks in pg_stop_backup,
in the same way.

>> You changed the code so that pg_stop_backup() removes backup_label before
>> it marks the exclusive-backup-state as not-in-progress. Could you tell me
>> why you did this change? It's better to explain it in the comment.
>
> That's actually mentioned in the comments :)
>
> This is to ensure that there cannot be any other pg_stop_backup() running
> in parallel and trying to remove the backup label file. The key point here
> is that the backup_label file is removed during EXCLUSIVE_BACKUP_STOPPING,
> that the removal of the backup_label file is kept last on purpose (that's
> what HEAD does anyway), and that we can rollback to an in-progress state
> in case of failure.

Okay.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation [and 2 more messages]

2016-12-14 Thread Alvaro Herrera
Robert Haas wrote:

> I have not read any database literature on the interaction of
> serializability with subtransactions.  This seems very thorny.
> Suppose T1 reads A and B and updates A -> A' while concurrently T2
> reads A and B and updates B -> B'.  This is obviously not
> serializable; if either transaction had executed before the other in a
> serial schedule, the second transaction in the schedule would have had
> to have seen (A, B') or (A', B) rather than (A, B), but that's not
> what happened.  But what if each of T1 and T2 did the reads in a
> subtransaction, rolled it back, and then did the write in the main
> transaction and committed?  The database system has two options.
> First, it could assume that the toplevel transaction may have relied
> on the results of the aborted subtransaction.  But if it does that,
> then any serialization failure which afflicts a subtransaction must
> necessarily also kill the main transaction, which seems pedantic and
> unhelpful.  If you'd wanted the toplevel transaction to be killled,
> you wouldn't have used a subtransaction, right?  Second, it could
> assume that the toplevel transaction in no way relied on or used the
> values obtained from the aborted subtransaction.  However, that
> defeats the whole purpose of having subtransactions in the first
> place.  What's the point of being able to start subtransactions if you
> can't roll them back and then decide to do something else instead?  It
> seems to me that what the database system should do is make that
> second assumption, and what the user should do is understand that to
> the degree that transactions depend on the results of aborted
> subtransactions, there may be serialization anomalies that go
> undetected.

Actually, Ian's sample transaction is even more malicious, because the
problem is not caused by reads within the aborted subtransaction -- the
cached-in-app reads occured *before* the subtransaction started in the
first place.  I think saving a count(*) across a possibly-failed
insertion on the same table is wrong.  I can't blame the database for
the behavior.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_catversion builtin function

2016-12-14 Thread Jesper Pedersen

On 12/14/2016 08:52 AM, Robert Haas wrote:

But I understand your concern, so "Rejected" is ok under

https://commitfest.postgresql.org/12/906/


I have a better reason for rejecting this patch: we already have this feature.

rhaas=# select catalog_version_no from pg_control_system();
 catalog_version_no

  201612081
(1 row)




Ah, perfect !

Thanks, Robert

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_catversion builtin function

2016-12-14 Thread Robert Haas
On Wed, Dec 14, 2016 at 8:32 AM, Jesper Pedersen
 wrote:
> On 12/13/2016 10:33 AM, Tom Lane wrote:
>> Jesper Pedersen  writes:
>>> Attached is a new builtin function that exposes the CATALOG_VERSION_NO
>>> constant under the pg_catversion() function, e.g.
>>
>> I'm pretty sure that we intentionally didn't expose that, reasoning that
>> users should only care about the user-visible version number.  What
>> exactly is the argument for exposing this?
>
> I'm using it to get the catalog version from a running instance in order to
> figure out if a dump/restore is needed for the next daily build -- instead
> of keeping the catversion.h file around for each installation, with script
> magic.
>
> Test databases are external to PostgreSQL's test suite, and one is quite
> big, so "cp" is faster than dump/restore :)
>
> But I understand your concern, so "Rejected" is ok under
>
> https://commitfest.postgresql.org/12/906/

I have a better reason for rejecting this patch: we already have this feature.

rhaas=# select catalog_version_no from pg_control_system();
 catalog_version_no

  201612081
(1 row)


Here's the commit:

commit dc7d70ea05deca9dfc6a25043d406b57cc8f6c30
Author: Joe Conway 
Date:   Sat Mar 5 11:10:19 2016 -0800

Expose control file data via SQL accessible functions.

Add four new SQL accessible functions: pg_control_system(),
pg_control_checkpoint(), pg_control_recovery(), and pg_control_init()
which expose a subset of the control file data.

Along the way move the code to read and validate the control file to
src/common, where it can be shared by the new backend functions
and the original pg_controldata frontend program.

Patch by me, significant input, testing, and review by Michael Paquier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_catversion builtin function

2016-12-14 Thread Jesper Pedersen

On 12/13/2016 10:33 AM, Tom Lane wrote:

Jesper Pedersen  writes:

Attached is a new builtin function that exposes the CATALOG_VERSION_NO
constant under the pg_catversion() function, e.g.


I'm pretty sure that we intentionally didn't expose that, reasoning that
users should only care about the user-visible version number.  What
exactly is the argument for exposing this?



I'm using it to get the catalog version from a running instance in order 
to figure out if a dump/restore is needed for the next daily build -- 
instead of keeping the catversion.h file around for each installation, 
with script magic.


Test databases are external to PostgreSQL's test suite, and one is quite 
big, so "cp" is faster than dump/restore :)


But I understand your concern, so "Rejected" is ok under

https://commitfest.postgresql.org/12/906/

Best regards,
 Jesper




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-14 Thread Ildar Musin

Hi,

On 13.12.2016 21:10, Robert Haas wrote:

On Tue, Dec 13, 2016 at 12:22 PM, Ildar Musin  wrote:

We've noticed that PartitionDispatch object is built on every INSERT query
and that it could create unnecessary overhead. Wouldn't it be better to keep
it in relcache?


You might be able to cache some of that data in the relcache, but List
*keystate is pointing to query-lifespan data, so you can't cache that.



Yes, you are right. I meant mostly the 'indexes' field. I've measured 
insert performance with perf in case when there are thousand partitions 
and it seems that 34% of the time it takes to run 
RelationGetPartitionDispatchInfo() which builds this indexes array. And 
the most of the time it spends on acquiring locks on all partitions 
which is unnecessary if we're inserting in just a single partition. 
Probably we could avoid this by moving at least indexes field into cache.


--
Ildar Musin
i.mu...@postgrespro.ru


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Heikki Linnakangas

On 12/14/2016 12:27 PM, Magnus Hagander wrote:

I would so like to just drop support for plain passwords completely :) But
there's a backwards compatibility issue to think about of course.

But -- is there any actual usecase for them anymore?


Hmm. At the moment, I don't think there is.

But, a password stored in plaintext works with either MD5 or SCRAM, or 
any future authentication mechanism. So as soon as we have SCRAM 
authentication, it becomes somewhat useful again.


In a nutshell:

auth / stored   MD5 SCRAM   plaintext
-
passwordY   Y   Y
md5 Y   N   Y
scram   N   Y   Y

If a password is stored in plaintext, it can be used with any 
authentication mechanism. And the plaintext 'password' authentication 
mechanism works with any kind of a stored password. But an MD5 hash 
cannot be used with SCRAM authentication, or vice versa.



I just noticed that the manual for CREATE ROLE says:


Note that older clients might lack support for the MD5 authentication
mechanism that is needed to work with passwords that are stored
encrypted.


That's is incorrect. The alternative to MD5 authentication is plain 
'password' authentication, and that works just fine with MD5-hashed 
passwords. I think that sentence is a leftover from when we still 
supported "crypt" authentication (so I actually get to blame you for 
that ;-), commit 53a5026b). Back then, it was true that if an MD5 hash 
was stored in pg_authid, you couldn't do "crypt" authentication. That 
might have left old clients out in the cold.


Now that we're getting SCRAM authentication, we'll need a similar notice 
there again, for the incompatibility of a SCRAM verifier with MDD5 
authentication and vice versa.




If not, another option could be to just specifically check that it's *not*
"md5" or "scram-:". That would invalidate
plaintext passwords that have those texts in them of course, but what's the
likelyhood of that in reality?


Hmm, we have dismissed that risk for the MD5 hashes (and we also have a 
length check for them), but as we get new hash formats, the risk 
increases. Someone might well want to use "plain:of:jars" as password. 
Perhaps we should use a more complicated pattern.


I googled around for how others store SCRAM and other password hashes. 
Many other systems seem to have similar naming schemes. The closest 
thing to a standard I could find was:


https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md

Perhaps we should also use something like "$plain$" or 
"$scram-sha-256"?


There's also https://tools.ietf.org/html/rfc5803, which specifies how to 
store SCRAM verifiers in LDAP. I don't understand enough of LDAP to 
understand what those actually look like, though, and there were no 
examples in the RFC.


I wonder if we should also worry about storing multiple verifiers in 
rolpassword? We don't support that now, but we might in the future. It 
might come handy, if you could easily store multiple hashes in a single 
string, separated by commas for example.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Ashutosh Sharma
Hi Micheal,,

> I have just read the patch, and hardcoding the array position for a
> socket event to 0 assumes that the socket event will be *always* the
> first one in the list. but that cannot be true all the time, any code
> that does not register the socket event as the first one in the list
> will likely break.

I think your point is very valid and even i had similar thought in my
mind when implementing it but as i mentioned upthread that's how it is
being used in the current code base. Please check a function call to
ModifyWaitEvent() in secure_read().


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Heikki Linnakangas

On 12/14/2016 12:15 PM, Michael Paquier wrote:

This work is definitely something that should be done before anything
else. Need a patch or are you on it?


I'm on it..

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Magnus Hagander
On Wed, Dec 14, 2016 at 9:51 AM, Heikki Linnakangas  wrote:

> On 12/09/2016 10:19 AM, Michael Paquier wrote:
>
>> On Fri, Dec 9, 2016 at 5:11 PM, Heikki Linnakangas 
>> wrote:
>>
>>> Couple of things I should write down before I forget:
>>>
>>> 1. It's a bit cumbersome that the scram verifiers stored in
>>> pg_authid.rolpassword don't have any clear indication that they're scram
>>> verifiers. MD5 hashes are readily identifiable by the "md5" prefix. I
>>> think
>>> we should use a "scram-sha-256:" for scram verifiers.
>>>
>>
>> scram-sha-256 would make the most sense to me.
>>
>> Actually, I think it'd be awfully nice to also prefix plaintext passwords
>>> with "plain:", but I'm not sure it's worth breaking the compatibility, if
>>> there are tools out there that peek into rolpassword. Thoughts?
>>>
>>
>> pgbouncer is the only thing coming up in mind. It looks at pg_shadow
>> for password values. pg_dump'ing data from pre-10 instances will also
>> need to adapt. I see tricky the compatibility with the exiting CREATE
>> USER PASSWORD command though, so I am wondering if that's worth the
>> complication.
>>
>> 2. It's currently not possible to use the plaintext "password"
>>> authentication method, for a user that has a SCRAM verifier in
>>> rolpassword.
>>> That seems like an oversight. We can't do MD5 authentication with a SCRAM
>>> verifier, but "password" we could.
>>>
>>
>> Yeah, that should be possible...
>>
>
> The tip of the work branch can now do SCRAM authentication, when a user
> has a plaintext password in pg_authid.rolpassword. The reverse doesn't
> work, however: you cannot do plain "password" authentication, when the user
> has a SCRAM verifier in pg_authid.rolpassword. It gets worse: plain
> "password" authentication doesn't check if the string stored in
> pg_authid.rolpassword is a SCRAM authenticator, and treats it as a
> plaintext password, so you can do this:
>
> PGPASSWORD="scram-sha-256:mDBuqO1mEekieg==:4096:17dc259499c1
> a184c26ee5b19715173d9354195f510b4d3af8be585acb39ae33:d3d7131
> 49c6becbbe56bae259aafe4e95b79ab7e3b50f2fbd850ea7d7b7c114f"  psql
> postgres  -h localhost -U scram_user
>
> I think we're going to have a more bugs like this, if we don't start to
> explicitly label plaintext passwords as such.
>
> So, let's add "plain:" prefix to plaintext passwords, in
> pg_authid.rolpassword. With that, these would be valid values in
> pg_authid.rolpassword:
>
> plain:foo
> md55a962ce7a24371a10e85627a484cac28
> scram-sha-256:mDBuqO1mEekieg==:4096:17dc259499c1a184c26ee5b1
> 9715173d9354195f510b4d3af8be585acb39ae33:d3d713149c6becbbe56
> bae259aafe4e95b79ab7e3b50f2fbd850ea7d7b7c114f
>

I would so like to just drop support for plain passwords completely :) But
there's a backwards compatibility issue to think about of course.

But -- is there any actual usecase for them anymore?

If not, another option could be to just specifically check that it's *not*
"md5" or "scram-:". That would invalidate
plaintext passwords that have those texts in them of course, but what's the
likelyhood of that in reality?

Though I guess that might at least in theory be more bug-prone, so going
with a "plain:" prefix seems like a good idea as well.



> But anything that doesn't begin with "plain:", "md5", or "scram-sha-256:"
> would be invalid. You shouldn't have invalid values in the column, but if
> you do, all the authentication mechanisms would reject it.
>
> It would be nice to also change the format of MD5 passwords to have a
> colon, as in "md5:", but that's probably not worth breaking
> compatibility for. Almost no-one stores passwords in plaintext, so changing
> the format of that wouldn't affect many people, but there might well be
> tools out there that peek into MD5 hashes.


There are definitely tools that do that, so +1 on leaving that alone.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Michael Paquier
On Wed, Dec 14, 2016 at 5:38 PM, Ashutosh Sharma  wrote:
>> What I think you want to do is modify the flag events associated to
>> the socket read/write event to be updated in WaitEventAdjustWin32(),
>
> Well, this does not work as the following if check does not allow the
> FD_READ or FD_WRITE flags to be applied on the already existing socket
> handle. We have already debugged and verified this.
>
>if (events == event->events &&
> (!(event->events & WL_LATCH_SET) || set->latch == latch))
> return;
>
>> which gets called in ModifyWaitEvent(). By the way, position 0 refers
>> to a socket for FeBeWaitSet, but that's a mandatory thing when a list
>> of events is created with AddWaitEventToSet.
>
> Yes, 0th position refers to event for socket and we will have to pass
> 0 as an argument to get socket events from FeBeWaitSet. You may see
> the usage of ModifyWaitEvent() in secure_read where we are passing the
> hard coded value (0) to get the socket events from the FeBeWaitSet.

I have just read the patch, and hardcoding the array position for a
socket event to 0 assumes that the socket event will be *always* the
first one in the list. but that cannot be true all the time, any code
that does not register the socket event as the first one in the list
will likely break.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Michael Paquier
On Wed, Dec 14, 2016 at 5:51 PM, Heikki Linnakangas  wrote:
> The tip of the work branch can now do SCRAM authentication, when a user has
> a plaintext password in pg_authid.rolpassword. The reverse doesn't work,
> however: you cannot do plain "password" authentication, when the user has a
> SCRAM verifier in pg_authid.rolpassword. It gets worse: plain "password"
> authentication doesn't check if the string stored in pg_authid.rolpassword
> is a SCRAM authenticator, and treats it as a plaintext password, so you can
> do this:
>
> PGPASSWORD="scram-sha-256:mDBuqO1mEekieg==:4096:17dc259499c1a184c26ee5b19715173d9354195f510b4d3af8be585acb39ae33:d3d713149c6becbbe56bae259aafe4e95b79ab7e3b50f2fbd850ea7d7b7c114f"
> psql postgres  -h localhost -U scram_user

This one's fun.

> I think we're going to have a more bugs like this, if we don't start to
> explicitly label plaintext passwords as such.
>
> So, let's add "plain:" prefix to plaintext passwords, in
> pg_authid.rolpassword. With that, these would be valid values in
> pg_authid.rolpassword:
>
> [...]
>
> But anything that doesn't begin with "plain:", "md5", or "scram-sha-256:"
> would be invalid. You shouldn't have invalid values in the column, but if
> you do, all the authentication mechanisms would reject it.

I would be tempted to suggest adding the verifier type as a new column
of pg_authid, but as CREATE USER PASSWORD accepts strings with md5
prefix as-is for ages using the "plain:" prefix is definitely a better
plan. My opinion on the matter has changed compared to a couple of
months back.

> It would be nice to also change the format of MD5 passwords to have a colon,
> as in "md5:", but that's probably not worth breaking compatibility
> for. Almost no-one stores passwords in plaintext, so changing the format of
> that wouldn't affect many people, but there might well be tools out there
> that peek into MD5 hashes.

Yes, let's not take this road.

This work is definitely something that should be done before anything
else. Need a patch or are you on it?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

2016-12-14 Thread Michael Paquier
On Wed, Dec 14, 2016 at 7:02 PM, Rahila Syed  wrote:
>>There is a similar code pattern for materialized views, see
>>create_ctas_nodata() where the attribute list is built
> create_ctas_nodata() is for creation of materialized views WITH NO DATA.
> For other materialized views and CREATE TABLE AS, column definitions are
> built in
> intorel_startup() function which has different code from that of CREATE VIEW
> which
> the patch deals with.
>
> Limiting the scope of the patch to include changing the type of literal
> constants
> to text only for plain views. Also, error out when column with UNKNOWN type
> is
> being created for other relations like tables and materialized views.

Matviews is the same way of thinking as views in terms of definition.
It is inconsistent to try to address a problem only partially if the
same behavior shows up in different code paths.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

2016-12-14 Thread Rahila Syed
Hello,

Thank you for comments.

>There is a similar code pattern for materialized views, see
>create_ctas_nodata() where the attribute list is built
create_ctas_nodata() is for creation of materialized views WITH NO DATA.
For other materialized views and CREATE TABLE AS, column definitions are
built in
intorel_startup() function which has different code from that of CREATE
VIEW which
the patch deals with.

Limiting the scope of the patch to include changing the type of literal
constants
to text only for plain views. Also, error out when column with UNKNOWN type
is
being created for other relations like tables and materialized views.

>And actually, shouldn't this be just a plain error?
Changed it to error in the attached patch.

>Your patch has no regression tests, surely you want some to stress
>this code path
Added regression tests in the attached patch.

Also adding this patch to CF 2017-01

Thank you,
Rahila Syed


unknown_view_column_to_text_v1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-12-14 Thread Amit Kapila
On Tue, Dec 13, 2016 at 11:30 PM, Robert Haas  wrote:
> On Mon, Dec 12, 2016 at 9:21 PM, Amit Kapila  wrote:
>> The reason is to make the operations consistent in master and standby.
>> In WAL patch, for clearing the SPLIT_CLEANUP flag, we write a WAL and
>> if we don't release the lock after writing a WAL, the operation can
>> appear on standby even before on master.   Currently, without WAL,
>> there is no benefit of doing so and we can fix by using
>> MarkBufferDirty, however, I thought it might be simpler to keep it
>> this way as this is required for enabling WAL.  OTOH, we can leave
>> that for WAL patch.  Let me know which way you prefer?
>
> It's not required for enabling WAL.  You don't *have* to release and
> reacquire the buffer lock; in fact, that just adds overhead.
>

If we don't release the lock, then it will break the general coding
pattern of writing WAL (Acquire pin and an exclusive lock,
Markbufferdirty, Write WAL, Release Lock).  Basically, I think it is
to ensure that we don't hold it for multiple atomic operations or in
this case avoid calling MarkBufferDirty multiple times.

> You *do*
> have to be aware that the standby will perhaps see the intermediate
> state, because it won't hold the lock throughout.  But that does not
> mean that the master must release the lock.
>

Okay, but I think that will be avoided to a great extent because we do
follow the practice of releasing the lock immediately after writing
the WAL.

>>> I don't think we should be afraid to call MarkBufferDirty() instead of
>>> going through these (fairly stupid) hasham-specific APIs.
>>
>> Right and anyway we need to use it at many more call sites when we
>> enable WAL for hash index.
>
> I propose the attached patch, which removes _hash_wrtbuf() and causes
> those functions which previously called it to do MarkBufferDirty()
> directly.
>


It is possible that we can MarkBufferDirty multiple times (twice in
hashbucketcleanup and then in _hash_squeezebucket) while holding the
lock.  I don't think that is a big problem as of now but wanted to
avoid it as I thought we need it for WAL patch.

>  Aside from hopefully fixing the bug we're talking about
> here, this makes the logic in several places noticeably less
> contorted.
>

Yeah, it will fix the problem in hashbucketcleanup, but there are two
other problems that need to be fixed for which I can send a separate
patch.  By the way, as mentioned to you earlier that WAL patch already
takes care of removing _hash_wrtbuf and simplified the logic wherever
possible without introducing the logic of MarkBufferDirty multiple
times under one lock.  However, if you want to proceed with the
current patch, then I can send you separate patches for the remaining
problems as addressed in bug fix patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)

2016-12-14 Thread Heikki Linnakangas

On 12/09/2016 10:19 AM, Michael Paquier wrote:

On Fri, Dec 9, 2016 at 5:11 PM, Heikki Linnakangas  wrote:

Couple of things I should write down before I forget:

1. It's a bit cumbersome that the scram verifiers stored in
pg_authid.rolpassword don't have any clear indication that they're scram
verifiers. MD5 hashes are readily identifiable by the "md5" prefix. I think
we should use a "scram-sha-256:" for scram verifiers.


scram-sha-256 would make the most sense to me.


Actually, I think it'd be awfully nice to also prefix plaintext passwords
with "plain:", but I'm not sure it's worth breaking the compatibility, if
there are tools out there that peek into rolpassword. Thoughts?


pgbouncer is the only thing coming up in mind. It looks at pg_shadow
for password values. pg_dump'ing data from pre-10 instances will also
need to adapt. I see tricky the compatibility with the exiting CREATE
USER PASSWORD command though, so I am wondering if that's worth the
complication.


2. It's currently not possible to use the plaintext "password"
authentication method, for a user that has a SCRAM verifier in rolpassword.
That seems like an oversight. We can't do MD5 authentication with a SCRAM
verifier, but "password" we could.


Yeah, that should be possible...


The tip of the work branch can now do SCRAM authentication, when a user 
has a plaintext password in pg_authid.rolpassword. The reverse doesn't 
work, however: you cannot do plain "password" authentication, when the 
user has a SCRAM verifier in pg_authid.rolpassword. It gets worse: plain 
"password" authentication doesn't check if the string stored in 
pg_authid.rolpassword is a SCRAM authenticator, and treats it as a 
plaintext password, so you can do this:


PGPASSWORD="scram-sha-256:mDBuqO1mEekieg==:4096:17dc259499c1a184c26ee5b19715173d9354195f510b4d3af8be585acb39ae33:d3d713149c6becbbe56bae259aafe4e95b79ab7e3b50f2fbd850ea7d7b7c114f" 
 psql postgres  -h localhost -U scram_user


I think we're going to have a more bugs like this, if we don't start to 
explicitly label plaintext passwords as such.


So, let's add "plain:" prefix to plaintext passwords, in 
pg_authid.rolpassword. With that, these would be valid values in 
pg_authid.rolpassword:


plain:foo
md55a962ce7a24371a10e85627a484cac28
scram-sha-256:mDBuqO1mEekieg==:4096:17dc259499c1a184c26ee5b19715173d9354195f510b4d3af8be585acb39ae33:d3d713149c6becbbe56bae259aafe4e95b79ab7e3b50f2fbd850ea7d7b7c114f

But anything that doesn't begin with "plain:", "md5", or 
"scram-sha-256:" would be invalid. You shouldn't have invalid values in 
the column, but if you do, all the authentication mechanisms would 
reject it.


It would be nice to also change the format of MD5 passwords to have a 
colon, as in "md5:", but that's probably not worth breaking 
compatibility for. Almost no-one stores passwords in plaintext, so 
changing the format of that wouldn't affect many people, but there might 
well be tools out there that peek into MD5 hashes.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-14 Thread Ashutosh Sharma
Hi Micheal,

> I bet that this patch breaks many things for any non-WIN32 platform.

It seems to me like you have already identified some issues when
testing. If yes, could please share it. I have tested my patch  on
CentOS-7 and Windows-7 machines and have found no issues. I ran all
the regression test suites including the test cases for pldebugger.

> What I think you want to do is modify the flag events associated to
> the socket read/write event to be updated in WaitEventAdjustWin32(),

Well, this does not work as the following if check does not allow the
FD_READ or FD_WRITE flags to be applied on the already existing socket
handle. We have already debugged and verified this.

   if (events == event->events &&
(!(event->events & WL_LATCH_SET) || set->latch == latch))
return;

> which gets called in ModifyWaitEvent(). By the way, position 0 refers
> to a socket for FeBeWaitSet, but that's a mandatory thing when a list
> of events is created with AddWaitEventToSet.

Yes, 0th position refers to event for socket and we will have to pass
0 as an argument to get socket events from FeBeWaitSet. You may see
the usage of ModifyWaitEvent() in secure_read where we are passing the
hard coded value (0) to get the socket events from the FeBeWaitSet.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers