Re: [HACKERS] `array_position...()` causes SIGSEGV

2016-12-08 Thread Michael Paquier
On Fri, Dec 9, 2016 at 3:14 PM, Junseok Yang  wrote:
> I met SIGSEGV when using `array_position()` with record type
> arguments, so I've written a patch which corrects this problem. It
> seems that `array_position...()` sets wrong memory context for the
> cached function (in this case `record_eq()`) which is used to find a
> matching element.
>
> The problem is reproducable with the following query.
>
> SELECT array_position(ids, (1, 1))
> FROM (VALUES (ARRAY[(0, 0)]), (ARRAY[(1, 1)])) AS _(ids);

Good catch. That's present since 13dbc7a8 and the introduction of
array_offset(), or array_position() on HEAD, so the patch should be
applied down to 9.5.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-08 Thread Michael Paquier
On Fri, Dec 9, 2016 at 3:23 PM, Michael Paquier
 wrote:
> This basically means that if the latch is set, we don't wait at all
> and drop the ball. I am wondering: isn't that a problem even if
> WL_LATCH_SET is *not* set? If I read this code correctly, even if
> caller has not set WL_LATCH_SET and the latch is set, then the wait
> will stop.

Nah. I misread the code. set->latch is not NULL only if WL_LATCH_SET is enabled.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-12-08 Thread Masahiko Sawada
On Fri, Dec 9, 2016 at 3:02 PM, vinayak  wrote:
> On 2016/12/05 14:42, Ashutosh Bapat wrote:
>>
>> On Mon, Dec 5, 2016 at 11:04 AM, Haribabu Kommi
>>  wrote:
>>
>>
>> On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada 
>> wrote:


 2PC is a basic building block to support the atomic commit and there
 are some optimizations way in order to reduce disadvantage of 2PC. As
 you mentioned, it's hard to support a single model that would suit
 several type of FDWs. But even if it's not a purpose for sharding,
 because many other database which could be connected to PostgreSQL via
 FDW supports 2PC, 2PC for FDW would be useful for not only sharding
 purpose. That's why I was focusing on implementing 2PC for FDW so far.
>>>
>>>
>>> Moved to next CF with "needs review" status.
>>
>> I think this should be changed to "returned with feedback.". The
>> design and approach itself needs to be discussed. I think, we should
>> let authors decide whether they want it to be added to the next
>> commitfest or not.
>>
>> When I first started with this work, Tom had suggested me to try to
>> make PREPARE and COMMIT/ROLLBACK PREPARED involving foreign servers or
>> at least postgres_fdw servers work. I think, most of my work that
>> Vinayak and Sawada have rebased to the latest master will be required
>> for getting what Tom suggested done. We wouldn't need a lot of changes
>> to that design. PREPARE involving foreign servers errors out right
>> now. If we start supporting prepared transactions involving foreign
>> servers that will be a good improvement over the current status-quo.
>> Once we get that done, we can continue working on the larger problem
>> of supporting ACID transactions involving foreign servers.
>
> In the pgconf ASIA depelopers meeting Bruce Momjian and other developers
> discussed
> on FDW based sharding [1]. The suggestions from other hackers was that we
> need to discuss
> the big picture and use cases of sharding. Bruce has listed all the building
> blocks of built-in sharding
> on wiki [2]. IIUC,transaction manager involving foreign servers is one part
> of sharding.

Yeah, the 2PC on FDW is a basic building block for FDW based sharding
and it would be useful not only FDW sharding but also other purposes.
As far as I surveyed some papers the many kinds of distributed
transaction management architectures use the 2PC for atomic commit
with some optimisations. And using 2PC to provide atomic commit on
distributed transaction has much affinity with current PostgreSQL
implementation from some perspective.

> As per the Bruce's wiki page there are two use cases for transactions
> involved multiple foreign servers:
> 1. Cross-node read-only queries on read/write shards:
> This will require a global snapshot manager to make sure the shards
> return consistent data.
> 2. Cross-node read-write queries:
> This will require a global snapshot manager and global transaction
> manager.
>
> I agree with you that if we start supporting PREPARE and COMMIT/ROLLBACK
> PREPARED
> involving foreign servers that will be good improvement.
>
> [1] https://wiki.postgresql.org/wiki/PgConf.Asia_2016_Developer_Meeting
> [2] https://wiki.postgresql.org/wiki/Built-in_Sharding
>

I also agree to work on implementing the atomic commit across the
foreign servers and then continue to work on the more larger problem.
I think that this will be large step forward. I'm going to submit the
updated version patch to CF3.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-08 Thread Michael Paquier
On Fri, Dec 9, 2016 at 2:38 AM, Ashutosh Sharma  wrote:
> Well, we are currently passing INFINITE timeout value to
> WaitForMultipleObjects() API which is hanging. I thought of changing this
> INIFINTE timeout interval to some finite value like say 1 sec and could not
> reproduce the issue. But, having said that i checked the older code where
> this issue does not exists and found here as well we are passing INFINTE
> timeout value to WaitForMultipleObjects(). So not sure if this passing
> INFINITE timeout is  really an issue. Attached is a small patch that has my
> changes.

That's obviously incorrect to me. This should not wait for a timeout
if the caller does not want to. So you are likely breaking a bunch of
code by doing so, including many extensions on Windows. The
pre-WaitEventSet code uses that:
-   if (wakeEvents & WL_TIMEOUT)
-   {
-   INSTR_TIME_SET_CURRENT(start_time);
-   Assert(timeout >= 0 && timeout <= INT_MAX);
-   cur_timeout = timeout;
-   }
-   else
-   cur_timeout = INFINITE;
INFINITE maps to -1 by looking at the MS docs, and that's as well what
the new code does so the inconsistency is not there. And the new code
does not bother about setting INFINITE and just uses -1.

I have tried to compile pldebugger with MSVC but gave up at the end,
so I am falling back to some code review for the time being. Andres
mentioned me that this Windows code was in need of an extra lookup.
And from what I can see, the logic that we have before 98a64d0 is a
set of handles using WaitForMultipleObjects with a pre-allocated
position in the handle array. The new code made things more generic by
allocating the events depending on what the user has set up.
pgwin32_signal_event is correctly using the first handle, and other
events registered correctly adapt to this position.

I have spent some time reviewing the code, and I think that I have
spotted one problem. The event set that the code you are triggering is
waiting for is FeBeWaitSet, which gets initialized in
pq_init()@pqcomm.c. 3 events are being set there. This goes through
CreateWaitEventSet, that sets up pgwin32_signal_event as first handle.
So far so good. Then a bunch of events are added with
AddWaitEventToSet. Which is fine as well as the handling of handles
with WSA_INVALID_EVENT looks correct, because its value is NULL and
there is a static assertion to check things.

One place in the code has spotted my attention:
+#elif defined(WAIT_USE_WIN32)
+   set->handles = (HANDLE) data;
+   data += sizeof(HANDLE) * nevents;
+#endif
This should be actually (nevents + 1) to take into account
pgwin32_signal_event. That's harmless now but it would if new fields
are added to WaitEventSet in the future.

A second thing that I am noticing is that in the new code:
+* Note: we assume that the kernel calls involved in latch management
+* will provide adequate synchronization on machines with weak memory
+* ordering, so that we cannot miss seeing is_set if a notification
+* has already been queued.
+*/
+   if (set->latch && set->latch->is_set)
+   {
+   occurred_events->fd = PGINVALID_SOCKET;
+   occurred_events->pos = set->latch_pos;
+   occurred_events->user_data =
+   set->events[set->latch_pos].user_data;
+   occurred_events->events = WL_LATCH_SET;
+   occurred_events++;
+   returned_events++;
+
+   break;
+   }
This basically means that if the latch is set, we don't wait at all
and drop the ball. I am wondering: isn't that a problem even if
WL_LATCH_SET is *not* set? If I read this code correctly, even if
caller has not set WL_LATCH_SET and the latch is set, then the wait
will stop.

Still reviewing the code...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Venkata B Nagothi
Hi,

I am testing the partitioning feature from the latest master and got the
following error while loading the data -

db01=# create table orders_y1993 PARTITION OF orders FOR VALUES FROM
('1993-01-01') TO ('1993-12-31');
CREATE TABLE

db01=# copy orders from '/data/orders-1993.csv' delimiter '|';
*ERROR:  could not read block 6060 in file "base/16384/16412": read only 0
of 8192 bytes*
*CONTEXT:  COPY orders, line 376589:
"9876391|374509|O|54847|1997-07-16|3-MEDIUM   |Clerk#01993|0|ithely
regular pack"*

Am i doing something wrong ?

Regards,

Venkata B N
Database Consultant


On Fri, Dec 9, 2016 at 3:58 PM, Amit Langote 
wrote:

> On 2016/12/09 0:25, Robert Haas wrote:
> > On Wed, Dec 7, 2016 at 11:42 PM, Michael Paquier
> >  wrote:
> >>> Congrats to everyone working on this! This is a large step forward.
> >>
> >> Congratulations to all! It was a long way to this result.
> >
> > Yes.  The last effort in this area which I can remember was by Itagaki
> > Takahiro in 2010, so we've been waiting for this for more than 6
> > years.  It's really good that Amit was able to put in the effort to
> > produce a committable patch, and I think he deserves all of our thanks
> > for getting that done - and NTT deserves our thanks for paying him to
> > do it.
> >
> > Even though I know he put in a lot more work than I did, let me just
> > say: phew, even reviewing that was a ton of work.
>
> Absolutely!  Your review comments and design suggestions have been
> instrumental in improving (and cutting down on the size of) the patches.
>
> > Of course, this is the beginning, not the end.
>
> +1000!
>
> > I've been thinking
> > about next steps -- here's an expanded list:
> >
> > - more efficient plan-time partition pruning (constraint exclusion is
> too slow)
> > - run-time partition pruning
> > - partition-wise join (Ashutosh Bapat is already working on this)
> > - try to reduce lock levels
> > - hash partitioning
> > - the ability to create an index on the parent and have all of the
> > children inherit it; this should work something like constraint
> > inheritance.  you could argue that this doesn't add any real new
> > capability but it's a huge usability feature.
> > - teaching autovacuum enough about inheritance hierarchies for it to
> > update the parent statistics when they get stale despite the lack of
> > any actual inserts/updates/deletes to the parent.  this has been
> > pending for a long time, but it's only going to get more important
> > - row movement (aka avoiding the need for an ON UPDATE trigger on each
> > partition)
> > - insert (and eventually update) tuple routing for foreign partitions
> > - not scanning the parent
> > - fixing the insert routing so that we can skip tuple conversion where
> possible
> > - fleshing out the documentation
>
> I would definitely want to contribute to some of these items.  It's great
> that many others plan to contribute toward this as well.
>
> > One thing I'm wondering is whether we can optimize away some of the
> > heavyweight locks.  For example, if somebody does SELECT * FROM ptab
> > WHERE id = 1, they really shouldn't need to lock the entire
> > partitioning hierarchy, but right now they do.  If the root knows
> > based on its own partitioning key that only one child is relevant, it
> > would be good to lock *only that child*.  For this feature to be
> > competitive, it needs to scale to at least a few thousand partitions,
> > and locking thousands of objects instead of one or two is bound to be
> > slow.  Similarly, you can imagine teaching COPY to lock partitions
> > only on demand; if no tuples are routed to a particular partition, we
> > don't need to lock it.  There's a manageability component here, too:
> > not locking partitions unnecessarily makes ti easier to get DDL on
> > other partitions through.  Alternatively, maybe we could rewrite the
> > lock manager to be hierarchical, so that you can take a single lock
> > that represents an AccessShareLock on all partitions and only need to
> > make one entry in the lock table to do it.  That means that attempts
> > to lock individual partitions need to check not only for a lock on
> > that partition but also on anything further up in the hierarchy, but
> > that might be a good trade if it gives us O(1) locking on the parent.
> > And maybe we could also have a level of the hierarchy that represents
> > every-table-in-the-database, for the benefit of pg_dump.  Of course,
> > rewriting the lock manager is a big project not for the faint of
> > heart, but I think if we don't it's going to be a scaling bottleneck.
>
> Hierarchical lock manager stuff is interesting.  Are you perhaps alluding
> to a new *intention* lock mode as described in the literature on multiple
> granularity locking [1]?
>
> > We also need to consider other parts of the system that may not scale,
> > like pg_dump.  For a long time, we've been sorta-kinda willing to fix
> > the 

[HACKERS] `array_position...()` causes SIGSEGV

2016-12-08 Thread Junseok Yang
Hello hackers,

I met SIGSEGV when using `array_position()` with record type
arguments, so I've written a patch which corrects this problem. It
seems that `array_position...()` sets wrong memory context for the
cached function (in this case `record_eq()`) which is used to find a
matching element.

The problem is reproducable with the following query.

SELECT array_position(ids, (1, 1))
FROM (VALUES (ARRAY[(0, 0)]), (ARRAY[(1, 1)])) AS _(ids);
From 8868ae0050ec382bc1bae2b993742eb2a40bbb14 Mon Sep 17 00:00:00 2001
From: Junseok Yang 
Date: Thu, 8 Dec 2016 18:25:21 -0800
Subject: [PATCH] Fix memory context bugs in `array_position...()`

When `array_position()` is called with record type arguments, it uses
`record_eq()` to find a matching element. Before calling `record_eq()`,
it stores `FmgrInfo` of `record_eq()` to its `FmgrInfo` as extra data
in the context of `ecxt_per_query_memory`. However, it sets the context
of extra data for `FmgrInfo` of `record_eq()` to `CurrentMemoryContext`
which is `ecxt_per_tuple_memory`. And `record_eq()` also stores extra
data in the context set by `array_position()`. In this scenario, if
`array_position()` is called more than twice over tuples in a query,
the process for this session will be terminated with SIGSEGV because
the extra data for `record_eq()` should be already freed after the
first tuple was processed.

`array_positions()` has the same issue.
---
 src/backend/utils/adt/array_userfuncs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 8d6fa41..9eb678a 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -795,7 +795,8 @@ array_position_common(FunctionCallInfo fcinfo)
 	   format_type_be(element_type;
 
 		my_extra->element_type = element_type;
-		fmgr_info(typentry->eq_opr_finfo.fn_oid, _extra->proc);
+		fmgr_info_cxt(typentry->eq_opr_finfo.fn_oid, _extra->proc,
+	  fcinfo->flinfo->fn_mcxt);
 	}
 
 	/* Examine each array element until we find a match. */
@@ -933,7 +934,8 @@ array_positions(PG_FUNCTION_ARGS)
 	   format_type_be(element_type;
 
 		my_extra->element_type = element_type;
-		fmgr_info(typentry->eq_opr_finfo.fn_oid, _extra->proc);
+		fmgr_info_cxt(typentry->eq_opr_finfo.fn_oid, _extra->proc,
+	  fcinfo->flinfo->fn_mcxt);
 	}
 
 	/*
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-12-08 Thread vinayak

On 2016/12/05 14:42, Ashutosh Bapat wrote:

On Mon, Dec 5, 2016 at 11:04 AM, Haribabu Kommi
 wrote:


On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada 
wrote:


2PC is a basic building block to support the atomic commit and there
are some optimizations way in order to reduce disadvantage of 2PC. As
you mentioned, it's hard to support a single model that would suit
several type of FDWs. But even if it's not a purpose for sharding,
because many other database which could be connected to PostgreSQL via
FDW supports 2PC, 2PC for FDW would be useful for not only sharding
purpose. That's why I was focusing on implementing 2PC for FDW so far.


Moved to next CF with "needs review" status.

I think this should be changed to "returned with feedback.". The
design and approach itself needs to be discussed. I think, we should
let authors decide whether they want it to be added to the next
commitfest or not.

When I first started with this work, Tom had suggested me to try to
make PREPARE and COMMIT/ROLLBACK PREPARED involving foreign servers or
at least postgres_fdw servers work. I think, most of my work that
Vinayak and Sawada have rebased to the latest master will be required
for getting what Tom suggested done. We wouldn't need a lot of changes
to that design. PREPARE involving foreign servers errors out right
now. If we start supporting prepared transactions involving foreign
servers that will be a good improvement over the current status-quo.
Once we get that done, we can continue working on the larger problem
of supporting ACID transactions involving foreign servers.
In the pgconf ASIA depelopers meeting Bruce Momjian and other developers 
discussed
on FDW based sharding [1]. The suggestions from other hackers was that 
we need to discuss
the big picture and use cases of sharding. Bruce has listed all the 
building blocks of built-in sharding
on wiki [2]. IIUC,transaction manager involving foreign servers is one 
part of sharding.
As per the Bruce's wiki page there are two use cases for transactions 
involved multiple foreign servers:

1. Cross-node read-only queries on read/write shards:
This will require a global snapshot manager to make sure the shards 
return consistent data.

2. Cross-node read-write queries:
This will require a global snapshot manager and global transaction 
manager.


I agree with you that if we start supporting PREPARE and COMMIT/ROLLBACK 
PREPARED

involving foreign servers that will be good improvement.

[1] https://wiki.postgresql.org/wiki/PgConf.Asia_2016_Developer_Meeting
[2] https://wiki.postgresql.org/wiki/Built-in_Sharding

Regards,
Vinayak Pokale
NTT Opern Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Amit Langote
On 2016/12/09 0:25, Robert Haas wrote:
> On Wed, Dec 7, 2016 at 11:42 PM, Michael Paquier
>  wrote:
>>> Congrats to everyone working on this! This is a large step forward.
>>
>> Congratulations to all! It was a long way to this result.
> 
> Yes.  The last effort in this area which I can remember was by Itagaki
> Takahiro in 2010, so we've been waiting for this for more than 6
> years.  It's really good that Amit was able to put in the effort to
> produce a committable patch, and I think he deserves all of our thanks
> for getting that done - and NTT deserves our thanks for paying him to
> do it.
> 
> Even though I know he put in a lot more work than I did, let me just
> say: phew, even reviewing that was a ton of work.

Absolutely!  Your review comments and design suggestions have been
instrumental in improving (and cutting down on the size of) the patches.

> Of course, this is the beginning, not the end.

+1000!

> I've been thinking
> about next steps -- here's an expanded list:
> 
> - more efficient plan-time partition pruning (constraint exclusion is too 
> slow)
> - run-time partition pruning
> - partition-wise join (Ashutosh Bapat is already working on this)
> - try to reduce lock levels
> - hash partitioning
> - the ability to create an index on the parent and have all of the
> children inherit it; this should work something like constraint
> inheritance.  you could argue that this doesn't add any real new
> capability but it's a huge usability feature.
> - teaching autovacuum enough about inheritance hierarchies for it to
> update the parent statistics when they get stale despite the lack of
> any actual inserts/updates/deletes to the parent.  this has been
> pending for a long time, but it's only going to get more important
> - row movement (aka avoiding the need for an ON UPDATE trigger on each
> partition)
> - insert (and eventually update) tuple routing for foreign partitions
> - not scanning the parent
> - fixing the insert routing so that we can skip tuple conversion where 
> possible
> - fleshing out the documentation

I would definitely want to contribute to some of these items.  It's great
that many others plan to contribute toward this as well.

> One thing I'm wondering is whether we can optimize away some of the
> heavyweight locks.  For example, if somebody does SELECT * FROM ptab
> WHERE id = 1, they really shouldn't need to lock the entire
> partitioning hierarchy, but right now they do.  If the root knows
> based on its own partitioning key that only one child is relevant, it
> would be good to lock *only that child*.  For this feature to be
> competitive, it needs to scale to at least a few thousand partitions,
> and locking thousands of objects instead of one or two is bound to be
> slow.  Similarly, you can imagine teaching COPY to lock partitions
> only on demand; if no tuples are routed to a particular partition, we
> don't need to lock it.  There's a manageability component here, too:
> not locking partitions unnecessarily makes ti easier to get DDL on
> other partitions through.  Alternatively, maybe we could rewrite the
> lock manager to be hierarchical, so that you can take a single lock
> that represents an AccessShareLock on all partitions and only need to
> make one entry in the lock table to do it.  That means that attempts
> to lock individual partitions need to check not only for a lock on
> that partition but also on anything further up in the hierarchy, but
> that might be a good trade if it gives us O(1) locking on the parent.
> And maybe we could also have a level of the hierarchy that represents
> every-table-in-the-database, for the benefit of pg_dump.  Of course,
> rewriting the lock manager is a big project not for the faint of
> heart, but I think if we don't it's going to be a scaling bottleneck.

Hierarchical lock manager stuff is interesting.  Are you perhaps alluding
to a new *intention* lock mode as described in the literature on multiple
granularity locking [1]?

> We also need to consider other parts of the system that may not scale,
> like pg_dump.  For a long time, we've been sorta-kinda willing to fix
> the worst of the scalability problems with pg_dump, but that's really
> no longer an adequate response.  People want 1000 partitions.  Heck,
> people want 1,000,000 partitions, but getting to where 1000 partitions
> works well would help PostgreSQL a lot.  Our oft-repeated line that
> inheritance isn't designed for large numbers of inheritance children
> is basically just telling people who have the use case where they need
> that to go use some other product.  Partitioning, like replication, is
> not an optional feature for a world-class database.  And, from a
> technical point of view, I think we've now got an infrastructure that
> really should be able to be scaled up considerably higher than what
> we've been able to do in the past.  When we were stuck with
> inheritance + constraint exclusion, we could say "well, 

Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-12-08 Thread Michael Paquier
On Thu, Dec 8, 2016 at 10:05 PM, Michael Paquier
 wrote:
> On Thu, Dec 8, 2016 at 5:55 PM, Heikki Linnakangas  wrote:
>> On 12/08/2016 10:18 AM, Michael Paquier wrote:
>>> Hmmm. How do we handle the case where the user name does not match
>>> then? The spec gives an error message e= specifically for this case.
>>
>> Hmm, interesting. I wonder how/when they imagine that error message to be
>> used. I suppose you could send a dummy server-first message, with a made-up
>> salt and iteration count, if the user is not found, so that you can report
>> that in the server-final message. But that seems unnecessarily complicated,
>> compared to just sending the error immediately. I could imagine using a
>> dummy server-first message to hide whether the user exists, but that
>> argument doesn't hold water if you're going to report an "unknown-user"
>> error, anyway.
>
> Using directly an error message would map with MD5 and plain, but
> that's definitely a new protocol piece so I'd rather think that using
> e= once the client has sent its first message in the exchange should
> be answered with an appropriate SASL error...
>
>> Actually, we don't give away that information currently. If you try to log
>> in with password or MD5 authentication, and the user doesn't exist, you get
>> the same error as with an incorrect password. So, I think we do need to give
>> the client a made-up salt and iteration count in that case, to hide the fact
>> that the user doesn't exist. Furthermore, you can't just generate random
>> salt and iteration count, because then you could simply try connecting
>> twice, and see if you get the same salt and iteration count. We need to
>> deterministically derive the salt from the username, so that you get the
>> same salt/iteration count every time you try connecting with that username.
>> But it needs indistinguishable from a random salt, to the client. Perhaps a
>> SHA hash of the username and some per-cluster secret value, created by
>> initdb. There must be research papers out there on how to do this..
>
> A simple idea would be to use the system ID when generating this fake
> salt? That's generated by initdb, once per cluster. I am wondering if
> it would be risky to use it for the salt. For the number of iterations
> the default number could be used.

I have been thinking more about this part quite a bit, and here is the
most simple thing that we could do while respecting the protocol.
That's more or less what I think you have in mind by re-reading
upthread, but it does not hurt to rewrite the whole flow to be clear:
1) Server gets the startup packet, maps pg_hba.conf and moves on to
the scram authentication code path.
2) Server sends back sendAuthRequest() to request user to provide a
password. This maps to the plain/md5 behavior as no errors would be
issued to user until he has provided a password.
3) Client sends back the password, and the first message with the user name.
4) Server receives it, and checks the data. If a failure happens at
this stage, just ERROR on PG-side without sending back a e= message.
This includes the username-mismatch, empty password and end of
password validity. So we would never use e=unknown-user. This sticks
with what you quoted upthread that the server may end the exchange
before sending the final message.
5) Server sends back the challenge, and client answers back with its
reply to it.

Then enters the final stage of the exchange, at which point the server
would issue its final message that would be e= in case of errors. If
something like an OOM happens, no message would be sent so failing on
an OOM ERROR on PG side would be fine as well.

6) Read final message from client and validate.
7) issue final message of server.

On failure at steps 6) or 7), an e= message is returned instead of the
final message. Does that look right?

One thing is: when do we look up at pg_authid? After receiving the
first message from client or before beginning the exchange? As the
first message from client has the user name, it would make sense to do
the lookup after receiving it, but from PG prospective it would just
make sense to use the data already present in the startup packet. The
current patch does the latter. What do you think?

By the way, I have pushed the extra patches you sent into this branch:
https://github.com/michaelpq/postgres/tree/scram
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Amit Langote

Hi Stephen,

On 2016/12/08 22:35, Stephen Frost wrote:
>>> * The fact that there's no implementation of row movement should be
>>> documented as a limitation.  We should also look at removing that
>>> limitation.
>>
>> Yes, something to improve.  By the way, since we currently mention INSERT
>> tuple-routing directly in the description of the partitioned tables in the
>> CREATE TABLE command reference, is that also the place to list this
>> particular limitation?  Or is UPDATE command reference rather the correct
>> place?
> 
> Both.

Attached a documentation fix patch.

Actually, there was no mention on the INSERT reference page of
tuple-routing occurring in case of partitioned tables and also the
possibility of an error if a *partition* is directly targeted in an
INSERT. Mentioned that as well.

Thanks,
Amit
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 8bf8af302b..01abe71f84 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -275,7 +275,8 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
  
   Rows inserted into a partitioned table will be automatically routed to
   the correct partition.  If no suitable partition exists, an error will
-  occur.
+  occur.  Also, if updating a row in a given partition causes it to move
+  to another partition due to the new partition key, an error will occur.
  
 
  
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 06f416039b..00c984d8d5 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -526,6 +526,17 @@ INSERT oid count
  
+ 
+ 
+  Notes
+
+  
+   If the specified table is a partitioned table, each row is routed to
+   the appropriate partition and inserted into it.  If the specified table
+   is a partition, an error will occur if one of the input rows violates
+   the partition constraint.
+  
+ 
 
  
   Examples
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 2de0f4aad1..e86993b9cf 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -279,6 +279,13 @@ UPDATE count
sub-selects is safer, though often harder to read and slower than
using a join.
   
+
+  
+   In case of partitioned tables, updating a row might cause it to move
+   to a new partition due to the new partition key.  An error will occur
+   in this case.  Also, if the specified table is a partition, an error
+   will occur if the new row violates the partition constraint.
+  
  
 
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Tsunakawa, Takayuki
From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> On 2016/12/09 10:09, Tsunakawa, Takayuki wrote:
> > Another requirement was subpartitioning.  Will this be possible with the
> current infrastructure, or does this need drastic change?
> 
> It does support sub-partitioning, although the syntax is a bit different.

Super great!  I'm excited to try the feature when I have time.  I hope I can 
contribute to the quality by find any bug.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto compilation error due to stack-allocated EVP_CIPHER_CTX

2016-12-08 Thread Michael Paquier
On Fri, Dec 9, 2016 at 1:11 AM, Asif Naeem  wrote:
> It make sense. I would like to share more comments as following i.e.
>
>> static int
>> bf_check_supported_key_len(void)
>> {
>> ...
>>  /* encrypt with 448bits key and verify output */
>>  evp_ctx = EVP_CIPHER_CTX_new();
>>  if (!evp_ctx)
>>   return 1;
>>  if (!EVP_EncryptInit_ex(evp_ctx, EVP_bf_ecb(), NULL, NULL, NULL))
>>   goto leave;
>>  if (!EVP_CIPHER_CTX_set_key_length(evp_ctx, 56))
>>   goto leave;
>>  if (!EVP_EncryptInit_ex(evp_ctx, NULL, NULL, key, NULL))
>>   goto leave;
>>  if (!EVP_EncryptUpdate(evp_ctx, out, , data, 8))
>>   goto leave;
>>  if (memcmp(out, res, 8) != 0)
>>   goto leave;/* Output does not match ->
>> strong cipher is
>>  * not supported */
>>  status = 1;
>> leave:
>>  EVP_CIPHER_CTX_free(evp_ctx);
>>  return status;
>> }
>
>
> It seems that it need to return 0 instead of 1 in case of failure i.e.

Yep that's wrong. Thanks for pointing that out.

>>  /* encrypt with 448bits key and verify output */
>>  evp_ctx = EVP_CIPHER_CTX_new();
>>  if (!evp_ctx)
>>   return 0;
>
> We can avoid multiple if conditions and goto statement something like i.e.
>
>>  if (EVP_EncryptInit_ex(evp_ctx, EVP_bf_ecb(), NULL, NULL, NULL) &&
>>  EVP_CIPHER_CTX_set_key_length(evp_ctx, 56) &&
>>  EVP_EncryptInit_ex(evp_ctx, NULL, NULL, key, NULL) &&
>>  EVP_EncryptUpdate(evp_ctx, out, , data, 8) &&
>>  memcmp(out, res, 8) == 0 )) /* Output does not match -> strong
>> cipher is not supported */
>>  status = 1;
>>  EVP_CIPHER_CTX_free(evp_ctx);
>>  return status;
>> }

I thought about doing that as well to be honest :) One way or the
other is fine, still I recall seeing more the style of the current
patch in PG code though, and that sticks better with current HEAD. But
my impressions may be wrong.

> What is your opinion ?. I am hopeful I will be able to share all my findings
> tomorrow. Thanks.

Thanks for looking at the patch. Looking forward to hearing more!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Amit Langote
On 2016/12/09 10:09, Tsunakawa, Takayuki wrote:
> Another requirement was subpartitioning.  Will this be possible with the 
> current infrastructure, or does this need drastic change?

It does support sub-partitioning, although the syntax is a bit different.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Alexander
> Korotkov
> Yes. Getting at least some of this features committed to v10 would be great
> and improve partitioning usability a lot.

I'm sorry for not contributing to the real partitioning feature, but I'm really 
looking forward to seeing the efficient plan-time and run-time partition 
pruning implemented in v10.  Recently, we failed to acquire a customer because 
they could not achieve their performance goal due to the slow partition pruning 
compared to Oracle.  The batch app prepares a SELECT statement against a 
partitioned table, then executes it millions of time with different parameter 
values.  It took a long time to process Bind messages.

Another requirement was subpartitioning.  Will this be possible with the 
current infrastructure, or does this need drastic change?

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unlogged tables cleanup

2016-12-08 Thread Michael Paquier
On Fri, Dec 9, 2016 at 4:54 AM, Robert Haas  wrote:
> On Wed, Dec 7, 2016 at 11:20 PM, Michael Paquier
>  wrote:
>> OK, I rewrote a bit the patch as attached. What do you think?
>
> Committed and back-patched all the way back to 9.2.

Thanks!

>>> Right (I think).  If we set and clear delayChkpt around this work, we
>>> don't need the immediate sync.
>>
>> My point is a bit different than what you mean I think: the
>> transaction creating an unlogged relfilenode would not need to even
>> set delayChkpt in the empty() routines because other transactions
>> would not refer to it until this transaction has committed. So I am
>> arguing about just removing the sync phase.
>
> That doesn't sound right; see the comment for heap_create_init_fork.
> Suppose the transaction creating the unlogged table commits, a
> checkpoint happens, and then the operating system crashes.  Without
> the immediate sync, the operating system crash can cause the un-sync'd
> file to crash, and because of the checkpoint the WAL record that
> creates it isn't replayed either.  So the file's just gone.

Doh. That would have made sense if the checkpoint was actually
flushing the page if it was in shared buffers. But that's not the
case.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-08 Thread Michael Paquier
On Fri, Dec 9, 2016 at 9:34 AM, Josh Berkus  wrote:
> On 12/08/2016 04:16 PM, Tom Lane wrote:
>> Josh Berkus  writes:
>>> On 12/01/2016 05:58 PM, Magnus Hagander wrote:
 And in fairness, having such a "guide to changes" chapter in each
 release probably *would* be a good idea. But it would take resources to
 make that. The release notes are good, but having a more hand-holding
 version explaining incompatible changes in "regular sentences" would
 probably be quite useful to users.
>>
>>> We will have enough major changes in 10.0 to warrant writing one of
>>> these.  Maybe not as part of the official docs, but as a set of wiki
>>> pages or similar.
>>
>> Seems to me this is exactly the release notes' turf.  If you think the
>> release notes aren't clear enough, step right up and help improve them.
>>
>> My own take on it is that the release notes are already a massive
>> amount of work, and putting duplicative material in a bunch of other
>> places isn't going to make things better, it'll just increase the
>> maintenance burden.
>
> This would mean adding literally pages of material to the release notes.
> In the past, folks have been very negative on anything which would make
> the release notes longer.  Are you sure?

As that's a per-version information, that seems adapted to me. There
could be as well in the release notes a link to the portion of the
docs holding this manual. Definitely this should be self-contained in
the docs, and not mention the wiki. My 2c.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-08 Thread Josh Berkus
On 12/08/2016 04:16 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> On 12/01/2016 05:58 PM, Magnus Hagander wrote:
>>> And in fairness, having such a "guide to changes" chapter in each
>>> release probably *would* be a good idea. But it would take resources to
>>> make that. The release notes are good, but having a more hand-holding
>>> version explaining incompatible changes in "regular sentences" would
>>> probably be quite useful to users.
> 
>> We will have enough major changes in 10.0 to warrant writing one of
>> these.  Maybe not as part of the official docs, but as a set of wiki
>> pages or similar.
> 
> Seems to me this is exactly the release notes' turf.  If you think the
> release notes aren't clear enough, step right up and help improve them.
> 
> My own take on it is that the release notes are already a massive
> amount of work, and putting duplicative material in a bunch of other
> places isn't going to make things better, it'll just increase the
> maintenance burden.

This would mean adding literally pages of material to the release notes.
In the past, folks have been very negative on anything which would make
the release notes longer.  Are you sure?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-08 Thread Tom Lane
Josh Berkus  writes:
> On 12/01/2016 05:58 PM, Magnus Hagander wrote:
>> And in fairness, having such a "guide to changes" chapter in each
>> release probably *would* be a good idea. But it would take resources to
>> make that. The release notes are good, but having a more hand-holding
>> version explaining incompatible changes in "regular sentences" would
>> probably be quite useful to users.

> We will have enough major changes in 10.0 to warrant writing one of
> these.  Maybe not as part of the official docs, but as a set of wiki
> pages or similar.

Seems to me this is exactly the release notes' turf.  If you think the
release notes aren't clear enough, step right up and help improve them.

My own take on it is that the release notes are already a massive
amount of work, and putting duplicative material in a bunch of other
places isn't going to make things better, it'll just increase the
maintenance burden.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-08 Thread Josh Berkus
On 12/01/2016 05:58 PM, Magnus Hagander wrote:
> >> * Add docs: "Guide to changes in recovery.conf in 10.0"
> >
> > Hmm, we don't usually write the docs in terms of how things are
> > different from a previous version.  Might seem strange in 5 years.
> > Not sure what's best, here.
> 
> A good chunk in the release notes would make sense as well?
> 
> 
> It would.
> 
> And in fairness, having such a "guide to changes" chapter in each
> release probably *would* be a good idea. But it would take resources to
> make that. The release notes are good, but having a more hand-holding
> version explaining incompatible changes in "regular sentences" would
> probably be quite useful to users.

We will have enough major changes in 10.0 to warrant writing one of
these.  Maybe not as part of the official docs, but as a set of wiki
pages or similar.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to drop old-style (V0) functions?

2016-12-08 Thread Andres Freund
On 2016-12-08 18:03:04 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> >> The habit of zero-initializing Datums has got exactly nothing to do with
> >> V0 functions; it's about ensuring consistent results and avoiding
> >> heisenbugs from use of uninitialized memory.  I do not think we should
> >> drop it.
> 
> > Well, V0 functions don't have a real way to get information about NULL,
> > and we allow non-strict V0 functions, so?
> 
> Non-strict V0 functions are pretty fundamentally broken, although IIRC
> there was some hack whereby they could see the isnull marker for their
> first argument, which is why we didn't just disallow the case.  There was
> never any expectation that checking for value == 0 was an appropriate
> coding method for detecting nulls, because it couldn't work for
> pass-by-value data types.

Well, we have a bunch in our regression tests ;). And I'm not saying
it's *good* that they rely on that, I think it's a reason to drop the
whole V0 interface.

(I also suspect there's a bunch in brin related to this)


> Again, the point of initializing those values is not to support broken
> tests for nullness.  It's to ensure consistent behavior in case of
> buggy attempts to use null values.

Well, it also makes such attempts undetectable. I'm not really convinced
that that's such an improvement.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to drop old-style (V0) functions?

2016-12-08 Thread Tom Lane
Andres Freund  writes:
> On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
>> The habit of zero-initializing Datums has got exactly nothing to do with
>> V0 functions; it's about ensuring consistent results and avoiding
>> heisenbugs from use of uninitialized memory.  I do not think we should
>> drop it.

> Well, V0 functions don't have a real way to get information about NULL,
> and we allow non-strict V0 functions, so?

Non-strict V0 functions are pretty fundamentally broken, although IIRC
there was some hack whereby they could see the isnull marker for their
first argument, which is why we didn't just disallow the case.  There was
never any expectation that checking for value == 0 was an appropriate
coding method for detecting nulls, because it couldn't work for
pass-by-value data types.

Again, the point of initializing those values is not to support broken
tests for nullness.  It's to ensure consistent behavior in case of
buggy attempts to use null values.  It's much like the fact that makeNode
zero-fills new node structs: that's mostly wasted work, if you want to
look at it in a certain way, but it's good for reproducibility.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to drop old-style (V0) functions?

2016-12-08 Thread Andres Freund
On 2016-12-08 14:53:58 -0800, Andres Freund wrote:
> On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> > The habit of zero-initializing Datums has got exactly nothing to do with
> > V0 functions; it's about ensuring consistent results and avoiding
> > heisenbugs from use of uninitialized memory.  I do not think we should
> > drop it.
> 
> Well, V0 functions don't have a real way to get information about NULL,
> and we allow non-strict V0 functions, so?

Oh, and the regression tests fail in V0 functions if you drop that.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to drop old-style (V0) functions?

2016-12-08 Thread Andres Freund
On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I'm wondering if it's not time for $subject:
> > - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
> >   forgotten
> > - They have us keep weird hacks around just for the sake of testing V0
> > - they actually cost performance, because we have to zero initialize 
> > Datums, even if
> >   the corresponding isnull marker is set.
> > - they allow to call arbitrary functions pretty easily
> 
> If by the first point you mean "assume V1 when no info function is found",
> I object to that.  If you mean you want to require an info function, that
> might be OK.

I mean throwing an error.  Silently assuming V1 seems like a horrible
idea to me.  It doesn't seem unlikely that we want to introduce a new
call interface at some point given the runtime cost of the current one,
and that'd just bring back the current problem.


> The habit of zero-initializing Datums has got exactly nothing to do with
> V0 functions; it's about ensuring consistent results and avoiding
> heisenbugs from use of uninitialized memory.  I do not think we should
> drop it.

Well, V0 functions don't have a real way to get information about NULL,
and we allow non-strict V0 functions, so?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changed SRF in targetlist handling

2016-12-08 Thread Tom Lane
Robert Haas  writes:
> Tom, it's been about 3.5 months since you wrote this.  I think it
> would be really valuable if you could get to this RSN because the
> large patch set posted on the "Changed SRF in targetlist handling"
> thread is backed up behind this -- and I think that's really valuable
> work which I don't want to see slip out of this release.

Yeah, I was busy with other stuff during the recent commitfest.
I'll try to get back to this.  There's still only 24 hours in a day,
though.  (And no, [1] is not enough to help.)

regards, tom lane

[1] 
https://www.theguardian.com/science/2016/dec/07/earths-day-lengthens-by-two-milliseconds-a-century-astronomers-find


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to drop old-style (V0) functions?

2016-12-08 Thread Tom Lane
Andres Freund  writes:
> I'm wondering if it's not time for $subject:
> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
>   forgotten
> - They have us keep weird hacks around just for the sake of testing V0
> - they actually cost performance, because we have to zero initialize Datums, 
> even if
>   the corresponding isnull marker is set.
> - they allow to call arbitrary functions pretty easily

If by the first point you mean "assume V1 when no info function is found",
I object to that.  If you mean you want to require an info function, that
might be OK.

The habit of zero-initializing Datums has got exactly nothing to do with
V0 functions; it's about ensuring consistent results and avoiding
heisenbugs from use of uninitialized memory.  I do not think we should
drop it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

2016-12-08 Thread Robert Haas
On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund  wrote:
> On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund  wrote:
>> > I've a working fix for this, and for a similar issue Robert found. I'm
>> > still playing around with it, but basically the fix is to make the
>> > growth policy a bit more adaptive.
>>
>> Any chance you can post a patch soon?
>
> Here's my WIP series addressing this and related problems. With this
> we're again noticeably faster than the dynahash implementation, in both
> the case here, and the query you brought up over IM.
>
> This definitely needs some more TLC, but the general approach seems
> good. I particularly like that it apparently allows us to increase the
> default fillfactor without much downside according to my measurements.

Are you going to commit something here?  At least enough to make
Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite
time?  Because the fact that it doesn't really sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Tom Lane
Stephen Frost  writes:
> That's a good point, we might be doing things wrong in other places in
> the code by using FirstNormalObjectId on pre-8.1 servers.

> What I suggest then is an independent patch which uses a different
> variable than FirstNormalObjectId and sets it correctly based on the
> version of database that we're connecting to,

+1

> and after that's working
> correctly for HEAD vs. HEAD-8.0, and 9.6-9.2 vs. 9.6-7.1 (all I was able
> to get running within a few hours..  if someone wants to test against
> 7.0 or earlier that's fine, or if someone can provide a clean patch to
> make 7.0 work for me, that's fine too) and after that looks good and is
> committed, I'll rebase this work on that.

pg_dump never intended to support pre-7.0 servers.  I do have 7.0-7.3
servers in captivity and can do testing if you like.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changed SRF in targetlist handling

2016-12-08 Thread Robert Haas
On Mon, Aug 22, 2016 at 4:20 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2016-08-17 17:41:28 -0700, Andres Freund wrote:
>>> Tom, do you think this is roughly going in the right direction?
>
> I've not had time to look at this patch, I'm afraid.  If you still
> want me to, I can make time in a day or so.

Tom, it's been about 3.5 months since you wrote this.  I think it
would be really valuable if you could get to this RSN because the
large patch set posted on the "Changed SRF in targetlist handling"
thread is backed up behind this -- and I think that's really valuable
work which I don't want to see slip out of this release.  At the same
time, both that and this are quite invasive, and I don't want it all
to get committed the day before feature freeze, because that will mess
up the schedule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to drop old-style (V0) functions?

2016-12-08 Thread Stephen Frost
Andres, all,

* Andres Freund (and...@anarazel.de) wrote:
> I'm wondering if it's not time for $subject:
> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
>   forgotten
> - They have us keep weird hacks around just for the sake of testing V0
> - they actually cost performance, because we have to zero initialize Datums, 
> even if
>   the corresponding isnull marker is set.
> - they allow to call arbitrary functions pretty easily
> 
> I don't see any reason to keep them around. If seriously doubt anybody
> is using them seriously in anything but error.

+100

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> (Actually, the most likely way in which this would break things is if
> >> it started causing built-in casts to get dumped ... have you checked?)
> 
> > So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
> > in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
> > have functions for server versions 7.3-8.0.  Casts which do have a
> > function aren't included though, be they user-defined or not, because
> > they're excluded by getFuncs() and dumpCast() just punts.
> 
> > With my change, pg_dump'ing against 8.0 and earlier will dump out all
> > casts, including those with functions, since the function definitions
> > will now be pulled in for them by getFuncs().
> 
> I poked into that, and you're right --- it wasn't until 8.1 (commit
> 2193a856a) that we had a hard-and-fast rule that initdb-assigned OIDs
> would be less than FirstNormalObjectId.  Before that, the cutoff was
> variable and was recorded in pg_database.datlastsysoid.

Oh, I see.

> > What isn't clear to me is what to do about this.  Given the lack of
> > anyone complaining, and that this would at least ensure that the
> > user-defined casts are dumped, we could just go with this change and
> > tell people who are dumping against 8.0 and earlier databases to ignore
> > the errors from the extra CREATE CAST commands (they shouldn't hurt
> > anything, after all) during the restore.
> 
> There's a lot to be said for that.  It dumped too much before, it'll
> dump a bit more now, but neither case is fatal.  And it's unlikely
> that anybody really cares anymore.

Well, yes, but still, if it's not too hard to do...

> If you do want to do something about this, the way would be to retrieve
> datlastsysoid and use that as the cutoff with a pre-8.1 server.  I think
> there used to be code to do things that way in pg_dump; we must have
> removed it (rather prematurely).

That's a good point, we might be doing things wrong in other places in
the code by using FirstNormalObjectId on pre-8.1 servers.

What I suggest then is an independent patch which uses a different
variable than FirstNormalObjectId and sets it correctly based on the
version of database that we're connecting to, and after that's working
correctly for HEAD vs. HEAD-8.0, and 9.6-9.2 vs. 9.6-7.1 (all I was able
to get running within a few hours..  if someone wants to test against
7.0 or earlier that's fine, or if someone can provide a clean patch to
make 7.0 work for me, that's fine too) and after that looks good and is
committed, I'll rebase this work on that.

That said, at least initial testing makes it look like it's still going
to be in the 10s-of-ms on 8.3 and earlier.  Looking at it a bit more, it
looks like part of the problem there is that, for some reason, we're
using a sequential scan inside a nested loop instead of using the
pg_cast_oid_index..  Setting enable_seqscan = false turns that into a
Bitmap Heap Scan which gets it down to only a few ms again.  ANALYZE
doesn't seem to help either, though I'm still not terribly concerned
about this.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Ok, thinking a bit more on this and testing, it looks like we record the
> > initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
> > Therefore, we could use that as the gating factor for getFuncs() instead
> > of FirstNormalObjectId and then I can also modify getCasts() to exclude
> > pinned casts, which should fix pg_dump against 8.0 and earlier.
> 
> Don't really like that; won't it make the getFuncs query substantially
> more expensive?

Adding the check against pg_depend doesn't appear to make it much slower
than having the check against FirstNormalObjectId.  Adding either seems
to cause the getFuncs() query to go from ~1.7ms to ~3.4ms for HEAD, at
least on my system, which was also built with debugging and asserts and
all that, so the difference in the field is probably less.

Going back to older versions, the difference drops because we drop the
pg_init_privs logic for pre-9.6, and then drop the check for transforms
in pre-9.5.

All versions are just a few ms from HEAD down to 8.4.

In 8.3 and older, we do start to get slower because we don't use a
Merge Anti Join and instead use a Nested Loop Left Join.  I was getting
about:

8.3 - ~35ms
8.2 - ~44ms
8.1 - ~66ms
8.0 - ~82ms
7.4 - ~64ms
7.3 - ~61ms

Again, these were Assert and debug-enabled builds, though, of course,
that's not going to make up for the lack of anti-join.

I'm really not sure that's worth troubling over though as I am not aware
of anyone with hundreds of 8.3 or earlier databases that they're going
to need a really fast pg_dump to work on.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> (Actually, the most likely way in which this would break things is if
>> it started causing built-in casts to get dumped ... have you checked?)

> So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
> in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
> have functions for server versions 7.3-8.0.  Casts which do have a
> function aren't included though, be they user-defined or not, because
> they're excluded by getFuncs() and dumpCast() just punts.

> With my change, pg_dump'ing against 8.0 and earlier will dump out all
> casts, including those with functions, since the function definitions
> will now be pulled in for them by getFuncs().

I poked into that, and you're right --- it wasn't until 8.1 (commit
2193a856a) that we had a hard-and-fast rule that initdb-assigned OIDs
would be less than FirstNormalObjectId.  Before that, the cutoff was
variable and was recorded in pg_database.datlastsysoid.

> What isn't clear to me is what to do about this.  Given the lack of
> anyone complaining, and that this would at least ensure that the
> user-defined casts are dumped, we could just go with this change and
> tell people who are dumping against 8.0 and earlier databases to ignore
> the errors from the extra CREATE CAST commands (they shouldn't hurt
> anything, after all) during the restore.

There's a lot to be said for that.  It dumped too much before, it'll
dump a bit more now, but neither case is fatal.  And it's unlikely
that anybody really cares anymore.

If you do want to do something about this, the way would be to retrieve
datlastsysoid and use that as the cutoff with a pre-8.1 server.  I think
there used to be code to do things that way in pg_dump; we must have
removed it (rather prematurely).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-12-08 Thread Karl O. Pinc
On Thu, 8 Dec 2016 11:27:34 -0500
Robert Haas  wrote:

> On Wed, Dec 7, 2016 at 11:02 PM, Karl O. Pinc  wrote:
> > I read this and knew that I hadn't finished review, but didn't
> > immediately respond because I thought the patch had to be
> > marked "ready for committer" on commitfest.postgresql.org
> > before the committers would spend a lot of time on it.  
> 
> Generally that's true, but I was trying to be helpful and trying also
> to move this toward some kind of conclusion.

It has been very helpful, particularly your last reply.  And I think
there's now a clear path forward.

(I'm not looking for any replies to the below, although of course would
welcome whatever you've got to say.)

> It is, of course, not my job to decide who is at fault in whatever may
> or may not have gone wrong during the reviewing process.

Understood.  And I'm not looking for any sort of judgment or
attempting to pass judgment.  It has been frustrating though
and your email provided an opportunity to vent, and to
provide some feedback, of whatever quality, to the review
process.

It's been that much more frustrating because I don't really
care one way or another about the feature.  I was just trying
to build up credit reviewing somebody else's patch, and instead
probably did the opposite with all the thrashing.  :)

>  I do think that the blizzard of small patches that you've
> submitted in some of your emails may possibly be a bit overwhelming.
> We shouldn't really need a stack of a dozen or more patches to
> implement one small feature.  Declarative partitioning only had 7.
> Why does this need more than twice that number?

I'm trying to break up changes into patches which are as small
as possible to make them more easily understood by providing
a guide for the reader/reviewer.  So rather than
a single patch I'd make, say, 3.  One for documentation to describe
the change.  Another which does whatever refactoring is necessary
to the existing code, but which otherwise does not introduce any
functional changes.  And another which adds the new code which makes
use of the refactoring.  At each stage the code should continue
to work without bugs.  The other party can then decide to incorporate
the patchs into the main patch, which is itself another attached
patch.

Do this for several unrelated changes to the main patch and the patches
add up.

Mostly I wouldn't expect various patches to be reviewed by the
committers, they'd get incorporated into the main patch.

This is how I break down the work when I look at code changes.
What's it do?  What does it change/break in the existing code?
How does the new stuff work?  But this process has not worked
here so I guess I'll stop.

But I expect you will see at least 3 patches submitted for
committer review.  I see a number of hardcoded constants,
now that the main patch adds additional code, that can
be made into symbols to, IMO, improve code clarity.
Guiles points out that this is an issue of coding style
and might be considered unnecessary complication.
So they are not in the main patch.  They are attached
(applied to v14 of the main patch; really, the first applies
to the master PG branch and the 2nd to v14 of the
pg_current_logfiles patch) if you want to look
at them now and provide some guidance as to whether they
should be dropped or included in the patch.

> > The extreme case is the attached "cleanup_rotate" patch.
> > (Which applies to v14 of this patch.)  It's nothing but
> > a little bit of tiding up of the master branch,

> I took a look at that patch just now and I guess I don't really see
> the point.

The point would be to make the code easier to read.  Saving cycles
is hardly ever worthwhile in my opinion.  The whole point
of code is to be read by people and be understandable.
If it's not understood it's useless going forward.

Once I've gone to the effort to understand something
written, that I'm going to be responsible for maintaining, I like to
see it written as clearly as possible.  In my experience
if you don't do this then little confusions multiply and
eventually the whole thing turns to mud.

The trade-off, as you say, is the overhead involved
in running minor changes through the production
process.  I figure this can be minimized by bundling
such changes with related substantial changes.

Again, it's not working so I'll drop it.

> > FYI.  The point of the "retry_current_logfiles"
> > patch series is to handle the case
> > where logfile_writename gets a ENFILE or EMFILE.
> > When this happens the current_logfiles file can
> > "skip" and not contain the name(s) of the current
> > log file for an entire log rotation.  To miss, say,
> > a month of logs because the logs only rotate monthly
> > and your log processing is based on reading the
> > current_logfiles file sounds like a problem.  
> 
> I think if you are trying to enumerate the names of your logfiles by
> any sort of polling mechanism, rather than by seeing what 

[HACKERS] Time to drop old-style (V0) functions?

2016-12-08 Thread Andres Freund
Hi,

I'm wondering if it's not time for $subject:
- V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
  forgotten
- They have us keep weird hacks around just for the sake of testing V0
- they actually cost performance, because we have to zero initialize Datums, 
even if
  the corresponding isnull marker is set.
- they allow to call arbitrary functions pretty easily

I don't see any reason to keep them around. If seriously doubt anybody
is using them seriously in anything but error.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-12-08 Thread Petr Jelinek
On 08/12/16 20:16, Peter Eisentraut wrote:
> On 12/6/16 11:58 AM, Peter Eisentraut wrote:
>> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>>> I think that the removal of changes to ReplicationSlotAcquire() that you
>>> did will result in making it impossible to reacquire temporary slot once
>>> you switched to different one in the session as the if (active_pid != 0)
>>> will always be true for temp slot.
>>
>> I see.  I suppose it's difficult to get a test case for this.
> 
> I created a test case, saw the error of my ways, and added your code
> back in.  Patch attached.
> 

Hi,

I am happy with this version, thanks for moving it forward.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2016-12-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Dec 7, 2016 at 11:55 PM, Heikki Linnakangas  wrote:
>> Should we be worried about breaking the API of tuplesort_get* functions?
>> They might be used by extensions. I think that's OK, but wanted to bring it
>> up. This would be only for master, of course, and any breakage would be
>> straightforward to fix.

> I don't think so. I'm not aware of any third party extensions that
> call tuplesort.c routines, despite having looked for them. I noticed
> that pg_repack doesn't. For any that do, they'll break in a
> predictable, obvious way.

Adding or subtracting function arguments is something we do all the time.
As long as it's not proposed for back-patch, I don't see a problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Tom Lane
Stephen Frost  writes:
> Ok, thinking a bit more on this and testing, it looks like we record the
> initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
> Therefore, we could use that as the gating factor for getFuncs() instead
> of FirstNormalObjectId and then I can also modify getCasts() to exclude
> pinned casts, which should fix pg_dump against 8.0 and earlier.

Don't really like that; won't it make the getFuncs query substantially
more expensive?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Tom Lane
I've pushed the previous patch to HEAD.  Attached is a proposed patch
(against 9.6) that we could use for the back branches; it takes the
brute force approach of just computing the correct value on-demand
in the two functions at issue.  The key question of course is whether
this is acceptable from a performance standpoint.  I did a simple test
using a 1000-entry VALUES list:

select count(a) from (
values
  ('0'::varchar(3), '0'::varchar(4)),
  ('1'::varchar(3), '1'::varchar(4)),
  ('2'::varchar(3), '2'::varchar(4)),
  ('3'::varchar(3), '3'::varchar(4)),
  ('4'::varchar(3), '4'::varchar(4)),
  ...
  ('996'::varchar(3), '996'::varchar(4)),
  ('997'::varchar(3), '997'::varchar(4)),
  ('998'::varchar(3), '998'::varchar(4)),
  ('999'::varchar(3), '999'::varchar(4))
) v(a,b);

Since all the rows do have the same typmod, this represents the worst
case where we have to scan all the way to the end to confirm the typmod,
and it has about as little overhead otherwise as I could think of doing.
I ran it like this:

pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression

and could not see any above-the-noise-level difference --- in fact,
it seemed like it was faster *with* the patch, which is obviously
impossible; I blame that on chance realignments of loops vs. cache
line boundaries.

So I think this is an okay candidate for back-patching.  If anyone
wants to do their own performance tests, please do.

regards, tom lane

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 1e3ecbc..c51fd81 100644
*** a/src/backend/parser/parse_relation.c
--- b/src/backend/parser/parse_relation.c
*** static void expandTupleDesc(TupleDesc tu
*** 52,57 
--- 52,58 
  int rtindex, int sublevels_up,
  int location, bool include_dropped,
  List **colnames, List **colvars);
+ static int32 *getValuesTypmods(RangeTblEntry *rte);
  static int	specialAttNum(const char *attname);
  static bool isQueryUsingTempRelation_walker(Node *node, void *context);
  
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2157,2165 
--- 2158,2179 
  			{
  /* Values RTE */
  ListCell   *aliasp_item = list_head(rte->eref->colnames);
+ int32	   *coltypmods;
  ListCell   *lcv;
  ListCell   *lcc;
  
+ /*
+  * It's okay to extract column types from the expressions in
+  * the first row, since all rows will have been coerced to the
+  * same types.  Their typmods might not be the same though, so
+  * we potentially need to examine all rows to compute those.
+  * Column collations are pre-computed in values_collations.
+  */
+ if (colvars)
+ 	coltypmods = getValuesTypmods(rte);
+ else
+ 	coltypmods = NULL;
+ 
  varattno = 0;
  forboth(lcv, (List *) linitial(rte->values_lists),
  		lcc, rte->values_collations)
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2184,2196 
  
  		varnode = makeVar(rtindex, varattno,
  		  exprType(col),
! 		  exprTypmod(col),
  		  colcollation,
  		  sublevels_up);
  		varnode->location = location;
  		*colvars = lappend(*colvars, varnode);
  	}
  }
  			}
  			break;
  		case RTE_JOIN:
--- 2198,2212 
  
  		varnode = makeVar(rtindex, varattno,
  		  exprType(col),
! 		  coltypmods[varattno - 1],
  		  colcollation,
  		  sublevels_up);
  		varnode->location = location;
  		*colvars = lappend(*colvars, varnode);
  	}
  }
+ if (coltypmods)
+ 	pfree(coltypmods);
  			}
  			break;
  		case RTE_JOIN:
*** expandRTE(RangeTblEntry *rte, int rtinde
*** 2296,2301 
--- 2312,2319 
  		varnode = makeVar(rtindex, varattno,
  		  coltype, coltypmod, colcoll,
  		  sublevels_up);
+ 		varnode->location = location;
+ 
  		*colvars = lappend(*colvars, varnode);
  	}
  }
*** expandTupleDesc(TupleDesc tupdesc, Alias
*** 2413,2418 
--- 2431,2504 
  }
  
  /*
+  * getValuesTypmods -- expandRTE subroutine
+  *
+  * Identify per-column typmods for the given VALUES RTE.  Returns a
+  * palloc'd array.
+  */
+ static int32 *
+ getValuesTypmods(RangeTblEntry *rte)
+ {
+ 	int32	   *coltypmods;
+ 	List	   *firstrow;
+ 	int			ncolumns,
+ nvalid,
+ i;
+ 	ListCell   *lc;
+ 
+ 	Assert(rte->values_lists != NIL);
+ 	firstrow = (List *) linitial(rte->values_lists);
+ 	ncolumns = list_length(firstrow);
+ 	coltypmods = (int32 *) palloc(ncolumns * sizeof(int32));
+ 	nvalid = 0;
+ 
+ 	/* Collect the typmods from the first VALUES row */
+ 	i = 0;
+ 	foreach(lc, firstrow)
+ 	{
+ 		Node	   *col = (Node *) lfirst(lc);
+ 
+ 		coltypmods[i] = exprTypmod(col);
+ 		if (coltypmods[i] >= 0)
+ 			nvalid++;
+ 		i++;
+ 	}
+ 
+ 	/*
+ 	 * Scan remaining rows; as soon as we have a non-matching typmod for a
+ 	 * column, reset that typmod to -1.  We can bail out early if all typmods
+ 	 

Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> I have a vague feeling that the code for dumping casts and/or transforms
> >> may have some assumptions that the underlying function is also being
> >> dumped.  Although maybe the assumption was really only what's fixed here,
> >> ie that there be a DumpableObject for the function.  Anyway, take a close
> >> look for that.
> 
> > I'll look around and see, though my hunch is that, at some point, we
> > were just pulling all functions and then an optimization was added to
> > exclude pg_catalog and no one noticed that it broke casts using built-in
> > functions.
> 
> Nah, that's historical revisionism --- the exclusion for system functions
> is very ancient.  It certainly predates transforms altogether, and
> probably predates the cast-dumping code in anything like its current form.

Apparently, that exclusion goes back to 7.3.

> I think the expectation was that casts using built-in functions were
> also built-in and so needn't be dumped.  There may be a comment about it
> somewhere, which would need to be revised now.

I don't see anything in current code or back to 9.2.  Going back
farther, I do find comments about skipping casts which only refer to
things in pg_* namespaces, which, of course, isn't correct either.

As such, creating a cast like so:

create cast (timestamptz as interval) with function age(timestamptz) as
assignment;

results in a cast which no version of pg_dump will actually dump out of
any PG server version since CREATE CAST was added.  I spent the effort
to get all the way back to 7.1 up and running, tho CREATE CAST wasn't
actually added until 7.3.

> (Actually, the most likely way in which this would break things is if
> it started causing built-in casts to get dumped ... have you checked?)

So, this is fun.  Apparently casts had OIDs > FirstNormalObjectId back
in 8.0 and earlier, so pg_dump >= 9.0 dumps out all casts which don't
have functions for server versions 7.3-8.0.  Casts which do have a
function aren't included though, be they user-defined or not, because
they're excluded by getFuncs() and dumpCast() just punts.

With my change, pg_dump'ing against 8.0 and earlier will dump out all
casts, including those with functions, since the function definitions
will now be pulled in for them by getFuncs().

What isn't clear to me is what to do about this.  Given the lack of
anyone complaining, and that this would at least ensure that the
user-defined casts are dumped, we could just go with this change and
tell people who are dumping against 8.0 and earlier databases to ignore
the errors from the extra CREATE CAST commands (they shouldn't hurt
anything, after all) during the restore.

Older versions of pg_dump don't have much to offer us regarding this
case- they didn't dump out user-defined casts which only used components
from pg_catalog either.

Ok, thinking a bit more on this and testing, it looks like we record the
initdb-defined casts as 'pinned' in pg_depend all the way back to 7.3.
Therefore, we could use that as the gating factor for getFuncs() instead
of FirstNormalObjectId and then I can also modify getCasts() to exclude
pinned casts, which should fix pg_dump against 8.0 and earlier.

I'll start working on a patch for that, please let me know if you see
any huge glaring holes in my thinking.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2016-12-08 Thread Peter Geoghegan
On Wed, Dec 7, 2016 at 11:55 PM, Heikki Linnakangas  wrote:
> Should we be worried about breaking the API of tuplesort_get* functions?
> They might be used by extensions. I think that's OK, but wanted to bring it
> up. This would be only for master, of course, and any breakage would be
> straightforward to fix.

I don't think so. I'm not aware of any third party extensions that
call tuplesort.c routines, despite having looked for them. I noticed
that pg_repack doesn't. For any that do, they'll break in a
predictable, obvious way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-12-08 Thread Robert Haas
On Wed, Dec 7, 2016 at 9:23 PM, Amit Kapila  wrote:
> Attached patch changes the comment based on above suggestions.

Thanks.  Committed and back-patched to 9.6.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2016-12-08 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Dec 8, 2016 at 2:11 PM, Stephen Frost  wrote:
> > Yes, that makes the compiler warning go away.
> 
> Great, pushed.

Awesome, thanks!

> > ... your compiler knows that key->partnatts will always be >= 1?
> 
> :-)
> 
> I think my compiler is too dumb to notice that int x; printf("%d", x);
> is a reference to an uninitialized variable.

Made me laugh, thanks again. :)

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2016-12-08 Thread Robert Haas
On Thu, Dec 8, 2016 at 2:11 PM, Stephen Frost  wrote:
> Yes, that makes the compiler warning go away.

Great, pushed.

> ... your compiler knows that key->partnatts will always be >= 1?

:-)

I think my compiler is too dumb to notice that int x; printf("%d", x);
is a reference to an uninitialized variable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unlogged tables cleanup

2016-12-08 Thread Robert Haas
On Wed, Dec 7, 2016 at 11:20 PM, Michael Paquier
 wrote:
> OK, I rewrote a bit the patch as attached. What do you think?

Committed and back-patched all the way back to 9.2.

>> Right (I think).  If we set and clear delayChkpt around this work, we
>> don't need the immediate sync.
>
> My point is a bit different than what you mean I think: the
> transaction creating an unlogged relfilenode would not need to even
> set delayChkpt in the empty() routines because other transactions
> would not refer to it until this transaction has committed. So I am
> arguing about just removing the sync phase.

That doesn't sound right; see the comment for heap_create_init_fork.
Suppose the transaction creating the unlogged table commits, a
checkpoint happens, and then the operating system crashes.  Without
the immediate sync, the operating system crash can cause the un-sync'd
file to crash, and because of the checkpoint the WAL record that
creates it isn't replayed either.  So the file's just gone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for changes to recovery.conf API

2016-12-08 Thread Josh Berkus
On 12/04/2016 07:21 PM, Michael Paquier wrote:
> On Mon, Dec 5, 2016 at 11:34 AM, Haribabu Kommi
>  wrote:
>> As there was a feedback from others related to the patch and doesn't find
>> any
>> update from author.
>>
>> Closed in 2016-11 commitfest with "returned with feedback" status.
>> Please feel free to update the status once you submit the updated patch or
>> if the current status doesn't reflect the actual status of the patch.
> 
> Having a consensus here is already a great step forward. I am sure
> that a new version will be sent for the next CF.
> 

Please let's make sure this gets done for 10.  We're planning on
breaking a lot of config things in 10, so it would be MUCH easier on
users if we do the recovery.conf changes in that version as well.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication WIP

2016-12-08 Thread Peter Eisentraut
On 12/6/16 11:58 AM, Peter Eisentraut wrote:
> On 12/5/16 6:24 PM, Petr Jelinek wrote:
>> I think that the removal of changes to ReplicationSlotAcquire() that you
>> did will result in making it impossible to reacquire temporary slot once
>> you switched to different one in the session as the if (active_pid != 0)
>> will always be true for temp slot.
> 
> I see.  I suppose it's difficult to get a test case for this.

I created a test case, saw the error of my ways, and added your code
back in.  Patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From b6c389b5ccd17a4c8a68d429048c2e7db34bfd51 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 8 Dec 2016 12:00:00 -0500
Subject: [PATCH] Add support for temporary replication slots

This allows creating temporary replication slots that are removed
automatically at the end of the session or on error.
---
 contrib/test_decoding/Makefile  |  2 +-
 contrib/test_decoding/expected/ddl.out  |  4 +-
 contrib/test_decoding/expected/slot.out | 58 +++
 contrib/test_decoding/sql/slot.sql  | 20 ++
 doc/src/sgml/func.sgml  | 16 ++--
 doc/src/sgml/protocol.sgml  | 13 ++-
 src/backend/catalog/system_views.sql| 11 ++
 src/backend/replication/repl_gram.y | 22 +++
 src/backend/replication/repl_scanner.l  |  1 +
 src/backend/replication/slot.c  | 69 ++---
 src/backend/replication/slotfuncs.c | 24 
 src/backend/replication/walsender.c | 28 +++--
 src/backend/storage/lmgr/proc.c |  3 ++
 src/backend/tcop/postgres.c |  3 ++
 src/include/catalog/pg_proc.h   |  6 +--
 src/include/nodes/replnodes.h   |  1 +
 src/include/replication/slot.h  |  4 +-
 src/test/regress/expected/rules.out |  3 +-
 18 files changed, 237 insertions(+), 51 deletions(-)
 create mode 100644 contrib/test_decoding/expected/slot.out
 create mode 100644 contrib/test_decoding/sql/slot.sql

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index a6641f5040..d2bc8b8350 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -39,7 +39,7 @@ submake-test_decoding:
 
 REGRESSCHECKS=ddl xact rewrite toast permissions decoding_in_xact \
 	decoding_into_rel binary prepared replorigin time messages \
-	spill
+	spill slot
 
 regresscheck: | submake-regress submake-test_decoding temp-install
 	$(MKDIR_P) regression_output
diff --git a/contrib/test_decoding/expected/ddl.out b/contrib/test_decoding/expected/ddl.out
index a9ba615b5b..c104c4802d 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -702,7 +702,7 @@ SELECT pg_drop_replication_slot('regression_slot');
 
 /* check that the slot is gone */
 SELECT * FROM pg_replication_slots;
- slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
++---++--+++--+--+-+-
+ slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
+---++---++--+---+++--+--+-+-
 (0 rows)
 
diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out
new file mode 100644
index 00..5e6b70ba38
--- /dev/null
+++ b/contrib/test_decoding/expected/slot.out
@@ -0,0 +1,58 @@
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 'test_decoding');
+ ?column? 
+--
+ init
+(1 row)
+
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_t', 'test_decoding', true);
+ ?column? 
+--
+ init
+(1 row)
+
+SELECT pg_drop_replication_slot('regression_slot_p');
+ pg_drop_replication_slot 
+--
+ 
+(1 row)
+
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_p', 'test_decoding', false);
+ ?column? 
+--
+ init
+(1 row)
+
+-- reconnect to clean temp slots
+\c
+SELECT pg_drop_replication_slot('regression_slot_p');
+ pg_drop_replication_slot 
+--
+ 
+(1 row)
+
+-- should fail because the temporary slot was dropped automatically
+SELECT pg_drop_replication_slot('regression_slot_t');
+ERROR:  replication slot "regression_slot_t" does not exist
+-- test switching between slots in a session
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true);
+ ?column? 
+--
+ init
+(1 row)
+
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot2', 'test_decoding', true);
+ ?column? 
+--
+ init

Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2016-12-08 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Dec 8, 2016 at 1:49 PM, Stephen Frost  wrote:
> > * Robert Haas (rh...@postgresql.org) wrote:
> >> Implement table partitioning.
> >
> > My compiler apparently doesn't care for this:
> >
> > .../src/backend/catalog/partition.c: In function ‘partition_rbound_cmp’:
> > .../src/backend/catalog/partition.c:1787:13: warning: ‘cmpval’ may be used 
> > uninitialized in this function [-Wmaybe-uninitialized]
> >   if (cmpval == 0 && lower1 != lower2)
> 
> So, apparently your compiler doesn't recognize that the loop always
> has to execute at least once, because we don't support a table
> partitioned on zero attributes.  If you initialize cmpval to 0 at the
> top of the function, does that fix it?

Yes, that makes the compiler warning go away.

... your compiler knows that key->partnatts will always be >= 1?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2016-12-08 Thread Robert Haas
On Thu, Dec 8, 2016 at 1:49 PM, Stephen Frost  wrote:
> * Robert Haas (rh...@postgresql.org) wrote:
>> Implement table partitioning.
>
> My compiler apparently doesn't care for this:
>
> .../src/backend/catalog/partition.c: In function ‘partition_rbound_cmp’:
> .../src/backend/catalog/partition.c:1787:13: warning: ‘cmpval’ may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>   if (cmpval == 0 && lower1 != lower2)

So, apparently your compiler doesn't recognize that the loop always
has to execute at least once, because we don't support a table
partitioned on zero attributes.  If you initialize cmpval to 0 at the
top of the function, does that fix it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2016-12-08 Thread Stephen Frost
* Robert Haas (rh...@postgresql.org) wrote:
> Implement table partitioning.

My compiler apparently doesn't care for this:

.../src/backend/catalog/partition.c: In function ‘partition_rbound_cmp’:
.../src/backend/catalog/partition.c:1787:13: warning: ‘cmpval’ may be used 
uninitialized in this function [-Wmaybe-uninitialized]
  if (cmpval == 0 && lower1 != lower2)
 ^

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-08 Thread Tom Lane
Robert Haas  writes:
> Maybe it would help for Jeff to use elog_node_display() to the nodes
> that are causing the problem - e.g. outerpathkeys and innerpathkeys
> and best_path->path_mergeclauses, or just best_path - at the point
> where the error is thrown. That might give us enough information to
> see what's broken.

I'll be astonished if that's sufficient evidence.  We already know that
the problem is that the input path doesn't claim to be sorted in a way
that would match the merge clauses, but that doesn't tell us how such
a path came to be generated (or, if it wasn't intentionally done, where
the data structure got clobbered later).

It's possible that setting a breakpoint at create_mergejoin_path and
capturing stack traces for all calls would yield usable insight.  But
there are likely to be lots of calls if this is an 8-way join query,
and probably only a few are wrong.

I'd much rather have a test case than try to debug this remotely.
Bandwidth too low.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2016-12-08 Thread Aleksander Alekseev
Tom, Geoff,

Thanks for your feedback! Here is version 2 of this patch.

> You could just change it to
> if (str[strspn(str, " \t\n\r\f")] == '\0')
> to mitigate calling strlen. It's safe to do so because strspn will
> only return values from 0 to strlen(str).

> [...] I have serious doubts that the "optimized" implementation
> you propose is actually faster than a naive one; it may be slower, and
> it's certainly longer and harder to understand/test.

I would like to point out that I never said it's optimized. However I
like the code Geoff proposed. It definitely doesn't make anything worse.
I decided to keep pg_str_contansonly procedure (i.e. not to inline this
code) for now. Code with strspn looks OK in a simple example. However in
a concrete context it looks like a bad Perl code in ROT13 to me:

```
/* try to figure out what's exactly going on */
if(somelongname[strspn(somelongname /* yes, again */, "longlistofchars")] != 
'\0')
```
> Function name seems weirdly spelled.

I named it the same way pg_str_endswith is named. However I'm open for
better suggestions here.

> Whether it's worth worrying about, I dunno.  This is hardly the only
> C idiom you need to be familiar with to read the PG code.

Well, at least this v2 version of the patch removes second string
scanning. And I still believe that this inlined strspn is not readable
or obvious at all.

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index a8bb472..a1c5853 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -17,6 +17,7 @@
 #include "access/htup_details.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "lib/stringinfo.h"
 #include "nodes/makefuncs.h"
 #include "parser/parser.h"
@@ -696,7 +697,7 @@ typeStringToTypeName(const char *str)
 	ErrorContextCallback ptserrcontext;
 
 	/* make sure we give useful error for empty input */
-	if (strspn(str, " \t\n\r\f") == strlen(str))
+	if (pg_str_containsonly(str, " \t\n\r\f"))
 		goto fail;
 
 	initStringInfo();
diff --git a/src/backend/tsearch/ts_utils.c b/src/backend/tsearch/ts_utils.c
index 9c3e15c..81d6132 100644
--- a/src/backend/tsearch/ts_utils.c
+++ b/src/backend/tsearch/ts_utils.c
@@ -17,6 +17,7 @@
 #include 
 
 #include "miscadmin.h"
+#include "common/string.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
 
@@ -45,7 +46,7 @@ get_tsearch_config_filename(const char *basename,
 	 * and case-insensitive filesystems, and non-ASCII characters create other
 	 * interesting risks, so on the whole a tight policy seems best.
 	 */
-	if (strspn(basename, "abcdefghijklmnopqrstuvwxyz0123456789_") != strlen(basename))
+	if (!pg_str_containsonly(basename, "abcdefghijklmnopqrstuvwxyz0123456789_"))
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid text search configuration file name \"%s\"",
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 394042c..7e41359 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -32,6 +32,7 @@
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "parser/parse_type.h"
@@ -75,7 +76,7 @@ regprocin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (pro_name_or_oid[0] >= '0' &&
 		pro_name_or_oid[0] <= '9' &&
-		strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid))
+		pg_str_containsonly(pro_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(pro_name_or_oid)));
@@ -286,7 +287,7 @@ regprocedurein(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (pro_name_or_oid[0] >= '0' &&
 		pro_name_or_oid[0] <= '9' &&
-		strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid))
+		pg_str_containsonly(pro_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(pro_name_or_oid)));
@@ -535,7 +536,7 @@ regoperin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (opr_name_or_oid[0] >= '0' &&
 		opr_name_or_oid[0] <= '9' &&
-		strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid))
+		pg_str_containsonly(opr_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(opr_name_or_oid)));
@@ -750,7 +751,7 @@ regoperatorin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (opr_name_or_oid[0] >= '0' &&
 		opr_name_or_oid[0] <= '9' &&
-		strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid))
+		pg_str_containsonly(opr_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(opr_name_or_oid)));
@@ -995,7 +996,7 @@ regclassin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (class_name_or_oid[0] >= '0' &&
 		class_name_or_oid[0] <= '9' &&
-		

Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-08 Thread Robert Haas
On Thu, Dec 8, 2016 at 12:50 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> I have a DML statement which triggers the error:
>> ERROR:  XX000: outer pathkeys do not match mergeclauses
>> LOCATION:  create_mergejoin_plan, createplan.c:3722
>
> Hmm.
>
>> Any tips on investigating this further in situ?  Or is the best option just
>> to work harder on a minimal and disclosable test case?
>
> I think we need a test case --- not minimal necessarily, but something
> other people can reproduce.  You might find that setting enable_hashjoin
> and/or enable_nestloop to false makes it easier to provoke the error,
> since evidently this requires that we (a) generate a faulty mergejoin Path
> and then (b) choose it as the cheapest one, since the error occurs while
> converting it to a Plan.

Maybe it would help for Jeff to use elog_node_display() to the nodes
that are causing the problem - e.g. outerpathkeys and innerpathkeys
and best_path->path_mergeclauses, or just best_path - at the point
where the error is thrown. That might give us enough information to
see what's broken.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?

2016-12-08 Thread Robert Haas
On Thu, Dec 8, 2016 at 12:40 PM, Tom Lane  wrote:
> Ah-hah, thanks for the insight.  I can now reproduce it, and I confirm
> that aside from removing the semaphores, our POSIX shmem segment(s)
> are removed from /dev/shm.  They presumably still are attached to whatever
> processes have them mapped already, but this behavior is going to break
> DSM usage in any case.  (The SysV shm segment does survive, presumably
> because systemd notices its positive attach count.)

Make sense.  Actually, it would be fairly unlucky for the DSM thing to
cause a failure for parallel query as it exists today, because there's
only about 4ms between when the segment is created and when all
backends attached.  But DSA - especially if we use it for anything
long-lived - will surely break.

> So I now agree that getting out from under SysV semaphores isn't going to
> fix our problems with systemd ... at least, not unless we migrate *to*
> not away from SysV shared memory for DSM.  Even then, we'd have to be
> careful that standard usage patterns keep every DSM segment continually
> attached to at least one process.  Dunno how practical that is.  And it
> blows chunks in the goal of not being constrained by SHMMAX.

dynamic_shared_memory_type = sysv is already supported, but it's not
the default precisely because of SHMMAX.  Keeping every DSM segment
attached to at least one process is normal for parallel query as it
exists today, but tough for any application that "pins" segments.  On
Windows, we do that: pinning a segment actually causes a user backend
to reach inside the postmaster's address space to open a file
descriptor.  Why Microsoft thought that was something a process should
be able to do to another process I couldn't say.  But on Linux where
segments don't go away on last close, and where such frightening APIs
don't exist, there's no guarantee that a segment will always be open
somewhere.

Hey, I have an idea.  Let's switch from processes to threads, and then
shared memory - including the dynamic kind - can be implemented using
malloc().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2016-12-08 Thread Tom Lane
Jeff Janes  writes:
> I have a DML statement which triggers the error:
> ERROR:  XX000: outer pathkeys do not match mergeclauses
> LOCATION:  create_mergejoin_plan, createplan.c:3722

Hmm.

> Any tips on investigating this further in situ?  Or is the best option just
> to work harder on a minimal and disclosable test case?

I think we need a test case --- not minimal necessarily, but something
other people can reproduce.  You might find that setting enable_hashjoin
and/or enable_nestloop to false makes it easier to provoke the error,
since evidently this requires that we (a) generate a faulty mergejoin Path
and then (b) choose it as the cheapest one, since the error occurs while
converting it to a Plan.

BTW, if you're not doing this in a debug (--enable-cassert) build, it'd
be useful to try it in one.  I'm a little suspicious that the root cause
might be a memory-stomp type of problem, ie somebody scribbling on a
pathkey data structure without accounting for it being shared with another
path.  It's possible that cassert memory checking would help catch that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?

2016-12-08 Thread Tom Lane
Peter Eisentraut  writes:
> On 12/7/16 9:28 PM, Alex Hunsaker wrote:
>> Hrm, the following incantation seems to break for me on a fresh Fedora
>> 25 system:
>> 1) As root su to $USER and start postgres.
>> 2) ssh in as $USER and then logout
>> 3) # psql localhost
>> 
>> FATAL: semctl(4980742, 3, SETVAL, 0) failed: Invalid argument
>> LOG: server process (PID 14569) exited with exit code 1

> Yeah, the way to trigger this is to run the postgres server not in a
> "session", then log in interactively as that same user, thus creating a
> session, and then logging out from that session, thus completely logging
> out that user from all sessions.

> (Thus, the way to trigger the KillUserProcesses behavior is quite the
> opposite, because that only happens if you have the postgres server
> running in a session.)

Ah-hah, thanks for the insight.  I can now reproduce it, and I confirm
that aside from removing the semaphores, our POSIX shmem segment(s)
are removed from /dev/shm.  They presumably still are attached to whatever
processes have them mapped already, but this behavior is going to break
DSM usage in any case.  (The SysV shm segment does survive, presumably
because systemd notices its positive attach count.)

So I now agree that getting out from under SysV semaphores isn't going to
fix our problems with systemd ... at least, not unless we migrate *to*
not away from SysV shared memory for DSM.  Even then, we'd have to be
careful that standard usage patterns keep every DSM segment continually
attached to at least one process.  Dunno how practical that is.  And it
blows chunks in the goal of not being constrained by SHMMAX.

Oh well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-08 Thread Ashutosh Sharma
Well, we are currently passing INFINITE timeout value to
WaitForMultipleObjects() API which is hanging. I thought of changing this
INIFINTE timeout interval to some finite value like say 1 sec and could not
reproduce the issue. But, having said that i checked the older code where
this issue does not exists and found here as well we are passing INFINTE
timeout value to WaitForMultipleObjects(). So not sure if this passing
INFINITE timeout is  really an issue. Attached is a small patch that has my
changes.

On Thu, Dec 8, 2016 at 1:51 PM, Michael Paquier 
wrote:

> On Wed, Dec 07, 2016 at 03:16:09PM +0530, Ashutosh Sharma wrote:
> > Problem Analysis:
> > -
> > Allthough i am very new to Windows, i tried debugging the issue and
> > could find that Backend is not receiving the query executed after
> > "SELECT pldbg_attach_to_port(2)" and is infinitely waiting on
> > "WaitEventSetWaitBlock()" at WaitForMultipleObjects() to read the
> > input command. Below is the backtrace for the same.
>
> This code was indeed in need of review.  It seems that you are giving
> enough information to reproduce the problem. I'll try to dig into
> it...
> --
> Michael
>
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index b7e5129..eae2ebb 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1395,7 +1395,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 	 * Need to wait for ->nevents + 1, because signal handle is in [0].
 	 */
 	rc = WaitForMultipleObjects(set->nevents + 1, set->handles, FALSE,
-cur_timeout);
+1000);
 
 	/* Check return code */
 	if (rc == WAIT_FAILED)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] postgres_fdw bug in 9.6

2016-12-08 Thread Jeff Janes
I have a setup where a 9.6.1 server uses postgres_fdw to connect to a 9.4.9
hot standby server.

I have a DML statement which triggers the error:

ERROR:  XX000: outer pathkeys do not match mergeclauses
LOCATION:  create_mergejoin_plan, createplan.c:3722

The error first starts appearing with this commit (on the local side):

commit aa09cd242fa7e3a694a31f8aed521e80d1e626a4
Author: Robert Haas 
Date:   Wed Mar 9 10:51:49 2016 -0500

postgres_fdw: Consider foreign joining and foreign sorting together.


The version of the remote side does not seem to matter.  I've also promoted
a test instance of the remote from hot standby to master and then upgraded
to 9.6.1, and neither step fixes the issue.

The statement is like this:

explain update foo_local set col3=foo_remote.col3 from foo_remote where
foo_local.id=foo_remote.id and foo_local.id in ('aaa','bbb','ccc','ddd');

Where foo_remote is a pretty complicated view (defined locally) over the
join of 8 foreign tables.

I am having trouble producing a self-contained, disclosable test case for
this.  Small changes causes the error to go away.  On the local side, it
doesn't seem to depend on the contents of the table, only the structure.
But on the remote side, truncating the central table for the query makes
the error go away.

Any tips on investigating this further in situ?  Or is the best option just
to work harder on a minimal and disclosable test case?

Cheers,

Jeff


Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2016-12-08 Thread Tom Lane
Aleksander Alekseev  writes:
> How about rewriting such a code like this?
> if (pg_str_containsonly(str, " \t\n\r\f"))

Function name seems weirdly spelled.  Also, I believe that in nearly all
use-cases the number of data characters that would typically be examined
is small, so I have serious doubts that the "optimized" implementation
you propose is actually faster than a naive one; it may be slower, and
it's certainly longer and harder to understand/test.

Whether it's worth worrying about, I dunno.  This is hardly the only
C idiom you need to be familiar with to read the PG code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Robert Haas
On Thu, Dec 8, 2016 at 11:44 AM, Dmitry Ivanov  wrote:
>> That would be fantastic.  I and my colleagues at EnterpriseDB can
>> surely help review; of course, maybe you and some of your colleagues
>> would like to help review our patches, too.
>
> Certainly, I'll start reviewing as soon as I get familiar with the code.

Thanks!

>> Do you think this is
>> likely to be something where you can get something done quickly, with
>> the hope of getting it into v10?
>
> Yes, I've just cleared my schedule in order to make this possible. I'll
> bring in the patches ASAP.

Fantastic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Robert Haas
On Thu, Dec 8, 2016 at 11:43 AM, Alexander Korotkov
 wrote:
> Great! And it is very cool that we have basic infrastructure already
> committed.  Thanks a lot to you and everybody involved.

Thanks.

>> of course, maybe you and some of your colleagues
>> would like to help review our patches, too.
> We understand our reviewing performance is not sufficient.  Will try to do
> better during next commitfest.

Not trying to throw stones, just want to get as much committed as
possible.  And I think our patches are good and valuable improvements
too, so I want to see them go in because they will help everybody.
Thanks for trying to increase the reviewing effort; it is sorely
needed.

>> Do you think this is
>> likely to be something where you can get something done quickly, with
>> the hope of getting it into v10?
>
> Yes, because we have set of features already implemented in pg_pathman.  In
> particular we have following features from your list and some more.
>
> - more efficient plan-time partition pruning (constraint exclusion is too
> slow)
> - run-time partition pruning
> - insert (and eventually update) tuple routing for foreign partitions
> - hash partitioning
> - not scanning the parent

That's a lot of stuff.  Getting even a couple of those would be a big win.

>> Time is growing short, but it would
>> be great to polish this a little more before we ship it.
>
> Yes. Getting at least some of this features committed to v10 would be great
> and improve partitioning usability a lot.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Dmitry Ivanov

That would be fantastic.  I and my colleagues at EnterpriseDB can
surely help review; of course, maybe you and some of your colleagues
would like to help review our patches, too. 


Certainly, I'll start reviewing as soon as I get familiar with the code.



Do you think this is
likely to be something where you can get something done quickly, with
the hope of getting it into v10?  


Yes, I've just cleared my schedule in order to make this possible. I'll 
bring in the patches ASAP.



--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Alexander Korotkov
On Thu, Dec 8, 2016 at 7:29 PM, Robert Haas  wrote:

> On Thu, Dec 8, 2016 at 11:13 AM, Dmitry Ivanov 
> wrote:
> > We (PostgresPro) have been working on pg_pathman for quite a while, and
> > since it's obviously going to become the thing of the past, it would be a
> > wasted effort if we didn't try to participate.
> >
> > For starters, I'd love to work on both plan-time & run-time partition
> > pruning. I created a custom node for run-time partition elimination, so I
> > think I'm capable of developing something similar.

That would be fantastic.  I and my colleagues at EnterpriseDB can
> surely help review;


Great! And it is very cool that we have basic infrastructure already
committed.  Thanks a lot to you and everybody involved.


> of course, maybe you and some of your colleagues
> would like to help review our patches, too.


We understand our reviewing performance is not sufficient.  Will try to do
better during next commitfest.


> Do you think this is
> likely to be something where you can get something done quickly, with
> the hope of getting it into v10?


Yes, because we have set of features already implemented in pg_pathman.  In
particular we have following features from your list and some more.

- more efficient plan-time partition pruning (constraint exclusion is too
slow)
- run-time partition pruning
- insert (and eventually update) tuple routing for foreign partitions
- hash partitioning
- not scanning the parent

Time is growing short, but it would
> be great to polish this a little more before we ship it.
>

Yes. Getting at least some of this features committed to v10 would be great
and improve partitioning usability a lot.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Robert Haas
On Thu, Dec 8, 2016 at 11:13 AM, Dmitry Ivanov  wrote:
> We (PostgresPro) have been working on pg_pathman for quite a while, and
> since it's obviously going to become the thing of the past, it would be a
> wasted effort if we didn't try to participate.
>
> For starters, I'd love to work on both plan-time & run-time partition
> pruning. I created a custom node for run-time partition elimination, so I
> think I'm capable of developing something similar.

That would be fantastic.  I and my colleagues at EnterpriseDB can
surely help review; of course, maybe you and some of your colleagues
would like to help review our patches, too.  Do you think this is
likely to be something where you can get something done quickly, with
the hope of getting it into v10?  Time is growing short, but it would
be great to polish this a little more before we ship it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-12-08 Thread Robert Haas
On Wed, Dec 7, 2016 at 11:02 PM, Karl O. Pinc  wrote:
> I read this and knew that I hadn't finished review, but didn't
> immediately respond because I thought the patch had to be
> marked "ready for committer" on commitfest.postgresql.org
> before the committers would spend a lot of time on it.

Generally that's true, but I was trying to be helpful and trying also
to move this toward some kind of conclusion.

> I've been introducing some complication, but I hope it's necessary.
> To keep the review process simple my plan has been to submit
> multiple patches.  One with the basics and more on top of that
> that introduce complication that can be considered and accepted or
> rejected.  So I send emails with multiple patches, some that I think
> need to go into the core patch and others to be kept separate.
> But, although I try, this does not seem to have been communicated
> because I keep getting emails back that contain only a single patch.
>
> Maybe something's wrong with my review process but I don't know
> how to fix it.

It is, of course, not my job to decide who is at fault in whatever may
or may not have gone wrong during the reviewing process.  It is not
only not my job, but would be unproductive, since the goal here is for
people to contribute more, not less.  I have done enough review in
this community to have experienced the problem of people who say that
they took your suggestions when in fact they didn't, or who randomly
de-improve things in future patch versions that were more correct in
earlier versions.  I agree that on occasions when that happens, it is
indeed very frustrating.  Whether that's happened in this case, I
don't know.  I do think that the blizzard of small patches that you've
submitted in some of your emails may possibly be a bit overwhelming.
We shouldn't really need a stack of a dozen or more patches to
implement one small feature.  Declarative partitioning only had 7.
Why does this need more than twice that number?

> The extreme case is the attached "cleanup_rotate" patch.
> (Which applies to v14 of this patch.)  It's nothing but
> a little bit of tiding up of the master branch, but does
> touch code that's already being modified so it seems
> like the committers would want to look at it at the same
> time as when reviewing the pg_current_logfile patch.
> Once you've looked at the pg_current_logfile patch
> you've already looked at and modified code in the function
> the "cleanup_rotate" patch touches.

I took a look at that patch just now and I guess I don't really see
the point.  I mean, it will save a few cycles, and that's not nothing,
but it's not much, either.  I don't see a reason to complicate the
basic feature patch by entangling it with such a change - it just
makes it harder to get the actual feature committed.  And I kind of
wonder if the time it will take to review and commit that patch is
even worth it.  There must be hundreds of places in our code base
where you could do stuff like that, but only a few of those save
enough cycles to be worth bothering with.  To be fair, if that patch
were submitted independently of the rest of this and nobody objected,
I'd probably commit it; I mean, why not?  But this thread is now over
100 emails long without reaching a happy conclusion, and one good way
to expedite the path to a happy conclusion is to drop all of the
nonessential bits.  And that change certainly qualifies as
nonessential.

> FYI.  The point of the "retry_current_logfiles"
> patch series is to handle the case
> where logfile_writename gets a ENFILE or EMFILE.
> When this happens the current_logfiles file can
> "skip" and not contain the name(s) of the current
> log file for an entire log rotation.  To miss, say,
> a month of logs because the logs only rotate monthly
> and your log processing is based on reading the
> current_logfiles file sounds like a problem.

I think if you are trying to enumerate the names of your logfiles by
any sort of polling mechanism, rather than by seeing what files show
up in your logging directory, you are doing it wrong.  So, I don't see
that this matters at all.

> The point of the code is to report not just the real error, but what
> effect the real error has -- that the current_logfiles file did not
> get updated in a timely fashion.  Maybe this isn't necessary, especially
> if the write of current_logfiles gets retried on failure.  Or maybe
> the right thing to do is to give logfile_open() an argument that
> let's it elaborate it's error message.  Any guidance here would
> be appreciated.

Generally speaking, I would say that it's the user's job to decide
what the impact of an error is; our job is only to tell them what
happened.  There are occasional exceptions; for example:

rhaas=# select ablance from pgbench_accounts;
ERROR:  column "ablance" does not exist
LINE 1: select ablance from pgbench_accounts;
   ^
HINT:  Perhaps you meant to reference the column "pgbench_accounts.abalance".

The HINT 

Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Dmitry Ivanov

Hi everyone,


Of course, this is the beginning, not the end.  I've been thinking
about next steps -- here's an expanded list:


- more efficient plan-time partition pruning (constraint 
exclusion is too slow)

- run-time partition pruning
- try to reduce lock levels
...


We (PostgresPro) have been working on pg_pathman for quite a while, and 
since it's obviously going to become the thing of the past, it would be a 
wasted effort if we didn't try to participate.


For starters, I'd love to work on both plan-time & run-time partition 
pruning. I created a custom node for run-time partition elimination, so I 
think I'm capable of developing something similar.



--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto compilation error due to stack-allocated EVP_CIPHER_CTX

2016-12-08 Thread Asif Naeem
It make sense. I would like to share more comments as following i.e.

static int
> bf_check_supported_key_len(void)
> {
> ...
>  /* encrypt with 448bits key and verify output */
>  evp_ctx = EVP_CIPHER_CTX_new();
>  if (!evp_ctx)
>   return 1;
>  if (!EVP_EncryptInit_ex(evp_ctx, EVP_bf_ecb(), NULL, NULL, NULL))
>   goto leave;
>  if (!EVP_CIPHER_CTX_set_key_length(evp_ctx, 56))
>   goto leave;
>  if (!EVP_EncryptInit_ex(evp_ctx, NULL, NULL, key, NULL))
>   goto leave;
>  if (!EVP_EncryptUpdate(evp_ctx, out, , data, 8))
>   goto leave;
>  if (memcmp(out, res, 8) != 0)
>   goto leave;/* Output does not match ->
> strong cipher is
>  * not supported */
>  status = 1;
> leave:
>  EVP_CIPHER_CTX_free(evp_ctx);
>  return status;
> }


It seems that it need to return 0 instead of 1 in case of failure i.e.

 /* encrypt with 448bits key and verify output */
>  evp_ctx = EVP_CIPHER_CTX_new();
>  if (!evp_ctx)
>   return 0;


We can avoid multiple if conditions and goto statement something like i.e.

 if (EVP_EncryptInit_ex(evp_ctx, EVP_bf_ecb(), NULL, NULL, NULL) &&
>  EVP_CIPHER_CTX_set_key_length(evp_ctx, 56) &&
>  EVP_EncryptInit_ex(evp_ctx, NULL, NULL, key, NULL) &&
>  EVP_EncryptUpdate(evp_ctx, out, , data, 8) &&
>  memcmp(out, res, 8) == 0 )) /* Output does not match -> strong
> cipher is not supported */
>  status = 1;
>  EVP_CIPHER_CTX_free(evp_ctx);
>  return status;
> }


What is your opinion ?. I am hopeful I will be able to share all my
findings tomorrow. Thanks.


On Wed, Dec 7, 2016 at 2:23 AM, Michael Paquier 
wrote:

> On Tue, Dec 6, 2016 at 11:42 PM, Asif Naeem  wrote:
> > Thanks for updated patch. Although EVP_CIPHER_CTX_cleanup() seems
> deprecated
> > in OpenSSL >= 1.1.0 i.e.
> >
> >> # if OPENSSL_API_COMPAT < 0x1010L
> >> #  define EVP_CIPHER_CTX_init(c)  EVP_CIPHER_CTX_reset(c)
> >> #  define EVP_CIPHER_CTX_cleanup(c)   EVP_CIPHER_CTX_reset(c)
> >> # endif
> >
> >
> > I guess use of deprecated function is fine, until OpenSSL library support
> > it.
>
> We could use some ifdef block with the OpenSSL version number, but I
> am not sure if that's worth complicating the code at this stage.
> --
> Michael
>


Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2016-12-08 Thread Geoff Winkless
On 8 December 2016 at 15:54, Aleksander Alekseev
 wrote:
> Hi.
>
> I noticed that there is a lot of repeating code like this:
>
> ```
> if (strspn(str, " \t\n\r\f") == strlen(str))
> ```
>
> I personally don't find it particularly readable, not mentioning that
> traversing a string twice doesn't look as a good idea (you can check
> using objdump that latest GCC 6.2 doesn't optimize this code).

You could just change it to

if (str[strspn(str, " \t\n\r\f")] == '\0')

to mitigate calling strlen. It's safe to do so because strspn will
only return values from 0 to strlen(str).

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use EVP API pgcrypto encryption, dropping support for OpenSSL 0.9.6 and older

2016-12-08 Thread Heikki Linnakangas

On 12/08/2016 05:51 PM, Christoph Berg wrote:

Re: Heikki Linnakangas 2016-10-17 <07ebd878-ff09-72d5-7df7-f7fde7b83...@iki.fi>

Committed this patch now.


Hi,

I've just taken up work again on PG 10 on Debian unstable.

With openssl 1.1.0c-2, pgcrypto errors out with:


Yeah, sorry about that. It's already been discussed at 
https://www.postgresql.org/message-id/20161201014826.ic72tfkahmevpwz7%40alap3.anarazel.de.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2016-12-08 Thread Aleksander Alekseev
Hi.

I noticed that there is a lot of repeating code like this:

```
if (strspn(str, " \t\n\r\f") == strlen(str))
```

I personally don't find it particularly readable, not mentioning that
traversing a string twice doesn't look as a good idea (you can check
using objdump that latest GCC 6.2 doesn't optimize this code).

How about rewriting such a code like this?

```
if (pg_str_containsonly(str, " \t\n\r\f"))
```

Corresponding patch is attached. I don't claim that my implementation of
pg_str_containsonly procedure is faster that strspn + strlen, but at
least such refactoring makes code a little more readable.

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index a8bb472..a1c5853 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -17,6 +17,7 @@
 #include "access/htup_details.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "lib/stringinfo.h"
 #include "nodes/makefuncs.h"
 #include "parser/parser.h"
@@ -696,7 +697,7 @@ typeStringToTypeName(const char *str)
 	ErrorContextCallback ptserrcontext;
 
 	/* make sure we give useful error for empty input */
-	if (strspn(str, " \t\n\r\f") == strlen(str))
+	if (pg_str_containsonly(str, " \t\n\r\f"))
 		goto fail;
 
 	initStringInfo();
diff --git a/src/backend/tsearch/ts_utils.c b/src/backend/tsearch/ts_utils.c
index 9c3e15c..81d6132 100644
--- a/src/backend/tsearch/ts_utils.c
+++ b/src/backend/tsearch/ts_utils.c
@@ -17,6 +17,7 @@
 #include 
 
 #include "miscadmin.h"
+#include "common/string.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
 
@@ -45,7 +46,7 @@ get_tsearch_config_filename(const char *basename,
 	 * and case-insensitive filesystems, and non-ASCII characters create other
 	 * interesting risks, so on the whole a tight policy seems best.
 	 */
-	if (strspn(basename, "abcdefghijklmnopqrstuvwxyz0123456789_") != strlen(basename))
+	if (!pg_str_containsonly(basename, "abcdefghijklmnopqrstuvwxyz0123456789_"))
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid text search configuration file name \"%s\"",
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 394042c..7e41359 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -32,6 +32,7 @@
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "parser/parse_type.h"
@@ -75,7 +76,7 @@ regprocin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (pro_name_or_oid[0] >= '0' &&
 		pro_name_or_oid[0] <= '9' &&
-		strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid))
+		pg_str_containsonly(pro_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(pro_name_or_oid)));
@@ -286,7 +287,7 @@ regprocedurein(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (pro_name_or_oid[0] >= '0' &&
 		pro_name_or_oid[0] <= '9' &&
-		strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid))
+		pg_str_containsonly(pro_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(pro_name_or_oid)));
@@ -535,7 +536,7 @@ regoperin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (opr_name_or_oid[0] >= '0' &&
 		opr_name_or_oid[0] <= '9' &&
-		strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid))
+		pg_str_containsonly(opr_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(opr_name_or_oid)));
@@ -750,7 +751,7 @@ regoperatorin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (opr_name_or_oid[0] >= '0' &&
 		opr_name_or_oid[0] <= '9' &&
-		strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid))
+		pg_str_containsonly(opr_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(opr_name_or_oid)));
@@ -995,7 +996,7 @@ regclassin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (class_name_or_oid[0] >= '0' &&
 		class_name_or_oid[0] <= '9' &&
-		strspn(class_name_or_oid, "0123456789") == strlen(class_name_or_oid))
+		pg_str_containsonly(class_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		CStringGetDatum(class_name_or_oid)));
@@ -1186,7 +1187,7 @@ regtypein(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (typ_name_or_oid[0] >= '0' &&
 		typ_name_or_oid[0] <= '9' &&
-		strspn(typ_name_or_oid, "0123456789") == strlen(typ_name_or_oid))
+		pg_str_containsonly(typ_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(typ_name_or_oid)));
@@ -1358,7 +1359,7 @@ regconfigin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (cfg_name_or_oid[0] >= '0' &&
 		cfg_name_or_oid[0] <= '9' &&
-		

Re: [HACKERS] Use EVP API pgcrypto encryption, dropping support for OpenSSL 0.9.6 and older

2016-12-08 Thread Christoph Berg
Re: Heikki Linnakangas 2016-10-17 <07ebd878-ff09-72d5-7df7-f7fde7b83...@iki.fi>
> Committed this patch now.

Hi,

I've just taken up work again on PG 10 on Debian unstable.

With openssl 1.1.0c-2, pgcrypto errors out with:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 
-fdebug-prefix-map=/<>=. 
-specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat 
-Werror=format-security -I/usr/include/mit-krb5 -fPIC -pie 
-fno-omit-frame-pointer -fpic -I. -I/<>/build/../contrib/pgcrypto 
-I../../src/include -I/<>/build/../src/include -Wdate-time 
-D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/tcl8.6 
 -c -o openssl.o /<>/build/../contrib/pgcrypto/openssl.c
/<>/build/../contrib/pgcrypto/openssl.c:253:17: error: field 
'evp_ctx' has incomplete type
  EVP_CIPHER_CTX evp_ctx;
 ^~~
/<>/build/../contrib/pgcrypto/openssl.c: In function 
'bf_check_supported_key_len':
/<>/build/../contrib/pgcrypto/openssl.c:373:17: error: storage 
size of 'evp_ctx' isn't known
  EVP_CIPHER_CTX evp_ctx;
 ^~~
/<>/build/../contrib/pgcrypto/openssl.c:373:17: warning: unused 
variable 'evp_ctx' [-Wunused-variable]
: recipe for target 'openssl.o' failed

Reverting 5ff4a67f63fd6d3eb01ff9707d4674ed54a89f3b fixes compilation.
(9.6 is fine.)

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Robert Haas
On Wed, Dec 7, 2016 at 11:42 PM, Michael Paquier
 wrote:
>> Congrats to everyone working on this! This is a large step forward.
>
> Congratulations to all! It was a long way to this result.

Yes.  The last effort in this area which I can remember was by Itagaki
Takahiro in 2010, so we've been waiting for this for more than 6
years.  It's really good that Amit was able to put in the effort to
produce a committable patch, and I think he deserves all of our thanks
for getting that done - and NTT deserves our thanks for paying him to
do it.

Even though I know he put in a lot more work than I did, let me just
say: phew, even reviewing that was a ton of work.

Of course, this is the beginning, not the end.  I've been thinking
about next steps -- here's an expanded list:

- more efficient plan-time partition pruning (constraint exclusion is too slow)
- run-time partition pruning
- partition-wise join (Ashutosh Bapat is already working on this)
- try to reduce lock levels
- hash partitioning
- the ability to create an index on the parent and have all of the
children inherit it; this should work something like constraint
inheritance.  you could argue that this doesn't add any real new
capability but it's a huge usability feature.
- teaching autovacuum enough about inheritance hierarchies for it to
update the parent statistics when they get stale despite the lack of
any actual inserts/updates/deletes to the parent.  this has been
pending for a long time, but it's only going to get more important
- row movement (aka avoiding the need for an ON UPDATE trigger on each
partition)
- insert (and eventually update) tuple routing for foreign partitions
- not scanning the parent
- fixing the insert routing so that we can skip tuple conversion where possible
- fleshing out the documentation

One thing I'm wondering is whether we can optimize away some of the
heavyweight locks.  For example, if somebody does SELECT * FROM ptab
WHERE id = 1, they really shouldn't need to lock the entire
partitioning hierarchy, but right now they do.  If the root knows
based on its own partitioning key that only one child is relevant, it
would be good to lock *only that child*.  For this feature to be
competitive, it needs to scale to at least a few thousand partitions,
and locking thousands of objects instead of one or two is bound to be
slow.  Similarly, you can imagine teaching COPY to lock partitions
only on demand; if no tuples are routed to a particular partition, we
don't need to lock it.  There's a manageability component here, too:
not locking partitions unnecessarily makes ti easier to get DDL on
other partitions through.  Alternatively, maybe we could rewrite the
lock manager to be hierarchical, so that you can take a single lock
that represents an AccessShareLock on all partitions and only need to
make one entry in the lock table to do it.  That means that attempts
to lock individual partitions need to check not only for a lock on
that partition but also on anything further up in the hierarchy, but
that might be a good trade if it gives us O(1) locking on the parent.
And maybe we could also have a level of the hierarchy that represents
every-table-in-the-database, for the benefit of pg_dump.  Of course,
rewriting the lock manager is a big project not for the faint of
heart, but I think if we don't it's going to be a scaling bottleneck.

We also need to consider other parts of the system that may not scale,
like pg_dump.  For a long time, we've been sorta-kinda willing to fix
the worst of the scalability problems with pg_dump, but that's really
no longer an adequate response.  People want 1000 partitions.  Heck,
people want 1,000,000 partitions, but getting to where 1000 partitions
works well would help PostgreSQL a lot.  Our oft-repeated line that
inheritance isn't designed for large numbers of inheritance children
is basically just telling people who have the use case where they need
that to go use some other product.  Partitioning, like replication, is
not an optional feature for a world-class database.  And, from a
technical point of view, I think we've now got an infrastructure that
really should be able to be scaled up considerably higher than what
we've been able to do in the past.  When we were stuck with
inheritance + constraint exclusion, we could say "well, there's not
really any point because you'll hit these other limits anyway".  But I
think now that's not really true.  This patch eliminates one of the
core scalability problems in this area, and provides infrastructure
for attacking some of the others.  I hope that people will step up and
do that.  There's a huge opportunity here for PostgreSQL to become
relevant in use cases where it currently falters badly, and we should
try to take advantage of it.  This patch is a big step by itself, but
if we ignore the potential to do more with this as the base we will be
leaving a lot of "win" on the table.

-- 
Robert Haas
EnterpriseDB: 

Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> I have a vague feeling that the code for dumping casts and/or transforms
> >> may have some assumptions that the underlying function is also being
> >> dumped.  Although maybe the assumption was really only what's fixed here,
> >> ie that there be a DumpableObject for the function.  Anyway, take a close
> >> look for that.
> 
> > I'll look around and see, though my hunch is that, at some point, we
> > were just pulling all functions and then an optimization was added to
> > exclude pg_catalog and no one noticed that it broke casts using built-in
> > functions.
> 
> Nah, that's historical revisionism --- the exclusion for system functions
> is very ancient.  It certainly predates transforms altogether, and
> probably predates the cast-dumping code in anything like its current form.

Oh, certainly it predates transforms entirely, but I completely expect
that code to have been copied from dumpCast().

> I think the expectation was that casts using built-in functions were
> also built-in and so needn't be dumped.  There may be a comment about it
> somewhere, which would need to be revised now.

I'll look and see.

> (Actually, the most likely way in which this would break things is if
> it started causing built-in casts to get dumped ... have you checked?)

I did and it doesn't cause built-in casts to be dumped because we have
the FirstNormalObjectId check in selectDumpableCast().  We don't have
that for transforms (it just uses the generic selectDumpableObject
routine), but we also don't have any built-in transforms.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I have a vague feeling that the code for dumping casts and/or transforms
>> may have some assumptions that the underlying function is also being
>> dumped.  Although maybe the assumption was really only what's fixed here,
>> ie that there be a DumpableObject for the function.  Anyway, take a close
>> look for that.

> I'll look around and see, though my hunch is that, at some point, we
> were just pulling all functions and then an optimization was added to
> exclude pg_catalog and no one noticed that it broke casts using built-in
> functions.

Nah, that's historical revisionism --- the exclusion for system functions
is very ancient.  It certainly predates transforms altogether, and
probably predates the cast-dumping code in anything like its current form.
I think the expectation was that casts using built-in functions were
also built-in and so needn't be dumped.  There may be a comment about it
somewhere, which would need to be revised now.

(Actually, the most likely way in which this would break things is if
it started causing built-in casts to get dumped ... have you checked?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump vs. TRANSFORMs

2016-12-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > As pointed out by Peter E, this also impacts CASTs.  Attached is a patch
> > which addresses both by simply also pulling any functions which are
> > referenced from pg_cast or pg_transform when they have OIDs at or after
> > FirstNormalObjectId.  I also modified dumpCast() and dumpTransform() to
> > complain loudly if they're unable to dump out the cast or transform due
> > to not finding the function definition(s) necessary.
> 
> Please do not hack the already-overcomplicated query in getFuncs without
> bothering to adjust the long comment that describes what it's doing.

Right, just wanted to make sure no one had issue with this approach.

> I have a vague feeling that the code for dumping casts and/or transforms
> may have some assumptions that the underlying function is also being
> dumped.  Although maybe the assumption was really only what's fixed here,
> ie that there be a DumpableObject for the function.  Anyway, take a close
> look for that.

I'll look around and see, though my hunch is that, at some point, we
were just pulling all functions and then an optimization was added to
exclude pg_catalog and no one noticed that it broke casts using built-in
functions.

> Looks reasonable otherwise.

Ok, great, I'll get to work on building and testing all supported
versions of pg_dump vs. all server versions that I can reasonably get to
build on my current laptop, which I expect to be a matrix of 9.2-master
pg_dump against server versions ~7.4-master...

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Pavel Stehule
2016-12-08 14:03 GMT+01:00 Alvaro Herrera :

> Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > Tom Lane wrote:
> > >> In HEAD, we could change the RTE data structure so that
> > >> transformValuesClause could save the typmod information in the RTE,
> > >> keeping the lookups cheap.
> >
> > > Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> > > a bit about it at
> > > https://www.postgresql.org/message-id/20161122204730.
> dgipy6gxi25j4e6a@alvherre.pgsql
> >
> > I dunno.  If your example there is correct that XMLTABLE can be called as
> > a plain function in a SELECT list, then I doubt that we want to tie
> > anything about it to the RTE data structure.  If anything, the case where
> > it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
> > case.
>
> Well, XMLTABLE is specified by the standard to be part of ,
> which it turn is part of .  I can't immediately tell
> whether it allows XMLTABLE to be called like a regular function.  The
> current patch allows it, but maybe that's not right, and it's probably
> not that useful anyway.
>

It looks like function, and we support on both sides, so I implemented
both.

Probably, there is only 10 rows more related to this feature. Using this
function in target list is not critical feature - now with LATERAL JOIN we
can live without it. It is just some few steps forward to our user.

Again - implementation of this feature is probably few lines only.


>
> > I've been trying to avoid getting involved in the XMLTABLE patch, mainly
> > because I know zip about XML, but maybe I need to take a look.
>
> I think it'd be productive that you did so.  The XML part of it is
> reasonably well isolated, so you could give your opinion on the core
> parser / executor parts without looking at the XML part.
>

The critical part has zero relation to XML. All is some game with tupledesc.

Regards

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?

2016-12-08 Thread Peter Eisentraut
On 12/7/16 9:28 PM, Alex Hunsaker wrote:
> Hrm, the following incantation seems to break for me on a fresh Fedora
> 25 system:
> 1) As root su to $USER and start postgres.
> 2) ssh in as $USER and then logout
> 3) # psql localhost
> 
> FATAL: semctl(4980742, 3, SETVAL, 0) failed: Invalid argument
> LOG: server process (PID 14569) exited with exit code 1

Yeah, the way to trigger this is to run the postgres server not in a
"session", then log in interactively as that same user, thus creating a
session, and then logging out from that session, thus completely logging
out that user from all sessions.

(Thus, the way to trigger the KillUserProcesses behavior is quite the
opposite, because that only happens if you have the postgres server
running in a session.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Stephen Frost
Amit,

* Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
> Hmm, I had mixed feeling about what to do about that as well.  So now, we
> have the description of various new features buried into VI. Reference
> section of the documentation, which is simply meant as a command
> reference.  I agree that the new partitioning warrants more expansion in
> the DDL partitioning chapter.  Will see how that could be done.

Definitely.

> > * The fact that there's no implementation of row movement should be
> > documented as a limitation.  We should also look at removing that
> > limitation.
> 
> Yes, something to improve.  By the way, since we currently mention INSERT
> tuple-routing directly in the description of the partitioned tables in the
> CREATE TABLE command reference, is that also the place to list this
> particular limitation?  Or is UPDATE command reference rather the correct
> place?

Both.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?

2016-12-08 Thread Peter Eisentraut
On 12/7/16 9:38 AM, Tom Lane wrote:
>> Even with that change, dynamic shared memory is still vulnerable to be
>> removed.
> Really?  I thought we concluded that it is safe because it is detectably
> attached to running processes.

The DSM implementation uses POSIX shared memory, which doesn't have an
attachment count like SysV shared memory.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PgConf.Russia 2017 Call for Papers

2016-12-08 Thread Alexander Korotkov
Hi!

We are now accepting talk proposals for PgConf.Russia 2017.

The Russian PostgreSQL conference will take place on 15-17 March 2017 in
Moscow, at the Digital October venue, same as for PgConf.Russia 2015. The
audience include a wide range of application and system developers,
database architects and administrators.

 * 15 (Wed) tutorials
 * 16 & 17 (Thu-Fri) talks - the main part of the conference

Please, note that PgConf.Russian 2017 takes place in MARCH.

See https://pgconf.ru/en

The major topics for the talks are:

 * PostgreSQL scalability, performance and security.
 * PostgreSQL core development. Internals. Current and future projects.
 * Live experience experience of using PostgreSQL in industrial grade
systems. Integration, migration, application development.
 * Cluster. Scalable and fail-safe PostgreSQL-based systems.
 * PostgreSQL ecosystem. Related and supplementary software.
 * PostgreSQL community in Russia and abroad.

If you will to report a practical case, or would like to make on online
demo, or have proposals expanding the named topics, please state this in
your talk proposal. We will be glad to make the conference programm more
interesting.

The typical duration of a talk is 45 minutes, half-talks for 22 minutes are
also possible.

The application deadline is December, 15 2016. The preliminary program will
be published by December, 20.

Talk proposal submission form is available at
https://pgconf.ru/en/2017/register_speaker

The confirmed speakers will get free admission to the conference (all
days), tickets to Moscow and accommodation booked and paid by the
organizers.

We will be glad to see you in Moscow!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-12-08 Thread Michael Paquier
On Thu, Dec 8, 2016 at 5:55 PM, Heikki Linnakangas  wrote:
> On 12/08/2016 10:18 AM, Michael Paquier wrote:
>> Hmmm. How do we handle the case where the user name does not match
>> then? The spec gives an error message e= specifically for this case.
>
> Hmm, interesting. I wonder how/when they imagine that error message to be
> used. I suppose you could send a dummy server-first message, with a made-up
> salt and iteration count, if the user is not found, so that you can report
> that in the server-final message. But that seems unnecessarily complicated,
> compared to just sending the error immediately. I could imagine using a
> dummy server-first messaage to hide whether the user exists, but that
> argument doesn't hold water if you're going to report an "unknown-user"
> error, anyway.

Using directly an error message would map with MD5 and plain, but
that's definitely a new protocol piece so I'd rather think that using
e= once the client has sent its first message in the exchange should
be answered with an appropriate SASL error...

> Actually, we don't give away that information currently. If you try to log
> in with password or MD5 authentication, and the user doesn't exist, you get
> the same error as with an incorrect password. So, I think we do need to give
> the client a made-up salt and iteration count in that case, to hide the fact
> that the user doesn't exist. Furthermore, you can't just generate random
> salt and iteration count, because then you could simply try connecting
> twice, and see if you get the same salt and iteration count. We need to
> deterministically derive the salt from the username, so that you get the
> same salt/iteration count every time you try connecting with that username.
> But it needs indistinguishable from a random salt, to the client. Perhaps a
> SHA hash of the username and some per-cluster secret value, created by
> initdb. There must be research papers out there on how to do this..

A simple idea would be to use the system ID when generating this fake
salt? That's generated by initdb, once per cluster. I am wondering if
it would be risky to use it for the salt. For the number of iterations
the default number could be used.

> To be really pedantic about that, we should also ward off timing attacks, by
> making sure that the dummy authentication is no faster/slower than a real
> one..

There is one catalog lookup when extracting the verifier from
pg_authid, I'd guess that if we generate a fake verifier things should
get pretty close.

>> If this is taken into account we need to perform sanity checks at
>> initialization phase I am afraid as the number of iterations and the
>> salt are part of the verifier. So you mean that just sending out a
>> normal ERROR message is fine at an earlier step (with *logdetails
>> filled for the backend)? I just want to be sure I understand what you
>> mean here.
>
> That's right, we can send a normal ERROR message. (But not for the
> "user-not-found" case, as discussed above.)

I'd think that the cases where the password is empty and the password
has passed valid duration should be returned with e=other-error. If
the caller sends a SCRAM request that would be impolite (?) to just
throw up an error once the exchange has begun.

> Although, currently, the whole pg_hba.conf file in that example is a valid
> file that someone might have on a real server. With the above addition, it
> would not be. You would never have the two lines with the same
> host/database/user combination in pg_hba.conf.

Okay.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typmod associated with multi-row VALUES constructs

2016-12-08 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> In HEAD, we could change the RTE data structure so that
> >> transformValuesClause could save the typmod information in the RTE,
> >> keeping the lookups cheap.
> 
> > Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> > a bit about it at
> > https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql
> 
> I dunno.  If your example there is correct that XMLTABLE can be called as
> a plain function in a SELECT list, then I doubt that we want to tie
> anything about it to the RTE data structure.  If anything, the case where
> it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
> case.

Well, XMLTABLE is specified by the standard to be part of ,
which it turn is part of .  I can't immediately tell
whether it allows XMLTABLE to be called like a regular function.  The
current patch allows it, but maybe that's not right, and it's probably
not that useful anyway.

> I've been trying to avoid getting involved in the XMLTABLE patch, mainly
> because I know zip about XML, but maybe I need to take a look.

I think it'd be productive that you did so.  The XML part of it is
reasonably well isolated, so you could give your opinion on the core
parser / executor parts without looking at the XML part.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] varlena beyond 1GB and matrix

2016-12-08 Thread Kohei KaiGai
2016-12-08 16:11 GMT+09:00 Craig Ringer :
> On 8 December 2016 at 12:01, Kohei KaiGai  wrote:
>
>>> At a higher level, I don't understand exactly where such giant
>>> ExpandedObjects would come from.  (As you point out, there's certainly
>>> no easy way for a client to ship over the data for one.)  So this feels
>>> like a very small part of a useful solution, if indeed it's part of a
>>> useful solution at all, which is not obvious.
>>>
>> I expect an aggregate function that consumes millions of rows as source
>> of a large matrix larger than 1GB. Once it is formed to a variable, it is
>> easy to deliver as an argument of PL functions.
>
> You might be interested in how Java has historically dealt with similar 
> issues.
>
> For a long time the JVM had quite low limits on the maximum amount of
> RAM it could manage, in the single gigabytes for a long time. Even for
> the 64-bit JVM. Once those limitations were lifted, the garbage
> collector algorithm placed a low practical limit on how much RAM it
> could cope with effectively.
>
> If you were doing scientific computing with Java, lots of big
> image/video work, using GPGPUs, doing large scale caching, etc, this
> rapidly became a major pain point. So people introduced external
> memory mappings to Java, where objects could reference and manage
> memory outside the main JVM heap. The most well known is probably
> BigMemory (https://www.terracotta.org/products/bigmemory), but there
> are many others. They exposed this via small opaque handle objects
> that you used to interact with the external memory store via library
> functions.
>
> It might make a lot of sense to apply the same principle to
> PostgreSQL, since it's much less intrusive than true 64-bit VARLENA.
> Rather than extending all of PostgreSQL to handle special-case
> split-up VARLENA extended objects, have your interim representation be
> a simple opaque value that points to externally mapped memory. Your
> operators for the type, etc, know how to work with it. You probably
> don't need a full suite of normal operators, you'll be interacting
> with the data in a limited set of ways.
>
> The main issue would presumably be one of resource management, since
> we currently assume we can just copy a Datum around without telling
> anybody about it or doing any special management. You'd need to know
> when to clobber your external segment, when to copy(!) it if
> necessary, etc. This probably makes sense for working with GPGPUs
> anyway, since they like dealing with big contiguous chunks of memory
> (or used to, may have improved?).
>
> It sounds like only code specifically intended to work with the
> oversized type should be doing much with it except passing it around
> as an opaque handle, right?
>
Thanks for your proposition. Its characteristics looks like a largeobject
but volatile (because of no backing storage).
As you mentioned, I also think its resource management is the core of issues.
We have no reference count mechanism, so it is uncertain when we should
release the orphan memory chunk, or we have to accept this memory consumption
until clean-up of the relevant memory context.
Moreever, we have to pay attention to the scenario when this opaque identifier
is delivered to the background worker, thus, it needs to be valid on other
process's context. (Or, prohibit to exchange it on the planner stage?)

> Do you need to serialize this type to/from disk at all? Or just
> exchange it in chunks with a client? If you do need to, can you
> possibly do TOAST-like or pg_largeobject-like storage where you split
> it up for on disk storage then reassemble for use?
>
Even though I don't expect this big chunk is entirely exported to/imported
from the client at once, it makes sense to save/load this big chunk to/from
the disk, because construction of a big memory structure consumes much CPU
cycles than simple memory copy from buffers.
So, in other words, it does not need valid typinput/typoutput handlers, but
serialization/deserialization on disk i/o is helpful.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2016-12-08 Thread Etsuro Fujita

On 2016/12/07 20:23, Etsuro Fujita wrote:

On 2016/12/07 15:39, Ashutosh Bapat wrote:



On 2016/11/22 18:28, Ashutosh Bapat wrote:


I guess, the reason why you are doing it this way, is SELECT clause for
the
outermost query gets deparsed before FROM clause. For later we call
deparseRangeTblRef(), which builds the tlist. So, while deparsing
SELECT
clause, we do not have tlist to build from. In that case, I guess,
we have
to
build the tlist in get_subselect_alias_id() if it's not available and
stick it
in fpinfo. Subsequent calls to get_subselect_alias_id() should find
tlist
there. Then in deparseRangeTblRef() assert that there's a tlist in
fpinfo
and use it to build the SELECT clause of subquery. That way, we don't
build
tlist unless it's needed and also use the same tlist for all searches.
Please
use tlist_member() to search into the tlist.



I wrote:

This would probably work, but seems to me a bit complicated.
Instead, I'd
like to propose that we build the tlist for each relation being
deparsed as
a subquery in a given join tree, right before deparsing the SELECT
clause in
deparseSelectStmtForRel, if is_subquery is false and lower_subquery_rels
isn't NULL, and store the tlist into the relation's fpinfo.  That would
allow us to build the tlist only when we need it, and to use
tlist_member
for the exact comparison.  I think it would be much easier to implement
that.



IIRC, this is inline with my original proposal of creating tlists
before deparsing anything. Along-with that I also proposed to bubble
this array of tlist up the join hierarchy to avoid recursion [1] point
#15, further elaborated in [2]

[1]
https://www.postgresql.org/message-id/ad449b25-66ee-4c06-568c-0eb2e1bef9f9%40lab.ntt.co.jp

[2]
https://www.postgresql.org/message-id/CAFjFpRcn7%3DDNOXm-PJ_jVDqAmghKVf6tApQC%2B_RgMZ8tOPExcA%40mail.gmail.com



My proposal here would be a bit different from what you proposed; right
before deparseSelectSql in deparseSelectStmtForRel, build the tlists for
relations present in the given jointree that will be deparsed as
subqueries, by (1) traversing the given jointree and (2) applying
build_tlist_to_deparse to each relation to be deparsed as a subquery and
saving the result in that relation's fpinfo.  I think that would be
implemented easily, and the overhead would be small.


I implemented that to address your concern:
* deparseRangeTblRef uses the tlist created as above, to build the 
SELECT clause of the subquery.  (You said "Then in deparseRangeTblRef() 
assert that there's a tlist in fpinfo", but the tlist may be empty, so I 
didn't add any assertion to that function.)

* get_relation_column_alias_ids uses tlist_member with the tlist.

I also addressed the comments #1, #2 and #3 in [1].  For #1, I added 
"LIMIT 10" to the query.  Attached is the new version of the patch.


Other changes:
* As discussed before, renamed grouped_tlist in fpinfo to "tlist" and 
used it to store the tlist created as above.
* Also, renamed SS_REL_ALIAS_PREFIX to SUBQUERY_REL_ALIAS_PREFIX 
(Likewise for SS_COL_ALIAS_PREFIX).

* Relocated some functions.
* Revised comments a little bit.

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAFjFpRfU4-gxqZ8ahoKM1ZtDJEThe3K8Lb_6beRKa8mmP%3Dv%3DfA%40mail.gmail.com
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 109,114  typedef struct deparse_expr_cxt
--- 109,116 
  /* Handy macro to add relation name qualification */
  #define ADD_REL_QUALIFIER(buf, varno)	\
  		appendStringInfo((buf), "%s%d.", REL_ALIAS_PREFIX, (varno))
+ #define SUBQUERY_REL_ALIAS_PREFIX	"s"
+ #define SUBQUERY_COL_ALIAS_PREFIX	"c"
  
  /*
   * Functions to determine whether an expression can be evaluated safely on
***
*** 159,165  static void printRemoteParam(int paramindex, Oid paramtype, int32 paramtypmod,
   deparse_expr_cxt *context);
  static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
  	   deparse_expr_cxt *context);
! static void deparseSelectSql(List *tlist, List **retrieved_attrs,
   deparse_expr_cxt *context);
  static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
--- 161,167 
   deparse_expr_cxt *context);
  static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
  	   deparse_expr_cxt *context);
! static void deparseSelectSql(List *tlist, bool is_subquery, List **retrieved_attrs,
   deparse_expr_cxt *context);
  static void deparseLockingClause(deparse_expr_cxt *context);
  static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
***
*** 167,172  static void appendConditions(List *exprs, deparse_expr_cxt *context);
--- 169,177 
  static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
  	RelOptInfo *joinrel, bool use_alias, List **params_list);
  static void deparseFromExpr(List *quals, deparse_expr_cxt 

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-12-08 Thread Etsuro Fujita

On 2016/12/07 21:11, Ashutosh Bapat wrote:

On Wed, Dec 7, 2016 at 4:51 PM, Etsuro Fujita
 wrote:

On 2016/12/05 20:20, Ashutosh Bapat wrote:



3. Why is foreignrel variable changed to rel?
-extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root,
-RelOptInfo *foreignrel, List *tlist,
-List *remote_conds, List *pathkeys,
-List **retrieved_attrs, List **params_list);
+extern void deparseSelectStmtForRel(StringInfo buf, PlannerInfo
*root, RelOptInfo *rel,
+List *tlist, List *remote_conds, List *pathkeys,
+bool is_subquery, List **retrieved_attrs,
+List **params_list);



I changed the name that way to match with the function definition in
deparse.c.



That's something good to do but it's unrelated to this patch. I have
repeated this line several times in this thread :)


OK.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Major service downtime expected

2016-12-08 Thread Magnus Hagander
On Thu, Dec 8, 2016 at 9:49 AM, Magnus Hagander  wrote:

> Hi!
>
> (Copying this one to -hackers as well, for git coverage)
>
> We were just notified by one of our hosting providers, MeetMe, that they
> are about to start physically moving two of our servers to a different
> datacenter. Unfortunately, we were only given about an hours warning on
> this, so we have been unable to migrate important services elsewhere before
> the downtime.
>
> The following services may be affected, and we have no idea how long the
> downtime will be (we've been given a maximum window of 6 hours, but we hope
> it's going to be around an hour):
>
> * Main website backend systems (anything dynamic on the website, cached
> content will continue to serve)
> * Website search
> * List archives (except cached pages). (Mailinglists themselves should
> continue to work)
> * git.postgresql.org (gitmaster not affected)
> * irc docbot
> * postgres.ca website and associated services
>
>
> We apologize for this inconvenience, and will work with the techs from
> MeetMe to get the systems back up as quickly as possible.
>
>
This issue should now be resolved and services back to normal. There will
be a round of security patching on the other servers later today as well,
but that should only be minutes worth of downtime at the worst.



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-12-08 Thread Masahiko Sawada
On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier
 wrote:
> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas  wrote:
>> You could do that, but first I would code up the simplest, cleanest
>> algorithm you can think of and see if it even shows up in a 'perf'
>> profile.  Microbenchmarking is probably overkill here unless a problem
>> is visible on macrobenchmarks.
>
> This is what I would go for! The current code is doing a simple thing:
> select the Nth element using qsort() after scanning each WAL sender's
> values. And I think that Sawada-san got it right. Even running on my
> laptop a pgbench run with 10 sync standbys using a data set that fits
> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead
> using perf top on a non-assert, non-debug build. Hash tables and
> allocations get a far larger share. Using the patch,
> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10
> nodes. Let's kick the ball for now. An extra patch could make things
> better later on if that's worth it.

Yeah, since the both K and N could be not large these algorithm takes
almost the same time. And current patch does simple thing. When we
need over 100 or 1000 replication node the optimization could be
required.
Attached latest v9 patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0fc4e57..bc67a99 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3054,42 +3054,75 @@ include_dir 'conf.d'
 transactions waiting for commit will be allowed to proceed after
 these standby servers confirm receipt of their data.
 The synchronous standbys will be those whose names appear
-earlier in this list, and
+in this list, and
 that are both currently connected and streaming data in real-time
 (as shown by a state of streaming in the
 
-pg_stat_replication view).
-Other standby servers appearing later in this list represent potential
-synchronous standbys. If any of the current synchronous
-standbys disconnects for whatever reason,
-it will be replaced immediately with the next-highest-priority standby.
-Specifying more than one standby name can allow very high availability.
+pg_stat_replication view). If the keyword
+FIRST is specified, other standby servers appearing
+later in this list represent potential synchronous standbys.
+If any of the current synchronous standbys disconnects for
+whatever reason, it will be replaced immediately with the
+next-highest-priority standby. Specifying more than one standby
+name can allow very high availability.


 This parameter specifies a list of standby servers using
 either of the following syntaxes:
 
-num_sync ( standby_name [, ...] )
+[ANY] num_sync ( standby_name [, ...] )
+FIRST num_sync ( standby_name [, ...] )
 standby_name [, ...]
 
 where num_sync is
 the number of synchronous standbys that transactions need to
 wait for replies from,
 and standby_name
-is the name of a standby server. For example, a setting of
-3 (s1, s2, s3, s4) makes transaction commits wait
-until their WAL records are received by three higher-priority standbys
-chosen from standby servers s1, s2,
-s3 and s4.
+is the name of a standby server.
+FIRST and ANY specify the method used by
+the master to control the standby servres.
 
 
-The second syntax was used before PostgreSQL
+The keyword FIRST, coupled with num_sync, makes
+transaction commit wait until WAL records are received from the
+num_sync standbys with higher priority number.
+For example, a setting of FIRST 3 (s1, s2, s3, s4)
+makes transaction commits wait until their WAL records are received
+by three higher-priority standbys chosen from standby servers
+s1, s2, s3 and s4.
+
+
+The keyword ANY, coupled with num_sync,
+makes transaction commits wait until WAL records are received
+from at least num_sync connected standbys among those
+defined in the list of synchronous_standby_names. For
+example, a setting of ANY 3 (s1, s2, s3, s4) makes
+transaction commits wait until receiving WAL records from at least
+any three standbys of four listed servers s1,
+s2, s3, s4.
+
+
+FIRST and ANY are case-insensitive words
+and the standby name having these words are must be double-quoted.
+
+
+The third syntax was used before PostgreSQL
 version 9.6 and is still supported. It's the same as the first syntax
-with num_sync equal to 1.
- 

Re: [HACKERS] Declarative partitioning - another take

2016-12-08 Thread Amit Langote
On 2016/12/08 3:20, Robert Haas wrote:
> On Wed, Dec 7, 2016 at 11:53 AM, Erik Rijkers  wrote:
>>> My bad.  The fix I sent last night for one of the cache flush issues
>>> wasn't quite right.  The attached seems to fix it.
>> Yes, fixed here too.  Thanks.
> 
> Thanks for the report - that was a good catch.
> 
> I've committed 0001 - 0006 with that correction and a few other
> adjustments.  There's plenty of room for improvement here, and almost
> certainly some straight-up bugs too, but I think we're at a point
> where it will be easier and less error-prone to commit follow on
> changes incrementally rather than by continuously re-reviewing a very
> large patch set for increasingly smaller changes.

Attached is a patch to fix some stale comments in the code and a minor
correction to one of the examples on the CREATE TABLE page.

Thanks,
Amit
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 8bf8af302b..bb6cffabf7 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1498,7 +1498,7 @@ CREATE TABLE cities (
 
 CREATE TABLE measurement_y2016m07
 PARTITION OF measurement (
-unitsales WITH OPTIONS DEFAULT 0
+unitsales DEFAULT 0
 ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
 
 
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 6dab45f0ed..441b31c46e 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1492,7 +1492,7 @@ generate_partition_qual(Relation rel, bool recurse)
  *			Construct values[] and isnull[] arrays for the partition key
  *			of a tuple.
  *
- *	pkinfo			partition key execution info
+ *	pdPartition dispatch object of the partitioned table
  *	slot			Heap tuple from which to extract partition key
  *	estate			executor state for evaluating any partition key
  *	expressions (must be non-NULL)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 270be0af18..2d2d383941 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1403,10 +1403,7 @@ BeginCopy(ParseState *pstate,
 	 errmsg("table \"%s\" does not have OIDs",
 			RelationGetRelationName(cstate->rel;
 
-		/*
-		 * Initialize state for CopyFrom tuple routing.  Watch out for
-		 * any foreign partitions.
-		 */
+		/* Initialize state for CopyFrom tuple routing. */
 		if (is_from && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		{
 			PartitionDispatch *pd;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c77b216d4f..917795af9c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1614,8 +1614,8 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 
 	/*
 	 * In case of a partition, there are no new column definitions, only
-	 * dummy ColumnDefs created for column constraints.  We merge these
-	 * constraints inherited from the parent.
+	 * dummy ColumnDefs created for column constraints.  We merge them
+	 * with the constraints inherited from the parent.
 	 */
 	if (is_partition)
 	{
@@ -2030,8 +2030,8 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 			newcollid;
 
 /*
- * Partitions have only one parent, so conflict should never
- * occur
+ * Partitions have only one parent and have no column
+ * definitions of their own, so conflict should never occur.
  */
 Assert(!is_partition);
 
@@ -2118,8 +2118,8 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 
 	/*
 	 * Now that we have the column definition list for a partition, we can
-	 * check whether the columns referenced in column option specifications
-	 * actually exist.  Also, we merge the options into the corresponding
+	 * check whether the columns referenced in the column constraint specs
+	 * actually exist.  Also, we merge the constraints into the corresponding
 	 * column definitions.
 	 */
 	if (is_partition && list_length(saved_schema) > 0)
diff --git a/src/include/catalog/pg_partitioned_table.h b/src/include/catalog/pg_partitioned_table.h
index cec54ae62e..be8727b556 100644
--- a/src/include/catalog/pg_partitioned_table.h
+++ b/src/include/catalog/pg_partitioned_table.h
@@ -47,7 +47,7 @@ CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
 #ifdef CATALOG_VARLEN
 	oidvector		partclass;		/* operator class to compare keys */
 	oidvector		partcollation;	/* user-specified collation for keys */
-	pg_node_tree	partexprs;		/* list of expressions in the partitioning
+	pg_node_tree	partexprs;		/* list of expressions in the partition
 	 * key; one item for each zero entry in
 	 * partattrs[] */
 #endif

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-12-08 Thread Heikki Linnakangas

On 12/08/2016 10:18 AM, Michael Paquier wrote:

On Thu, Dec 8, 2016 at 5:54 AM, Heikki Linnakangas  wrote:

Attached those here, as add-on patches to your latest patch set.


Thanks for looking at it!


I'll continue reviewing, but a couple of things caught my eye that you may want
to jump on, in the meanwhile:

On error messages, the spec says:


o  e: This attribute specifies an error that occurred during
  authentication exchange.  It is sent by the server in its final
  message and can help diagnose the reason for the authentication
  exchange failure.  On failed authentication, the entire server-
  final-message is OPTIONAL; specifically, a server implementation
  MAY conclude the SASL exchange with a failure without sending the
  server-final-message.  This results in an application-level error
  response without an extra round-trip.  If the server-final-message
  is sent on authentication failure, then the "e" attribute MUST be
  included.



Note that it says that the server can send the error message with the e=
attribute, in the *final message*. It's not a valid response in the earlier
state, before sending server-first-message. I think we need to change the
INIT state handling in pg_be_scram_exchange() to not send e= messages to the
client. On an error at that state, it needs to just bail out without a
message. The spec allows that. We can always log the detailed reason in the
server log, anyway.


Hmmm. How do we handle the case where the user name does not match
then? The spec gives an error message e= specifically for this case.


Hmm, interesting. I wonder how/when they imagine that error message to 
be used. I suppose you could send a dummy server-first message, with a 
made-up salt and iteration count, if the user is not found, so that you 
can report that in the server-final message. But that seems 
unnecessarily complicated, compared to just sending the error 
immediately. I could imagine using a dummy server-first messaage to hide 
whether the user exists, but that argument doesn't hold water if you're 
going to report an "unknown-user" error, anyway.


Actually, we don't give away that information currently. If you try to 
log in with password or MD5 authentication, and the user doesn't exist, 
you get the same error as with an incorrect password. So, I think we do 
need to give the client a made-up salt and iteration count in that case, 
to hide the fact that the user doesn't exist. Furthermore, you can't 
just generate random salt and iteration count, because then you could 
simply try connecting twice, and see if you get the same salt and 
iteration count. We need to deterministically derive the salt from the 
username, so that you get the same salt/iteration count every time you 
try connecting with that username. But it needs indistinguishable from a 
random salt, to the client. Perhaps a SHA hash of the username and some 
per-cluster secret value, created by initdb. There must be research 
papers out there on how to do this..


To be really pedantic about that, we should also ward off timing 
attacks, by making sure that the dummy authentication is no 
faster/slower than a real one..



If this is taken into account we need to perform sanity checks at
initialization phase I am afraid as the number of iterations and the
salt are part of the verifier. So you mean that just sending out a
normal ERROR message is fine at an earlier step (with *logdetails
filled for the backend)? I just want to be sure I understand what you
mean here.


That's right, we can send a normal ERROR message. (But not for the 
"user-not-found" case, as discussed above.)



As Peter E pointed out earlier, the documentation is lacking, on how to
configure MD5 and/or SCRAM. If you put "scram" as the authentication method
in pg_hba.conf, what does it mean? If you have a line for both "scram" and
"md5" in pg_hba.conf, with the same database/user/hostname combo, what does
that mean? Answer: The first one takes effect, the second one has no effect.
Yet the example in the docs now has that, which is nonsense :-). Hopefully
we'll have some kind of a "both" option, before the release, but in the
meanwhile, we need describe how this works now in the docs.


OK, it would be better to add a paragraph in client-auth.sgml
regarding the mapping of the two settings. For the example of file in
postgresql.conf, I would have really thought that adding directly a
line with "scram" listed was enough though. Perhaps a comment to say
that if md5 and scram are specified the first one wins where a user
and database name map?


So, I think this makes no sense:


 # Allow any user from host 192.168.12.10 to connect to database
-# "postgres" if the user's password is correctly supplied.
+# "postgres" if the user's password is correctly supplied and is
+# using the correct password method.
 #
 # TYPE  DATABASEUSERADDRESS METHOD
 hostpostgres 

[HACKERS] Effect of caching hash bucket size while costing

2016-12-08 Thread Srinivas Karthik V
Dear PostgreSQL Hackers,

I am working in PostgreSQL 9.4.* optimizer module. In costsize.c file and
final_cost_hashjoin() function, the innerbucketsize is either:

a) calculated using a cached copy
  OR
b)  calculated afresh using statistics captured by the following code
snippet:
thisbucketsize =   estimate_hash_bucketsize(root,
get_leftop(restrictinfo->clause),virtualbuckets);

For the query I used, if I disable the caching for calculating the
innerbucketsize, I get a different plan with cost change of around 1000
units.

1) Can you please let me know if innerbucketsize*innerpathrows captures the
maximum bucket size?
2) why is it not calculated afresh all the time?

For reference, below is the query I am using:

explain  select i_item_id,  avg(cs_quantity) , avg(cs_list_price) ,
avg(cs_coupon_amt) ,  avg(cs_sales_price)  from catalog_sales,
customer_demographics, date_dim, item, promotion where cs_sold_date_sk =
d_date_sk and cs_item_sk = i_item_sk and cs_bill_cdemo_sk = cd_demo_sk
and   cs_promo_sk = p_promo_sk and cd_gender = 'F' and  cd_marital_status =
'U' and  cd_education_status = 'Unknown' and  (p_channel_email = 'N' or
p_channel_event = 'N') and  d_year = 2002 and i_current_price <= 100 group
by i_item_id order by i_item_id

and the hashclause which was tried was (item.i_item_sk =
catalog_sales.cs_item_sk).

Thanks,
Srinivas Karthik


[HACKERS] Major service downtime expected

2016-12-08 Thread Magnus Hagander
Hi!

(Copying this one to -hackers as well, for git coverage)

We were just notified by one of our hosting providers, MeetMe, that they
are about to start physically moving two of our servers to a different
datacenter. Unfortunately, we were only given about an hours warning on
this, so we have been unable to migrate important services elsewhere before
the downtime.

The following services may be affected, and we have no idea how long the
downtime will be (we've been given a maximum window of 6 hours, but we hope
it's going to be around an hour):

* Main website backend systems (anything dynamic on the website, cached
content will continue to serve)
* Website search
* List archives (except cached pages). (Mailinglists themselves should
continue to work)
* git.postgresql.org (gitmaster not affected)
* irc docbot
* postgres.ca website and associated services


We apologize for this inconvenience, and will work with the techs from
MeetMe to get the systems back up as quickly as possible.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0

2016-12-08 Thread Michael Paquier
On Wed, Dec 07, 2016 at 03:16:09PM +0530, Ashutosh Sharma wrote:
> Problem Analysis:
> -
> Allthough i am very new to Windows, i tried debugging the issue and
> could find that Backend is not receiving the query executed after
> "SELECT pldbg_attach_to_port(2)" and is infinitely waiting on
> "WaitEventSetWaitBlock()" at WaitForMultipleObjects() to read the
> input command. Below is the backtrace for the same.

This code was indeed in need of review.  It seems that you are giving
enough information to reproduce the problem. I'll try to dig into
it...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-12-08 Thread Michael Paquier
On Thu, Dec 8, 2016 at 5:54 AM, Heikki Linnakangas  wrote:
> Attached those here, as add-on patches to your latest patch set.

Thanks for looking at it!

> I'll continue reviewing, but a couple of things caught my eye that you may 
> want
> to jump on, in the meanwhile:
>
> On error messages, the spec says:
>
>> o  e: This attribute specifies an error that occurred during
>>   authentication exchange.  It is sent by the server in its final
>>   message and can help diagnose the reason for the authentication
>>   exchange failure.  On failed authentication, the entire server-
>>   final-message is OPTIONAL; specifically, a server implementation
>>   MAY conclude the SASL exchange with a failure without sending the
>>   server-final-message.  This results in an application-level error
>>   response without an extra round-trip.  If the server-final-message
>>   is sent on authentication failure, then the "e" attribute MUST be
>>   included.
>
>
> Note that it says that the server can send the error message with the e=
> attribute, in the *final message*. It's not a valid response in the earlier
> state, before sending server-first-message. I think we need to change the
> INIT state handling in pg_be_scram_exchange() to not send e= messages to the
> client. On an error at that state, it needs to just bail out without a
> message. The spec allows that. We can always log the detailed reason in the
> server log, anyway.

Hmmm. How do we handle the case where the user name does not match
then? The spec gives an error message e= specifically for this case.
If this is taken into account we need to perform sanity checks at
initialization phase I am afraid as the number of iterations and the
salt are part of the verifier. So you mean that just sending out a
normal ERROR message is fine at an earlier step (with *logdetails
filled for the backend)? I just want to be sure I understand what you
mean here.

> As Peter E pointed out earlier, the documentation is lacking, on how to
> configure MD5 and/or SCRAM. If you put "scram" as the authentication method
> in pg_hba.conf, what does it mean? If you have a line for both "scram" and
> "md5" in pg_hba.conf, with the same database/user/hostname combo, what does
> that mean? Answer: The first one takes effect, the second one has no effect.
> Yet the example in the docs now has that, which is nonsense :-). Hopefully
> we'll have some kind of a "both" option, before the release, but in the
> meanwhile, we need describe how this works now in the docs.

OK, it would be better to add a paragraph in client-auth.sgml
regarding the mapping of the two settings. For the example of file in
postgresql.conf, I would have really thought that adding directly a
line with "scram" listed was enough though. Perhaps a comment to say
that if md5 and scram are specified the first one wins where a user
and database name map?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers